Is there a review mechanism for pushing dataset through DVC?

Hi,

Usually when we are writing code, we work on a branch and then raise a merge request to merge it into the main branch. A reviewer, then reviews the code and approves the merge request.

I am wondering if there is any approval/review mechanism before we do a “dvc push” so that the reviewer has an opprtunity to catch dataset errors before the dataset is uploaded to the remote storage?

For instance, I am following this tutorial: https://dvc.org/doc/use-cases/sharing-data-and-model-files

Here, I would like a reviewer to review and approve the dataset before “dvc push” command is run and the dataset is uploaded to the remote storage.

If not, what procedures do you guys follow to keep a check before the dataset upload?

Hi @subhankar this functionality is not implemented in DVC.

But it would be possible to set something like this up yourself in a custom CI workflow. Here’s one potential example which could work when you are using DVC pipelines (dvc repro) to generate datasets:

What you could do is configure your default DVC remote credentials with a user account that does not have write permissions for your remote storage (so that your regular users’ dvc push would always fail).

For your CI environment, you would configure a separate remote account that does have the appropriate permissions to dvc push.

You could then set up a CI workflow that automatically runs any time a new commit is made to your master branch (so it would be triggered whenever a pull request is merged into master). This CI workflow would re-run dvc repro for that latest master commit, and then dvc push the results (from the CI account with write permissions for your remote).

1 Like