Getting started with DVC - developer vs CI environments

Hi,

I’m getting started with DVC and after a couple of days wrestling with it managed to take the initial instructions for “Get Started” to use S3, to document how other developers on my team can use DVC, write scripts to perform a DVC pull in a Bitbucket Pipeline, and just generally get DVC to do what is promised in our specific situation.

Is there a place to document or share these learnings?

  • What you’re going to need to push and pull DVC managed files to an S3 bucket
  • How to refer to ~/.aws/config and ~/.aws/credentials given that DVC does not support the expansion of ~ or $HOME
  • When to use “dvc config --local” vs “dvc config --project”
  • How to switch between ./dvc/config settings and environment variables in a CI/CD pipeline
  • How to add the hooks for DVC to your repository .pre-commit-config.yaml file (specifically, how to not freak out that the DVC hook definitions are “upside down” with respect to repo and rev). See the order of “hooks”, “repo”, and “rev” in dvc/.pre-commit-config.yaml at main · iterative/dvc · GitHub.

Thanks

bump bumpy, bumpety, bumper, bump.

Ooops, sorry @drjasonharrison
By a place to document and share, do you mean something where one could put tutorial-style docs explaining particular use cases?

We don’t have a place like this, though I think one could consider adding to the examples in our docs (example). If you feel like they are lacking some use cases. I would recommend going to our docs project and creating an issue or proposing a PR.

It seems to me that the answer to most of the questions can be found in docs. Probably besides the last one, but that one should be covered by pre-commit docs.
I am not sure I understand what does upside down mean?

1 Like

@Paffciu by “upside down” I mean that the keys for the pre-commit hook definitions for DVC are in the opposite order than most of the pre-commit hook definitions I have ever seen.

For example, isort hooks from https://github.com/pre-commit/mirrors-isort:

-   repo: https://github.com/pre-commit/mirrors-isort
    rev: ''  # Use the revision sha / tag you want to point at
    hooks:
    -   id: isort

The order is: repo, rev, hooks, id.

However in the dvc/.pre-commit-config.yaml:

  - hooks:
      - id: isort
        language_version: python3
    repo: https://github.com/timothycrosley/isort
    rev: 5.10.1

The order is: hooks, id, repo, rev

Maybe this difference in ordering reflects a desire to highlight the hooks, rather than the repo or rev. But it also could be lexical sorting of the keys…which is more likely when you look at the DVC specific hooks with key order: args, entry, id, language, name, stage, require_serial…

The key ordering in pre-commit-config.yaml doesn’t matter, it’s just a yaml dictionary. You can define those keys in whatever order you want, you don’t have to follow any particular convention in order to use the DVC hooks. If you prefer writing the repo key first, then you can. The pre-commit tool will work either way.

Also, the iterative/dvc repo’s .pre-commit-config.yaml is specifically for configuring a dev environment for contributing code to DVC itself. That file is not relevant to setting up a DVC repository. (see https://dvc.org/doc/user-guide/contributing/core#development-environment)

(The actual DVC hook definitions come from .pre-commit-hooks.yaml: https://github.com/iterative/dvc/blob/main/.pre-commit-hooks.yaml.)

Correct, the key ordering doesn’t matter. However there is an ordering that you will typically find in pre-commit-hooks.yaml files, documentation, etc.

All I am trying to note is the key order in https://github.com/iterative/dvc/blob/main/.pre-commit-hooks.yaml is not the typical ordering. As a new user of DVC, coming across a difference like this wasn’t a show-stopper, but I did note it and wanted to provide feedback on it. I guess a PR would be the next step.

1 Like