Basic DVC workflow

I’m a new DVC user and I have a question about the basic model management workflow.

I’ve configured a repo for DVC, I ran a test experiment by creating a feature branch. I ran a sweep and selected the best model, i.e.

dvc exp run -S 'train.batch_size=16,32,64,128' --queue
dvc queue start
dvc exp apply ex1
git add .
git commit -m 'My Experiment'

I have a question about how to merge this back to main correctly, the process I’ve been following is:

  • switch to main
  • merge the feature branch to main
  • dvc pull

The last step seems to be important - if I don’t do the the vscode source control sidebar shows uncommitted DVC tracked model files. Is this the correct workflow (I’m about to learn GTO but I wanted to ensure that I can correctly manage the state of my main branch first).

1 Like

Hey, @david.waterworth !

TL;DR: I think this link is worth checking: Example: Make an experiment persistent

A few comments on the workflow and suggestion to try:

  • I think you don’t need a branch for this. If you want at the end an experiment as part of the main branch (e.g. a new commit as a result of running a sweep), I would just run experiments in main. If you run them in the queue they don’t affect the workspace, etc.
  • When you are done, you could do dvc exp apply, dvc exp branch to “materialize” an experiment as a commit in main or as a new branch.

In VS Code you could use context menu to apply an experiment or do a branch.

Let us know if that solves the issue.