Is it possible to change branches while running repro?

In our project we sometimes have long repros (with e.g., hyperparameter optimization) that can take hours/days to finish. While such a repro is running, I can’t change branches because:

  • The repro reads files as the stages are being executed, so changing branches mid-repro could make subsequent stages use file versions from a different branch;
  • We have commit hooks that use dvc, which complains about the lockfile if I try to change branches during a repro.

Right now I see two solutions to be able to continue working while a repro is running:

  1. Run the repro in another machine (e.g., cloud)
  2. Create an entire copy of my repo in another folder, and work there.

Solution 1 is bad because I have to waste resources to keep the cloud instance running, and because running it remotely is cumbersome (I don’t currently have a system to notify me if the repro fails, for instance, and have to keep checking, and I don’t have the same tools in the remote to investigate outputs, monitor resource usage, etc).

Solution 2 is bad because I need twice as much data stored in my computer (for which I usually don’t have enough space for, it’s a big project), and because more than once I ended up working on the wrong repo, messing up the repro and having to move my changes to the other version of the repo.

Are there any recommended approaches here? This seems like a common problem that most users with long-running repros would face.

One solution would be to do 2 and set up a shared cache locally so the storage isn’t duplicated. How to Share a Cache Among Projects

Another option would be to treat the pipeline as an experiment and use dvc exp run --temp or dvc exp run --queue to run it in a temporary directory.

Using dvc experiments sounds like a really nice solution, thank you for the input! Would you mind elaborating a bit more on how the workflow would be? I’m guessing something like:

  1. dvc exp run --temp, leave the long-running repro/exp running
  2. git checkout other_branch, work in another branch
  3. ?

Maybe step 3, when the repro is done, is running dvc exp apply? Maybe git checkout the previous branch first?

For anyone stumbling across my question, this is the workflow I ended up with:

  1. dvc exp run --temp [-n experiment_name], leave the long-running repro/exp running.
  2. Change branches, work on other things.
  3. Once the repro/experiment finishes, git checkout the branch you ran the experiment in.
  4. dvc exp apply <experiment_name> (experiment_name is chosen for you if you don’t specify -n in dvc exp run; you can query which experiments you have with dvc exp list, which is branch-specific). This applies all the changes you had for the long-running repro + all the outputs to your workspace.
  5. git commit your repro results.

Caveats to have in mind:

  • If your repro depends on files specified via global paths, you should not touch these files while the repro is running (I believe?)

@dberenbaum Thank you for the suggestion, it works really well. Can you confirm the caveat I mentioned and let me know if you’re aware of any others?

1 Like

Yes, that all looks correct @julianotusi!