Dvc exp show: experiment not showing / wrong position

Hi there,
I am trying out dvc for a few days now and have a question regarding dvc exp show. My setup is:

  • Have a git <branch 1> with a head commit <commit 1>
  • Run an experiment on a remote machine (checkout <commit 1>, dvc pull, dvc exp run)
  • Create a new branch (dvc exp branch) which creates <branch 2> and <commit 2> in it
  • I push the experiment (dvc exp push)
  • On my local machine, I fetch the experiment (dvc exp pull)
    (* In the mean time, I have more commits on <branch 1>)

Then, dvc exp show behaves unexpectedly:

  • If I run dvc exp show -a, it shows <branch 1> and <branch 2>, but no experiment
  • If I run dvc exp show -A, it shows the experiment under <commit 1>

Expected behavior:

  • If I run dvc exp show -a, it shows the experiment under <branch 2>

Hope I got this across. Am I doing something wrong here?

Hi again,

so after stepping through the dvc code to see what is happening, I realized I must have misunderstood the concept of experiments.

What I would like to have is an overview of all my experiments, grouped by the branches they are in and sorted by metrics, parameters or whatever. Apparently, dvc exp show is not the right tool to do so. Does someone have a suggestion how I can achieve what I want?

I am also very open to suggestions on how a better workflow would look like. My “special requirement” is that I need to run experiements on another machine inside a docker container (Which I tried to do by doing inside the container: git clone, dvc exp run, [dvc exp branch], dvc exp push, git push).
This requirement brings another requirement, i.e. that all experiments must be pushed to git automatically after experiment run and I need to somehow remove unwanted experiments later.

I tried to solve this by having each experiment create its own branch, then either merging the branch into my main or feature branch afterwards or deleting the branch. When I could then exp show all experiments for some branch, I’d be fine. But this does not seem to be the dvc way.

Hi @gugar ,

And what does plain dvc exp show show if you git checkout to branch 2?

DVC experiments are always associated with parent commits (and not branches). dvc exp show -a/--all-branches shows your current branch heads, and experiments associated with those branch head commits. So in your scenario, your experiment will always be associated with commit 1, and not with branch 1 (unless the tip of branch 1 is currently pointing to commit 1).

Using dvc exp branch just creates a new git branch that contains the contents of the experiment. It does not move/re-associate the experiment itself to be under the new git branch. If you are using dvc exp branch, the expectation is that afterwards you will just use that git branch (instead of using the experiment ref).

This is why you get:

  • If I run dvc exp show -a, it shows <branch 1> and <branch 2>, but no experiment
  • If I run dvc exp show -A, it shows the experiment under <commit 1>

branch 1’s HEAD has moved due to In the mean time, I have more commits on <branch 1>, so exp show -a does not include commit 1 in the table. branch 2 is an entire new git branch, and is no longer considered to be a child experiment.

Hi @kupruser, @pmrowla,
Thanks for your replies. @pmrowla thanks for your detailed explanation, which would be great to have in the dvc documentation! @kupruser it only shows workspace and branch 2, but no experiments under branch 2 which seems to be the intended behavior after reading @pmrowla 's answer.

Now I see that my approach is not supported by dvc, or at least not as straightforward as I was thinking. To get to a solution of the issue: Should I change the way I organize experiments and if so, how? Or is my approach ok but I am not using the dvc commands correctly?

Do you actually want to sort or group by branches? Could you provide more info about how you want to group the experiments? Like you suggested, it seems like there is simply a misunderstanding or disconnect about expected workflows.

Some possible workflows depending on what you want:

  • Leave your setup as is. Once you do dvc exp branch, the result in dvc exp show -a for branch 2 is the exact same as the experiment you see under commit 1. You could create as many branches as you want like this, but it could get messy to manage that many branches.
  • Remove the dvc exp branch step. You can push experiments as is (and pull them locally) and see them all with dvc exp show without needing to create new branches or commits. This is a more typical workflow, in which case you could later use dvc exp branch (or dvc exp apply) to bring the best experiment into your regular Git workflow.
  • Checkout a different branch before running the experiment so that the experiment is based on the tip of that branch. This could be useful if you actually need to group bunches of experiments by branches.

Hi @dberenbaum,
thanks for your answer! I tried all your suggestions, but still experiments do not show up in dvc exp show. I can only see my HEAD commit there but without experiment name. When I use -A I can see all commits with their results but then it is pretty messy and I cannot sort by any of the metrics.

After trying out a lot of stuff, I think my major misconception is that dvc experiments are meant to be multiple experiments per git commit and sorting only those. Is that correct? My use case is that I have at max 1 experiment per git commit and I want to sort them e.g. per branch.

So I could of course write my own dvc exp show replacement, but I am asking myself: Am I using dvc wrong?

I think my major misconception is that dvc experiments are meant to be multiple experiments per git commit and sorting only those.

Right, makes sense. Now, the table is only sorting within a commit, but you want to sort between all commits or branches, right? Please feel free to add a feature request to Sign in to GitHub · GitHub, or I can copy the relevant points there if you would prefer.

My use case is that I have at max 1 experiment per git commit and I want to sort them e.g. per branch.

Do you mind explaining why you want to have 1 experiment per commit? What differs between each commit? This sort of defeats the point of dvc experiments, which is to keep track of those differences for you so that you don’t need so many commits/branches.

If you have long gaps with lots of commits between every experiment, and you don’t think you have any use for running multiple experiments based on a single commit, then I think your workflow is fine, although you could probably accomplish the same without the dvc exp commands (other than dvc exp show).

Do you mind explaining why you want to have 1 experiment per commit? What differs between each commit? This sort of defeats the point of dvc experiments, which is to keep track of those differences for you so that you don’t need so many commits/branches.

I am doing AI research and mostly the difference between experiments is in code. So I want to change code, then of course commit with git and run an experiment on top of it. Of course, sometimes I also just change parameters, where experiments would come in handy. But (perhaps that’s a personal thing) I prefer to have a completely defined state within git when running an experiment, therefore, I would rather update and commit my params.yaml file than running an experiment with parameters in the arguments.

[…] although you could probably accomplish the same without the dvc exp commands […]

Yes, that’s what I am thinking as well. Just use dvc repro. TBH, the exp stuff of dvc is adding a whole bunch of complexity to the workflow for a (at least for me) quite limited benefit. I think working with experiment names and hashes in addition to all the git stuff introduces many new ways to make errors.

Btw, so I don’t just grumble here: I love all the rest of DVC so far :slight_smile: Great tool!

I am doing AI research and mostly the difference between experiments is in code. So I want to change code, then of course commit with git and run an experiment on top of it.

One thing to note here would be that DVC experiments will include uncommitted code changes (they aren’t meant to only support parameter changes). So rather than “commit with git and then run an experiment on top of it”, you could just make your code changes and then do dvc exp run (without modified parameters and without git commit’ing anything), to see what you would get before committing those code changes (with the results and those code changes stored in the DVC experiment).

You can also retrieve your changes from an experiment using dvc exp apply. So you can do things like:

# edit some code
$ vim src/file.py
$ dvc exp run -n exp-a
# edit code again (to test some alternate version of the changes)
$ vim src/file.py
$ dvc exp run -n exp-b
# compare the results of code changes in A vs code changes in B
$ dvc exp show
# decide that I want to keep the changes+results from A and discard everything else
$ git reset --hard
$ dvc exp apply exp-a
$ git add .
$ git commit -m 'this commit contains the code changes and result of A'

I see, thank you for pointing that out. However, because I have to run my experiments on specific remote machines, this workflow will not work for me. I have to commit and push any changes so that the remote machine can pick up the desired state and run from there.
Also, by doing so, I will use a mix of git and dvc for code version control. This feels very error prone to me.

Also, by doing so, I will use a mix of git and dvc for code version control. This feels very error prone to me.

Your explanation about why this workflow doesn’t fit your use case makes sense, but just to clarify this, internally DVC experiments are just git commits, so for all intents and purposes you would still only be using git for code version control.

Named experiments are just git refs (like branches or tags) stored in a separate refs/exps namespace, and you can use experiment refs or SHAs in git commands like any other commit SHA or git ref. You can read more about the implementation here: Git Custom References for ML Experiments

Hello @dberenbaum @pmrowla,

I have been experimenting with DVC (version 2.18.1) the past week and am trying to establish a workflow for our ML experiments. I am trying to use a workflow where experiments are grouped in separate branches and I am using the data and project structure provided by this tutorial Data Version Control With Python and DVC – Real Python, I forked the repo, which you can see here: GitHub - ChristianP45/real-python-dvc.

So for example, I would like to train a simple SGD classifier, so I create a branch called sgd so that I can group all SGD related experiments in that branch. I ran 2 experiments using a different number of iterations for each (i.e. baseline and 500-iters). It all looks good up until the point where I actually commit and push the changes (FYI I installed the DVC git hooks)

As you can see, the experiments then appear under master branch after commiting and pushing the changes for some reason? And I cannot figure out why. Is this expected? Am I missing something? It would be really helpful if guys could assist as I am running out of ideas.

Thanks in advance :slight_smile:

Cheers,
Chris

Hi @chrisP45 . This is not expected, appears to be a bug in dvc exp show -a. I will take a look

As you can see, the experiments then appear under master branch after commiting and pushing the changes for some reason? And I cannot figure out why.

I took a look.
This only happens when sgd branch doesn’t contain any commits so it points to the same commit as master (let’s call it master-commit-0).

When you run the experiments in the sgd branch, they are actually derived from the master-commit-0.

So, when you run git add and git commit, you are introducing a new commit in sgd branch and that’s why the default exp show doesn’t show the experiments. By default, only shows experiments derived from the current commit (you could see the experiments if you ran dvc exp show -n 2).

However, dvc exp show -a looks to all branches and it finds that master points to master-commit-0, which has 2 experiments derived.

You can use the dvc exp gc -w (or with other flags) to clean the dvc experiments.

I think the problem here is that git add and git commit is used as an alternative to the commands we provide for persisting experiments and it doesn’t take care of cleaning experiments, causing the confusion.

@chrisP45 does this solve the question?