Hello,
What is the best way to programatically access experiments? I want to write a script which goes through all of the experiments on a particular ref e.g. HEAD, and collects some data / information from the files there. The files could be git or dvc tracked. I have lots of output from my pipeline, so it is more than I would want to put in a metrics json. It seems like DVCFileSystem (DVCFileSystem) might do this? Would I just need to get a list of all the experiment revs, then loop over them and create a DVCFileSystem for each one?
Thanks,
Greg
Hey Greg,
Currently there is no exp-specific api like that, but if all you need is an fs-like interface to experiments then yes, creating a DVCFileSystem for each rev would do the trick. Please let us know how it goes for you or if you’ll run into any problems.
1 Like
Thanks for the advice.
This is what I ended up doing:
repo = Repo(".")
# determine current branch
branch = repo.scm.active_branch()
# get experiments on this commit
experiments = repo.experiments.ls()[branch]
for exp_name in experiments:
print(f"collecting data for {exp_name}")
# make dvc file system for this experiment
fs = DVCFileSystem(rev=exp_name)
then accessing files, e.g. pandas.read_csv
through the dvc fs.