Fill-back metrics

jonilaserson · July 14, 2020, 2:35pm

Say I am maintaining some data using dvc, and at some point decide I want to have a metric showing some data statistics (i.e. how many positive samples I have). Can I back-fill this metric to previous commits? (the goal is to track the number of positives I had after each data-commit).

If the commit history is A->B->C, I know I can go back to any commit and run a pipeline, but the metric output of this pipeline will need to be saved in a different commit, right? So I will have to create new git commits: A->A’ B->B’, C->C’ that will store the metric results, or is there a different way?

jorgeorpinel · July 15, 2020, 4:43pm

Very interesting question @jonilaserson !

I think it’s fundamentally a Git question, as versioning is done entirely with Git. (DVC does the tracking of data via placeholders in small dvc.lock and .dvc files.)

So I will have to create new git commits: A->A’ B->B’, C->C’ that will store the metric results, or is there a different way?

Correct. This can be achieved via rebase, I believe. But it may imply rewriting the commit history — sometimes frowned upon.

Please feel free to open a feature request for DVC to support this use case directly though! Sign in to GitHub · GitHub

jonilaserson · July 15, 2020, 6:49pm

I actually meant to add the three branches like this:
A->A’
|
v
B->B’
|
v
C->C’
And then I won’t need rebase, but I will still need a way to know I should collect the metrics in these “detached” commits.

jorgeorpinel · July 15, 2020, 7:56pm

So C’ is where you add the metrics to C, and then you cherry pick that commit separately onto B and onto A, creating your 2 extra branches.

still need a way to know I should collect the metrics in these “detached” commits

If the metrics are data series (plots), you can use dvc plots diff A' B' C'. See plots diff. But this feature isn’t available for plain metrics yet — I just opened an issue for that.

Topic		Replies	Views
How to change metrics computation and recompute Questions	1	656	December 10, 2019
DVC and MLFlow - reproduce experiments using git commit ids Questions	14	5611	February 18, 2021
Tracking metrics.json Questions	0	324	April 13, 2023
Evaluate. Replace in previous commits Questions	2	195	May 2, 2023
Best Practices: How to track data? Questions	2	1153	April 25, 2022

Fill-back metrics

Related topics