How to deal with multiple metric files that may define homonymous metrics?

julianotusi · December 21, 2022, 4:15pm

I have a pipeline with multiple independent evaluation stages. Each one generates its own .yaml file with the computed metrics inside.

I noticed that if any of these files defines a metric that shares the same name as another metric in another metrics file, dvc will name the metric <filename>:<metric_name> to avoid ambiguities. Although better than being ambiguous, this is not optimal for me, as some of the metrics will be named simply <metric_name> (according to its name in the corresponding .yaml file, if no other metric file defines a metric with the same name) or instead <filename>:<metric_name> (if there were conflicts). This makes it hard to regex for a specific metric when running e.g. dvc exp show, as you have to know in advance if there were conflicts or not to know the name of the metric.

Is there a way to tell dvc to name all my metrics <filename>:<metric_name>? If not, what’s the best practice in this scenario, never having metrics with the same name being defined in multiple files (e.g. I manually append the filename to the metric name inside the yaml files)?

shcheklein · December 21, 2022, 9:13pm

@julianotusi hi! good question. I’m hesitant to say that it makes sense to make it a default mode to prefix all the names, since we’ll be “punishing” people with simpler scenarios for no reason (the would have to complicated their regexps, right?). There is not special mode that I’m aware of unfortunately

But before we even jump into discussing some potential solutions or we need a GH ticket to request this as a feature, could you please elaborate a bit on the scenario you are trying to solve? Specifically, is you manually using keep and drop arguments to fine-tune the table? Or are you trying to automate something and it’s hard to keep changing that automation script that breaks whenever the metrics schema changes, etc.

Also, btw, have you see the VS Code extension - it might help potentially to deal with the table as well in certain cases.

julianotusi · December 22, 2022, 9:53am

Thanks for the reply!

I can’t discuss certain details, but here’s an approximation of my use-case. Imagine you’re building a machine learning model for analyzing eSports matches (like Overwatch, counter-strike, etc). The model looks at a video stream for a game and predicts some interesting things about the match. Since we’d like to build support for multiple games, we set up our dvc pipeline with foreaches over each of the games, including the final evaluation of the models.

The outputs of the model and the metrics used in evaluation are game-dependent, and there might be homonymous outputs/metrics among games. For example, maybe in counter-strike the important metrics are number_of_headshots.accuracy (measuring how accurate the model is in predicting this number) who_planted_the_bomb.cross_entropy, and other things. Then maybe in Overwatch we’re also predicting number of headshots, so we also want a metric number_of_headshots.accuracy, but the other metrics are maybe ult_was_used.precision, or other things.

Since each evaluation stage should have different outputs, we keep multiple metrics files named <game>_metrics.yaml (generated by the corresponding evaluation stage for that game). So in the end the behavior we get from dvc is that certain metrics will have a name like counter-strike_metrics.yaml:number_of_headshots.accuracy and overwatch_metrics.yaml:number_of_headshots.accuracy (since there was a conflict), and others will have a name like ult_was_used.precision (since overwatch is the only game with that metric).

Now let’s say you train models for all your games in multiple experiments, and you want to compare performance among them. In specific, you want to know if experiment-A is better than experiment-B in terms of their performance in predicting headshots for counter-strike (that is, the number_of_headshots.accuracy metric inside counter-strike_metrics.yaml). I would do a dvc exp show that drop all tables but the one with this metric. But the problem is: how do you know if the metric name is just <metric_name> or if it is <game>_metrics.yaml:<metric_name>? You would have to have in mind all the other games and the metrics used there to figure this out, which is what I’m trying to sidestep. Sure, you could just have a regex that catches both cases, but that seems unnecessarily complicated, and the resulting table would have column names with different naming conventions if you want to see multiple metrics at once.

Finally, I’m a Pycharm user, so unfortunately can’t use the extension. But I would love to see an extension for it too .

shcheklein · December 22, 2022, 9:23pm

Thanks for a really comprehensive explanation. That helps a lot. To summarize, the way I understand is - yes, you are running CLI command (dvc exp show) with different filters to see different slices of all the metrics. And depending on the project layouts there is not single rule to nicely describe the filter. And it’s not about any kind of automation, etc (you run dvc exp show on CI for example), right?

In this case, would it be possible for you to say run it first with a simple regexp, see certain metrics (some of them extra). Then clarify regexp in the next command?

I’m trying here to better understand how important it is, how often the project changes in a way that you don’t know the metrics now, or theirs names are prefixed now.

julianotusi · December 23, 2022, 1:34pm

Right, running CLI command, no single rule to describe the filter, and no automation atm.

I didn’t understand this. Do you mean piping dvc exp show to a grep?

Atm I’m leaning towards appending <game> to all the metric names. This way I get no more conflicts, at the expense of longer metrics names. I think that’s fine by me, but do you see alternatives?

Topic		Replies	Views
Multiple pipelines with single metric file Questions	3	1206	September 26, 2019
Need guidance with a use-case of benchmarking models Questions	2	232	January 30, 2024
How to handle general metadata without experiments? Questions	6	775	February 18, 2021
How to track experiment metrics across different machine learning models? Questions	1	114	June 21, 2024
Statistical significant stage best practice Questions	9	878	June 9, 2021

How to deal with multiple metric files that may define homonymous metrics?

Related topics