How to handle dynamic number of outputs?

I have a training script for a self-supervised model for which I would like to evaluate every single produced checkpoint (i.e. one per epoch) against various downstream tasks. This whole setup would be configured as a pipeline, with training and evaluation scripts corresponding to pipeline stages. Since training may stop at any point if loss doesn’t improve, the number of produced model checkpoints is not known beforehand.

How could this use-case be handled with DVC? I was thinking that perhaps the checkpoint files produced by the training stage could be defined as a wildcard output, and the downstream stages could then loop over that wildcard as well. But this pattern does not seem to be supported by DVC at the moment.

Is there any other way to accomplish this?

Hi @randhash .

The only way to handle it right now is to just put those files into a common directory and track that directory with dvc instead of tracking individual files.

Wildcards are not currently supported, because we need to deterministically determine the structure of your pipeline in order to be able to rerun it later.