Output with timestamp in it's name

Thytu · October 29, 2021, 10:59am

Hello everyone,

When running dvc run Is there any way to precise an output with an unknown name?

Here is my situation: I have a step named train_model which will create a folder named in this format models/{timestamp}/model.py and another stage optimize_model which will use this output to create models/*/model_freezed_jit.pt where * is the timestamp.

Because I can’t precise the specific folder name, dvc can’t find the output : failed to reproduce 'dvc.yaml': output 'models/*/model.pt' does not exist.

Is there any way to do that?

stages:
  train_model:
    cmd: python src/train_model.py
    deps:
    - data/formated/
    - src/train_model.py
    outs:
    - models/*/model.pt:
        cache: false
        persist: true
  optimize_model:
    cmd: python src/model_optimizer.py
    deps:
    - models/*/model.pt
    outs:
    - models/*/model_freezed_jit.pt:
        cache: false
        persist: true
    - models/*/half_model_freezed_jit.pt:
        cache: false
        persist: true
    - models/*/quant_model_freezed_jit.pt:
        cache: false
        persist: true

Paffciu · October 29, 2021, 12:26pm

@Thytu I am afraid that this exact functionality is not available.
I am not sure it would fit your setup but maybe specifying whole models dir as output would help you?
Can you share some more context for your project? Why do you need to specify timestamp in outs?

Thytu · October 29, 2021, 12:52pm

@Paffciu Thanks for your reply!

train_model save a torch model model.pt.
optimize_model will load this model and convert it to graph mode as model_freezed_jit.pt

Because I want to keep track of my last experiments, I do not want to erase the previous model.pt at each experiment (this is why persist is set to true) so I need to change my model path (or name) at each experiment.

So at each experiment, I create a new folder with a new name (this is why I use timestamp) to store my new models.

Regarding why can’t I just precise models as output, because optimize_model would have models in deps AND in outs which produce an error.

If you have any idea how to solve this, it would be amazing!

Paffciu · October 29, 2021, 2:09pm

@Thytu
Hmm this use case seems to be covered by the experiments functionality (dvc exp, https://dvc.org/doc/command-reference/exp#exp)

Basically, when tinkering with your project and exploring different paths that it might take, you set up your project in a way that your stage has a single output (lets name it model.pt). Then you modify something with your baseline project (eg modify train code, or change parameters tracked by dvc). Then you use dvc exp run {stage-name} and dvc takes care of storing the changes (both to the code and the produced model.pt). You iterate until you are satisfied with the result, and choose the best experiment with dvc exp apply to “promote” best result as, for example, new baseline.
That way you don’t have to create output dir with complicated structure, but have a simple repository, and
DVC takes care of tracking the experiments.

I would recommended going through experiments “get started” in order to get a grasp on how this works:

Thytu · October 29, 2021, 2:18pm

@Paffciu
Thanks for your advice, I will to that

Paffciu · October 29, 2021, 2:24pm

Just to point out one thing:
I mentioned that you specify single output (model.pt) as a contrast to complicated outputs dir with timestamp. It doesn’t mean that experiments can handle only single output

Topic		Replies	Views
Failed to reproduce 'train': output 'model.pt' does not exist Questions	0	370	February 13, 2023
How to retrieve output files from dvc pipeline Questions	2	513	November 4, 2021
Same output for different pipelines Questions	8	130	November 22, 2024
DVC-hash and PyTorch files Questions	2	18	June 10, 2025
DVC run and add: store command and data Questions	2	655	August 16, 2018

Output with timestamp in it's name

Related topics