Variables in dvc-file

Hi,

Is it possible to use variables in DVC files. This would help me to keep track of repeated values. Let’s take as an example, the Dvc-file from the docs:

INPUT=input.data
OUTPUT=output.data
always_changed: true
locked: true
cmd: python cmd.py $INPUT $OUTPUT metrics.json
deps:
  - md5: da2259ee7c12ace6db43644aef2b754c
    path: cmd.py
  - md5: e309de87b02312e746ec5a500844ce77
    path: $INPUT
md5: 521ac615cfc7323604059d81d052ce00
outs:
  - cache: true
    md5: 70f3c9157e3b92a6d2c93eb51439f822
    metric: false
    path: $OUTPUT

Thanks for help!

Bartosz

1 Like

Hi @btel,

At the moment I think the closest you could get is to manually edit the DVC-file after creation. Something like this:

always_changed: true
locked: true
cmd: "INPUT=input; OUTPUT=output; python cmd.py $INPUT $OUTPUT metrics.json"
deps:
  - md5: da2259ee7c12ace6db43644aef2b754c
    path: cmd.py
  - md5: e309de87b02312e746ec5a500844ce77
    path: input
md5: 521ac615cfc7323604059d81d052ce00
outs:
  - cache: true
    md5: 70f3c9157e3b92a6d2c93eb51439f822
    metric: false
    path: output

But I think this probably defeats the point of what you’re trying to achieve. In fact, I’m not sure I understand the usefulness of having vars in DVC-files, what do you mean by “keep track of repeated values”? You mean for better readability of the DVC-file?

How do you envision providing these values to dvc add or dvc run? Or would it be a trick only for manually edited DVC-files?

Yes, readability is one reason. However, not the most important one. In my current workflow, I am often editting the files manually. For example, to rename the targets, or add more steps to the pipeline. It happens from time to time that I forget to change the name of the file in the deps or in the cmd, with all the mess that arises.

Second use case, is automatically generating the DVC files from a template. This is useful, when you need to train models for different subset of your data (think for example, different user groups etc.). The DVC files might be identical except the names of deps/target.

2 Likes

OK. I think we want DVC-files to be easy to edit for these kind of situations as well as for advanced users, so your use cases seem reasonable to me, even if vars were only supported for manually edited/generated stage files. However for now the answer is no, we don’t support this. Also, the format would probably look a bit different if we did (valid YAML).

Do you mind opening a feature request for this in https://github.com/iterative/dvc/issues/new/choose for the engineering team to look into it? Thanks!

p.s. in fact this is already mentioned as part of existing issue https://github.com/iterative/dvc/issues/2437 (see points 2. and 6. in long description) so you can chime in there instead if you prefer.

1 Like

@btel Also I think this thread and discussion is highly relevant in this case - https://github.com/iterative/dvc/issues/1462 . It would be really great if you could left a comment there describing your use case.

1 Like