Untracked vars/parameters

Hi all,

How can I parameterize a cmd in a stage so that changing the parameters would not cause DVC to re-run the stage on repro?

Let’s say I have an optional local service which may or may not be activated in the instance I run experiments in. I want to add the following parameter (or variable) to the DVC:

service_url: "http://url:port"

and use it my pipeline:

...
    cmd:
      - service_url=${service_url} scripts/script.py # pass URL to the script as an env variable

But I would like to be able to change this variable without invalidating the cached outcomes of the stage. In other words, I want this variable to be completely transparent to DVC so that it thought that outcomes do not depend on the variable (they in fact do not depend on it! The service just consumes some side effects of the script.py, and do not change the generated output).

Okay, so basically I found what I was looking for. The section about variables mentions that

To use the expression literally in dvc.yaml (so DVC does not replace it for a value), escape it with a backslash, e.g. \${....

So I could just use usual env variables in dvc.yaml to parameterize my scripts in the untracked way.

1 Like