Switching between virtual Python environments within `cmd`

Hello,

We are using Poetry to manage Python environments. It is generally possible to invoke a python script B to run in a given Poetry environment from within python program A. I have so far done this for instance via issuing shell commands from within a Python program. Like This:

Given this structure with two differnt envs:

folder_a

  • prog_a.py
  • pyproject.toml
  • poetry.lock

folder_b

  • prog_b.py
  • pyproject.toml
  • poetry.lock
# within prog_a.py
import subprocess
cmd = "cd folder_b; poetry run python prog_b.py"
subprocess.run(cmd, shell=True, check=True)                  

This worked without problem.

Now, when I try the same from DVC it unfortunately does not work. Say I now have the following setup:

folder_a

  • dvc.yaml
  • pyproject.toml # has DVC installed
  • poetry.lock

folder_b

  • prog_b.py
  • pyproject.toml
  • poetry.lock

If I specify a cmd in a dvc.yaml like this: cmd : "cd abs/path/to/folder_b; poetry run python prog_b.py", then upon dvc repro ... I expected prog_b to be run in the Poetry env from folder_b. Instead it is run in the env from folder_a (where DVC is installed).

There seems to be no way to escape this. I have even tried instead calling a wrapper through cmd that does the exact same subprocess call as in the first example (meaning it gets the full rest of the command as string, including the cd…). Still, no luck, the env stays the same.

Does anyone have any clue why this is the case? How does DVC execute cmd? In some special way that prevents running a process in a new env? I have tried a lot of variants of DVC / Poetry configs, to no avail.

Alternatively I appreciate suggestions how else to deal with the need to execute different stages in different environments.

Thanks lots and lots. :slight_smile:

Best

Jonas

After some more experimentation it appears that it has probably nothing to do with DVC, so please disregard the question about how DVC calls cmd.

I’d still be interested in alternative suggestions how to handle separate virtual environments for the stages in the context of DVC.

Hello @Jonas!

I am glad you found solution to your problem.
Regarding the execution from different envs in your stages:
I think there are few ways you could approach this problem:

  1. Use particular python interpreter directly, eg:
    ~/.virtualenvs/my_env/bin/python my_script.py

  2. You could leverage subshell, example for UNIX:
    (source ~/.virtualenvs/my_env/bin/activate && which python) - putting parentheses around command launches subshell which will be exited after execution

One thing to keep in mind is that we need a mechanism verifying if a particular env changed and run might need retriggering. You can specify the python directory as a dependency, but with bulky setups calculating checksums for that might take some time. I presume that you probably don’t want to keep a virtualenv in DVC cache, so that might be a potential drawback.

Hi @Paffciu,

thank you! I will try this for sure.

Regarding the change monitoring of envs. I address this by automatically cloning version-tagged commits of the pipeline components, and defining the code as dependency. The code includes a poetry.lock file which defines the environment. So it is detected when this file changes (which incidentally happens only when the version changes).

2 Likes

@Paffciu Hi! Just to wrap this up and report back. I am now calling the interpreter directly as you suggested in your first point. It works very nicely. It also feels more robust thann cd-ing to the dir and then running poetry run .... Thanks again!

(Side note: I saw that DVC also simply uses the subprocess library to call the commands, so there should be no difference in principle between DVC and calling processes from other Python programs.)