Switching between virtual Python environments within `cmd`

Jonas · December 14, 2021, 1:13am

Hello,

We are using Poetry to manage Python environments. It is generally possible to invoke a python script B to run in a given Poetry environment from within python program A. I have so far done this for instance via issuing shell commands from within a Python program. Like This:

Given this structure with two differnt envs:

folder_a

prog_a.py
pyproject.toml
poetry.lock

folder_b

prog_b.py
pyproject.toml
poetry.lock

# within prog_a.py
import subprocess
cmd = "cd folder_b; poetry run python prog_b.py"
subprocess.run(cmd, shell=True, check=True)

This worked without problem.

Now, when I try the same from DVC it unfortunately does not work. Say I now have the following setup:

folder_a

dvc.yaml
pyproject.toml # has DVC installed
poetry.lock

folder_b

prog_b.py
pyproject.toml
poetry.lock

If I specify a cmd in a dvc.yaml like this: cmd : "cd abs/path/to/folder_b; poetry run python prog_b.py", then upon dvc repro ... I expected prog_b to be run in the Poetry env from folder_b. Instead it is run in the env from folder_a (where DVC is installed).

There seems to be no way to escape this. I have even tried instead calling a wrapper through cmd that does the exact same subprocess call as in the first example (meaning it gets the full rest of the command as string, including the cd…). Still, no luck, the env stays the same.

Does anyone have any clue why this is the case? How does DVC execute cmd? In some special way that prevents running a process in a new env? I have tried a lot of variants of DVC / Poetry configs, to no avail.

Alternatively I appreciate suggestions how else to deal with the need to execute different stages in different environments.

Thanks lots and lots.

Best

Jonas

Jonas · December 14, 2021, 10:07am

After some more experimentation it appears that it has probably nothing to do with DVC, so please disregard the question about how DVC calls cmd.

I’d still be interested in alternative suggestions how to handle separate virtual environments for the stages in the context of DVC.

Paffciu · December 14, 2021, 12:14pm

Hello @Jonas!

I am glad you found solution to your problem.
Regarding the execution from different envs in your stages:
I think there are few ways you could approach this problem:

Use particular python interpreter directly, eg:
~/.virtualenvs/my_env/bin/python my_script.py
You could leverage subshell, example for UNIX:
(source ~/.virtualenvs/my_env/bin/activate && which python) - putting parentheses around command launches subshell which will be exited after execution

One thing to keep in mind is that we need a mechanism verifying if a particular env changed and run might need retriggering. You can specify the python directory as a dependency, but with bulky setups calculating checksums for that might take some time. I presume that you probably don’t want to keep a virtualenv in DVC cache, so that might be a potential drawback.

Jonas · December 14, 2021, 2:13pm

Hi @Paffciu,

thank you! I will try this for sure.

Regarding the change monitoring of envs. I address this by automatically cloning version-tagged commits of the pipeline components, and defining the code as dependency. The code includes a poetry.lock file which defines the environment. So it is detected when this file changes (which incidentally happens only when the version changes).

Jonas · December 20, 2021, 4:21pm

@Paffciu Hi! Just to wrap this up and report back. I am now calling the interpreter directly as you suggested in your first point. It works very nicely. It also feels more robust thann cd-ing to the dir and then running poetry run .... Thanks again!

(Side note: I saw that DVC also simply uses the subprocess library to call the commands, so there should be no difference in principle between DVC and calling processes from other Python programs.)

Topic		Replies	Views
Python virtual environment Questions	1	598	May 23, 2023
Adjusting absolute paths in stage command w/o triggering re-run Questions	4	765	December 13, 2021
ModuleNotFoundError while following the DVC course Course 1: Iterative Tools for Data Scientists	2	1427	May 16, 2023
Dvc exp run and pythonpath Questions	2	635	May 12, 2023
Can't use dvc.yaml in subfolders Questions	2	174	July 1, 2024

Switching between virtual Python environments within `cmd`

Related topics