Lock error with parallelized dvc repro

I have a pipeline with many parallel stages. I use a combination of a (generated) params.yaml file and foreach approach. See below for simplified examples of dvc.yaml, params.yaml, and dvc.lock.

I wrote a simple scheduler in Python to run the stages with dvc repro in parallel using multiprocessing. Essentially, it’s running running dvc repro setup@sim1, dvc repro setup@sim2, etc. in parallel.

From what I understand this should work. However, I’m getting lock errors: “ERROR: Unable to acquire lock. Most likely another DVC process is running or was terminated abruptly.”

dvc.yaml:

stages:
  setup:
    foreach: ${sims}
    do:
      cmd: python my_script.py --simlabel ${item.label}
      deps:
        - my_script.py
      outs:
        - ${item.work_dir}/outputs.txt

params.yaml

sims:
	sim1:
		label: label1
		parameters:
		  param1: 1
		  param2: 2
		work_dir: label1_workdir
	sim2:
		label: label2
		parameters:
		  param1: 1
		  param2: 2
		work_dir: label2_workdir

dvc.lock:

schema: '2.0'
stages:
  setup@sim1
    cmd: python python my_script.py --simlabel label1
    deps:
    - path: my_script.py
      md5: 614d9f23b42a36de65981aa76026600e
      size: 17401
    outs:
    - path: label1_workdir/outputs.txt
      md5: eaf7cdf7540bb96a6307394ad421ad7b
      size: 4412
  setup@sim2
    cmd: python python my_script.py --simlabel label2
    deps:
    - path: my_script.py
      md5: 614d9f23b42a36de65981aa76026600e
      size: 17401
    outs:
    - path: label2_workdir/outputs.txt
      md5: eaf7cdf7540bb96a6307394ad421ad7b
      size: 4412
1 Like

Hi @maartenb sorry for late response,
So the problem here is that both stages (while different due to parameters use) use the same dependency:

  • my_script.py

dvc is locking it and hence you get an error. So this is a little bit different use case than the one linked by you in your post, where the pipelines are separate completely.
I am afraid there is no straightforward workaround for that, besides creating several copies of the script and naming them differently, though that would probably not be very elegant.

Having said so, you might be interested in exp command, which should allow you to achieve your goal:

You could get parallel execution with the following commands:

dvc exp run setup@sim1 --queue
dvc exp run setup@sim2 --queue
dvc exp run -j {2}

Please read on exp and share your thoughts on whether this is something you might find useful.