Improved parallel execution documentation

gregstarr · November 28, 2023, 2:01am

Hello,

It seems like proper parallel stage execution is in the works so this might be eventually OBE. At the moment the way to run stages in parallel (when they have common dependencies) is to run them as experiments then merge them back together. This process is a bit of a pain so I am looking forward to proper parallel execution. However, this workaround could be improved with better documentation. I have used this technique two times to train 10 components of an ensemble model. The first time, all the dvc.lock files got jumbled/reordered so the merge was pretty involved. I ended up manually extracting the sections of the dvc.lock files from the specific stages I knew ran in each experiment branch and merging them into a common lock file. This ultimately worked but was an arduous process. The second time I ran it I did an octopus merge and everything combined seamlessly and easily. I’m not sure exactly what I did the second time to achieve this result so I think some good documentation on how to go through this process would be great.

kupruser · November 28, 2023, 9:05pm

Great points. Could you create an issue on GitHub - iterative/dvc.org: 📖 DVC website and documentation , please? Also contributions to docs are welcome, if you are keen.

gregstarr · November 28, 2023, 9:51pm

I would contribute but I need to understand what happened to make things go smoothly the second time. Was it the fact that I used octopus merge of all experiment branches together simultaneously rather than one at a time?

kupruser · November 29, 2023, 8:51am

Oh, I see. Unfortunately, I can’t explain that at this moment neither.

Topic		Replies	Views
Running multiple dvc pipeline in parallel Feature Requests	5	4195	December 12, 2019
Lock error with parallelized dvc repro Questions	7	939	November 13, 2023
Multiple dvc runs in parallel Questions	10	2094	March 11, 2022
Dvc experiments multiple branches workflow Questions	0	666	March 17, 2022
Versioning predictions Questions	7	949	February 10, 2021

Improved parallel execution documentation

Related topics