Workflow on slurm-like clusters

jorgeorpinel · September 22, 2020, 4:25pm

Gotcha. I’ll check if other team members have specific tips for those tools.

I know that DVC and MLFlow can happily coexist. But it you mainly use the latter for tracking/visualizing results, and since you’re using GitLab, you may be interested in our sister project CML (for this or other projects).

This seems like a good area where DVC can help. I’m just not sure about how to integrate it with the existing tools you’re using for hyperparams and metrics (DVC has it’s own solutions for that), you’d have to play around with it a bit — feel free to send us follow-up questions here or at dvc.org/chat.

But the tracking of large files/dirs is one of the core features of DVC.

Convenient! With DVC, you could use a “shared server” approach (variation): construct the pipeline stages using external dependencies to the data in ~/, since you know it will be there at run time. Locally you would need dummy or sample data files with the same file names to test your code.

Another option is to use external outputs — if you want to track changes in those data files/dirs.

Topic		Replies	Views
Data (registry) and remote GPU cluster with local DVC repositories Questions	6	717	July 5, 2022
DVC Heartbeat - Discord gems Announcements	3	4165	June 27, 2019
DVC compared with GitLFS for storage and versioning only Questions	12	6942	October 13, 2020
Statistical significant stage best practice Questions	9	889	June 9, 2021
Best practice for hyperparameters sweep Questions	8	2819	November 20, 2019

Workflow on slurm-like clusters

Related topics