Workflow on slurm-like clusters

Gotcha. I’ll check if other team members have specific tips for those tools.

I know that DVC and MLFlow can happily coexist. But it you mainly use the latter for tracking/visualizing results, and since you’re using GitLab, you may be interested in our sister project CML (for this or other projects).

This seems like a good area where DVC can help. I’m just not sure about how to integrate it with the existing tools you’re using for hyperparams and metrics (DVC has it’s own solutions for that), you’d have to play around with it a bit — feel free to send us follow-up questions here or at dvc.org/chat.

But the tracking of large files/dirs is one of the core features of DVC.

Convenient! With DVC, you could use a “shared server” approach (variation): construct the pipeline stages using external dependencies to the data in ~/, since you know it will be there at run time. Locally you would need dummy or sample data files with the same file names to test your code.

Another option is to use external outputs — if you want to track changes in those data files/dirs.

1 Like