Link types and run options

Hello,
I have questions about the different link types. I’m mainly working on windows at the moment.

  1. What is the default link type for windows?
  2. While I am training a new model (takes some time), I want to work on the old model. To do this I added the flag --outs-persist for this. Is this a good a idea in general? If I change the link type now to symlink, hardlink will this work as expected?
  3. I also track the training with mlflow. To add the mlruns folder as output to the training stage, I again added the --outs-persist flag. The main reason for this is, that I want to share the mlflow results with others. This works quite well at the moment. What happens if I change the link type now? Will this break something?

Thank you

Hi @jimmy

All DVC projects have the same default: reflink,copy (tries reflinks first and then defaults to a copying strategy) but as Windows doesn’t support reflinks (last time I checked) then it’s just copy in effect. Win does support symlinks and hard links on some file systems, so you can try enabling those. See cache.type in config.

Let me double check about the persistent outputs before I can give you a complete answer. For now you may be interested in the discussions here

It’s a good idea if you’re just using the old model (e.g. load it to memory), but keep in mind that it can be deleted, modified, or overwritten by the stage commands repro executes.

Nothing bad should happen if you change from copy to *link now, it just means the next time repro finishes, the model file in the workspace will be rewritten as a link to its cached version.

BTW another way to play with the previous version is to find the file in the cache and use that directly, but that’s tricky/hacky… Unless that version was committed to Git, in which case something like dvc get . model --rev HEAD^ -o model.old would do the trick (see https://dvc.org/doc/command-reference/get).

To be safe, please use dvc checkout --relink after you change the cache link type.

Thank you, this is already good news. A reason why I want to change the link type is, that I want to use git worktrees. In order to avoid data duplication, I set the cache dir from the worktree to the cache dir of the normal project. This should then work I guess.

2 Likes