Tracking data and code dependencies

Hello Ruslan,
thank you for your wonderful answer!

First, it’s wonderful to know that reflinks are coming, but I wonder if our platforms are supported. I searched on the Github issue tracker to read about it (Issue #280) but it is not clear to me if copy can be used instead (by default). I’d rather trade speed for safety on this, so it would be great if DVC could be configured to prefer reflink over copy over hard/softlink.

Regarding the auto-magical system to track new files: yes, I read about that somewhere and I agree the current system is better. Explicit is better than implicit :slight_smile:

On this, I must admit that, in some ways, I prefer the approach that Pachyderm is using: having a virtual filesystem where data is exposed, and applications are free to read/write from there to have automatic tracking of outputs. With DVC, docker is not necessary (as it is in Pachyderm) so a thin virtual filesystem layer would be sufficient, but I don’t know if this is portable (FUSE on Linux, BSD and Mac, but I don’t know what’s available on Windows).
I think that would be a nice alternative to how DVC is currently operating, but I understand it might take too much dev effort.

Regarding the tracking: thanks for letting me know. I’ll grok this, then ask again if something is still unclear :slight_smile:

Regarding our use case: it’s very similar to this, but it’s kind of complex as we have some requirements regarding authentication and data access. I think it’s better to start a new conversation, but I’ll take some time to learn more about DVC before doing that.

Thanks!
~Alessandro

1 Like