Does DVC have any support for restricting access to the DVC repo in order to make sure data is not lost / corrupted?
The use case is this: We are a group of data scientists / developers working on the same data. Data is medical data. It is not allowed to leave our premises (i.e. it cannot be stored in the cloud). Data cannot be reproduced / recovered if lost. We are doing our day-to-day work on a number of machines (3 shared machines and our workstations). We are developing medical software so we are required to keep track of each and every change to the data and we must also be able to reproduce any software built 5 years back in time.
We currently have the following setup: All code is placed in a git repository. We have a number of DVC repos in our git repository. The DVC repos are all stored remotely on a NAS (with off-site backup). The nas is mounted in the same location on all machines. Remote storage is accessed via this mount point.
The problem is this: Since the remote storage is mounted on all machines and everyone needs to be able to dvc push to the remote storage, everyone needs to have write access directly into the DVC remote storage. This is very fragile. If someone accidentally deletes /mnt/nas/DVC all our data is lost. I guess the problem is the same even when storing data in the cloud.
Is there a way to restrict access to the remote storage such that only DVC can read/write the data?
I’m thinking: If DVC was running as a daemon with certain privileges (like the docker daemon) everything would be fine. Is something like this possible?