Sure. Let’s try to setup a MWE together:
# Mock up the setup
# In reality user1 would be something like `/home/user1`, cache better
# to be located on the same volume as `/home` and data (which is `NAS/temp_registy/Dogs`)
% mkdir example-shared-cache
% cd example-shared-cache
% mkdir data
% mkdir user1
% mkdir user2
% echo "dog1" > data/dog1.txt
% echo "dog2" > data/dog2.txt
% mkdir cache
% cd user1
% mkdir project
% cd project
% git init
# Initialize DVC repo with a remote cache and enable all possible links to avoid copies
% dvc init
% dvc config cache.type "reflink,symlink,hardlink,copy"
% dvc cache dir /Users/ivan/Projects/example-shared-cache/cache
% git add .dvc/config
# Now we add the data finally. See it here https://dvc.org/doc/command-reference/add#example-transfer-to-an-external-cache and here https://dvc.org/doc/command-reference/add#-o
% dvc add ../../data -o data
% ls
data data.dvc
% git add .gitignore data.dvc
% git commit -a -m "add data"
% # git push should go here to GH/GitLab/etc
# Now the second user comes ...
% cd ../user2
% git clone ../user1/project # in reality it would a clone from GH/GitLab or something
% cd project
% ls
data.dvc
% dvc checkout
% ls
data data.dvc
Now, let’s say we’d like to add one more dog
into the initial dataset:
% echo "dog3" > example-shared-cache/data/dog3.txt
To update the data in the repository, do this:
% cd user1/project
# This is a bug that you have to remove these, I'll create a ticket
% rm -rf data data.dvc
% dvc add ../../data -o data
% git add data.dvc ...
% git commit -m "update data"
% git push
There is an important caveat to keep in mind. Check the cache.shared
option and configure it appropriately as well in the initial setup if it’s needed.
An interesting alternative to using dvc add ... -o
is to use dvc import-url
like this:
% dvc import-url /Users/ivan/Projects/example-shared-cache/data data
It’s similar to the dvc add ... -o
but also saves in the .dvc
file the source /Users/ivan/Projects/example-shared-cache/data
, and later you could use dvc update data.dvc
or something like this to update your data.
Another alternative is to setup an extra data registry repo as @pmrowla mentioned.