I am very new to dvc and I am in the process of setting up all existing datasets that we have using dvc, so we can all share the data efficiently in a meaningful way.
The dvc has been setup successfully using network drives as data storage. I have created a repo, which contains several smaller datasets. Each dataset is placed in a subfolder within a directory, which has been pushed to the save location on the network drive using dvc push as one repo.
My question is, is it possible to only pull one dataset (one subfolder) within this repo? Later we will be applying different transformation/data augmentation techniques on these datasets to create new datasets. If we can’t pull subdirectories within the same repo, does this mean we need to create new repos instead?
Ok, I think I know why. I had to install dvc[gs] package after pip install dvc.
I think the instruction should be a bit clearer from this page: https://dvc.org/doc/install/linux
Or make the error messages a bit clearer. Just a suggestion.