Is it possible to only pull/get a subfolder from a existing repo

Hi,

I am very new to dvc and I am in the process of setting up all existing datasets that we have using dvc, so we can all share the data efficiently in a meaningful way.

The dvc has been setup successfully using network drives as data storage. I have created a repo, which contains several smaller datasets. Each dataset is placed in a subfolder within a directory, which has been pushed to the save location on the network drive using dvc push as one repo.

My question is, is it possible to only pull one dataset (one subfolder) within this repo? Later we will be applying different transformation/data augmentation techniques on these datasets to create new datasets. If we can’t pull subdirectories within the same repo, does this mean we need to create new repos instead?

Thanks,

You can use dvc pull on subfolders (or individual files) of a DVC-tracked directory. So if you have something like:

dvc add dataset/
dvc push

You can do:

dvc pull dataset/some/subdir/

to only pull the specific data you are interested in.

Hi I followed your instructions, for example:

dvc add data/
dvc push

Then I removed by local copy by:
rm -r data/original/dataset1

Then I try to get it back from my remote copy by:
dvc pull data/original/dataset1

The response I get was : Everything is up to date.
But dataset1 was still missing from the data/original directory.

What am I doing wrong?

Thanks,

Can you please run

dvc doctor

and then post the output here? It sounds like you might be using an outdated version of DVC

Here it is:

Platform: Python 3.8.12 on Linux-4.15.0-166-generic-x86_64-with-glibc2.17
Supports:
webhdfs (fsspec = 2022.1.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: xfs on /dev/sda3
Caches: local
Remotes: local
Workspace directory: xfs on /dev/sda3
Repo: dvc, git

Ok, I think I know why. I had to install dvc[gs] package after pip install dvc.

I think the instruction should be a bit clearer from this page: https://dvc.org/doc/install/linux
Or make the error messages a bit clearer. Just a suggestion.

Anyway, thanks for your help.