Hi everyone! First question - How to point multiple projects to single dataset?

stauntonjr · February 17, 2021, 2:32am

I stumbled upon dvc over the weekend after discussing with my colleagues our need for just such a thing. I got it working in some test repos with data on our private s3 buckets. Very cool! I really appreciate the documentation, it’s very well put-together.

One thing I’m not sure of is:

If I have a dataset on a remote server and I have several different repos/projects that use it, what is the appropriate way to point them all to that dataset and/or a single dvc remote representation thereof? (Do I need in each repo/project to download it, then dvc add it, and assign the dataset to the same remote?)

thanks!
Rory

kupruser · February 17, 2021, 3:21am

Hi @stauntonjr !

One way to handle it would be to put it into a data registry and then just dvc import it in other projects https://dvc.org/doc/use-cases/data-registries . Would that work for you?

naama · February 17, 2021, 2:24pm

Hi, I have the same problem.
if I use data registry as you suggested, and then dvc import it to the other projects,
the data will be duplicated and downloaded to each project? so I will have multiple copies of the same data in different projects, in the same server?
thanks!
Naama

kupruser · February 17, 2021, 3:02pm

Hi.

dvc imported data is not affected by dvc push, so it will only be stored in the data registry remote with no duplication.

naama · February 17, 2021, 4:25pm

thanks,
what do you mean by " is not affected by dvc push?
I have a data registry, and another code project.
after using dvc import in the code project, the data is downloaded + its .dvc file to the project folder. this data doesn’t take space in my server? is it use links?

kupruser · February 17, 2021, 4:31pm

After using dvc import, the data is downloaded locally, it’s .dvc is created, but dvc push won’t push it to remote. So next time when you run dvc pull, it will download it from the data registry it was dvc imported from, so it won’t take space in this project’s remote.

Topic		Replies	Views
A separate data-registry for each dataset or combine them into one? Questions	1	384	July 23, 2022
Workflow for pulling data added to project using dvc import Questions	3	116	May 8, 2024
Understanding data registries and remotes General	3	312	January 22, 2024
First steps with DVC, a few questions Questions	2	64	September 20, 2024
Dataset in another repository Questions	4	55	March 26, 2025

Hi everyone! First question - How to point multiple projects to single dataset?

Related topics