Raw data from google big query

hanan-vian · June 1, 2020, 5:17pm

so I have created a dvc to create a local raw data file based on a query of a gbq database.
now I want to add this file to dvc for tracking so if the database will not be available in the future I am fully reproducible.
When I do dvc add I get the following error:
data.csv is specified as an output in more than one stage:
raw_data.dvc
This is not allowed. Consider using a different output name.

What am I doing wrong?

shcheklein · June 2, 2020, 2:59am

Hi @hanan-vian!

Could you please share how does your raw_data.dvc look like and how did you generate it?

If you were using dvc run -o data.csv ... then data.csv is already tracked by DVC and you don’t need to add it explicitly with dvc add. In your case it looks like this is the most reasonable option.

hanan-vian · June 2, 2020, 6:24am

But how do I make the csv reach the remote storage?
Will I be able to repro without accessing the database?

hanan-vian · June 2, 2020, 6:25am

The dvc was generated as you said with explicit - o

shcheklein · June 2, 2020, 6:31am

But how do I make the csv reach the remote storage?

That’s what dvc push command is for (similar to git push, but handles DVC tracked data). You need to setup a remote storage first. Do you use Google Cloud Storage?

Usually dvc remote add - d myremote gs://bucket/path should be enough to being able to do dvc push.

Will I be able to repro without accessing the database?

Yes! The workflow usually looks like:

git clone <you-repo>
dvc pull
(after that you will see your data.csv)

Then:

dvc repro

if you’d like to reproduce. But if all outputs (including models) are saved into DVC this command should show “Nothing to reproduce” after successful dvc pull. There are explicit options to reproduce anyway.

Topic		Replies	Views
Dvc external output add after changing files data in remote is failing Questions	2	825	April 19, 2021
Dvc get error: Unable to find DVC file Questions	12	3271	June 20, 2021
Retrieve data after using dvc import, but deleting original git repo Questions	2	1572	November 1, 2019
DVC - can’t I track directly an S3 remote data? Questions	1	1283	July 12, 2019
DVC Heartbeat - Discord gems Announcements	3	4167	June 27, 2019

Raw data from google big query

Related topics