FmDo DVC always need a csv file? Mostly we have data in some cloud and there we establish connection and create our fata for training from various sources. Let’s sssume I have data in snowflake and from my pipelines i am pushing my code into snowflake infrastructure where all data split and training are happening. Now i want to introduce DVC, but not finding any clue on how DVC can help while my data is on cloud or warehouse ? Where I’ll install dvc library, on pipeline agent or snowflake infrastructure to get the snapshot saved in remote?
1 Like
Hi. DVC does not directly work with remote database connections, as it’s hard to provide reproducibility guarantees for SQL queries.
DVC does have this helper command: dvc import-db
that can help you out to export your db to a csv or a json records format and keep them updated via dvc update
. And your pipelines can rely later on that exported dependency.
1 Like
Are you hoping to run a DVC pipeline on Snowflake infrastructure? What problem are you hoping DVC will solve?
If you were trying to run some processes outside of Snowflake, I could imagine one DVC “always changed” stage to fetch a table version identifier, write that to a file and use it as a dependency for downstream stages.
1 Like