DVC and SNOWFLAKE setup

FmDo DVC always need a csv file? Mostly we have data in some cloud and there we establish connection and create our fata for training from various sources. Let’s sssume I have data in snowflake and from my pipelines i am pushing my code into snowflake infrastructure where all data split and training are happening. Now i want to introduce DVC, but not finding any clue on how DVC can help while my data is on cloud or warehouse ? Where I’ll install dvc library, on pipeline agent or snowflake infrastructure to get the snapshot saved in remote?

1 Like

Hi. DVC does not directly work with remote database connections, as it’s hard to provide reproducibility guarantees for SQL queries.

DVC does have this helper command: dvc import-db that can help you out to export your db to a csv or a json records format and keep them updated via dvc update. And your pipelines can rely later on that exported dependency.

1 Like

Are you hoping to run a DVC pipeline on Snowflake infrastructure? What problem are you hoping DVC will solve?

If you were trying to run some processes outside of Snowflake, I could imagine one DVC “always changed” stage to fetch a table version identifier, write that to a file and use it as a dependency for downstream stages.

1 Like