What type of protocole for huge dataset


I want to use dvc for my dataset versioning. My dataset can go up to 10Go and I want to setup my protocole to retrieve my data.
I see that protocole to be able to retrieve data:

  • Amazon S3
  • SSH
  • HDFS
  • Local files and directories outside the workspace

What protocole should I use to be able to pull that quantity of data?

Hi @xavier !

What do you mean by retrieve? dvc pull? The article you’ve linked talks about an experimental workflow, which we don’t recommend using (there is a note about add/import at the top of it, as that is what people usually are looking for).

Yes it is dvc pull. What is the best way to manage huge dataset? Is it a good solution to use DVC?