I want to use dvc for my dataset versioning. My dataset can go up to 10Go and I want to setup my protocole to retrieve my data.
I see that protocole to be able to retrieve data:
(https://dvc.org/doc/user-guide/managing-external-data)
Amazon S3
SSH
HDFS
Local files and directories outside the workspace
What protocole should I use to be able to pull that quantity of data?
What do you mean by retrieve? dvc pull? The article you’ve linked talks about an experimental workflow, which we don’t recommend using (there is a note about add/import at the top of it, as that is what people usually are looking for).