What type of protocole for huge dataset

xavier · March 23, 2021, 4:34pm

Hello,

I want to use dvc for my dataset versioning. My dataset can go up to 10Go and I want to setup my protocole to retrieve my data.
I see that protocole to be able to retrieve data:
(https://dvc.org/doc/user-guide/managing-external-data)

Amazon S3
SSH
HDFS
Local files and directories outside the workspace

What protocole should I use to be able to pull that quantity of data?

kupruser · March 23, 2021, 5:54pm

Hi @xavier !

What do you mean by retrieve? dvc pull? The article you’ve linked talks about an experimental workflow, which we don’t recommend using (there is a note about add/import at the top of it, as that is what people usually are looking for).

xavier · March 23, 2021, 7:52pm

Yes it is dvc pull. What is the best way to manage huge dataset? Is it a good solution to use DVC?

Topic		Replies	Views
Access remote data instead of downloading it Questions	8	667	March 3, 2023
Best practice for handling large data Questions	5	2602	April 16, 2021
Maximum data size Questions	4	569	November 29, 2023
Is it Possible to train data in s3 bucket without downloading to local machine with DVC? Questions	11	779	March 22, 2023
Working on remote data Questions	1	144	April 29, 2024

What type of protocole for huge dataset

Related topics