Hi,
I want to use DVC to version a large dataset (does not fit in local storage) folder stored in a S3-like database.
In my git repo, I used the following command to track my remote dataset:
dvc import-url --no-download remote://minio/dataset_v1.0 data
So, I have a “dataset_v1.0.dvc” in my local “data” folder.
Now, I want to write a python script that load and transform this dataset (several subdirs containing files).
I am a bit confused about the path I should use in my Python code to open such files.
e.g. Using “open(“dataset/dataset_v1.0.dvc/some_subdir/some_file.csv”, “r”)” does not seam to be a good idea.
What should I do ?