Hi,
I have a DVC repo and my data is stored on S3. Right now I have two versions of the data. One is data v2.0.0, one is data v3.0.0.
-
The data v2.0.0 might be added by the DVC v2. Its link is:
s3://<my-bucket-name>/dataset-registry/cache/09/<the-number>
-
The data v3.0.0 is created recently by the DVC v3. its link is:
s3://<my-bucket-name>/dataset-registry/cache/files/md5/81/<the-number>
-
Now, I have the DVC v3. I can pull the data v2.0.0 by dvc.api.get_url given the repo, path and version. But I failed to pull the data v3.0.0 by dvc.api.get_url given the repo, path and version. Because, for data v3.0.0, it will always point to a false path:
s3://<my-bucket-name>/dataset-registry/cache/81/<the-number>
. But can’t go to the correct path:cache/files/md5/...
-
Here is my dvc version information:
DVC version: 3.42.0 (pip)
-------------------------
Platform: Python 3.8.18 on Linux-4.14.330-250.540.amzn2.x86_64-x86_64-with-glibc2.10
Subprojects:
dvc_data = 3.8.0
dvc_objects = 3.0.6
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 2.0.4
Supports:
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
s3 (s3fs = 2024.2.0, boto3 = 1.34.34)
and my .dvc/config has the remote link:
[core]
remote =s3cache
['remote "s3cache"']
url = s3://<mu-bucket-name>/dataset-registry/cache
I feel this might be related to DVC version. But I am not very familiar with the structure of DVC. Could anyone share any ideas for how I can fix this. Thanks!