Need Help -- Fail to pull data when updating data with DVC v3 but Initialize data with DVC v2

Hi,

I have a DVC repo and my data is stored on S3. Right now I have two versions of the data. One is data v2.0.0, one is data v3.0.0.

  • The data v2.0.0 might be added by the DVC v2. Its link is: s3://<my-bucket-name>/dataset-registry/cache/09/<the-number>

  • The data v3.0.0 is created recently by the DVC v3. its link is: s3://<my-bucket-name>/dataset-registry/cache/files/md5/81/<the-number>

  • Now, I have the DVC v3. I can pull the data v2.0.0 by dvc.api.get_url given the repo, path and version. But I failed to pull the data v3.0.0 by dvc.api.get_url given the repo, path and version. Because, for data v3.0.0, it will always point to a false path: s3://<my-bucket-name>/dataset-registry/cache/81/<the-number>. But can’t go to the correct path: cache/files/md5/...

  • Here is my dvc version information:

DVC version: 3.42.0 (pip)
-------------------------
Platform: Python 3.8.18 on Linux-4.14.330-250.540.amzn2.x86_64-x86_64-with-glibc2.10
Subprojects:
        dvc_data = 3.8.0
        dvc_objects = 3.0.6
        dvc_render = 1.0.1
        dvc_task = 0.3.0
        scmrepo = 2.0.4
Supports:
        http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2024.2.0, boto3 = 1.34.34)

and my .dvc/config has the remote link:

[core]
   remote =s3cache
['remote "s3cache"']
  url = s3://<mu-bucket-name>/dataset-registry/cache

I feel this might be related to DVC version. But I am not very familiar with the structure of DVC. Could anyone share any ideas for how I can fix this. Thanks!