List all remote paths of tracked files

For a given commit, I’d like to get a list of the local paths & corresponding remote paths of all files tracked by DVC, possibly limited to a certain directory. Is there not an easy way to do this?

Currently looking at a workaround involving “dvc list -R --dvc-only --rev …” to get all tracked files, reading the md5 out of the corresponding .dvc files, and then using those to build the remote paths. Doesn’t seem very elegant.

Hi @jmiller! Might I ask what is the final purpose of this operation?

There is not really a straightforward way to get all the remote paths of a repo, probably what you are already doing would be the easiest way.

Usually, you would be accessing those files directly with either dvc get or the python API dvc.api.read.

Not sure if this can help but you can use dvc get --show-url to “print the storage location (URL) of the target data”. Alternatively, you can use the python API dvc.api.get_url.

Thanks! Possibly I’m not using dvc for its intended purposes. I want to load the data, as it was at a certain commit, into a cloud database and/or data visualization tool. Doing this directly with the remote copy of the data (s3) seems to make the most sense, because it doesn’t require me to upload another copy of the data to the cloud first.

Currently my repo has local cache but eventually I’d like to have all data stored in the cloud, not in a local workspace or cache at all, since the data is quite large.

From Python API dvc.api.open:

makes a direct connection to the remote storage (except for Google Drive), so the file contents can be streamed.

So using the API won’t create a copy of the data, but dvc get will.