API call to read DVC file fails

How do programmatically get a file URI so I can read it? I’ve created a simple git repo with following steps from Katacode tutorial:

dvc init
dvc add file.txt
git add file.txt.dvc
git add .gitignore

Call to api.get_url returns a non-existent URI with GitHub - amesar/dbws2: ws2 and file.txt as arguments.
But call with GitHub - iterative/example-get-started: Get started DVC project and model.pkl works.

What am I missing? Would be super useful to add doc-strings to api.py. For example, what is the “rev” argument? Git commit hash?

repo = sys.argv[1]
path = sys.argv[2]
uri = api.get_url(path, repo=repo)
rsp = requests.get(uri)
with open(f"out/{path}", 'wb') as f:
    f.write(rsp.content)

Hi @Andrej!

The api module does have some docstrings since late Jan (see https://github.com/iterative/dvc/pull/3130/files). Maybe you are using an old version of DVC?

Call to api.get_url returns a non-existent URI with https://github.com/amesar/dbws2 and file.txt as arguments.

You should send file.txt as the first argument and the repo URL with repo="ps://github.com/amesar/dbws"

Does that work for you?


Keep in mind we also have a dvc for possibly faster replies to this kind of usage questions :slight_smile:

@Andrej api functions are designed to make it easier to work with remotes repos and especially remote cache of those repos. Say you have your repo at https://github.com/amesar/dbws2 and you added a remote and pushed some files to it:

# Add a remote named "s3" pointing to some amazon s3 bucket and path, 
# set it as default, mind `-d`
dvc remote add -d s3 s3://bucket/path

# Push any files added via `dvc add` or `dvc run` to default remote
dvc push

# On a fresh clone or on an outdated working copy,
# this will pull data from a dvc remote. 
dvc pull

After that your file.txt will be stored at s3://bucket/path/b4/1abaf44fdd5f41b0d7c57669c9109a and you can get it with:

aws s3 cp s3://bucket/path/b4/1abaf44fdd5f41b0d7c57669c9109a file.txt.copy 

To get those s3 url you may use dvc.api.get_url("file.txt", repo="https://github.com/amesar/dbws2") you can also use dvc.api.open() or dvc.api.read().

The reason all these doesn’t work for you is that your remote is not set properly, looking at your .dvc/config file I can see that you are trying to use git url as your remote, which won’t work. The idea behind dvc remotes is that you want to store your big files separetely from your git versioned things, i.e. some cloud storage, some server via SSH or simply in a separate local dir (bigger drive, network share or whatever).

I showed the basic usage, i.e. dvc remote add -d ..., dvc push and dvc pull higher, but you can read more about here:

The last ones goes about all the remote types we support.

BTW, we are still refining our api so you might keep your dvc version fresh. Doc-strings are also being discussed, so any input is welcome. I hope this will help you make it work, don’t hesitate to get back to us otherwise.

P.S. The rev argument in all api function is git revision, i.e. a branch name, a tag or a commit sha.

1 Like