Track remote data on Azure

Hello DVC community!

I’m currently trying out DVC on Ubuntu 20.04 LTS, with Python 3.8.10, on WSL2 with Windows 10.
I am running into some seemingly inconsistent issues trying to track ZIP files in a remote Azure BLOB Storage container.

At least in the initial set-up, I am trying to track those files without downloading them to the local machine. Apologies if this is really a basic question but I a bit stumped at the moment.

I used “dvc remote add” and “dvc remote modify”, the generated config / config.local files in the .dvc folder look good. I added a valid connection string for the Azure storage account with those commands, then I also added that connection string to my environment with:

export AZURE_STORAGE_CONNECTION_STRING=‘myconnectionstring’

I am trying the following command:

dvc import-url azure://[mycontainer]/[myfile] --to-remote

The remote is another BLOB container in the same Storage Account where the files that I want to track are located.

Yesterday I managed to use the “import-url … --to-remote” command even though the .dvc files that were created locally landed in the wrong folder (my mistake with paths etc). Today I am simply getting the error:

ERROR: unexpected error - : ‘azure’

If I try to run:

dvc import-url azure://[mycontainer]/[myfile] ./path/to/local/folder/ --to-remote

The error I get is:

ERROR: failed to import azure://[mycontainer]/[myzipfilename]. You could also try downloading it manually, and adding it with dvc add. - bad DVC file name ‘path/to/local/folder/[myzipfilename].dvc’ is git-ignored.

Same error as the above appears if I move into the folder where I want the .dvc files to land and run the dvc import-url command from there, regardless of whether I do this with specifying the output file name or not.

EDIT: I ran it with -v and this is what I get:

2022-03-09 11:39:09,229 DEBUG: Lockfile for ‘dvc.yaml’ not found
2022-03-09 11:39:09,231 ERROR: unexpected error - : ‘azure’

Somehow there is no lock file, seems I am missing something super basic :frowning:

SECOND EDIT:
If I run the import-url command from the root project folder without specifying a path that is different from this project root, it works, with the “problem” that the local .dvc file is in the root and not in a subdirectory where I actually want it to be. I still get the “Lockfile for ‘dvc.yaml’ not found” debug msg but then “Computing md5 for a large file …” indicating that the command “works”.

I am getting more confused by the minute lol.
Any ideas how I can fix this?

Thanks a bunch in advance!

Looks like several different questions.

First, could you please provide some details about the first ERROR: unexpected error - : ‘azure’
You can get it from dvc import-url azure://[mycontainer]/[myfile] --to-remote -vv

Second, for dvc import-url azure://[mycontainer]/[myfile] ./path/to/local/folder/ --to-remote it looks like your ‘path/to/local/folder/[myzipfilename].dvc’ is git-ignored. To solve this , you need either modify your .gitignore to delete the pattern or switch to a different path.

2022-03-09 11:39:09,229 DEBUG: Lockfile for ‘dvc.yaml’ not found is only a DEBUG message, ‘dvc.yaml’ is not required in your case.

If I run the import-url command from the root project folder without specifying a path that is different from this project root, it works, with the “problem” that the local .dvc file is in the root and not in a subdirectory where I actually want it to be. I still get the “Lockfile for ‘dvc.yaml’ not found” debug msg but then “Computing md5 for a large file …” indicating that the command “works”.

looks like it can work on your root path, so I guess what you need to do is to make some modification on
your .gitignore file. For example, add an exclude rule !*.dvc to make any kind of .dvc file being tracked by Git(But may not so easy, more details in Git - gitignore Documentation)

Hi Yanxiang,

you were spot on, your comment seems to have fixed everything. I removed a .gitignore file and voilà. My simple DVC pipeline runs.

I actually did not start this repo from scratch but from a colleague which was tracking files locally instead of from azure. There was a .gitignore next to directory where those files were. They are not there anymore now, but on Azure, so I removed that .gitignore and everything is flowing. Thank you!

1 Like