AttributeError with dvc.api.read to azure blob storage

Hello,

I am new to DVC and evaluate it in a proof of concept implementation for our ML projects, which seems to fit perfectly! But I encounter a problem with dvc.api.open while using an Azure Blob Storage.

What I have done:

  • cloned GitHub - iterative/dataset-registry: Dataset registry DVC project
  • dvc remote add -d myremote azure://BLOB/PATH
  • dvc remote modify --local myremote connection_string ‘CONNECTION_STRING’
  • created test file
  • dvc add & push
  • removed the test file incl. cache from local repo
  • dvc.api.read ==> AttributeError
  • dvc pull
  • dvc.api.read ==> works

I am able to use dvc push and dvc pull, but by using dvc.api.read I get “AttributeError: ‘NoneType’ object has no attribute ‘account_key’” (see attached screenshots). If the file is downloaded with dvc pull and it is available in the cache folder everything works.

Can anyone point me to the problem or my misunderstanding? I want to use the streaming functionality, as we have very large files and do not want to store them on the storage of a virtual machine.

Thanks!

Hi @svenw3, could you please run dvc doctor from the command line and then post the output here?

DVC version: 1.11.16 (pip)

Platform: Python 3.9.1 on Windows-10-10.0.18362-SP0
Supports: azure, http, https
Cache types: hardlink
Caches: local
Remotes: https, azure
Repo: dvc, git

It looks like the problem here is related to a known limitation in DVC, where local config settings (such as those set via dvc remote modify --local ...) are not used by certain DVC commands, including api.open() and api.read().

The good news is that this limitation has been addressed in an upcoming release, however we are not currently planning to backport these changes into DVC 1.11.x.

Would you mind trying the pre-release version (see: https://dvc.org/blog/dvc-2-0-pre-release#install) from pip, and checking if that resolves your issue? When you install the prerelease version, don’t forget to also install the azure dependency

pip install --upgrade --pre dvc[azure]

Thanks for your fast reply! The problem still exists with version 2.0, see the attached screenshot.

Looks like this is an azure specific bug then, would you mind filing a bug report on our github?

thanks, I will fill in a bug report later today.

EDIT:
See: dvc.api.read: AttributeError while using azure blob storage for streaming data · Issue #5524 · iterative/dvc · GitHub