Import/list + webdavs remote not working

I have set up two repositories. The first one, a data registry, currently with a single dataset. The remote git repository is hosted on a company git on github.com. I have access to a DVC remote using the webdavs protocol.

I am able to connect to the DVC storage through some web interface and managed to set up the data registry, i.e., I can push and pull data from it. I achieved this with the default dvc remote add data-remote webdavs://cloud.com/data-registry and storing the username and password combination in a config.local file.

I’ve created a second project in which I would like to use the data stored in the registry. I.e., a project I would like to run some experiments. This project’s remote is hosted by the same cloud provider only I created a different folder for it. I’ve added the credentials for the experiment and the data registry remote in the config.local (these are thus the same credentials). The experiment project also knows about both remotes.

To test I’ve added a stage with output and tracked some empty files. I was able to push these to the remote. Thus, the connection is working. However, when I try to list or import the data registry, DVC reports (with some privacy edits):

ERROR: unexpected error - received 401 (Unauthorized)
Client error '401 Unauthorized' for url 'https://cloud.com/data-registry/21/8ff9c6f73767c8126b66f7213cd0d2.dir'

What am I missing here? Thanks for the help!

EDIT:

When using dvc import to import the dataset I am actually also getting the following error, which might explain why I am getting the 401.

WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                                                                                        
name: None, md5: 218ff9c6f73767c8126b66f7213cd0d2.dir

I updated the data registry git push/pull and dvc push/pull + dvc status --cloud and they say they are all up-to-date. I also checked the storage through the web interface and the file does actually exist, well 21/8ff9c6f73767c8126b66f7213cd0d2.dir exists.

Could you please use dvc config --list to show the difference of the configuration in two repos?
Is there any wrong configuration? except for the remote path?

I’ve checked the outputs of dvc config --list and it outputs the remotes (default, name, url, username, password) and everything seems to be fine. More configuration I haven’t done, so it only shows the remotes.

Any other ideas?

Hi @RiCk

Could you please try:

  1. Copy your first repo to somewhere else.
  2. Rename the remote to the same as the second remote.
  3. Change the remote directory path to the same as the second remote.

To see which step gives the error?

I copied the the repo removed all cache and remote storage (recreated the folder). Removed the dataset.dvc dvc remove dataset.dvc, added it again with dvc add dataset committed everything to git and pushed to git remote and dvc remote.

I copied the config and config.local to the second repo and after

dvc import git@github.com:company/data-registry.git dataset -o data

I still get

Importing 'dataset (git@github.com:company/data-registry.git)' -> 'data/dataset'
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                                                                                                         
name: None, md5: 218ff9c6f73767c8126b66f7213cd0d2.dir
ERROR: unexpected error - received 401 (Unauthorized): Client error '401 Unauthorized' for url 'https://company.com/webdav/data-registry-storage/21/8ff9c6f73767c8126b66f7213cd0d2.dir'
For more information check: https://httpstatuses.com/401

I don’t get it. From the folder I create the repository everything is fine, but when I try to import or list the repo from another (with the exact same config) it doesn’t work…

EDIT

We found out that it is possible to list and import the data when referencing the local git folder!

dvc list git@github.com:company/data-registry.git

Above doesn’t work, but below works (from the second experiment repo, which is in the same base folder as the data-registry).

dvc list ../data-registry

Any new ideas based on this? Thank you!

Because using the local git worked and the remote didn’t I have created an issue on the dvc github: get/list/import: not working with company github + webdavs storage · Issue #8016 · iterative/dvc