DVC initialized subdir invisible when using dvc.api to accecss data in a remote GitHub repo

Hi,

I have a GitHub monorepo containing a few subdirectories. DVC was initialized in only one of them, using dvc init --subdir. After initializing DVC, quite some data was added to the subdir and committed to DVC and quite some changes committed to Git.

I want to use dvc.api to access the data, however running

from dvc.api import DVCFileSystem
url = "git@github.com:ORG/repo-name.git"
fs = DVCFileSystem(url)
fs.find("/", detail=False)

returns only the files in the root and all other subdirectories of the GitHub repo, except the one where DVC was initialized. It seems as though this subdir is invisible to dvc.api. Trying to pass dvc_only=True or the path to the subdir always just returns an empty list.

Running dvc.api.scm.all_commits() does show SHAs of all commits. When trying to check out a certain commit hash, the same happens as described above – it’s as though the DVC initialized subdir is invisible to dvc.api.

What am I doing wrong? I’m using DVC Version: 3.2.2.

I found the answer to my question in the DVCFileSystem docstring. To see the DVC initialized subdirs, subrepos argument needs to be passed to the function:

fs = DVCFileSystem(url, subrepos=True)

Before, I was looking at the docs, where this info is missing.

Leaving this here in case it can help someone else.

1 Like