Queued jobs failing before they get started

H!

I have a couple of DVC projects for ML model development.
I have been using the queue functionality for experiments quite a lot and it seems like all of a sudden queued experiments don’t work.

I can run an experiment to completion with dvc exp run
But the following fails:

dvc exp run --queue
dvc queue start
dvc queue logs {exp_id}
> ERROR: No output logs found for experiment {exp_id}

It fails a few seconds after starting. This happens across all my dvc projects running in different python environments.

dvc doctor output

DVC version: 3.58.0 (pip)
-------------------------
Platform: Python 3.10.16 on macOS-15.0.1-arm64-arm-64bit
Subprojects:
        dvc_data = 3.16.7
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.40.2
        scmrepo = 3.3.9
Supports:
        http (aiohttp = 3.11.11, aiohttp-retry = 2.9.1),
        https (aiohttp = 3.11.11, aiohttp-retry = 2.9.1),
        s3 (s3fs = 2024.12.0, boto3 = 1.35.81)
Config:
        Global: /Users/user/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git
Repo.site_cache_dir: /Library/Caches/dvc/repo/a4ff6d6a37d7fcdb70adbcdc6d8a45d6

Any suggestions of further debugging or solutions is much appreciated!

1 Like

Same issue here!

Likewise on macOS ARM64 architecture

Strangely, this only happens on one of my macs!

  • I have access to two macs, a 2021 macbook pro and a 2024 M4 mac mini.
  • The macbook pro works
  • The mac mini shows the same issue you do
  • I tried re-creating the pyenv environment on the mac mini by piping pip list into a requirements.txt and then pip install -r requirements.txt on my mini. I’m using python==3.12.5 for both. Packages and python versions are the same.
  • This was attempted on the same repo on both machines.
  • I try dvc exp run --queue --set-param "train.epochs=7,8" and then dvc queue start and when I dvc queue status, both jobs immediately have failed with no logs. However running the same set of commands on my macbook, it works and the jobs run.
  • I have tried starting the jobs with dvc queue start --jobs 1 --verbose but no new information was gained.

My two dvc doctor commands produce slightly different results. The packages are the same but the configs slightly different:

macbook (works):

DVC version: 3.54.0 (pip)
-------------------------
Platform: Python 3.12.5 on macOS-15.4.1-arm64-arm-64bit
Subprojects:
        dvc_data = 3.16.4
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3)
Config:
        Global: /Users/sylvi/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/793506f254fda887f758d4aad4869507

mac mini (doesn’t work)

DVC version: 3.54.0 (pip)
-------------------------
Platform: Python 3.12.5 on macOS-15.4.1-arm64-arm-64bit
Subprojects:
        dvc_data = 3.16.4
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3)
Config:
        Global: /Users/sylviawhittleadmin/Library/Application Support/dvc
        System: /Library/Application Support/dvc

I’ll work to try and make the config on the mini the same as on the macbook in the meantime.

In your mac mini, can you try to cd into the dvc repository and run dvc doctor?

Ah, you’re right, I forgot to be in the repo before running dvc doctor.

Here’s the dvc doctor for the mac mini while in the repo:

DVC version: 3.54.0 (pip)
-------------------------
Platform: Python 3.12.5 on macOS-15.4.1-arm64-arm-64bit
Subprojects:
        dvc_data = 3.16.4
        dvc_objects = 5.1.0
        dvc_render = 1.0.2
        dvc_task = 0.4.0
        scmrepo = 3.3.7
Supports:
        gdrive (pydrive2 = 1.20.0),
        http (aiohttp = 3.10.5, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.10.5, aiohttp-retry = 2.8.3)
Config:
        Global: /Users/sylviawhittleadmin/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/622e64154b3334d0410024ac67350202