Dvc experiments missing names

initial issue (dead experiments, missing names)

  ├── 2f2a178            Feb 18, 2023   Running   -          0.70166   0.83538    0.42834                46791    0.15652              185381            
  ├── 9dc97e9            Feb 18, 2023   Running   -          0.70166   0.83538    0.42834                46791    0.15652              185381            
  ├── 65a6585            Feb 18, 2023   Running   dvc-task   0.70166   0.83538    0.42834                46791    0.15652              185381            
  ├── e5c1060            Feb 18, 2023   Running   -          0.70166   0.83538    0.42834                46791    0.15652              185381            
  ├── 9aa34d4 [exp_50]   Feb 18, 2023   Queued    -                -         -          -                    -          -                   -            
  └── f4c9181 [exp_51]   Feb 18, 2023   Queued    -   

Above is what dvc exp show looks like. When I call dvc queue kill 2f2a178 for example, I get:

ERROR: '2f2a178' is not a valid queued experiment name

I ran the experiments from vscode and set the number of jobs to 4. At this point it looks like DVC only thinks 1 is running? However none of them are actually running. dvc queue status is too slow to even run on the computer, it gets cancelled by the OS or something. See Dvc exp show, dvc queue status execution time - #6 by gregstarr. Where did the experiment names go? When I check for running processes, there is only a single process related to dvc:

starrgw1 256549  0.0  0.0 568940 21844 ?        SNl  11:37   0:00 [...]/.vscode-server/bin/[...]/node [...]/.vscode-server/extensions/iterative.dvc-0.6.10/dist/node_modules/dvc-vscode-lsp/dist/server.js --node-ipc --clientProcessId=256403

Usually when multiple experiments are running there are a bunch of dvc-related processes.

Here is my dvc doctor:

DVC version: 2.41.1 (pip)
---------------------------------
Platform: Python 3.9.11 on Linux-3.10.0-693.el7.x86_64-x86_64-with-glibc2.17
Subprojects:
        dvc_data = 0.29.0
        dvc_objects = 0.14.1
        dvc_render = 0.0.17
        dvc_task = 0.1.9
        dvclive = 1.3.2
        scmrepo = 0.1.5
Supports:
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3)
Cache types: symlink
Cache directory: lustre on 192.168.199.212@o2ib:192.168.199.213@o2ib:/scratch
Caches: local
Remotes: local
Workspace directory: nfs on master:/home
Repo: dvc, git

other issue (dvc slowness)

I tried to downgrade to dvc 2.9.2 to avoid the slowness and got this error when running dvc exp show

Traceback (most recent call last):
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/main.py", line 54, in main
    cmd = args.func(args)
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/command/base.py", line 35, in __init__
    from dvc.repo import Repo
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/repo/__init__.py", line 13, in <module>
    from dvc.ignore import DvcIgnoreFilter
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/ignore.py", line 10, in <module>
    from dvc.fs.base import FileSystem
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/fs/__init__.py", line 5, in <module>
    from .azure import AzureFileSystem
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/fs/azure.py", line 5, in <module>
    from fsspec.asyn import fsspec_loop
ImportError: cannot import name 'fsspec_loop' from 'fsspec.asyn' (/home/starrgw1/.local/lib/python3.8/site-packages/fsspec/asyn.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/starrgw1/.local/bin/dvc", line 8, in <module>
    sys.exit(main())
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/main.py", line 84, in main
    from dvc.info import get_dvc_info
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/info.py", line 10, in <module>
    from dvc.fs import FS_MAP, get_fs_cls, get_fs_config
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/fs/__init__.py", line 5, in <module>
    from .azure import AzureFileSystem
  File "/home/starrgw1/.local/lib/python3.8/site-packages/dvc/fs/azure.py", line 5, in <module>
    from fsspec.asyn import fsspec_loop
ImportError: cannot import name 'fsspec_loop' from 'fsspec.asyn' (/home/starrgw1/.local/lib/python3.8/site-packages/fsspec/asyn.py)

Have also posted about this here: Running `exp queue status` is very slow · Issue #8676 · iterative/dvc · GitHub

1 Like