Version 3.0: "extra keys not allowed" error

We’ve been using DVC for a while, so have many files in our repository with contents like

md5: a79fe063aea70ad56cd25885d24e6da8
outs:
- path: images
  cache: true
  metric: false
  md5: e92da1023b4968d0bf63edbc93db25b1.dir
  persist: false

After updating to dvc version 3.0.0, these files now cause “dvc status” to crash with an error of the form

validation failed.

extra keys not allowed, in outs -> 0 -> metric, line 3, column 3

As I understand it, version 3.0 removed support for some older types of “.dvc” files. Is there a command to convert existing files to the new format without having to download the data and run “dvc add” again?

Thanks

Unfortunately, we don’t have a command to migrate.

Support for cmd, param, plot and metric were removed from 3.0. Note that metric: false was always the default, so it should be safe to remove.

Also note that the md5 that you have at the top is no longer required and can be removed safely (except for imports). How many .dvc files do you have?

You could write a simple script to remove these, there’s no need to run dvc add again or download the data.

Thanks for the reply

I had been concerned that just deleting the extra keys like “metric” wouldn’t work because the newer .dvc files have extra values like “nfiles” and “size” that aren’t in older version. I wrote a script to drop the values, and it seemed to work though, so all good there.

It does mean that if we want to check out an older version of our data repo we need to

  • do a git checkout
  • run the script to patch the .dvc files
  • run dvc

That’s doable but a bit unfortunate. Is there a chance of modifying DVC to drop deprecated params in the .dvc files instead of crashing?

Since you asked, there’s about 5000 .dvc files in this repo, and probably that many more again spread across other repos. We’re pretty heavy DVC users, so thanks for your work!

For now I’m going to pin the 2.58 release

1 Like

Hello,

I’m getting the same issue while i’m trying to test dvc for the first time.
So this tutorial is not working with 3.0:

I’ll try to remove lines you suggest skshetry thank you.

@j_b What part of the tutorial fails for you?

I have the same problem.
When I try to do this: dvc get https://github.com/iterative/dataset-registry \ get-started/data.xml -o data/data.xml

I get this:

@rtekby What version of DVC shows up when you run dvc doctor? You will need DVC>3.0 for this command to work since GitHub - iterative/dataset-registry: Dataset registry DVC project now uses the 3.0 format.

2.56.0, okay, thanks, I have installed the latest version of dvc, but with poetry I use the venv in a slightly different way, so the old dvc without any venv was mistakenly used

Yes, me too.
I follow the instructions from here: Get Started: Data Versioning
but when I wrote this command:

dvc get https://github.com/iterative/dataset-registry \
          get-started/data.xml -o data/data.xml

I got this:
‘…/…/…/…/tmp/tmp73epx4t9dvc-clone/dvc.yaml’ validation failed.

extra keys not allowed, in artifacts, line 2, column 3
1 artifacts:
2 get-started-data:

@elif.buyukorhan, could you please share output of dvc version?

You are likely using a older version of dvc.

Yes, I guess. How can I update?

I wrote dvc version and I got this:

DVC version: 2.8.1 (pip)

Platform: Python 3.6.9 on Linux-5.4.0-150-generic-x86_64-with-Ubuntu-18.04-bionic
Supports:
webhdfs (fsspec = 2022.1.0),
http (aiohttp = 3.8.6, aiohttp-retry = 2.5.1),
https (aiohttp = 3.8.6, aiohttp-retry = 2.5.1)
Cache types: https://error.dvc.org/no-dvc-cache
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git

Check the output of which dvc. You’ll get the idea where it’s installed.

You should be able to do pip install -U dvc and normally that should be enough.