Commit/add only changed files

Hi, we try to add all locally checked out and modified files with one command to the cache to further push it to the remote.

We are using a S3 remote from which we only want to check out certain files to reduce the size of our repository on local hard drives. When modifying some of the checked-out files we want to add them to dvc without specifying all their names as targets for dvc add/commit.

E.g.: We track the files foo.txt bar.txt baz.txt but only have the file bar.txt checkout. The other two will remain only .dvc files and not get resolved. When we then modify the content of bar.txt and try to run dvc commit it tries to also add the deletion of foo.txt and baz.txt, because they were not checked out in the first place. We know we could specify bat.txt as a target for dvc commit. However, this approach is not practical when modifying a large number of files.

Is there a way to only add the really modified files to dvc? We where thinking about a list of file-names like dvc diff returns but only for modified files which we could use as input for dvc commit.

Hi @hermann.ralf

Can you confirm the remote is configured as the external cache of the project? I’ll assume so for now.

Actually I tried something like this and commit gives an ERROR: failed to commit - output 'xyz' does not exist. Can you maybe share the exact steps and output DVC gives you?

You should also be able to use terminal wildcards, like dvc commit f*.txt (or xargs if not). Would that help?

BTW you don’t have to use dvc commit to re-add. You can dvc add again instead :slightly_smiling_face:. And it has an explicit --glob option to employ wildcards in its targets. See https://dvc.org/doc/command-reference/add for details.

There’s no other way for DVC to know which modification you consider “real” modifications, unfortunately.

dvc diff already separates changes by Added/Modified/Deleted so yeah, you could try to process that output with awk or sed, and feed it to add or commit via xargs.

If you’d like to request the feature for diff to filter output with some new options (e.g. only show modifications with dvc diff --mod, please feel free to submit it at Issues · iterative/dvc · GitHub. Thanks!

1 Like