I created a DVC project to use as a data registry (Registry) for other projects. In one of those other projects (External), I imported a couple files with
External will never need to use all of the files in the Registry project, just a select few. After the import was finished, I ran
dvc status which reported all data and pipelines up to date.
Next, I added some new data to the Registry with
dvc add , and decided I wanted to import some of that data also to the External project. I switched over to External and ran
dvc status prior to doing the import out of habit.
The status output suggested the files I had already imported now have
update available: unchanged_file
I haven’t modified those files at all—all I did was add some new data to Registry which was unrelated to the pre-existing files (i.e. outside of any other already-tracked directories).
Maybe this is important, but the files which are being reported as changed (but are not changed) were added to the Registry project via a pipeline stage with
persist: true for each of them. The
always_changed flag was not set.
Is this expected behavior? If so, why? I suppose I could resolve the message with
dvc update or
dvc commit, but that takes ages since one of the files is quite long. I don’t want all of my collaborators who will be downloading data from the registry to have to do this every time I add new data, either.