I created a DVC project to use as a data registry (Registry) for other projects. In one of those other projects (External), I imported a couple files with dvc import
.
External will never need to use all of the files in the Registry project, just a select few. After the import was finished, I ran dvc status
which reported all data and pipelines up to date.
Next, I added some new data to the Registry with dvc add
, and decided I wanted to import some of that data also to the External project. I switched over to External and ran dvc status
prior to doing the import out of habit.
The status output suggested the files I had already imported now have
changed deps:
update available: unchanged_file
I haven’t modified those files at all—all I did was add some new data to Registry which was unrelated to the pre-existing files (i.e. outside of any other already-tracked directories).
Maybe this is important, but the files which are being reported as changed (but are not changed) were added to the Registry project via a pipeline stage with persist: true
for each of them. The always_changed
flag was not set.
Is this expected behavior? If so, why? I suppose I could resolve the message with dvc update
or dvc commit
, but that takes ages since one of the files is quite long. I don’t want all of my collaborators who will be downloading data from the registry to have to do this every time I add new data, either.