Storage usage for large file versioning

Hi, several months ago, I asked a question about the mechanism DVC use to manage large file versioning.

The example given was for a 10 Go file managed by DVC, a new version of the file is committed with a very small change. What will be the storage impact on this commit?

The answer I got was that the storage usage will be doubled, so 20 Go used after the new version’s commit. Because DVC doesn’t compare changes inside a large file.

I understand there was a plan for DVC to manage changes in large file. So that in my example, instead of double the storage usage, only the changed part will be stored on top of the base version.

My questions are :

  1. Is there a timeline to implement this new feature?
  2. Does DVC compare file content while versioning small files?
  1. Is there a timeline to implement this new feature?
    The current plan is to start this in 2022 Q1.

  2. Does DVC compare file content while versioning small files?
    For two files with the same contents, DVC will only save one copy for them.

Thank you for your prompt response.

  1. Is there any expected date/timeframe that this feature will become available to end users? What will be the impact to the already versionned files?
  2. May I understand there is currently no content compare for small files? For a tiny change in a small file, a second copy will be made (as for large file).
  1. Is there any expected date/timeframe that this feature will become available to end users? What will be the impact to the already versionned files?
    Depending on when it would be started, still lots of work to be done on the underlying levels. We will try to make it compatible with the old version of caches, but details are only available after we began working on it.

  2. May I understand there is currently no content compare for small files? For a tiny change in a small file, a second copy will be made (as for large file).
    Yes, we just save every one of them independently.

Here is an issue for this, from it you can know the latest update on this question.