Dvc cache content

So previously when I looked into the dvc cache I found directories 00 to ff which corresponded to the md5 starting values of my files. But now I see another directory called files/md5 which again has 00 to ff directories in it that correspond to md5 values of different files. Does anyone know why there is now a files/md5 directory and why not all md5 related directories live in the root dir of the dvc cache ?

I am not saying this is causing any problem but I was just surprised to see this today.

This change in the structure of the cache was shipped as part of DVC 3.0 . See:

Upgrading to DVC 3.0 | Data Version Control · DVC

thanx. that was exactly the info I was after!! :grinning:

not sure if I should open another issue or not (last one got hidden by AI spam filter :slight_smile: … fingers crossed that gets released soon :grinning: )

So I read in the documentation that by default no de-duplication for 2.0/3.0 cache entries happens but that
dvc cache migrate
would help with this.

I checked now on the remote and noticed the same duplicate files. What is the right way to remove duplicates on the remote side of things ?