So previously when I looked into the dvc cache I found directories 00 to ff which corresponded to the md5 starting values of my files. But now I see another directory called files/md5 which again has 00 to ff directories in it that correspond to md5 values of different files. Does anyone know why there is now a files/md5 directory and why not all md5 related directories live in the root dir of the dvc cache ?
I am not saying this is causing any problem but I was just surprised to see this today.
This change in the structure of the cache was shipped as part of DVC 3.0 . See:
Upgrading to DVC 3.0 | Data Version Control · DVC
thanx. that was exactly the info I was after!!
not sure if I should open another issue or not (last one got hidden by AI spam filter … fingers crossed that gets released soon )
So I read in the documentation that by default no de-duplication for 2.0/3.0 cache entries happens but that
dvc cache migrate
would help with this.
I checked now on the remote and noticed the same duplicate files. What is the right way to remove duplicates on the remote side of things ?