Hello,
I am looking into building a CI/CD for a project that essentially runs a (tiny) PyTorch model training end-to-end. This job should essentially let us know if a code/data change has modified anything in the training pipeline. My question comes down to the hash (md5 IIRC) computed on saved files, and it appears PyTorch-saved files store some form of metadata that changes between runs (e.g., timestamps, filepaths, etc.). I’m having a hard time finding any resources on DVC compatibility with torch.save() generated files. Can someone point me in the right direction?
I’m guessing this has been encountered by someone here before and curious how it was solved. If possible, I’d rather not write a custom weights/model writer to preserve file-hashes between runs.
Best,
Patrick