I would like to know whether we could store run info for reproducibility and add output data to a storage that can be than pulled.
For example, I have a raw dataset and a cleanup script
$ ls raw cleanup.py
I can run the cleanup
$ dvc run -d raw -o clean python cleanup.py raw clean $ cat clean.dvc cmd: python cleanup.py deps: - md5: xxxx path: raw md5: yyyyy outs: - cache: true md5: zzzz path: clean
and I observe how clean folder is produced. However if I add clean folder with dvc in order to share it the information on how to produce clean folder is modifed
$ dvc add clean $ cat clean.dvc md5: xxxx outs: - cache: true md5: xxxxx
So, can we have both features : stored command on how dataset can be produced and is stored in the cache ?