Hi,
I would like to know whether we could store run info for reproducibility and add output data to a storage that can be than pulled.
For example, I have a raw dataset and a cleanup script
$ ls
raw cleanup.py
I can run the cleanup
$ dvc run -d raw -o clean python cleanup.py raw clean
$ cat clean.dvc
cmd: python cleanup.py
deps:
- md5: xxxx
path: raw
md5: yyyyy
outs:
- cache: true
md5: zzzz
path: clean
and I observe how clean folder is produced. However if I add clean folder with dvc in order to share it the information on how to produce clean folder is modifed
$ dvc add clean
$ cat clean.dvc
md5: xxxx
outs:
- cache: true
md5: xxxxx
So, can we have both features : stored command on how dataset can be produced and is stored in the cache ?
Thanks