First run of dvc

#1

Hi,
I just cloned a project from a colleague; in this project there is a .dvc file, with some deps and some outs.
When I run dvc repro -f myfile.dvc, I got an error

…failed to reproduce ‘myfile.dvc’: output ‘xxx’ does not exist

Well, obviously the output does not exist, this is the first time I run the script, so nothing was created yet !
I got the same error when I run with the ‘–force’ flag

How do I run the pipeline for the first time ?
I am missing something ?

0 Likes

#2

Hi @spiette !

Could you show us cat myfile.dvc? I suspect it is a dvc add'ed file, which means that it is not reproducible, but should be either placed manually or pulled from a remote. Try running dvc pull to download needed data first.

Thanks,
Ruslan

0 Likes

#3

cmd: “conda activate myenv\n python src/myscript.py -c config_all.ini;\n
\ python src/myscript2.py -c config_all.ini”
deps:
- md5: 78da773cf0aa66adb3dc21647992ecfc
path: src/myscript.py
- md5: 8394510c4a7e1c7fa7b36645c92545a4
path: src/myscript2.py
- md5: 1e3216f583241b07eb5f67a454718966
path: src/myscript3.py
- md5: 0cf90aca2f6dfc86f586696839309291
path: config_all.ini
md5: 3422060a09515220f65dcc0befe3367a
outs:
- cache: true
md5: 246087ad63ed1d4d0b34959e48335213
metric: false
path: …/abc/output1.pkl
- cache: true
md5: e78ff278a3cc1544d9593d0da83e2344
metric: false
path: …/abc/output2.csv
- cache: true
md5: cbf185aeee1a963812b4338577ae4b04
metric: false
path: …/abc/output4.csv
- cache: true
md5: dde8676c4e199de6fb47fd495edf91ed.dir
metric: false
path: …/abc/data_all
- cache: false
md5: 273a2fced4472c0959c775d37380915f
metric: true
path: …/abc/some.json
wdir: .

dvc pull just throw an error; we haven’t set up remote data yet

0 Likes

#4

@spiette Thanks! I think I need a bit more info. So what is xxx from original error? Is it one of myfile.dvc outputs? Or is it defined in some other file like xxx.dvc?

0 Likes

#5

The error comes from the first outs : …/abc/output1.pkl

 dvc repro .\myfile.dvc
Warning: Output '..\abc\output1.pkl' of 'myfile.dvc' changed because it is 'not in cache'
Warning: Dvc file 'myfile.dvc' changed.
Stage 'myfile.dvc' changed.
Reproducing 'myfile.dvc'
Running command:
        conda activate myenv
 python src/myscript.py -c config_all.ini;
 python src/myscript2.py -c config_all.ini
Error: failed to reproduce 'myfile.dvc': output '..\abc\output1.pkl' does not exist

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
0 Likes

#6

Thanks! Dvc ran your command, but the command didn’t produce ..\abc\output1.pkl for some reason, which is why dvc is complaining. Could you look around your project, maybe your command created output1.pkl at some other path?

0 Likes

#7

@spiette Also, I see you are running windows. Was your colleague using linux or mac? If so, you might want to check your script to handle os.path correctly. E.g. it might output to ../abc/output1.pkl which is not a valid subdirs path on windows, which might’ve resulted in a file ../abc/output1.pkl(no subdirs, slashes are handled as a part of the filename) created on windows instead of ..\abc\output1.pkl

0 Likes

#8

Hi,
I think I found my problem; The “cmd” part of the dvc file contain multiple commands. While it works on *unix, it seems to just start one command on powershell, ignoring the other …
Putting all these command in one script file solve the problem

1 Like