Getting error in 3rd stage of pipeline

python src/cnnClassifier/pipeline/stage_03_training.py
ERROR: failed to reproduce ‘training’: output ‘artifacts\training\model.h5’ does not exist

If I run the same pipeline using main.py it works fine but while running it with dvc it is not able to run the 3 stage of training.

Any guidance is highly appreciated.

Thanks

Hi @Shami! It looks like that stage expects the output artifacts/training/model.h5. Please make sure that this file gets generated by that Python script.

Hi @dberenbaum!
Thanks for your comment, this is the output of 3rd pipeline which will be generated after this stage is executed.

I am also attaching the relevant section of dvc.yaml.

image

That dvc.yaml shows that src/cnnClassifier/pipeline/stage_03_training.py should create artifacts/training/model.h5. The error states that this file is not found at the end of that stage. The python script may run but not create the expected output file. Since you have defined that file as an output of the stage, DVC will fail if the file is missing after the script is run.

I see, what might be the reasons that the file is not generated, any tip?

If I run without DVC, every stage is running perfectly.

Thanks

That is entirely up to your script in src/cnnClassifier/pipeline/stage_03_training.py. Do you expect that script to create the file artifacts/training/model.h5? DVC is checking that the output you specified in that stage is in fact being created. You can check that your script creates that file or drop that path from the outs in dvc.yaml.

Thanks and I check the script file again.

It seems something related to DVC is not correct otherwise the same file should not have worked without DVC.

Thanks again

Maybe I can try to explain it differently. This is what will happen when you run the stage with DVC:

  1. DVC will check if it needs to run the stage by seeing if the files under the deps section in dvc.yaml have changed or the values under the params section have changed.
  2. If anything has changed, DVC will execute whatever is in the cmd section (python src/cnnClassifier/pipeline/stage_03_training.py).
  3. When the script completes successfully, DVC will check whether the files under the outs section exist (artifacts/training/model.h5). If not, DVC will fail because you have listed these files as outputs, yet they don’t exist.

Your training stage here fails at step 3. You can think of it like DVC verifying that the stage does what is described in dvc.yaml. Let’s say you have another stage like evaluation that takes artifacts/training/model.h5 as a dependency. If DVC doesn’t check that it exists at the end of the training stage, your evaluation stage might fail when it tries to load this file.

Thanks @dberenbaum for detailed answer.

Hope it will work for me.