My working directory contains the following folders:
data.dvc file currently points to all the contents of the data folder. The src folder contains a
prepare_data.py file that takes in images from both the data/train and data/test folders as inputs and then outputs four files:
Now, I want to create a reproducible stage for
src/prepare_data.py. To do this, I ran the following command:
dvc run -f prepare_data.dvc \ -d src/prepare_data.py -d data/train -d data/test \ -o data/train/imgs_train.npy -o data/train/train_labels.pkl \ -o data/test/imgs_test.npy -o data/test/test_labels.pkl \ python src/prepare_data.py
However, I received the following error message:
ERROR: failed to run command - Paths for outs: 'data'('data.dvc') 'data\train\imgs_train.npy'('prepare_data.dvc') overlap. To avoid unpredictable behaviour, rerun command with non overlapping outs paths.
I can see that there is a problem with referencing data/train and data/test when the data folder has already been tracked using the
data.dvc file. However, I still want to create the reproducible stage, so any suggestions?