Tutorial improvement + other suggestions

Once again, thanks for open sourcing the tool !

Remarks on the tutorial

  • The size of the data used in the tutorial is a bit large (i.e. featurization step requires more than 8GB of RAM and is a bit unwieldy on basic laptops). You might want to consider using a smaller one.
  • In the Running in a bulk section, and possibly some other, the output of the command shows Reproducing. However, when ran for the first time, dvc simply outputs Running command. For consistency, you may want to fix this.

Other questions and suggestions

  • When there is nothing to reproduce and we run dvc repro, nothing happens. It would be nice to display a message stating that there is indeed nothing to be done
  • When a .dvc file already exists and dvc asks : 'data/XXX.dvc' already exists. Do you wish to run the command and overwrite it? (y/n) and one replies no, it would be better to change the message to something like Not overwriting: 'data/XXX.dvc'
    rather than Failed to run command: 'data/XXX.dvc' already exists
1 Like

Hi @tmain !

Thank you for the feedback!

  • The size of the data used in the tutorial is a bit large (i.e. featurization step requires more than 8GB of RAM and is a bit unwieldy on basic laptops). You might want to consider using a smaller one.

Agreed, we are currently working on simplifying the tutorial.

  • In the Running in a bulk section, and possibly some other, the output of the command shows Reproducing . However, when ran for the first time, dvc simply outputs Running command . For consistency, you may want to fix this.

This is actually done on purpose, since reproduce means that it has been run once and this is why we print ‘Running command’ the first time and ‘Reproducing’ after that.

When there is nothing to reproduce and we run dvc repro , nothing happens. It would be nice to display a message stating that there is indeed nothing to be done

Great point! Created repro: print msg when there is nothing to reproduve · Issue #1049 · iterative/dvc · GitHub to track the progress on it.

  • When a .dvc file already exists and dvc asks : 'data/XXX.dvc' already exists. Do you wish to run the command and overwrite it? (y/n) and one replies no , it would be better to change the message to something like Not overwriting: 'data/XXX.dvc'
    rather than Failed to run command: 'data/XXX.dvc' already exists

Great point as well! Created repro: change 'already exists' msg · Issue #1050 · iterative/dvc · GitHub to track the progress on it.

Thanks,
Ruslan

I agree ! I was merely remarking that when one runs the tutorial, presumably one runs the code for the first time and the output should show Running command. However the tutorial displays Reproducing command

Ah, sorry, didn’t notice it :slight_smile: Great catch! Thank you for the feedback! We are actually preparing an update for the tutorial and will be sure to change the messages there as we go.