Documentation: tutorial problem?

I understand this may not be fully related to dvc, but since the problem happens when following the tutorial at https://blog.dataversioncontrol.com/data-version-control-tutorial-9146715eda46. So here is my question:

When following tutorial at the step in executing

dvc run -d data/Posts.tsv -d code/split_train_test.py         -d code/conf.py         -o data/Posts-test.tsv -o data/Posts-train.tsv         python code/split_train_test.py 0.33 20180319

it throws error

from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: /tmp/_MEIUqCWxh/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/python2.7/dist-packages/scipy/sparse/_sparsetools.x86_64-linux-gnu.so)
Failed to run command: Stage 'Posts-test.tsv.dvc' cmd python code/split_train_test.py 0.33 20180319 failed

I am not familiar with python, nor data science, but was just trying to evaluate if dvc fits our internal requirement so we can decide if going with dvc or not.

How can I fix this error? Otherwise any even simpler version that can basically just show dataset, model are versioned so we can see the differences, say, between version 0.0.1 and 0.0.2 and its diff, or that kind of things?

Thanks

1 Like

Hi @jtodd5527 !

The error you are encountering seems to be the same as in https://github.com/iterative/dvc/issues/749 . The problem is that previous versions of dvc didn’t preserve the shell you are running if it is not the default shell for your user. Could you please check that the shell you are running matches the default one for your user? I.e. these two commands should show the same shell:

$ echo $0
# /bin/zsh
$ grep $USER /etc/passwd
# efiop:x:1000:1000:efiop:/home/efiop:/bin/zsh

If shells don’t match, you could try running chsh -s $(echo $0) $USER to set the current shell as a default. After that you could try running your dvc run command once again. Please let us know if it worked for you.

The fix for that issue has been merged into master and going to be released in 0.10.0(end of the next week).

Thanks,
Ruslan

Not sure if that matches or not

$ echo $0
bash
$ grep $USER /etc/passwd
jtodd:x:1000:1000:jtodd,,,:/home/jtodd:/bin/bash

Executing after setting with chsh -s $(echo $0), it complains chsh: bash is an invalid shell. Setting chsh -s /bin/$(echo $0) works; however executing dvc run ... still throws ImportError: /tmp/_MEIFQ9LXk/libstdc++.so.6: versionGLIBCXX_3.4.21’ not found`.

I will also try the release next week maybe that’d be fixed with that version. Thanks

Looks like there is still a problem with the environment :frowning: Does running python code/split_train_test.py 0.33 20180319 without dvc run work for you?

Oh. Look like it’s merely my local environment problem. I was doing some other python tasks. And before this run, I removed .local and then just reinstall related packages. The execution works fine now.

dvc run -d data/Posts.tsv -d code/split_train_test.py         -d code/conf.py         -o data/Posts-test.tsv -o data/Posts-train.tsv         python code/split_train_test.py 0.33 20180319
Using 'Posts-test.tsv.dvc' as a stage file
Reproducing 'Posts-test.tsv.dvc':
    python code/split_train_test.py 0.33 20180319
Positive size 2049, negative size 97951

Sorry for creating confusion. And really appreciate your assistance!

No worries :slight_smile: Glad it resolved itself.