I understand this may not be fully related to dvc, but since the problem happens when following the tutorial at https://blog.dataversioncontrol.com/data-version-control-tutorial-9146715eda46. So here is my question:
When following tutorial at the step in executing
dvc run -d data/Posts.tsv -d code/split_train_test.py -d code/conf.py -o data/Posts-test.tsv -o data/Posts-train.tsv python code/split_train_test.py 0.33 20180319
it throws error
from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \
ImportError: /tmp/_MEIUqCWxh/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/python2.7/dist-packages/scipy/sparse/_sparsetools.x86_64-linux-gnu.so)
Failed to run command: Stage 'Posts-test.tsv.dvc' cmd python code/split_train_test.py 0.33 20180319 failed
I am not familiar with python, nor data science, but was just trying to evaluate if dvc fits our internal requirement so we can decide if going with dvc or not.
How can I fix this error? Otherwise any even simpler version that can basically just show dataset, model are versioned so we can see the differences, say, between version 0.0.1 and 0.0.2 and its diff, or that kind of things?
Thanks
1 Like
Hi @jtodd5527 !
The error you are encountering seems to be the same as in https://github.com/iterative/dvc/issues/749 . The problem is that previous versions of dvc didn’t preserve the shell you are running if it is not the default shell for your user. Could you please check that the shell you are running matches the default one for your user? I.e. these two commands should show the same shell:
$ echo $0
# /bin/zsh
$ grep $USER /etc/passwd
# efiop:x:1000:1000:efiop:/home/efiop:/bin/zsh
If shells don’t match, you could try running chsh -s $(echo $0) $USER
to set the current shell as a default. After that you could try running your dvc run
command once again. Please let us know if it worked for you.
The fix for that issue has been merged into master and going to be released in 0.10.0(end of the next week).
Thanks,
Ruslan
Not sure if that matches or not
$ echo $0
bash
$ grep $USER /etc/passwd
jtodd:x:1000:1000:jtodd,,,:/home/jtodd:/bin/bash
Executing after setting with chsh -s $(echo $0)
, it complains chsh: bash is an invalid shell
. Setting chsh -s /bin/$(echo $0)
works; however executing dvc run ...
still throws ImportError: /tmp/_MEIFQ9LXk/libstdc++.so.6: version
GLIBCXX_3.4.21’ not found`.
I will also try the release next week maybe that’d be fixed with that version. Thanks
Looks like there is still a problem with the environment Does running python code/split_train_test.py 0.33 20180319
without dvc run
work for you?
Oh. Look like it’s merely my local environment problem. I was doing some other python tasks. And before this run, I removed .local and then just reinstall related packages. The execution works fine now.
dvc run -d data/Posts.tsv -d code/split_train_test.py -d code/conf.py -o data/Posts-test.tsv -o data/Posts-train.tsv python code/split_train_test.py 0.33 20180319
Using 'Posts-test.tsv.dvc' as a stage file
Reproducing 'Posts-test.tsv.dvc':
python code/split_train_test.py 0.33 20180319
Positive size 2049, negative size 97951
Sorry for creating confusion. And really appreciate your assistance!
No worries Glad it resolved itself.