I understand this may not be fully related to dvc, but since the problem happens when following the tutorial at https://blog.dataversioncontrol.com/data-version-control-tutorial-9146715eda46. So here is my question:
When following tutorial at the step in executing
dvc run -d data/Posts.tsv -d code/split_train_test.py -d code/conf.py -o data/Posts-test.tsv -o data/Posts-train.tsv python code/split_train_test.py 0.33 20180319
it throws error
from ._sparsetools import csr_tocsc, csr_tobsr, csr_count_blocks, \ ImportError: /tmp/_MEIUqCWxh/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /usr/lib/python2.7/dist-packages/scipy/sparse/_sparsetools.x86_64-linux-gnu.so) Failed to run command: Stage 'Posts-test.tsv.dvc' cmd python code/split_train_test.py 0.33 20180319 failed
I am not familiar with python, nor data science, but was just trying to evaluate if dvc fits our internal requirement so we can decide if going with dvc or not.
How can I fix this error? Otherwise any even simpler version that can basically just show dataset, model are versioned so we can see the differences, say, between version 0.0.1 and 0.0.2 and its diff, or that kind of things?