Install from source

I am trying to use DVC in an academic HPC environment without root privileges.
It would be extremely helpful to have an option to install DVC from source so that I can create a module that other team members can simply load to start using DVC.
We have been experiencing problems with the conda installation option (specifically when adding dvc-ssh) and this kind of friction before I can even demonstrate DVC capabilities might become a deal breaker.

Hi John,

What kind of problems are you experiencing with with conda?

if pip is available, I would go with that.
First create a virtualenv and then install dvc in the virtualenv in order to have an isolated environment:

pip install virtualenv 
python -m venv dvc_virtualenv
source dvc_virtualenv/bin/activate
pip install 'dvc[ssh]'
dvc doctor

Hi dtrifiro

Thanks for the quick response. Personally I was able to get DVC running using conda just fine, but my colleague ran into problems running dvc pull from an ssh remote. Initially, his conda environment did not include dvc-ssh but neither installing the package nor rebuilding the environment from scratch fixed the problem. DVC still complained about missing dvc-ssh, even though both dvc and dvc-ssh were listed from conda list.

Perhaps you could look into the conda issue. But the heart of this feature request is to be able to install DVC without conda or virtual environments, and without root privileges. Then I can create a modulefile that puts dvc in the $PATH so that less technical team members can access dvc with just module load dvc, which they can include in their ~/.bashrc file.

This slightly convoluted approach is necessary in academic HPC environments with constrained privileges. But the academic community is crying out for better data management so it would be great to be able to make dvc readily available.

That said, still happy to troubleshoot conda if that is a quicker route to a solution.

  1. I think I have figured out what we were doing wrong with ssh remotes. I’ll test further and report back.
  2. I have come up with a work-around for creating a dvc module in an HPC environment without admin privileges. Again, I’ll describe the work-around after further testing
1 Like

Hi John,
how about creating a virtualenv dedicated to dvc and just having people put /path/to/virtualenv/bin/ in their PATH?

People would just have to add the following to their .bashrc

export DVC_VIRTUALENV=/path/to/dvc/venv
export PATH=$PATH:$DVC_VIRTUALENV/bin

Thanks dtrifiro.

Most of my team uses conda pretty heavily, so I’m reluctant to introduce nested environments. It would probably work if we are careful about the order of operations, but might still lead to unexpected problems.

The work-around we have used goes as follows:

  1. Create a Docker image for dvc. Mine is at Docker Hub but might be good if you guys released an official image.
  2. Convert the Docker image to a Singularity image.
  3. Write a bash wrapper script called ‘dvc’ that executes the Singularity image with the appropriate bindings and executes the dvc command inside the container, transparently passing on all parameters into the container.
  4. Write a module file that puts the wrapper script in the path when the module is loaded.

This may sound like a lot of work, but I’ve written a script that does steps 2, 3 and 4 for any containerised command, so the hardest part was step 1.

Anyway, so far so good.

Hi, there’s no dvc only docker image, but there’s a cml image, which includes dvc.

docker pull ghcr.io/iterative/cml:0-dvc1-base1

Thanks for the feedback