There’s also now an official guide that incorporates a lot of this info thanks to @jorgeorpinel: SSH & SFTP.
A: SSH is a protocol to connect/logging to a remote computer, and it is commonly used to execute commands or transfer files on a remote machine.
For that to work, you’ll need an SSH client and a server. You can type ssh -V
on your terminal to see if you have the client already installed (normally included in OSX or Linux distributions), if you are on windows, the most used client is PuTTY
Now, on the computer that you want to reach, you should have the SSH server, usually sshd
(or SSH Daemon), if you are running an operative system with systemd you can check if it is running by executing on the remote server systemctl status sshd
(and it should say “Active”).
If everything went well, and you have both, a client on your computer and a server on the other computer that you want to connect to, you can use the client (either ssh
or putty
by specifying the address of the remote machine - an IP or hostname that resolves to the desire destination).
DVC needs to know the same information in order to establish a connection, and the way you configure it is by using the remote add/modify
command. Let’s say that we are using the following configurations:
hostname/ip: example.com
user: mroutis
password: 123456789
port: 22
So the way I can configure the SSH remote with DVC will be the following:
dvc remote add --default ssh-storage ssh://example.com/path/to/storage
dvc remote modify ssh-storage user mroutis
dvc remote modify ssh-storage port 22
dvc remote modify --local ssh-storage password 123456789
Notice the
--local
on the last command, this is important, because the dvc remote
command modifies the .dvc/config
file and this file is stored with Git, so, you can end up with this information on your GitHub/BitBucket/GitLab or whatever you use to distribute your code repository.
Instead of password the recommended way is to use SSH keys.
There’s a difference between the host example.com
and the URL I’m passing to DVC ssh://example.com/path/to/storage
DVC needs a URL with the following standard: scheme://host/path
, in this case, the scheme is ssh
, the host is the one we already have, and the path is the absolute path were DVC will be uploading the cache.
DVC will be uploading ( push
) and downloading ( pull
) files from your remote, the URL ssh://hostname/path/to/storage
is just a way to tell DVC in which directory it will store the data. You can connect to your server with PuTTY
and then create a directory with the following command mkdir /tmp/dvc-storage
, now, if you configure DVC with the following URL ssh://your-host-name-or-ip/tmp/dvc-storage
when you dvc push
it will upload files to that directory that you created previously /tmp/dvc-storage
and when doing dvc pull
it will connect to the remote computer via SSH, go to that directory ( /tmp/dvc-storage/
) and then download the files via SSH.