How do I use DVC with SSH remote

Q: I’m just getting started with DVC, but I’d like to use it for multiple developers to access the data and share models and code.

I do own the server, but I’m not sure how to use DVC with SSH remote?


More context to this question and original versions are here.

1 Like

A: SSH is a protocol to connect/logging to a remote computer, and it is commonly used to execute commands or transfer files on a remote machine.

For that to work, you’ll need an SSH client and a server. You can type ssh -V on your terminal to see if you have the client already installed (normally included in OSX or Linux distributions), if you are on windows, the most used client is PuTTY

Now, on the computer that you want to reach, you should have the SSH server, usually sshd (or SSH Daemon), if you are running an operative system with systemd you can check if it is running by executing on the remote server systemctl status sshd (and it should say “Active”).

If everything went well, and you have both, a client on your computer and a server on the other computer that you want to connect to, you can use the client (either ssh or putty by specifying the address of the remote machine - an IP or hostname that resolves to the desire destination).

DVC needs to know the same information in order to establish a connection, and the way you configure it is by using the remote add/modify command. Let’s say that we are using the following configurations:

hostname/ip: example.com
user: mroutis
password: 123456789
port: 22

So the way I can configure the SSH remote with DVC will be the following:

dvc remote add --default ssh-storage ssh://example.com/path/to/storage
dvc remote modify ssh-storage user mroutis
dvc remote modify ssh-storage port 22
dvc remote modify --local ssh-storage password 123456789

:warning: Notice the --local on the last command, this is important, because the dvc remote command modifies the .dvc/config file and this file is stored with Git, so, you can end up with this information on your GitHub/BitBucket/GitLab or whatever you use to distribute your code repository.

Instead of password the recommended way is to use SSH keys.

There’s a difference between the host example.com and the URL I’m passing to DVC ssh://example.com/path/to/storage DVC needs a URL with the following standard: scheme://host/path, in this case, the scheme is ssh, the host is the one we already have, and the path is the absolute path were DVC will be uploading the cache.

DVC will be uploading ( push ) and downloading ( pull) files from your remote, the URL ssh://hostname/path/to/storage is just a way to tell DVC in which directory it will store the data. You can connect to your server with PuTTY and then create a directory with the following command mkdir /tmp/dvc-storage , now, if you configure DVC with the following URL ssh://your-host-name-or-ip/tmp/dvc-storage when you dvc push it will upload files to that directory that you created previously /tmp/dvc-storage and when doing dvc pull it will connect to the remote computer via SSH, go to that directory ( /tmp/dvc-storage/ ) and then download the files via SSH.

1 Like

I don’t think this answers the original question, which is how to set up DVC so that MANY DIFFERENT users can access the ssh server with their own accounts. You just provided an explanation on how to set up the ssh remote. However, your step for the local password is new to me, and might be useful somehow? Can people set up the user for ssh as local as well? Something like
dvc remote modify --local ssh-storage user mroutis
???

1 Like

Hi @fercook !

Yes, you could use --local that way to setup different users, sure. E.g. you would do:

dvc remote modify --local ssh-storage user fercook
dvc remote modify --local ssh-storage password 123456789

and your collegue would do

dvc remote modify --local ssh-storage user mroutis
dvc remote modify --local ssh-storage password 987654321

 Let us know if you have any other questions :slightly_smiling_face:

Do I need to have DVC installed on the remote machine for it to work, or just locally?

P.S.: thanks @shcheklein for this tutorial.

Just locally, @arthurcgusmao. Please refer to https://dvc.org/doc/start/data-versioning#storing-and-sharing for an updated intro to DVC remote storage. Thanks

1 Like

@shcheklein, I wasn’t able to replicate the remote repo configuration using the argument --local as in your answer:

dvc remote add --default ssh-storage ssh://example.com/path/to/storage
dvc remote modify ssh-storage user mroutis
dvc remote modify ssh-storage port 22
dvc remote modify --local ssh-storage password 123456789

When using --local only for configuring the password, DVC says:

ERROR: configuration error - config file error: remote 'ssh-storage' doesn't exist.

What seems to be happening is that DVC looks only on the local config file in that case, disregarding the default (non-local) config.

One can, however, work around that by reconfiguring the steps before the password with the --local argument.

Should I open an issue regarding this or is it the new expected behavior?

Hi @arthurcgusmao! This is a known issue and it will be fixed in the next release in a few days - https://github.com/iterative/dvc/issues/4276 (fix is already merged).

The workaround is to run (along with the regular dvc remote add):

dvc remote add --local ssh-storage ssh://example.com/path/to/storage

to add an entry, so that it is not complaining.

2 Likes

@arthurcgusmao 1.3.1 is out with the fix. Please upgrade and give it a try :slightly_smiling_face:

2 Likes

Hey, I have tried this:
dvc remote add --default ssh-storage ssh://example.com/path/to/storage dvc remote modify ssh-storage user mroutis dvc remote modify ssh-storage port 22 dvc remote modify --local ssh-storage password 123456789

and then this:
dvc remote modify --local ssh-storage user fercook dvc remote modify --local ssh-storage password 123456789

My goal is to use ssh instead of logging in to my account every time I want to do for instance dvc list. After doing the above commands, I still have to log in every time to my github account when I execute a dvc command.

This command by shcheklein gives me the following error:
dvc remote add --local ssh-storage ssh://dvc-data/dataset-registry
ERROR: configuration error - config file error: remote 'ssh-storage' already exists. Use -f|–force to overwrite it.

I have installed dvc with pip, because installing it in Arch with yay failed, as many files were not found in the mirrors. Now that I try to upgrade with pip, I can only get version 1.11 and not 1.3.

How should I proceed? Thanks!

1 Like

Hi @Konstantina!

You have a few things happening at once. No worries, let me try to help you out of it step by step.

  1. SSH storage (dvc remote * set of commands) are not about GitHub access. They are about accessing some server that will be used to store your data files. When you do dvc push, dvc pull, etc

  2. GitHub access (e.g. dvc list might require it internally to access the repository to read .dvc files, etc). Here you don’t need anything special from the DVC perspective. You should setup GH itself to do this - Connecting to GitHub with SSH

Please git it all a try and let us know if that works or not.

Hi! Thank you for so great tool.

I am new with dvc and try to set up my ssh server.
OS:Windows, all commands run in wsl2
way of installation - pip install dvs, pip install ‘dvc[ssh]’

config:

[core]
    remote = ssh-storage
['remote "ssh-storage"']
    url = ssh://my_name@00.000.00.000:11111/home/my_user/my_path

config.local:

['remote "ssh-storage"']
    keyfile = /home/my_name/.ssh/id_rsa
    password = my_passphrase_for_private_key

the command ssh my_name@00.000.00.000 -p11111 works in the same terminal.

if I try dvc push I see

ERROR: unexpected error - Passphrase must be specified to import encrypted private keys

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

I also tried to set ask_password = true and wrote passphrase manually with the same result.

dvc with local storage works well - so, I think the problem in config, but I don’t see, what is done wrong.
Could you help me?

Hi @olga.malyugina ! Could you provide a little more info about what sequence of commands you have run to set up the remote?

Also could you share the output of dvc doctor and dvc push -v ?

Waw, Thank you for so fast answer!

There were a sequence of commands, but after it, I changed configs manually

dvc remote add --default ssh-storage ssh://olga.malyugina@00.00.000.000/home/olga.malyugina/dvc
dvc remote modify ssh-storage user olga.malyugina
dvc remote modify ssh-storage port 42022
dvc remote modify --local ssh-storage password 123456789
dvc remote modify --local ssh-storage password my_password
dvc push
pip install 'dvc[ssh]'
dvc push #didn't work

dvc doctor

(dssm-linux) ➜  data git:(feature/dvc) ✗ dvc doctor
DVC version: 2.6.4 (pip)
---------------------------------
Platform: Python 3.8.11 on Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.17
Supports:
        http (requests = 2.25.1),
        https (requests = 2.25.1),
        ssh (sshfs = 2021.8.1)
Cache types: hardlink, symlink
Cache directory: 9p on D:\
Caches: local
Remotes: ssh, local
Workspace directory: 9p on D:\
Repo: dvc, git

dvc push -v (I changed host address)

(dssm-linux) ➜  data git:(feature/dvc) ✗ dvc push -v
2021-09-07 07:42:02,683 DEBUG: Preparing to transfer data from '../../.dvc/cache' to 'ssh://olga.malyugina@00.000.00.000:42022/home/olga.malyugina/dvc'
2021-09-07 07:42:02,684 DEBUG: Preparing to collect status from 'ssh://olga.malyugina@00.000.00.000:42022/home/olga.malyugina/dvc'
2021-09-07 07:42:02,684 DEBUG: Collecting status from 'ssh://olga.malyugina@00.000.00.000:42022/home/olga.malyugina/dvc'
2021-09-07 07:42:02,697 ERROR: unexpected error - Passphrase must be specified to import encrypted private keys
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/command/data_sync.py", line 57, in run
    processed_files_count = self.repo.push(
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/repo/__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/repo/push.py", line 48, in push
    pushed += self.cloud.push(obj_ids, jobs, remote=remote, odb=odb)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/data_cloud.py", line 85, in push
    return transfer(
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/transfer.py", line 221, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/status.py", line 160, in compare_status
    dest_exists, dest_missing = status(
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/status.py", line 132, in status
    odb.hashes_exist(hashes, name=str(odb.path_info), **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/db/base.py", line 501, in hashes_exist
    remote_size, remote_hashes = self._estimate_remote_size(hashes, name)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/db/base.py", line 303, in _estimate_remote_size
    remote_hashes = set(hashes)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/db/base.py", line 257, in _hashes_with_limit
    for hash_ in self.list_hashes(prefix, progress_callback):
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/db/base.py", line 247, in list_hashes
    for path in self._list_paths(prefix, progress_callback):
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/objects/db/base.py", line 227, in _list_paths
    for file_info in self.fs.walk_files(path_info, prefix=prefix):
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 110, in walk_files
    for file in self.find(path_info, **kwargs):
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/fs/fsspec_wrapper.py", line 103, in find
    files = self.fs.find(path, detail=detail)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/funcy/objects.py", line 50, in __get__
    return prop.__get__(instance, type)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/funcy/objects.py", line 28, in __get__
    res = instance.__dict__[self.fget.__name__] = self.fget(instance)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/dvc/fs/ssh.py", line 114, in fs
    return _SSHFileSystem(**self.fs_args)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/fsspec/spec.py", line 75, in __call__
    obj = super().__call__(*args, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/sshfs/spec.py", line 77, in __init__
    self._client, self._pool = self.connect(
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/fsspec/asyn.py", line 88, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/fsspec/asyn.py", line 69, in sync
    raise result[0]
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/asyncio/tasks.py", line 494, in wait_for
    return fut.result()
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/sshfs/utils.py", line 27, in wrapper
    return await func(*args, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/sshfs/spec.py", line 92, in _connect
    client = await self._stack.enter_async_context(_raw_client)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/contextlib.py", line 568, in enter_async_context
    result = await _cm_type.__aenter__(cm)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/misc.py", line 220, in __aenter__
    self._result = await self._coro
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/connection.py", line 6798, in connect
    options = SSHClientConnectionOptions(options, config=config, host=host,
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/connection.py", line 5574, in __init__
    super().__init__(options=options, last_config=last_config, **kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/misc.py", line 268, in __init__
    self.prepare(**self.kwargs)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/connection.py", line 6256, in prepare
    self.client_keys = load_keypairs(client_keys, passphrase,
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 3141, in load_keypairs
    key, certs = read_private_key_and_certs(key, passphrase)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 2959, in read_private_key_and_certs
    key, cert = import_private_key_and_certs(read_file(filename), passphrase)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 2849, in import_private_key_and_certs
    key, end = _decode_private(data, passphrase)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 2509, in _decode_private
    key = _decode_pem_private(pem_name, headers, data, passphrase)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 2423, in _decode_pem_private
    return _decode_openssh_private(data, passphrase)
  File "/home/olgamalyugina/miniconda/envs/dssm-linux/lib/python3.8/site-packages/asyncssh/public_key.py", line 2266, in _decode_openssh_private
    raise KeyImportError('Passphrase must be specified to import '
asyncssh.public_key.KeyImportError: Passphrase must be specified to import encrypted private keys
------------------------------------------------------------

I have ssh private key with a passphrase. Can the existence of the passphrase be a problem?

Could you share the output of dvc config -l and dvc config --local -l ?

done. (I changed it a little to work with aws, but returned settings to see error with ssh one more time)

(dssm-linux) ➜  dssm_backend git:(feature/dvc) ✗ dvc config -l
remote.aws-remote.url=s3://my_path
remote.ssh-storage.url=ssh://olga.malyugina@00.000.00.000:42022/home/olga.malyugina/dvc
core.remote=ssh-storage
remote.ssh-storage.keyfile=/home/olgamalyugina/.ssh/id_rsa
remote.ssh-storage.password=my_passphrase
(dssm-linux) ➜  dssm_backend git:(feature/dvc) ✗ dvc config --local -l
remote.ssh-storage.keyfile=/home/olgamalyugina/.ssh/id_rsa
remote.ssh-storage.password=my_passphrase
2 Likes

Hi there, I am facing the exact same problem here.
To provide more insight, my dvc version is 2.7.1.
It has been strange, because I was able to run dvc push in the past (few months ago…) but today I have not been able.
The outputs are equal as @olga.malyugina

I created a ticket to track progress on this: https://github.com/iterative/dvc/issues/6561

I have seen the bug was corrected after the new version release (2.7.2).
With this new version (2.7.2), I am still facing this problem.
Will it be fixed in 2.7.3?

Yes, it will be available in the future release. You can try using it right now by installing from the upstream (pip install --upgrade 'dvc[ssh] @ git+https://github.com/iterative/dvc.git') and setting the password config option to your passphrase (dvc remote modify --local REMOTE_NAME password PASSPHRASE).