"dvc add -external S3://mybucket/data.csv" is failing with access error even after giving correct remote cache configurations

The other option is add as you discovered (also available with import-url). This is a form of bootstrapping your repo with some external data that you don’t want locally now, but that some other system with a clone of the project will be able to actually download and process on that environment.

I think that add --external (using an external cache) is the only method currently available that ensures the data never gets to the “local” environment (on any machine with a repo clone). Note that still a copy of the data may be created in the external cache, if the Minio/S3 file system doesn’t support reflinks. And if it supports symlinks or hardlinks instead, those need to be configured explicitly in the project before using add --external (see Large Dataset Optimization).

2 Likes