With DuckDB you can easily query an S3 bucket based on a glob pattern, taking advantage of hive-partitioning to avoid downloading all files and columns, e.g.,
select some_col from 'mydata/*.parquet' where some_other_col = 5
Since DVC stores files by MD5 hash in object storage, glob patterns that would work locally don’t work in S3.
Has anyone dealt with a similar use case, where you don’t want to dvc pull
all files to run the query locally? The only solution I can think of at the moment is to duplicate the working directory structure at a given commit somewhere else in object storage and query that.