Using DuckDB to query DVC-versioned files directly in an object storage remote?

With DuckDB you can easily query an S3 bucket based on a glob pattern, taking advantage of hive-partitioning to avoid downloading all files and columns, e.g.,

select some_col from 'mydata/*.parquet' where some_other_col = 5

Since DVC stores files by MD5 hash in object storage, glob patterns that would work locally don’t work in S3.

Has anyone dealt with a similar use case, where you don’t want to dvc pull all files to run the query locally? The only solution I can think of at the moment is to duplicate the working directory structure at a given commit somewhere else in object storage and query that.

I just realized DuckDB is compatible with fsspec filesystems, and DVC provides one. Would it be possible to adapt the example in the docs to use a DVC filesystem?