I am using the Python api to pull a file from remote s3. The call looks like this:
) as fd:
df = pd.read_csv(fd)
This works successfully to download the file; however, I’m trying to achieve an increase in speed. I would like the api to download the file from s3 if it doesn’t already exist in my local cache. If it’s in my local cache, then I would like to get the file from there since it would be much faster.
I realize that I could dvc pull the file from the command line and change my api to my local repo; however, I’m trying to keep the code ‘path’ agnostic so that other users (who share the data repo) can run this script without having to modify paths in the code.
I’ve looked through the forum and the api documentation, but haven’t been able to find a solution. Is this something that is possible, or perhaps a potential feature in the future? It would really speed up my workflow.