What I want to achieve:
Hi! I tried to add data from my Google Drive to local dvc repository, but I don’t want to download the data, as I usually train my models on Google Colab and don’t need all the data on my local machine. But I want to use dvc on Google Colab to track version of data on Google Drive. After I trained my final model, I want to do dvc pull
to get a needed version of the data, because I need to write my model in python classes but not in notebooks, so I might need the dataset on my local machine
What I tried:
- I created a git repository and a dvc repository on the google drive and used
dvc add data
there. - Then I created a git repo and a dvc repo in my local project folder, used
dvc remote add mydrive gdrive://*folder-on-gdrive-id*
- Tried to use
dvc add --to-remote -r mydrive https://*url-to-data-folder*
, but I got an error: “ERROR: unexpected error - seek”.
My questions:
- Is my plan I described at the beginning correct?
- What does this error mean?
dvc version
output on my local machine:
DVC version: 3.50.1 (pip)
Platform: Python 3.10.8 on Windows-10-10.0.22631-SP0
Subprojects:
dvc_data = 3.15.1
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.2
Supports:
gdrive (pydrive2 = 1.19.0),
http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3)
Config:
Global: C:\Users\Lenovo\AppData\Local\iterative\dvc
System: C:\ProgramData\iterative\dvc
Cache types: https://error.dvc.org/no-dvc-cache
Caches: local
Remotes: gdrive
Workspace directory: NTFS on D:
Repo: dvc, git
Repo.site_cache_dir: C:\ProgramData\iterative\dvc\Cache\repo\de1325664c16713ee098d5f1f1a9c37a