I have run this command, but it looks like it didn’t push any targets in the remote cache:
$ dvc exp push origin rf_sgm_v6.01 -v --no-cache
2024-07-11 17:09:57,717 DEBUG: v3.51.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
2024-07-11 17:09:57,717 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc exp push origin rf_sgm_v6.01 -v --no-cache
2024-07-11 17:09:57,904 DEBUG: git push experiment ['refs/exps/f6/0bc58f3f1ee9bbe7842f164d360270fc677032/rf_sgm_v6.01:refs/exps/f6/0bc58f3f1ee9bbe7842f164d360270fc677032/rf_sgm_v6.01'] -> 'origin'
2024-07-11 17:09:59,395 DEBUG: Studio token not found.
Experiment rf_sgm_v6.01 is up to date on Git remote 'origin'.
To push cached outputs for this experiment to DVC remote storage,re-run this command without '--no-cache'.
2024-07-11 17:09:59,447 DEBUG: Analytics is enabled.
2024-07-11 17:09:59,488 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpy3w8oiof', '-v']
2024-07-11 17:09:59,493 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpy3w8oiof', '-v'] with pid 6879
I’ve also tried --no-run-cache
because I supposed this flag could help, but got the following error:
$ dvc exp push origin rf_sgm_v6.01 -v --no-run-cache
2024-07-11 17:23:36,273 DEBUG: v3.51.2 (pip), CPython 3.10.12 on Linux-6.8.0-35-generic-x86_64-with-glibc2.35
2024-07-11 17:23:36,273 DEBUG: command: /home/ermolaev/projects/radml/venv/bin/dvc exp push origin rf_sgm_v6.01 -v --no-run-cache
2024-07-11 17:23:36,477 DEBUG: git push experiment ['refs/exps/f6/0bc58f3f1ee9bbe7842f164d360270fc677032/rf_sgm_v6.01:refs/exps/f6/0bc58f3f1ee9bbe7842f164d3
60270fc677032/rf_sgm_v6.01'] -> 'origin'
2024-07-11 17:23:38,032 DEBUG: dvc push experiment '[ExpRefInfo(baseline_sha='f60bc58f3f1ee9bbe7842f164d360270fc677032', name='rf_sgm_v6.01')]'
Collecting |0.00 [00:00, ?entry/s]
<...>
FileNotFoundError: [Errno 2] No such file or directory: '/data/projects/radml/DVC_CACHE//1c/bddf98ac99bbf716816b36811e3fb6.dir'
2024-07-11 17:29:57,551 DEBUG: Preparing to transfer data from '/data/projects/radml/DVC_CACHE/' to 'gdrive://1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL'
2024-07-11 17:29:57,551 DEBUG: Preparing to collect status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL'
2024-07-11 17:29:57,552 DEBUG: Collecting status from '1GkI-Tgwdi1r2oat1E0wxFG3sXc6oL2UL'
2024-07-11 17:29:57,555 DEBUG: Querying 1 oids via object_exists
2024-07-11 17:29:58,629 DEBUG: Querying 12 oids via object_exists
2024-07-11 17:30:01,965 DEBUG: Estimated remote size: 256 files
2024-07-11 17:30:01,967 DEBUG: Querying 45 oids via traverse
Pushing
2024-07-11 17:30:39,931 DEBUG: Studio token not found.
Experiment rf_sgm_v6.01 is up to date on Git remote 'origin'.
2024-07-11 17:30:39,987 ERROR: failed to push cache: <HttpError 400 when requesting https://www.googleapis.com/drive/v2/files returned "The query is too com
plex.". Details: "[{'message': 'The query is too complex.', 'domain': 'global', 'reason': 'queryTooComplex', 'location': 'q', 'locationType': 'parameter'}]"
>
Traceback (most recent call last):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/files.py", line 84, in _GetList
self.auth.service.files()
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/googleapiclient/http.py", line 938, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://www.googleapis.com/drive/v2/files returned "The query is too complex.". Details: "[
{'message': 'The query is too complex.', 'domain': 'global', 'reason': 'queryTooComplex', 'location': 'q', 'locationType': 'parameter'}]">
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/push.py", line 126, in push
result["uploaded"] = _push_cache(repo, pushed_refs_info, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/push.py", line 182, in _push_cache
return repo.push(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/push.py", line 147, in push
push_transferred, push_failed = ipush(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/index/push.py", line 76, in push
result = transfer(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
status = compare_status(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
dest_exists, dest_missing = status(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 151, in status
exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/db.py", line 454, in oids_exist
return list(oids & set(wrap_iter(remote_oids, callback)))
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/db.py", line 35, in wrap_iter
for index, item in enumerate(iterable, start=1):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/db.py", line 346, in _list_oids_traverse
yield from self._list_oids(prefixes=traverse_prefixes, jobs=jobs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/db.py", line 250, in _list_oids
for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/db.py", line 225, in _list_prefixes
yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 529, in find
yield from self.fs.find(path)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/fs/spec.py", line 490, in find
for item in self._gdrive_list_ids(query_ids):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/funcy/decorators.py", line 47, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/funcy/flow.py", line 99, in retry
return call()
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/funcy/decorators.py", line 68, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/fs/spec.py", line 308, in <lambda>
get_list = _gdrive_retry(lambda: next(file_list, None))
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/apiattr.py", line 150, in __next__
result = self._GetList()
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/auth.py", line 85, in _decorated
return decoratee(self, *args, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/pydrive2/files.py", line 89, in _GetList
raise ApiRequestError(error)
pydrive2.files.ApiRequestError: <HttpError 400 when requesting https://www.googleapis.com/drive/v2/files returned "The query is too complex.". Details: "[{'
message': 'The query is too complex.', 'domain': 'global', 'reason': 'queryTooComplex', 'location': 'q', 'locationType': 'parameter'}]">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/commands/experiments/push.py", line 55, in run
result = self.repo.experiments.push(
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/__init__.py", line 364, in push
return push(self.repo, *args, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/scm_context.py", line 143, in run
return method(repo, *args, **kw)
File "/home/ermolaev/projects/radml/venv/lib/python3.10/site-packages/dvc/repo/experiments/push.py", line 134, in push
raise UploadError("failed to push cache", result) from e
dvc.repo.experiments.push.UploadError: failed to push cache
2024-07-11 17:30:39,990 DEBUG: Analytics is enabled.
2024-07-11 17:30:40,043 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmpp2q20wf1', '-v']
2024-07-11 17:30:40,049 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmpp2q20wf1', '-v'] with pid 8052
It looks like that dvc push
is ok with missed data anyway, but the final request to the gdrive is too complex. I’ve tried also the following command “dvc push” and got the same issue while pushing specific target finishes with ok status.
I don’t know how to interpret this message. What should I do? Could it be that repo contains too many targets to be pushed into gdrive (I’ve checked limits by size and count of files and it’s still ok)?
I also decided to try to push the same experiment on S3 and it’s finished with ok status:
$ dvc exp push origin rf_sgm_v6.01 -v --no-run-cache -r yadrive
<...>
2024-07-11 17:47:23,298 DEBUG: Preparing to transfer data from '/data/projects/radml/DVC_CACHE/files/md5' to 's3://cvisionrad-ml-data/files/md5'
2024-07-11 17:47:23,298 DEBUG: Preparing to collect status from 'cvisionrad-ml-data/files/md5'
2024-07-11 17:47:23,299 DEBUG: Collecting status from 'cvisionrad-ml-data/files/md5'
2024-07-11 17:47:23,300 DEBUG: Querying 214 oids via object_exists
2024-07-11 17:47:26,525 DEBUG: Indexing new .dir '6cba07d1ca2e93b30166b8bdc2e21a73.dir' with '2' nested files
2024-07-11 17:47:26,689 DEBUG: 'cvisionrad-ml-data/files/md5/00' doesn't look like a cache file, skipping
2024-07-11 17:47:26,690 DEBUG: Estimated remote size: 4096 files
2024-07-11 17:47:26,691 DEBUG: Querying 1403 oids via traverse
2024-07-11 17:47:26,790 DEBUG: Preparing to collect status from '/data/projects/radml/DVC_CACHE/files/md5'
2024-07-11 17:47:26,790 DEBUG: Collecting status from '/data/projects/radml/DVC_CACHE/files/md5'
2024-07-11 17:47:26,830 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
md5: ad7272d675a124f37f10bc398465ef1d
md5: d434b7e969ccea967016fcdbfd77eee6
md5: 7c3fe9919a5a653cc452dba9b49a8211
md5: c0ed4d8be64bd04b652f6d1605ad35b9.dir
<...>
2024-07-11 17:47:26,908 DEBUG: transfer dir: md5: 2f98661e28ec4d1a3cd6fb3ea63f9156.dir with 4 files
2024-07-11 17:48:33,659 DEBUG: transfer dir: md5: 61f81d4920dea83b745edd5aa482c00a.dir with 1 files
2024-07-11 17:48:33,734 DEBUG: directory 'md5: 61f81d4920dea83b745edd5aa482c00a.dir' contains missing files, skipping .dir file upload
2024-07-11 17:48:33,880 DEBUG: transfer dir: md5: 46ac83add73a3028cbadecee1d2a480d.dir with 4 files
2024-07-11 17:52:21,563 DEBUG: transfer dir: md5: eddc406f7c5257821fe9429e6cf1c55c.dir with 3 files
2024-07-11 17:52:22,036 DEBUG: transfer dir: md5: 06511bea29d83440a5e7ac3e34c573eb.dir with 3 files
2024-07-11 17:52:22,336 DEBUG: transfer dir: md5: 24900e46d96e4cc7c10097e02c3fa35e.dir with 6 files
2024-07-11 17:52:24,198 DEBUG: transfer dir: md5: 48772dedc99c75d126200daea53a9757.dir with 4 files
2024-07-11 17:53:12,348 DEBUG: transfer dir: md5: da5bcbd96cbc59c869b4e04a14445c5e.dir with 0 files
2024-07-11 17:53:12,350 DEBUG: directory 'md5: da5bcbd96cbc59c869b4e04a14445c5e.dir' contains missing files, skipping .dir file upload
2024-07-11 17:53:12,432 DEBUG: transfer dir: md5: a6e9497ab771d6f7b18b9a6f852038d9.dir with 0 files
2024-07-11 17:53:12,433 DEBUG: directory 'md5: a6e9497ab771d6f7b18b9a6f852038d9.dir' contains missing files, skipping .dir file upload
2024-07-11 17:53:12,523 DEBUG: transfer dir: md5: 1a6bae2f85bc9b30d446c91ac9602c32.dir with 4 files
2024-07-11 17:53:52,643 DEBUG: transfer dir: md5: f15aa5cafe956ac709f52ef977db2d16.dir with 4 files
2024-07-11 17:54:29,195 DEBUG: transfer dir: md5: 295a62e47c066006b3ff583c8c09148d.dir with 4 files
Pushing
0%| |Pushing to s3
P.S. I don’t provide full log because this command tries to push everything from the repo into S3 and it’s not what I’d like to do.
So, it looks like something is in pydrive2
package or somewhere near google drive, but I don’t know how to create the same conditions synthetically.