I am having a puzzling problem with dvc pull
. It works great manually, but not when building a docker image.
I had all kinds of problems with an old project I was reading so I decided to start all from zero. If you need to see the whole new project I wrote it is here
I have a models folder under mlops_basic_modif
in which running train.py
and convert_model_to_onnx.py
has generated two models. This folder is dvc controlled. And my remote is simply a folder in another location of my local computer.
When I run
dvc status
Data and pipelines are up to date.
dvc doctor
DVC version: 3.58.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-49-generic-x86_64-with-glibc2.35
Subprojects:
dvc_data = 3.16.7
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.9
Supports:
http (aiohttp = 3.11.9, aiohttp-retry = 2.9.1),
https (aiohttp = 3.11.9, aiohttp-retry = 2.9.1)
Config:
Global: /home/sensetime/.config/dvc
System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb2
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/sdb2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/1c16a182c2a2b603623bb9e578d560a0
The thing is :
- If I delete one of the models manually and I do
$ poetry run dvc pull
Collecting |3.00 [00:00, 48.7entry/s]
Fetching
Building workspace index |4.00 [00:00, 381entry/s]
Comparing indexes |5.00 [00:00, 800entry/s]
Applying changes |1.00 [00:00, 1.46file/s]
M models/
1 file modified
The pull is successful. It is the same if I do
$ poetry run dvc pull models.dvc
Collecting |0.00 [00:00, ?entry/s]
Fetching
Building workspace index |4.00 [00:00, 1.45kentry/s]
Comparing indexes |5.00 [00:00, 1.65kentry/s]
Applying changes |1.00 [00:00, 67.9file/s]
M models/
1 file modified
Everything works great even if I delete the whole models folder.
However
- I have a Dockerfile inside the
mlops_basics_modif
folder
# Use an official Python image
FROM python:3.9-slim
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1
# Set the working directory in the container
WORKDIR /app
# Copy the requirements file into the container
COPY requirements_inference.txt /app/
# Install dependencies
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir -r requirements_inference.txt
# Copy the app code into the container
COPY . /app/
RUN rm -rf /app/models # Ensure the models directory is excluded
# Initialize DVC repository
RUN dvc init --no-scm
# configuring remote server in dvc
# RUN dvc remote add -d mylocalremote /media/sensetime/cbe421fe-1303-4821-9392-a849bfdd00e21/DVC_remote3
RUN dvc remote add -d mylocalremote /mnt/dvc_remote
RUN cat .dvc/config
# pulling the trained model
RUN dvc pull models.dvc
# Expose port 8000
EXPOSE 8000
# Run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
and a docker-compose.yml
version: "3"
services:
prediction_api:
build: .
container_name: "inference_container"
ports:
- "8000:8000"
volumes:
- ~/dvc_remote:/mnt/dvc_remote # Map the remote directory to the container
When I do docker compose up
the dvc pulling fails
[+] Building 6.2s (14/14) FINISHED docker:default
=> [prediction_api internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.01kB 0.0s
=> [prediction_api internal] load metadata for docker.io/library/python:3.9-slim 0.7s
=> [prediction_api internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [prediction_api 1/10] FROM docker.io/library/python:3.9-slim@sha256:6250eb7983c08b3cf5a7db9309f8630d3ca03dd152158fa3 0.0s
=> [prediction_api internal] load build context 0.0s
=> => transferring context: 6.30kB 0.0s
=> CACHED [prediction_api 2/10] WORKDIR /app 0.0s
=> CACHED [prediction_api 3/10] COPY requirements_inference.txt /app/ 0.0s
=> CACHED [prediction_api 4/10] RUN pip install --no-cache-dir --upgrade pip && pip install --no-cache-dir -r requi 0.0s
=> [prediction_api 5/10] COPY . /app/ 0.5s
=> [prediction_api 6/10] RUN rm -rf /app/models # Ensure the models directory is excluded 0.3s
=> [prediction_api 7/10] RUN dvc init --no-scm 1.4s
=> [prediction_api 8/10] RUN dvc remote add -d mylocalremote /mnt/dvc_remote 1.1s
=> [prediction_api 9/10] RUN cat .dvc/config 0.4s
=> ERROR [prediction_api 10/10] RUN dvc pull models.dvc 1.5s
------
> [prediction_api 10/10] RUN dvc pull models.dvc:
1.277 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
1.277 md5: 523fafc64cf9d92f9d7e2bd21dad1406.dir
1.302 Everything is up to date.
1.302 ERROR: failed to pull data from the cloud - Checkout failed for following targets:
1.302 models
1.302 Is your cache up to date?
1.302 <https://error.dvc.org/missing-files>
------
failed to solve: process "/bin/sh -c dvc pull models.dvc" did not complete successfully: exit code: 1
I don’t know how to interpret
1.277 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
1.277 md5: 523fafc64cf9d92f9d7e2bd21dad1406.dir
The only thing I am suspecting now, is that maybe docker is having the .dvc
folder inside the mlops_basics_modif
folder and this might cause the problem??
Can someone help me here please. I am stuck in this