Dvc pull does not work from container

I am having a puzzling problem with dvc pull. It works great manually, but not when building a docker image.

I had all kinds of problems with an old project I was reading so I decided to start all from zero. If you need to see the whole new project I wrote it is here

I have a models folder under mlops_basic_modif in which running train.py and convert_model_to_onnx.py has generated two models. This folder is dvc controlled. And my remote is simply a folder in another location of my local computer.

When I run

dvc status
Data and pipelines are up to date.
dvc doctor
DVC version: 3.58.0 (pip)
-------------------------
Platform: Python 3.10.12 on Linux-6.8.0-49-generic-x86_64-with-glibc2.35
Subprojects:
	dvc_data = 3.16.7
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.9
Supports:
	http (aiohttp = 3.11.9, aiohttp-retry = 2.9.1),
	https (aiohttp = 3.11.9, aiohttp-retry = 2.9.1)
Config:
	Global: /home/sensetime/.config/dvc
	System: /etc/xdg/xdg-ubuntu/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb2
Caches: local
Remotes: local
Workspace directory: ext4 on /dev/sdb2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/1c16a182c2a2b603623bb9e578d560a0

The thing is :

  1. If I delete one of the models manually and I do
$ poetry run dvc pull 
Collecting                                                                                            |3.00 [00:00, 48.7entry/s]
Fetching
Building workspace index                                                                              |4.00 [00:00,  381entry/s]
Comparing indexes                                                                                     |5.00 [00:00,  800entry/s]
Applying changes                                                                                      |1.00 [00:00,  1.46file/s]
M       models/
1 file modified

The pull is successful. It is the same if I do

$ poetry run dvc pull models.dvc
Collecting                                                                                            |0.00 [00:00,    ?entry/s]
Fetching
Building workspace index                                                                             |4.00 [00:00, 1.45kentry/s]
Comparing indexes                                                                                    |5.00 [00:00, 1.65kentry/s]
Applying changes                                                                                      |1.00 [00:00,  67.9file/s]
M       models/
1 file modified

Everything works great even if I delete the whole models folder.

However

  1. I have a Dockerfile inside the mlops_basics_modif folder
# Use an official Python image
FROM python:3.9-slim

# Set environment variables
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements_inference.txt /app/

# Install dependencies
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements_inference.txt

# Copy the app code into the container
COPY . /app/
RUN rm -rf /app/models  # Ensure the models directory is excluded

# Initialize DVC repository
RUN dvc init --no-scm

# configuring remote server in dvc
# RUN dvc remote add -d mylocalremote /media/sensetime/cbe421fe-1303-4821-9392-a849bfdd00e21/DVC_remote3
RUN dvc remote add -d mylocalremote /mnt/dvc_remote

RUN cat .dvc/config
# pulling the trained model
RUN dvc pull models.dvc


# Expose port 8000
EXPOSE 8000

# Run the application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

and a docker-compose.yml

version: "3"
services:
    prediction_api:
        build: .
        container_name: "inference_container"
        ports:
            - "8000:8000"
        volumes:
            - ~/dvc_remote:/mnt/dvc_remote  # Map the remote directory to the container

When I do docker compose up the dvc pulling fails

[+] Building 6.2s (14/14) FINISHED                                                                               docker:default
 => [prediction_api internal] load build definition from Dockerfile                                                        0.0s
 => => transferring dockerfile: 1.01kB                                                                                     0.0s
 => [prediction_api internal] load metadata for docker.io/library/python:3.9-slim                                          0.7s
 => [prediction_api internal] load .dockerignore                                                                           0.0s
 => => transferring context: 2B                                                                                            0.0s
 => [prediction_api  1/10] FROM docker.io/library/python:3.9-slim@sha256:6250eb7983c08b3cf5a7db9309f8630d3ca03dd152158fa3  0.0s
 => [prediction_api internal] load build context                                                                           0.0s
 => => transferring context: 6.30kB                                                                                        0.0s
 => CACHED [prediction_api  2/10] WORKDIR /app                                                                             0.0s
 => CACHED [prediction_api  3/10] COPY requirements_inference.txt /app/                                                    0.0s
 => CACHED [prediction_api  4/10] RUN pip install --no-cache-dir --upgrade pip &&     pip install --no-cache-dir -r requi  0.0s
 => [prediction_api  5/10] COPY . /app/                                                                                    0.5s
 => [prediction_api  6/10] RUN rm -rf /app/models  # Ensure the models directory is excluded                               0.3s
 => [prediction_api  7/10] RUN dvc init --no-scm                                                                           1.4s
 => [prediction_api  8/10] RUN dvc remote add -d mylocalremote /mnt/dvc_remote                                             1.1s 
 => [prediction_api  9/10] RUN cat .dvc/config                                                                             0.4s 
 => ERROR [prediction_api 10/10] RUN dvc pull models.dvc                                                                   1.5s 
------                                                                                                                          
 > [prediction_api 10/10] RUN dvc pull models.dvc:                                                                              
1.277 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                         
1.277 md5: 523fafc64cf9d92f9d7e2bd21dad1406.dir                                                                                 
1.302 Everything is up to date.                                                                                                 
1.302 ERROR: failed to pull data from the cloud - Checkout failed for following targets:                                        
1.302 models
1.302 Is your cache up to date?
1.302 <https://error.dvc.org/missing-files>
------
failed to solve: process "/bin/sh -c dvc pull models.dvc" did not complete successfully: exit code: 1

I don’t know how to interpret

1.277 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                         
1.277 md5: 523fafc64cf9d92f9d7e2bd21dad1406.dir

The only thing I am suspecting now, is that maybe docker is having the .dvc folder inside the mlops_basics_modif folder and this might cause the problem??

Can someone help me here please. I am stuck in this

I would really appreciate some help here.

I have commented out the RUN dvc pull models.dvc part in the dockerfile, and run the container. Then I have entered the container with docker exec -it inference_container /bin/bash and do
ls /mnt/dvc_remote/ and yes, the remote is there!

So there is no valid reason why the pull should not work, right?

And what is more, if I am inside the container and I manually do

root@1a2dd68db6b8:/app# dvc status
models.dvc:                                                           
	changed outs:
		not in cache:       models
root@1a2dd68db6b8:/app# dvc pull models.dvc
Collecting                                                                                                                    |3.00 [00:00, 47.0entry/s]
Fetching
Building workspace index                                                                                                      |0.00 [00:00,    ?entry/s]
Comparing indexes                                                                                                             |4.00 [00:00,  171entry/s]
Applying changes                                                                                                              |2.00 [00:00,  45.4file/s]
A       models/
1 file added and 3 files fetched

So, I can manually pull the files!

Then why it does not work when building the container?

Maybe it’s because your /mnt/dvc_remote is mounted only at run, not at building.

I use a webdav remote, which is accessible during the build.