How to remove cache for specific targets/imports?

Hi, I am working with several quite large datasets, dvc pipelines and a few of big outputs they produce.

Let’s say that I have the following pipeline (pseudo-graph):

        DVC imports                DVC pipeline             Pipeline outs

     ┌────────────────┐        ┌────────────────┐         ┌───────────────┐
     │                │        │                │         │               │
     │                │        │   Stage  AB    │         │   Output      │
     │    Import      │        │                │         │               │
     │      A         ├───────►│                ├────────►│     AB        │
     │                │        │                │         │               │
     │                │        │                │         │               │
     └────────────────┘        └────────────────┘         └───────────────┘
                                       ▲
     ┌────────────────┐                │
     │                │                │
     │                │                │
     │    Import      │                │
     │      B         ├────────────────┘
     │                │
     │                │
     └────────────────┘

     ┌────────────────┐        ┌────────────────┐         ┌────────────────┐
     │                │        │                │         │                │
     │                │        │                │         │    Output      │
     │    Import      ├───────►│    Stage C     ├────────►│                │
     │      C         │        │                │         │      C         │
     │                │        │                │         │                │
     │                │        │                │         │                │
     └────────────────┘        └────────────────┘         └────────────────┘

I download cache for the imports (A, B, C) (dvc pull A.dvc B.dvc C.dvc) and start working on the parts of DVC pipeline that reproduce the C outputs. After finishing my work, I run dvc repro, commit everything and push DVC cache to remote. Now I want to focus on the part of the pipeline concerned with AB. However I do not need neither the import C, nor the outputs of the C stage. Unfortunately, they are taking my disk space by residing in the cache. Can I somehow remove C and its output from (local) cache but keep A, and B? Thanks for any help!

1 Like