Hi DVC community,
I’m trying to implement a hybrid tracking approach where DVC manages large binary files (.stl meshes) and Git manages small metadata files (.json) within the same directories. I’m running into a conflict when trying to update DVC tracking after adding new files.
Current Setup
data/
├── models/
│ ├── sample_001_a.stl # DVC-tracked (large binary ~10-50MB)
│ ├── sample_001_a.json # Git-tracked (small metadata ~1-2KB)
│ ├── sample_001_b.stl
│ ├── sample_001_b.json
│ └── ...
├── processed/
│ ├── sample_001_x.stl # DVC-tracked
│ ├── sample_001_x.json # Git-tracked
│ └── ...
└── raw/
├── sample_001.stl # DVC-tracked
└── ...
Configuration:
.dvcignore:
data/models/*.json
data/processed/*.json
data/raw/*.json
.gitignore:
/data/raw/*.stl
/data/models/*.stl
/data/processed/*.stl
Existing DVC files:
data/models.dvc- tracks themodelsdirectorydata/processed.dvc- tracks theprocesseddirectorydata/raw.dvc- tracks therawdirectory
Git tracking:
- All JSON files are tracked by Git (currently 80+ JSON files committed)
The Problem
Initial setup worked fine. But when I add new files (e.g., sample_017_a.stl and sample_017_a.json) and try to update DVC tracking:
$ dvc status
data\models.dvc:
changed outs:
modified: data\models
$ dvc add data/models
ERROR: output 'data\models' is already tracked by SCM (e.g. Git).
The same error occurs with:
dvc add data/models --forcedvc commit data/models.dvcdvc add data/models/sample_017_a.stl(trying to add individual file)
What I’ve Verified
.dvcignoreis configured to exclude JSON files.gitignoreis configured to exclude STL filesgit ls-files data/models/only shows JSON files (not STL files)- DVC
statuscorrectly detects the new files
My Questions
-
Is it possible to use DVC directory tracking with Git tracking some files in the same directory? Even with
.dvcignoreconfigured? -
How should I update DVC tracking when adding new files to an already-tracked directory that contains Git-tracked files?
-
Is there a DVC configuration option to allow this hybrid approach?
Why am I keeping the files in the same folder that Git and DVC track?
- STL files: Large binary meshes (10-50MB each) → Best for DVC cloud storage
- JSON files: Small text metadata with ground truth labels (1-2KB each) → Best for Git (need diffs, version history, code review)
Separating into different directories would work, but it loses the logical grouping of related files (e.g., sample_001_a.stl and sample_001_a.json belong together).
Environment
- DVC version: 3.59.2 (pip)
Any guidance would be greatly appreciated! Is this hybrid approach supported, or should I restructure to separate directories?
Thank you!