Check DVC status from PythonAPI

Hi,

I’m wondering if there is an option to check the DVC status of the repo using any of the Python APIs. I’m trying to write a utility function that shall abort training run if any of the tracked data is modified or not tracked.

Thanks

1 Like

There is no official API, but you can use the internal Repo API like this:

>>> from dvc.repo import Repo
>>> repo = Repo()
>>> repo.status() # status of the pipeline
{'featurize': [{'changed outs': {'data/features': 'modified'}}], 'train': [{'changed deps': {'data/features': 'modified'}}], 'evaluate': [{'changed deps': {'data/features': 'modified'}}], 'data/data.xml.dvc': [{'changed deps': {'get-started/data.xml (https://github.com/iterative/dataset-registry)': 'update available'}}]}
>>> repo.data_status() # status of data since last git commit
{'not_in_cache': [], 'not_in_remote': [], 'committed': {'modified': ['model.pkl', 'eval/importance.png']}, 'uncommitted': {'modified': ['data/features/']}, 'untracked': [], 'unchanged': ['data/data.xml', 'data/prepared/'], 'git': {'staged': {}, 'unstaged': ['dvc.lock', 'eval/live/plots/sklearn/cm/test.json', 'eval/live/plots/sklearn/cm/train.json', 'eval/live/plots/sklearn/roc/train.json', 'eval/prc/train.json'], 'untracked': [], 'is_empty': False, 'is_dirty': True}}
1 Like