Additional functions for the Python API library

Hello,

We are planning to use DVC for a media management system with several automation functions. As it is being written in Python, the DVC Python library would be perfect for us.

However, the functionality of this library is currently very limited. There are no methods, for example, to perform an “add” or “push” operation, or a “diff” check. As many ML experiments and apps are written in Python, it would be really useful for lots of developers if all the basic operations were supported by the programmatic API.

As DVC itself is written in Python, it wouldn’t be difficult to extend the API and we could even do it ourselves, but it would be difficult to maintain, as any update to the DVC could break it.

Do you have in your plans further work on this API?

1 Like

Hi, @yeraydavid. We don’t provide documented APIs but we do have an internal Repo API which we have plans to document and make it official. Most of the high-level APIs like add are fairly stable, but we cannot guarantee breakages at this point.

So, the command line is only a wrapper around Repo API and may return exit code and internal data structure.

For example dvc add file is same as Repo.add("file"). You have to look at our codebase to figure out more of these, but most of those mirror with Repo APIs.

Another way would be to use dvc.main.main function which is our entry point to CLI. This will however print to terminal, may show progress bars, and communicates through exit codes.

from dvc.main import main

ret = main(["add", "file"])
1 Like

Great!

Actually, using directly the Repo class was what we were planning to use in case that the “official” public API wasn’t to be extended in a near future, but we wanted to confirm it and have some advice first.

Thank you very much!