Is it possible to use DVC without knowing how it works internally?

dashohoxha · August 12, 2019, 10:31am

I am wondering, when we do a dvc init, dvc add etc. why don’t we do an automatic git add and git commit as well (with a suitable message)?

In case there are some local changes on the repo, it is possible to do a git stash before and a git stash pop after these changes, so that they don’t mess with other changes.

This question is also related somehow to another question.
I see all over the docs that the internals of how DVC works are explained everywhere. I just wonder, is it possible to use DVC without knowing how it works internally, without having to explain it to the users, similar to git?

Of course, knowing the internals has its own advantages (in understanding better the commands and their options, making better decisions, etc.) but I think that it should not be a must for using DVC.

Note: Sorry if I am raising questions that have been discussed and resolved before, but it is much easier to ask than to search all the docs and previous discussions.

Paffciu · August 12, 2019, 11:35am

At some point we have been researching if its possible to introduce more git hooks. Regretfully capabilities of git hooks are not enough for our use case. Thats why there is idea to introduce feature that would automatically handle git calls. Here is some context. I think we should create issue for autoscm feature.

shcheklein · August 12, 2019, 3:51pm

@dashohoxha good questions!! no need to be sorry at all.

why don’t we do an automatic git add and git commit as well (with a suitable message)?

People have a lot of different opinions on when and how commits should be created. It’s not about stash only, it’s about being in the right branch, being up to date with the upstream, etc. Any edge case and it will require significant effort to recover. So, it’s just to keep it simple and less opinionated I
believe.

knowing the internals

It’s totally possible to do this, that’s why we use expandable sections a lot. It’s still very beneficial to learn how DVC-files are organized, for example. Similar to Git, it gives just another level of ability to use tool, understand its options, etc.

dashohoxha · August 12, 2019, 4:55pm

If we tell the user: “Now do git add Posts.xml.dvc data/.gitignore”, then we also have to explain them why, how DVC manages data files internally, etc. If we do it automatically, followed by an auto-commit, then probably the user does not have to concern himself with the internal details (which may also change time after time, if we realize that there are better or more efficient ways to implement things). In this case the user can focus only on the high level steps of DS/ML processing. This is a kind of encapsulation.

I trust that you are right about this, but I still cannot conceive how it can go wrong. Maybe because I am not well-versed on DS/ML processes and workflows (I would need to see some concrete cases/examples where this can fail hopelessly, without being able to prevent or repair it).

What if there is a configuration option like auto-commit=True, which users can disable if they wish? In this case it will do auto-commit by default, unless there is a complex situation where user does not want this to happen?

jorgeorpinel · August 12, 2019, 6:48pm

@dashohoxha have you noticed the dvc install command? It partially does what you suggest.

Topic		Replies	Views
Does DVC actually require Git, or would Mercurial work just as well? Questions	4	1461	January 1, 2020
DVC and version control systems other than Git Questions	3	1148	September 10, 2018
Cannot git commit data changes? Questions	3	15	June 9, 2025
Peer reviews with DVC Questions	0	404	July 29, 2022
DVC compared with GitLFS for storage and versioning only Questions	12	6903	October 13, 2020

Is it possible to use DVC without knowing how it works internally?

Related topics