I am wondering, when we do a dvc init, dvc add etc. why don’t we do an automatic git add and git commit as well (with a suitable message)?
In case there are some local changes on the repo, it is possible to do a git stash before and a git stash pop after these changes, so that they don’t mess with other changes.
This question is also related somehow to another question.
I see all over the docs that the internals of how DVC works are explained everywhere. I just wonder, is it possible to use DVC without knowing how it works internally, without having to explain it to the users, similar to git?
Of course, knowing the internals has its own advantages (in understanding better the commands and their options, making better decisions, etc.) but I think that it should not be a must for using DVC.
Note: Sorry if I am raising questions that have been discussed and resolved before, but it is much easier to ask than to search all the docs and previous discussions.
At some point we have been researching if its possible to introduce more git hooks. Regretfully capabilities of git hooks are not enough for our use case. Thats why there is idea to introduce feature that would automatically handle git calls. Here is some context. I think we should create issue for autoscm feature.
@dashohoxha good questions!! no need to be sorry at all.
why don’t we do an automatic git add and git commit as well (with a suitable message)?
People have a lot of different opinions on when and how commits should be created. It’s not about stash only, it’s about being in the right branch, being up to date with the upstream, etc. Any edge case and it will require significant effort to recover. So, it’s just to keep it simple and less opinionated I
believe.
knowing the internals
It’s totally possible to do this, that’s why we use expandable sections a lot. It’s still very beneficial to learn how DVC-files are organized, for example. Similar to Git, it gives just another level of ability to use tool, understand its options, etc.
If we tell the user: “Now do git add Posts.xml.dvc data/.gitignore”, then we also have to explain them why, how DVC manages data files internally, etc. If we do it automatically, followed by an auto-commit, then probably the user does not have to concern himself with the internal details (which may also change time after time, if we realize that there are better or more efficient ways to implement things). In this case the user can focus only on the high level steps of DS/ML processing. This is a kind of encapsulation.
I trust that you are right about this, but I still cannot conceive how it can go wrong. Maybe because I am not well-versed on DS/ML processes and workflows (I would need to see some concrete cases/examples where this can fail hopelessly, without being able to prevent or repair it).
What if there is a configuration option like auto-commit=True, which users can disable if they wish? In this case it will do auto-commit by default, unless there is a complex situation where user does not want this to happen?