Output '' is already tracked by SCM (e.g. Git)

I’m trying to add a .csv file to an s3 bucket.

poetry run dvc add <data file>
gives the following error
Adding…
ERROR: output ‘filename’ is already tracked by SCM (e.g. Git)

The file doesn’t get added to the bucket.

1 Like

Hi @saumya31,

This error means that <data file> is being tracked by Git. So it can’t also be tracked by DVC, unfortunately. You can use git rm --cached <data file> first if you want to go from Git to DVC tracking.

However, I’m not sure I understand your goal. What do you mean by “add it to an S3 bucket”? DVC can track it for you, and if you setup an S3 data remote and then dvc push, it’s contents will be stored there. But you won’t be able to find <data file> in the bucket with the same file name, DVC will cache it in a special structure (which you can see in .dvc/cache in your project without needing to push anything to a remote).

I am relatively new to dvc/git. I am confused as to why DVC cannot track changes if it is already tracked by Git if DVC works along side Git.

I am finding myself in this loop where every time I try to dvc repro <insert stage> I end up having to git rmv → git commit the dependencies. How can I stop doing git rmv?

@Tom_K can you clarify your question? Are you getting an actual error message when you run dvc repro? git rmv is not a Git command, so it may depend on what you actually have git rmv aliased to.

If dvc repro makes changes to a file which is tracked by DVC (i.e. the stage output changed), you will have to git add/git commit the respective changes to the dvc.lock file. That’s how data tracking works in DVC - the data file itself is store by DVC (in .dvc/cache and in DVC remote storage), but the dvc.lock or .dvc files are tracked via Git.

If you have a pipeline output which you want to only be tracked by Git (and not tracked as DVC data), you need to mark it as cache: false.


Regarding automating git add/git commit when running DVC commands, if you enable the core.autostage config option in DVC, it will automatically do the git add when the dvc.lock file is modified in dvc repro (but you will still have to do the git commit yourself).

@pmrowla I think you are correct in just needing to use git add dvc.lock. I haven’t had the issue occur since doing it.
Although, since I stopped tracking some files in git, I can’t seem to use git revert to undo having stopped tracking changes. Not a huge deal, I can always start over as this is a side project.

Ok, so I thought I had sorted it out but ended up coming to the same error ERROR: failed to reproduce stage <file> is already tracked by SCM (e.g. Git). This happened after trying to run dvc repro scrap_listings a second time after having modified some some code.

I also found that whenever I run a script which pushes an output to intermediates/ those files automatically become added to .gitignore. Since these are dependencies I’d like to track these files. intemediates/.gitignore was automatically added when I ran dvc run -n -d <file> -o <file> python ...

It looks like intermediates/saved_hyperlinks.txt is still tracked by git. You will have to git rm (and git commit) that file if you want to track it as a DVC file instead.

Also note that stage outputs don’t have to be tracked as DVC data, you can also just continue tracking it with git if you prefer - you just need to mark that output as cache: false in this case.