We use dvc for our non-ML models, since it gives a clear dependency management for data. The input data is tracked on AWS S3, the results of all stages should be uploaded back to S3.
However, when using
dvc push, it wants to upload GB of data (since our inputs are large) but that looks very expensive and in the end, we only care about the resulting files and not the cache.
Is there any way to avoid uploading GB of cache and have the stage outputs tracked and uploaded?