I tried DVC with small data set and I really liked it. The main thing DVC helped me with is controlling the data pipeline and versioning the data accordingly. In order to compare experiments we already use MLflow.
My team have another project with much bigger data set which is stored on postgres (on AWS). Can we use DVC in order to version our tables? For example:
Raw_Table → One_hot_conversion_table → Normalized_one_hot_conversion_table