How to continually update a model with new data

ayeright · February 10, 2022, 6:01pm

I have a reinforcement learning use case with the following initial steps:

Construct initial episodes
Add episodes to initial experience replay buffer
Train initial model

Then the following steps which repeat every day:

Construct most recent time steps
Add new time steps to existing experience replay buffer
Fine-tune existing model
Compare existing model and fine-tuned model and keep the one that’s best

What would be the best way to do this using DVC? Is it even possible given that circular dependencies are not allowed?

dberenbaum · February 11, 2022, 1:35am

You can use persist: true to prevent outputs from being deleted during stage execution (see https://dvc.org/doc/command-reference/repro#description), so you can feed in new data each day and have the subsequent stages read from their existing outputs before writing out newly modified versions.

ayeright · February 14, 2022, 5:10pm

Thanks! I think that should work

Topic		Replies	Views
Right architecture for daily training Questions	6	602	July 12, 2023
Need to build non-ML data pipeline, is DVC good fit? Questions	7	1183	August 24, 2021
Re-run changed final stage on previous model versions Questions	2	533	May 19, 2020
Processing data in place Questions	8	1430	April 29, 2020
Using DVC to keep track of multiple model variants Questions	8	2626	August 21, 2020

How to continually update a model with new data

Related topics