I have a reinforcement learning use case with the following initial steps:
- Construct initial episodes
- Add episodes to initial experience replay buffer
- Train initial model
Then the following steps which repeat every day:
- Construct most recent time steps
- Add new time steps to existing experience replay buffer
- Fine-tune existing model
- Compare existing model and fine-tuned model and keep the one that’s best
What would be the best way to do this using DVC? Is it even possible given that circular dependencies are not allowed?