I wonder what is the best way to use Sagemaker with DVC in particular for running a train step which is part of a DVC pipeline. The problem I am running into is that SageMaker will create a training job and output somewhere different than the machine that invokes the training job.
Prior to using SageMaker our flow was to ssh into the relevant EC2 instance and run dvc repro. Now we have a command that invokes the same train job from our local machine which essentially kicks out train in an EC2 instance of our choice and stores the output in a s3 bucket.
Is there a way to run the training job, sync the output locally and inform DVC that this is the output of running a particular step in the pipeline? If that is complicated, is there a better alternative to working with SageMaker and DVC?