Hello,
I’m working on developing a dataset storage service that utilizes DVC for versioning and managing change histories of datasets, similar to Hugging Face. The goal is to allow users to upload files through a web interface without directly using DVC commands, with the service internally managing datasets using DVC.
Here’s the upload process I’m considering:
- Provide a signed URL through an API, which allows direct file uploads to a remote storage with an S3 interface.
- Users upload files directly to the remote storage using the provided URL.
- Once the file upload is complete, the server is notified that the upload has finished.
- The server then manages the file uploaded to the remote storage using DVC.
Given this approach, where users upload data without using dvc push
, I’m seeking recommendations on how the server can manage DVC for the uploaded files (as described in step 4 above).
Thank you in advance for your advice and suggestions!