Managing DVC for Files Uploaded Directly to Remote Storage via Web Interface


I’m working on developing a dataset storage service that utilizes DVC for versioning and managing change histories of datasets, similar to Hugging Face. The goal is to allow users to upload files through a web interface without directly using DVC commands, with the service internally managing datasets using DVC.

Here’s the upload process I’m considering:

  1. Provide a signed URL through an API, which allows direct file uploads to a remote storage with an S3 interface.
  2. Users upload files directly to the remote storage using the provided URL.
  3. Once the file upload is complete, the server is notified that the upload has finished.
  4. The server then manages the file uploaded to the remote storage using DVC.

Given this approach, where users upload data without using dvc push, I’m seeking recommendations on how the server can manage DVC for the uploaded files (as described in step 4 above).

Thank you in advance for your advice and suggestions!