Batch run rules

Hello! I read tutorial and have a question about DVC possibilities. Using make one can specify a rule in Makefile which can applied to all files which names satisfy some template. Is it possible in DVC? I’d like to batch process of a large number of files once and then re-process only new/updated files

3 Likes

Hi @hombit !

Unfortunately, there is no direct alternative to such a feature. Could you elaborate on your scenario, please?

Thank you,
Ruslan

My problem is processing of a lot of files, each file is processed independently. If the process crashed or data updated/added I’d like to re-produce only missing data-products, not everything. Makefile makes such tasks easy, because everything you need is the only one rule for some template, i.e. dir/*.dat. In DVC I don’t know how to do it without N identical “runs” which makes dvc.json non-human-readable and can cause errors if developer forgot to add new runs when data is added

It feels that it is at least partially related to this one https://github.com/iterative/dvc/issues/331 ? But I also don’t quite understand the full use case- the part with name templates. Is it separate from the incremental updates?

3 Likes

This issue looks very relevant for me, thank you!

2 Likes

Thanks!! It would be great if could chime in the ticket! It’ll help us prioritize this.