$0.00
So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?
This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.
https://github.com/jsfenfen/pdf17
Looks like you haven't made a choice yet.