Cart 0 $0.00
IRE favicon


Resource ID: #5049
Source: Jacob FentonJacob Fenton
Date: 1905-07-09



So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?

This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.

109 Lee Hills Hall, Missouri School of Journalism   |   221 S. Eighth St., Columbia, MO 65201   |   573-882-2042   |   |   Privacy Policy
apartmentpenciluserscalendar-fullcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
My cart
Your cart is empty.

Looks like you haven't made a choice yet.