The IRE website will be unavailable while we complete routine maintenance on Friday, November 19 from 8-10 am CT.
IRE favicon


Resource ID: #5049
Source: Jacob Fenton
Date: 2017



So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?

This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.

141 Neff Annex   |   Missouri School of Journalism Columbia, MO 65211   |   573-882-2042   |   |   Privacy Policy
apartmentpenciluserscalendar-fullcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram