The IRE website will be unavailable while we complete routine maintenance on Friday, November 19 from 8-10 am CT.
IRE favicon

Shop

Resource ID: #5049
Subject: 
Source: Jacob Fenton
Affiliation: 
Date: 2017

$0.00

Description

So now you know how to get data from basic pdfs and large batches. What about mixed formats, ongoing jobs etc?

This repo covers how to process pdfs in large batches, use Amazon Mechanical Turk, schedule OCR jobs to work while you aren't around, and build data pipeline. If you’re comfortable with the command line and Tesseract this is for you.

https://github.com/jsfenfen/pdf17

141 Neff Annex   |   Missouri School of Journalism Columbia, MO 65211   |   573-882-2042   |   info@ire.org   |   Privacy Policy
apartmentpenciluserscalendar-fullcrossmenu linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram