Tools to handle PDFs
This class will cover basic approaches for getting text out of PDF documents using powerful and freely available tools. Participants will be introduced to basic concepts and walked-through tackling common challenges encountered with tricky PDF documents.
This class is best for: People who are unfamiliar with the PDF to text tools or would like to learn how optical character recognition (OCR) tools can be used for extracting difficult text from images embedded in PDF documents.
Acton is a newsroom product engineer for Tribune Publishing, positioned within the Chicago Tribune newsroom. He is also a PhD candidate at the University of Illinois in Urbana-Champaign studying in the field of Informatics, specializing in data analytics and information visualization. His research focuses on the use of augmented reality as a medium for data-driven news and as a tool for assisting journalists in their work. Gorton's work has appeared with the Chicago Tribune, New York Daily News, Morning Call, Midwest Center for Investigative Reporting, and CU-CitizenAccess.org.
Tools to Handle PDFs
A long, long time ago… 22 years to precise… Adobe created the PDF as a way to have a document look the same no matter what computer or operating system it was displayed on. This seemingly innocent idea has become the pervasive standard solution for governments and corporations to store all documents. Odds are, you’ve come across some useful bits of data contained within a PDF. Or worse, your local government complied with their electronic mandate for open records requests by providing scanned in images of spreadsheets embedded into a PDF. It’s enough to makes you wonder if PDFs were put on this planet to make your job harder. Luckily you’re not alone, and there are now lots of tools available to get organized information out of a PDF. And, if need be, we can put a work-flow together to crack some worst case scenarios.