Using OCR to extract data from PDFs

  • Event: 2018 CAR Conference
  • Speaker: Miguel Barbosa of CitizenAudit
  • Date/Time: Saturday, Mar. 10 at 2:15pm
  • Location: Great America
  • Audio file: No audio file available.

This class will cover basic approaches for getting text out of PDF documents using powerful and freely available tools. Participants will be introduced to basic concepts and walked-through tackling common challenges encountered with tricky PDF documents.

This session is good for: People who are unfamiliar with the PDF to text tools or would like to learn how optical character recognition (OCR) tools can be used for extracting difficult text from images embedded in PDF document.

Speaker Bios

  • Miguel Barbosa is the Co-Founder & Ceo of CitizenAudit.org a tool designed to help journalists and investigators research nonprofits. Prior to this, Miguel worked as an analyst at a Hedgefund in Chicago.

Related Tipsheets

No tipsheets have yet been uploaded for this event.