Short Course on Optical Character Recognition
October 26,28,30,- 1998

Tapas Kanungo

Direction to the course
Optical Character Recognition (OCR) algorithms take as input
a scanned image of paper document and produce as output a symbolic
text document (e.g. ASCII, Word, or HTML). Text produced by OCR
algorithms can be searched and indexed by information retrieval algorithms.
Although researchers have worked on the problem of OCR for atleast
thirty years, there has been a renewed interest in OCR
technology in the recent years. This is partly due to:

i) the increasing need for efficient information storage and retrieval,
ii) the increasing need for cross-language information access, and
iii)the dramatic drop in scanner prices.

The purpose of this course is to teach the internals of an OCR system. Much of the time will be spent on OCR systems that are based on hidden Markov models (HHMs). The labs will allow you to experiment with sub-modules of OCR systems. No programming experience is necessary for the labs. Reading material will be provided at the course site.

Tentative course outline:

