Identification and removal of extraneous graphics in a commercial OCR operation

Ray R. Hashemi, Charlie Epperson, Steve Jones, Lei Jin, John Talburt

Research output: Contribution to book or proceedingConference articlepeer-review

1 Scopus citations

Abstract

The major issue in OCRing of a document that is composed of a mixture of text and graphics (i.e. a mixed document) is the presence of graphics in the document. In this research efforts we propose two algorithms for identification and removal of two special types of graphics, namely, company logos and graphic displays with broken boundaries. A prototype is built and its performance evaluated on a test set of 198 scanned images of mixed documents. The prototype was able to remove 100% of the two types of graphics from the images.

Original languageEnglish
Title of host publicationMultimedia, Image Processing and Soft Computing
Subtitle of host publicationTrends, Principles and Applications - Proceedings of the 5th Biannual World Automation Congress, WAC 2002, ISSCI 2002 and IFMIP 2002
Pages389-394
Number of pages6
StatePublished - 2002
Event4th International Symposium on Soft Computing for Industry, ISSCI 2002 and the 3rd International Forum on Multimedia and Image Processing, IFMIP 2002, Held within the World Automation Congress, WAC 2002 - Orlando, FL, United States
Duration: Jun 9 2002Jun 13 2002

Publication series

NameMultimedia, Image Processing and Soft Computing: Trends, Principles and Applications - Proceedings of the 5th Biannual World Automation Congress, WAC 2002, ISSCI 2002 and IFMIP 2002
Volume13

Conference

Conference4th International Symposium on Soft Computing for Industry, ISSCI 2002 and the 3rd International Forum on Multimedia and Image Processing, IFMIP 2002, Held within the World Automation Congress, WAC 2002
Country/TerritoryUnited States
CityOrlando, FL
Period06/9/0206/13/02

Scopus Subject Areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Software

Keywords

  • Document analysis
  • Image enhancement
  • OCR
  • Pattern recognition
  • Text mining

Fingerprint

Dive into the research topics of 'Identification and removal of extraneous graphics in a commercial OCR operation'. Together they form a unique fingerprint.

Cite this