Manuscripts Character Recognition Using Machine Learning and Deep Learning

Mohammad Anwarul Islam, Ionut E. Iacob

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

The automatic character recognition of historic documents gained more attention from scholars recently, due to the big improvements in computer vision, image processing, and digitization. While Neural Networks, the current state-of-the-art models used for image recognition, are very performant, they typically suffer from using large amounts of training data. In our study we manually built our own relatively small dataset of 404 characters by cropping letter images from a popular historic manuscript, the Electronic Beowulf. To compensate for the small dataset we use ImageDataGenerator, a Python library was used to augment our Beowulf manuscript’s dataset. The training dataset was augmented once, twice, and thrice, which we call resampling 1, resampling 2, and resampling 3, respectively. To classify the manuscript’s character images efficiently, we developed a customized Convolutional Neural Network (CNN) model. We conducted a comparative analysis of the results achieved by our proposed model with other machine learning (ML) models such as support vector machine (SVM), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and XGBoost. We used pretrained models such as VGG16, MobileNet, and ResNet50 to extract features from character images. We then trained and tested the above ML models and recorded the results. Moreover, we validated our proposed CNN model against the well-established MNIST dataset. Our proposed CNN model achieves very good recognition accuracies of 88.67%, 90.91%, and 98.86% in the cases of resampling 1, resampling 2, and resampling 3, respectively, for the Beowulf manuscript’s data. Additionally, our CNN model achieves the benchmark recognition accuracy of 99.03% for the MNIST dataset.

Original languageEnglish
Pages (from-to)168-188
Number of pages21
JournalModelling
Volume4
Issue number2
DOIs
StatePublished - Jun 2023

Scopus Subject Areas

  • Computer Science (miscellaneous)
  • Engineering (miscellaneous)
  • Mathematics (miscellaneous)
  • Modeling and Simulation

Keywords

  • character recognition
  • computer vision
  • convolutional neural network
  • deep learning
  • machine learning
  • old english

Fingerprint

Dive into the research topics of 'Manuscripts Character Recognition Using Machine Learning and Deep Learning'. Together they form a unique fingerprint.

Cite this