Handwritten and machine-printed text discrimination using a template matching approach

Mehryar Emambakhsh; Yulan He; Ian Nabney

doi:10.1109/DAS.2016.22

Handwritten and machine-printed text discrimination using a template matching approach

Mehryar Emambakhsh, Yulan He, Ian Nabney

Research output: Chapter in Book/Published conference output › Conference publication

Abstract

We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.

Original language	English
Title of host publication	Proceedings : 12th IAPR International Workshop on Document Analysis Systems, DAS 2016
Publisher	IEEE
Pages	399-404
Number of pages	6
ISBN (Print)	978-1-5090-1792-8
DOIs	https://doi.org/10.1109/DAS.2016.22
Publication status	Published - 13 Jun 2016
Event	12th IAPR International Workshop on Document Analysis Systems - Santorini, Greece Duration: 11 Apr 2016 → 14 Apr 2016

Workshop

Workshop	12th IAPR International Workshop on Document Analysis Systems
Abbreviated title	DAS 2016
Country/Territory	Greece
City	Santorini
Period	11/04/16 → 14/04/16

Bibliographical note

-© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Keywords

classification
handwritten
machine-printed
OCR
shape analysis
template matching

Access to Document

10.1109/DAS.2016.22

Handwritten and machine-printed text discrimination
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 817 KB

Cite this

@inproceedings{f197b789a2514a50b5308d0d2c0983da,

title = "Handwritten and machine-printed text discrimination using a template matching approach",

abstract = "We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.",

keywords = "classification, handwritten, machine-printed, OCR, shape analysis, template matching",

author = "Mehryar Emambakhsh and Yulan He and Ian Nabney",

note = "-{\textcopyright} 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.; 12th IAPR International Workshop on Document Analysis Systems, DAS 2016 ; Conference date: 11-04-2016 Through 14-04-2016",

year = "2016",

month = jun,

day = "13",

doi = "10.1109/DAS.2016.22",

language = "English",

isbn = "978-1-5090-1792-8",

pages = "399--404",

booktitle = "Proceedings : 12th IAPR International Workshop on Document Analysis Systems, DAS 2016",

publisher = "IEEE",

address = "United States",

}

TY - GEN

T1 - Handwritten and machine-printed text discrimination using a template matching approach

AU - Emambakhsh, Mehryar

AU - He, Yulan

AU - Nabney, Ian

N1 - -© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

PY - 2016/6/13

Y1 - 2016/6/13

N2 - We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.

AB - We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.

KW - classification

KW - handwritten

KW - machine-printed

KW - OCR

KW - shape analysis

KW - template matching

UR - http://www.scopus.com/inward/record.url?scp=84979523443&partnerID=8YFLogxK

UR - http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7490151

U2 - 10.1109/DAS.2016.22

DO - 10.1109/DAS.2016.22

M3 - Conference publication

AN - SCOPUS:84979523443

SN - 978-1-5090-1792-8

SP - 399

EP - 404

BT - Proceedings : 12th IAPR International Workshop on Document Analysis Systems, DAS 2016

PB - IEEE

T2 - 12th IAPR International Workshop on Document Analysis Systems

Y2 - 11 April 2016 through 14 April 2016

ER -

Handwritten and machine-printed text discrimination using a template matching approach

Abstract

Workshop

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this