A
SURVEY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
ABSTRACT
:
At present scenario,
there is growing demand for the software system to recognize characters in a
computer system when information is scanned through paper documents. This paper
presents detailed review in the field of Optical Character Recognition. Various
techniques are determined that have been proposed to realize the center of character
recognition in an optical character recognition system. OCR (Optical Character
Recognition) translates images of typewritten or handwritten characters into
the electronically editable format and it preserves font properties. Different
techniques for preprocessing and segmentation have been surveyed and discussed
in this paper.
Keywords:
Character Recognition System, Image Segmentation, OCR, Preprocessing, Skew
correction, Classifier
1.
INTRODUCTION
OCR (Optical Character
Recognition) translates images of typewritten or handwritten characters into
machine editable format. OCR reads damaged or low-quality codes and returns the
best guess at what the code is. It is widely used as a form of information
entry from printed paper data records, whether passport documents, invoices,
bank statements, computerized receipts, business cards, mail, printouts of
static data, or any suitable documentation. OCR does not deal with quality and
sharpness of characters. To overcome the limitations of OCR a new approach comes
into picture which is OCV. Projection Profile-based methods used makes
segmentation easy to separate the text in document image into lines, words, and
characters independent of the Language in the Text. Different methods are used
at each intermediate stage of OCR. Text Segmentation is done using Projection
Profile method. They proposed an algorithm for correction of the skew angle of
the text document [1]. Blur is the important factor that damages OCR accuracy.
In this paper prediction method based on a local blur estimation is proposed.
The relation between blur effect and character size is investigated which is
useful for the classifier. Classifier separates the given document into three
classes: readable, intermediate, non-readable classes [2]. The grading system
is used to evaluate the performance of printed text using various quality
measures. The recognition results showed high recognition rate as the system
was able to perform a recognition rate of 98.69 % along with a precision of
0.9857 and a sensitivity of 1 [3]. This paper presents complete OCR (Optical
Character Recognition) system for camera captured image/graphics embedded
textual documents for handheld devices [4]. Paper [5] describes the skew
detection and correction of scanned document images written in Assamese
language using the horizontal and vertical projection profile analysis OCR
consists of many phases such as Pre-processing, Segmentation, Feature
Extraction, Classifications and Recognition [6].
Existing
system :
OCR (Optical Character
Recognition) translates images of typewritten or handwritten characters into
machine editable format. OCR reads damaged or low-quality codes and returns the
best guess at what the code is. It is widely used as a form of information
entry from printed paper data records, whether passport documents, invoices,
bank statements, computerized receipts, business cards, mail, printouts of
static data, or any suitable documentation. OCR does not deal with quality and
sharpness of characters. To overcome the limitations of OCR a new approach
comes into picture which is OCV. Projection Profile-based methods used makes
segmentation easy to separate the text in document image into lines, words, and
characters independent of the Language in the Text. Different methods are used
at each intermediate stage of OCR. Text Segmentation is done using Projection
Profile method. They proposed an algorithm for correction of the skew angle of
the text document [1]. Blur is the important factor that damages OCR accuracy.
In this paper prediction method based on a local blur estimation is proposed.
The relation between blur effect and character size is investigated which is
useful for the classifier. Classifier separates the given document into three
classes: readable, intermediate, non-readable classes [2].
Proposed
system :
The grading system is
used to evaluate the performance of printed text using various quality
measures. The recognition results showed high recognition rate as the system
was able to perform a recognition rate of 98.69 % along with a precision of 0.9857
and a sensitivity of 1 [3]. This paper presents complete OCR (Optical Character
Recognition) system for camera captured image/graphics embedded textual
documents for handheld devices [4]. Paper [5] describes the skew detection and
correction of scanned document images written in Assamese language using the
horizontal and vertical projection profile analysis OCR consists of many phases
such as Pre-processing, Segmentation, Feature Extraction, Classifications and
Recognition [6].
Modules
:
1.1
Digitization Digitization is the process of
converting a paper-based handwritten document into electronic format. Here,
each document consists of only one character. The electronic conversion is
accomplished by using a method whereby a document is scanned and an electronic representation
of the original document as an image file format is produced. The author used
various scanners for digitization, and the digital image was going for next
step that is a preprocessing phase.
1.2
Pre-processing In The pre-processing phase, there is a
series of operations performed on the scanned input image. It enhances the
image rendering it suitable for segmentation the gray-level character image is
normalized into a window sized. After noise reduction, a bitmap image is
produced. Then, the bitmap image was transformed into a thinned image.
1.3
Segmentation The Segmentation phase is the most
important process. Segmentation is done by separation from the individual
characters of an image. Segmentation of handwritten characters into different
zones (upper, middle and lower zone) and characters is more difficult than that
of printed documents that are in standard form. This is mainly because of
variability in a paragraph, words of line and characters of a word, skew,
slant, size and curved. Sometimes components of two adjacent characters may be
touched or overlapped and this situation creates difficulties in the
segmentation task. The touching or overlapping problem occurs frequently
because of modified characters in upper-zone and lower-zone.
1.4
Feature Extraction and classification Feature extraction is
the phase which is used to measure the relevant shape contained in the
character. In the feature extraction phase, one can extract the features
according to levels of text, e.g., character level, word level, line level and
paragraph level. The classification phase is the decision making phase of an
OCR engine, which uses the features extracted in the previous stage for making
the class memberships in pattern recognition system. The preliminary aim of
classification phase of OCR is to develop the constraint for reducing the
misclassification relevant to feature extractions.
CONCLUSION
“This paper elaborated survey of disparate
techniques for OCR” has been studied. Handwritten character, natural scene
images, business cards and TV set images are selected for experimentation. A
systematic flow of OCR system is discussed. In this paper projection profile
based method for segmentation, fourier transform technique is for
pre-processing, and nearest neighbour classifier for classification are
described. This paper can be helpful to the researcher for selecting most
appropriate techniques to achieve optimum results for application according to
a different parameter described in the previous section.
REFERENCES
[1] A. S. Sawant, “Script Independent Text
Pre-processing and Segmentation for OCR,” Int. Conf. Electr. Electron. Signals,
Commun. Optim. - 2015, pp. 1–5, 2015.
[2] V. Kieu, F.
Cloppet, and N. Vincent, “OCR Accuracy Prediction Method Based on Blur
Estimation,” 2016 12th IAPR Work. Doc. Anal. Syst., pp. 317–322, 2016.
[3] J. B. Pedersen, K.
Nasrollahi, and T. B. Moeslund, “Quality Inspection of Printed Texts,” IWSSP
2016- 23rd Int. Conf. Syst. Image Process. 23-25 May 2016, Bratislava,
Slovakia, pp. 6–9, 2016.
[4] A. F. Mollah, N.
Majumder, S. Basu, and M. Nasipuri, “Design of an Optical Character Recognition
System for Camera- based Handheld Devices,” IJCSI, vol. 8, no. 4, pp. 283– 289,
2011.
[5] B. Jain and M. Borah,
“A Comparison Paper on Skew Detection of Scanned Document Images Based on
Horizontal and Vertical,” IJSRP, vol. 4, no. 6, pp. 4–7, 2014.
[6] E. N. Bhatia,
“Optical Character Recognition Techniques : A Review,” IJARCSSE, vol. 4, no. 5,
pp. 1219–1223, 2014.
[7] M. Shen, “Improving
OCR Performance with Background Image Elimination,” 2015 12th Int. Conf. Fuzzy
Syst. Knowl. Discov., pp. 1566–1570, 2015.
[8] A. Coates et al.,
“Text Detection and Character Recognition in Scene Images with Unsupervised
Feature Learning.”
[9] P. Road,
“Confidence Guided Progressive Search and Fast Match Techniques for High
Performance ChineseEnglish OCR *,” IEEE, pp. 89–92, 2002.
[10] H. Wang and J.
Kangas, “Character-Like Region Verification for Extracting Text in Scene
Images,” no. 11, 2001.
[11] I. Kastelan, S.
Kukolj, V. Pekovic, V. Marinkovic, and Z. Marceta, “Extraction of Text on TV
Screen using Optical Character Recognition,” IEEE, pp. 153–156, 2012.
[12] J. Diaz-escobar,
“Optical Character Recognition based on phase features,” IEEE, 2015.
[13] A. Thilagavathy,
K. Aarthi, and A. Chilambuchelvan, “A Hybrid Approach to Extract Scene Text
from Videos,” ICCEET, pp. 1017–1022, 2012.
[14] S. Goyal, “Optical
Character Recognition,” IJARCSSE, vol. 3, no. 11, pp. 982–985, 2013.
[15] G. Vamvakas, B.
Gatos, N. Stamatopoulos, and S. J. Perantonis, “A Complete Optical Character
Recognition Methodology for Historical Documents,” pp. 525–532, 2008.
[16] L. S. Yaeger, B.
J. Webb, and R. F. Lyon, “Search for Online , Printed Handwriting N EWTON,” Am.
Assoc. Artif. Intell., vol. 19, no. 1, pp. 73–90, 1998.
[17] J. Hu, S. G. Lim,
and M. K. Brown, “Writer independent on-line handwriting recognition using an
HMM approach,” J. PATTERN Recognit. Soc., vol. 33, pp. 133–147, 2000.
[18] A. Funada, D. Muramatsu,
and T. Matsumoto, “The Reduction of Memory and the Improvement of Recognition
Rate for HMM On-line Handwriting Recognition,” IEEE, pp. 0–5, 2004.
[19] J. r´ı Matas,
“Real-Time Scene Text Localization and Recognition,” IEEE, pp. 3538–3545, 2012.
[20] H. Lin and C. Hsu,
“Optical Character Recognition with Fast Training Neural Network,” IEEE, pp.
1458–1461, 2016.
[20] C. N. E.
Anagnostopoulos, I. E. Anagnostopoulos, V. Loumos, and E. Kayafas, “A License
Plate-Recognition Algorithm for Intelligent Transportation System
Applications,” IEEE, vol. 7, no. 3, pp. 377–392, 2006.
[21] Y. J. Zhang, “A
survey on evaluation methods for image segmentation,” pp. 1–13.
[22] A. Singh, K.
Bacchuwar, and A. Bhasin, “A Survey of OCR Applications,” Int. J. Mach. Learn.
Comput., vol. 2, no. 3, pp. 314–318, 2012.

thank you for your comment
pls call me on 8125424511