Evaluation and Improvement of OCR Performance for Japanese-English Mixed Text
 -----------------------------------------------------------------------------
    Masahiko HATA, Tetsushi WAKABAYASHI, Fumitaka KIMURA, and Yasuji MIYAKE
  Faculty of Engineering, Mie University, 1515 Kamihama, Tsu, 514-8507 JAPAN
            kimura@hi.info.mie-u.ac.jp  TEL/FAX  +81-59-231-9457
1. Introduction
   Performance of existing commercial Japanese OCR software is deteriorated
when the input Japanese text includes English words, English sentences, 
computer programs and commands. The performance deterioration for such 
Japanse-English mixed text is mainly caused by the problems of character 
segmentation and recognition of the English region.
   Japanese OCR software has two reading modes, i.e. Japanese mode and
English mode. The English mode is aimed to recognize characters used in 
English text (alphanumerals and symbols), while the Japanese mode is aimed 
to recognize all characters used in Japanese text (alphanumerals, symbols, 
Kanji, Hiragana and Katakana). Because the English mode is specialized to 
segmentation and recognition of English characters, it performs better for 
English region than Japanese mode does. However, the English mode is not 
available for Japanese -English mixed text, thus the recognition accuracy 
of the English region is relatively low.
   In section 2. the accuracy of character segmentation and recognition
for Japanese-English mixed text is evaluated to reveal the problems. In
section 3. a procedure for fixed pitch region detection for improving 
character segmentation in English region is described. In section 4. a
procedure to merge and correct the OCR output by Japanese mode and
English mode is described.
2. Evaluation of OCR performance for Japanese-English mixed text
 Eight test sheets are used in the performance evaluation.  Table 1 shows
the number of characters in each region of the test sheets.
	Table 1. Number of characters in each region of Japanese-English
                 mixed text sheets
	Test sheet			English region	Japanese region	Total
	Windows manual		215		528		743
	Magazine(ASCII)		157		893		1050
	Advertisement		263		972		1235
	Magazine(Nikkei Byte)	643		1065		1708
	Magazine(Interface)		253		295		548
	Magazine(ASCII)		98		1133		1231
	Magazine(Interface)		349		1317		1666
	Magazine(Nikkei Byte)	190		796		986
	Total			2466		8838		11304	
   Four typical Japanese OCR software A, B, C, D are used in the evaluation test.
   Table 2 and 3 show the accuracy of character segmentation and recognition by 
each OCR software for the test sheets, respectively. These tables show that 
the error rates of character segmentation and recognition in English region 
are nearly reduced to half by the use of the English mode. While the most 
errors in Japanese region are recognition errors, about the half of the 
errors in English region are segmentation errors of characters. These 
results show that the accuracy improvement of character segmentation 
and recognition in English region is necessary to improve total OCR 
performance for Japanese-English mixed text.
	Table 2. Accuracy of character segmentation for Japanese-English 
                 mixed text (%)
			Japanese region		English region
	OCR		Japanese mode	Japanese mode	English mode
	A		98.97		86.70		94.93
	B		98.65		88.77		96.57
	C		99.12		94.93		90.54
	D		99.13		90.06		95.23
	Average		98.97		90.12		94.32
	Table 3. Accuracy of character recognition for Japanese-English
                 mixed text (%)
			Japanese region		English region
	OCR		Japanese mode	Japanese mode	English mode
	A		92.88		78.85		90.44
	B		91.23		80.20		92.51
	C		95.89		89.38		84.73
	D		88.41		79.19		87.73
	Average		92.10		81.90		88.85
3. Detection of fixed pitch region
   The height and width of printed Japanese characters are correlated, and 
the characters are usually aligned in fixed pitch. This property can be 
utilized to estimate the pitch of character alignment and to detect the 
fixed pitch regions. Once the fixed pitch regions are detected, Japanese 
region (with fixed pitch) and English region (with variable pitch) are 
detected and separated.
3.1 Estimation of character pitch
   The pitch of character alignment in each line is estimated by the following 
procedure.
(1) Given a width of rectangular frame of a character, a ladder of 
horizontally aligned frames is shifted from left to right. The width of 
the frame ranges from 80 to 125% of the height of characters, and the 
horizontal displacement of the ladder ranges from 0 to 100% of the width.
(2) The width of the frame which minimizes the number of black pixels 
on the edges of the ladder found in (1) is defined as the estimated pitch.
   The number of black pixels on the edges of the ladder is calculated 
using  horizontal pixel projection of the text line.
3.2 Detection of fixed pitch region
   Shifting the ladder with estimated frame width from left to right on 
the text line, a region of characters enclosed in five or more successive 
frames without intersection is detected as a fixed pitch region. At both 
ends of the text line, a region of characters enclosed in three or more 
successive frames is detected as a fixed pitch region.
3.3 Character segmentation of Japanese-English mixed text
   Characters in the fixed pitch regions are synchronously segmented with 
the estimated pitch. This synchronous character segmentation avoids 
mis-seperation of Kanji or Hiragana characters with disconnected left 
and right parts. Characters in the variable pitch regions are segmented 
asynchronously. The asynchronous character segmentation is suitable for 
alphanumerals with narrow variable pitch alignment.
   Table 4. shows the accuracy of character segmentation for Japanese-
English mixed text. In the region independent character segmentation, 
entire text was assumed to be fixed pitch and was synchronously segmented. 
In this experiment character boundaries were simply detected based on 
the horizontal pixel projection of text lines both in fixed and variable 
pitch regions. It is shown that the accuracy of character segmentation 
in English region is significantly improved by the fixed pitch region 
detection.
	Table 4. Accuracy of character segmentation of Japanese-English 
                 mixed text (%)
			Alphanumeral region	Japanese region	Total region
	Region independent	56.20		96.02		87.33
	Region detection	83.90		96.16		93.49
4. String matching and correction of the OCR output
   The recognition accuracy of English region can be improved by replacing 
the output alphanumeral strings of Japanese mode by corresponding ones of 
English mode. An output alphanumeral sting of Japanese mode is matched 
against to the output of English mode by a string matching algorithm to 
detect the corresponding string with minimum edit cost. In the string 
matching algorithm, operations of deletion, insertion and substitution of 
characters are used with fixed amount of cost. The edit cost is total of 
the cost of each operation to edit an input string to the reference string. 
The edit cost is minimized by the dynamic programming. The cost of 
insertions preceding and succeeding the reference string is neglected to 
detect the corresponding substring.
   Table 5 shows the accuracy improvement by the string matching and 
correction. The string matching and correction is applied to the output 
alphanumeral strings of length five or more, because it is not effective 
for too short character strings. Used OCR software is C in Japanese mode 
and B in English mode.
	Table 5. Accuracy improvement by string matching and correction (%)
			Alphanumeral region	Japanese region	Total region
	before correction	85.81		96.01		92.72
	after correction	91.41		96.04		94.55
5. Conclusions
   In this paper, the accuracy of character segmentation and recognition for 
Japanese-English mixed text was evaluated, and a procedure for fixed pitch 
region detection for improving character segmentation in English region was 
described. A procedure to merge and correct the OCR output by Japanese mode 
and English mode was also described. The experimental result is summarized 
as follows.
  (1) The performance deterioration for Japanse-English mixed text recognition 
is mainly caused by the problems of character segmentation and recognition 
of the English region.
  (2) The detection and separation of fixed pitch region is efficient to 
improve character segmentation of Japanese-English mixed text.
  (3) The recognition accuracy of English region can be improved by replacing 
the output alphanumeral strings of Japanese mode by corresponding ones of 
English mode.
   Relating to the detection of the fixed pitch region, studies on (1) accuracy 
improvement of character segmentation in variable pitch region, and 
(2) performance evaluation by character recognition accuracy,
are remaining as future research topics. Relating to the string matching 
and correction, further studies on (1) accuracy improvement of alphanumeral 
string detection, and (2) string matching and correction of short 
alphanumeral strings, are remaining.