hp labs and today maintained by google. tesseract can be trained by you to support more languages and fonts. I have trained tesseract to read my hand writing and got success of over 90% - though this still means that once in every 10 characters or so there is an error. This page explains how you can train tesseract by yourself. This post will share some of the conclusions and pitfalls I have found from my experiment.
convert.exe img1.bmp img2.jpg -adjoin res.tiff
Common Errors
Error: Illegal short name for a feature!
signal_termination_handler:Error:Signal_termination_handler called:Code 2000
signal_termination_handler:Error:Signal_termination_handler called:Code 2000
I got this error after the .box file got corrupted for some reason. I have opened it and using "binary search" I deleted a different part of it every time and tried to build it again, until I found the wrong line. Typically the wrong line is because tesseract is identifying some very tiny dots as letters.
Writing Merged Microfeat ...Warning: no protos/configs for { in
CreateIntTemplates()
Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:Error:Assert failed:in file ..\classify\intproto.cpp, line 1312
CreateIntTemplates()
Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:Error:Assert failed:in file ..\classify\intproto.cpp, line 1312
As stated here, tesseract 3.0.1 only supports one image per font. It actually crashs when you try to use another image (exp2). you may want to use multipage tiff file if you need multiple images. this way you can always push more images to an existing font without loosing the previous coordinates. Generating a box file for the new tiff will override the existing one (which you have probably manually fixed) so I have built a utility to backup the previous one and copy all values from previous tiff pages to the newly generated box file.
read_params_file: Can't open batch.nochop
The Windows executable package does not include the configs. You will need to copy the 'tessdata' from the source distribution to the same directory as tesseract.exe to perform training (e.g. the source has two folder under tessdata which we need, configs, tessconfigs)
tessdata_manager.SeekToStart(TESSDATA_INTTEMP):Error:Assert failed:in file adaptmatch.cpp, line 512 Segmentation fault
You did not follow documentation - before unificying to traindata you need to:
"All you need to do now is collect together all (normproto, Microfeat, inttemp, pffmtable) the files and rename them with a lang. prefix..."
What's next? get this blog rss updates or register for mail updates!
2 comments:
I get this error using Windows XP and Tesseract-OCR 3.01....
How do I fix it or correct it ?
Writing Merged Microfeat ...Warning: no protos/configs for { in
CreateIntTemplates()
Class->NumConfigs == this->fontset_table_.get(Class->font_set_id).size:Error:Assert failed:in file ..\classify\intproto.cpp, line 1312
I think you may try to use two fonts for the same language which is not supported
Post a Comment