-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit c258d01
Showing
195 changed files
with
54,269 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Authors: | ||
|
||
Antonio Diaz Diaz [OCRAD] - Coder of OCRAD | ||
Bruno Barberi Gnecco [GOCR] - Programmer | ||
Joerg Schulenburg [GOCR] - Original idea and creation, programmer leader | ||
Sven Bansemer [DLLs] - modifications to use the libs as DLL in Windows world |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
Thanks: | ||
...to everyone who contributed to gocr. If you feel that your | ||
name should be in this list, write mail to the author. These | ||
are in no particular order: | ||
|
||
G.Kugler for sending me first example files and testing. (MayMM) | ||
Klaas Freitag for the libPgm2asc-patch <freitag@suse.de> | ||
Ryan Dibble for the otsu.c file <dibbler@umich.edu> | ||
Tim Waugh for the man page <twaugh@redhat.com> | ||
David Pinson for the tkispell-patch <dpinson@materials.unsw.EDU.AU> | ||
Martin Goldhahn for some patches <Martin.Goldhahn@Webcenter.no> | ||
Eberhard Burkard for the gocr.tcl patch <E.Burkard@web.de> | ||
James R. Van Zandt for lot of tips <jrv@vanzandt.mv.com> | ||
... | ||
|
||
... and everyone else who submitted bug-reports, | ||
feature-requests, patches and lots of example files. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
History: (Changes,ChangeLog) | ||
|
||
1.00 2015-10-08 | ||
first release with the sample code in object pascal to call the | ||
two dll's (gocr.dll and ocrad.dll) | ||
|
||
* OCRAD as been modified to compile as dll | ||
* GOCR has been modified to compile as dll + some minor changes to avoid | ||
division by zero exceptions |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
(* GSA-Win-OCR.dpr | ||
Copyright (C) 2015 Sven Bansemer/GSA | ||
This program is free software: you can redistribute it and/or modify | ||
it under the terms of the GNU General Public License as published by | ||
the Free Software Foundation, either version 2 of the License, or | ||
(at your option) any later version. | ||
This library is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
GNU General Public License for more details. | ||
You should have received a copy of the GNU General Public License | ||
along with this library. If not, see <http://www.gnu.org/licenses/>. | ||
*) | ||
|
||
{$APPTYPE CONSOLE} | ||
uses windows, sysutils; | ||
|
||
var dojob_ext:function(input:pchar; certainty:integer; Output:pchar; Charset:pchar):integer; cdecl; | ||
ocrad_dojob:function(input:pchar; Output:pchar):integer; cdecl; | ||
|
||
function get_ocrad(const filename:string):string; | ||
var p:pchar; | ||
begin | ||
getmem(p,102); | ||
ocrad_dojob(pchar(filename),p); | ||
result:=p; result:=trim(result); | ||
FreeMem(p,102); | ||
end; | ||
|
||
function get_gocr(const filename:string;charset:string):string; | ||
var p:pchar; | ||
f1:integer; | ||
begin | ||
getmem(p,102); | ||
f1:=DoJob_ext(pchar(filename),0,p,pchar(charset)); | ||
if f1>=1 then begin | ||
result:=p; result:=trim(result); | ||
end else result:=''; | ||
FreeMem(p,102); | ||
end; | ||
|
||
var dll_gsa:thandle; | ||
|
||
begin | ||
writeln('GSA-Win-OCR v1.00 (C) 2015 GSA '); | ||
writeln(''); | ||
writeln('This tool and the DLL''s are licensed under GNU General Public License.'); | ||
writeln('It uses GOCR and OCRAD (both GPL licensed) to extract text from images'); | ||
writeln('in ppm format.'); | ||
writeln(''); | ||
|
||
if (paramcount=0) or (not fileexists(paramstr(1))) then begin | ||
writeln('usage: '+paramstr(0)+' [image.ppm]'); | ||
exit; | ||
end; | ||
|
||
dll_gsa:=Windows.LoadLibrary('gocr.dll'); | ||
if dll_gsa<>0 then begin | ||
dojob_ext:=GetProcAddress(dll_gsa, 'DoJob_ext'); | ||
writeln('GOCR: '+get_gocr(paramstr(1),'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+-*?=')); | ||
windows.FreeLibrary(dll_gsa); | ||
end else begin | ||
writeln('Error loading DLL: gocr.dll'); | ||
end; | ||
|
||
dll_gsa:=Windows.LoadLibrary('ocrad.dll'); | ||
if dll_gsa<>0 then begin | ||
ocrad_dojob:=GetProcAddress(dll_gsa, '_Z5DoJobPcS_'); | ||
writeln('OCRAD: '+get_ocrad(paramstr(1))); | ||
windows.FreeLibrary(dll_gsa); | ||
end else begin | ||
writeln('Error loading DLL: ocrad.dll'); | ||
end; | ||
|
||
end. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
|
||
Requirements | ||
------------ | ||
1. You will need a C compiler for the DLLs. I have build the DLL's with Dev-C++ 5 from | ||
http://www.bloodshed.net/devcpp.html | ||
|
||
2. You will need an ObjectPascal compiler for the sample program that uses the DLL's. | ||
I used Delphi 7 as I own a license for that but any Delphi compiler will do or even | ||
Lazarus (http://www.lazarus-ide.org/) or Free Pascal (http://www.freepascal.org/). | ||
The source is also not that hard to understand to convert | ||
it to any other language. | ||
|
||
3. Images in PPM format. See http://netpbm.sourceforge.net/doc/ppm.html | ||
Any decent viewer will read or convert this for you like IrfanView. | ||
|
||
4. You will need a Windows tool to apply the patches to the original sources. I used | ||
WinMerge - http://winmerge.org/ | ||
|
||
Procedure | ||
--------- | ||
1. Unpack the archive if you have not done so already: | ||
|
||
unzip winocr[version].zip | ||
|
||
2. Apply the patches using WinMerg or any other tool of your choice. | ||
The *.patch files are in the folder where the two different OCR libs are | ||
together with the newly added files. | ||
|
||
3. Compile the DLLs using Dev-C++ or any other compatible compiler. | ||
|
||
4. Compile the sample program using any of the mentioned compilers. | ||
|
||
5. Put the DLLs (goocr.dll and ocrad.dll) and GSA-Win-OCR.exe into one folder. | ||
|
||
6. Use the tool with: "GSA-Win-OCR sample.ppm" or any other sample image as | ||
argument. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
|
||
GSA-Win-OCR | ||
============= | ||
Description | ||
----------- | ||
|
||
This is a tool that shows how to use the DLLs of the modified sources from GOCR | ||
and OCRAD on Windows OSs. GOCR has an windows executable but opening it and | ||
getting results from stdin/stdout was a bit unconfortable so I applied some | ||
modifications to make DLLs from it that can be used in any windows program. | ||
OCRAD however never had any windwos build when I checked it out. | ||
|
||
Both OCRs work a bit different and combining the results can improve your final | ||
result. | ||
|
||
I hope this is useful for someone experimenting with OCR. | ||
|
||
Installation | ||
------------ | ||
|
||
Please read the file "INSTALL" for details. | ||
|
||
License | ||
------- | ||
|
||
GNU GENERAL PUBLIC LICENSE - Please read the file "COPYING" for details. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
.cvsignore | ||
.version | ||
Makefile | ||
autom4te.cache | ||
config.status | ||
config.log |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Authors (in chronological order): | ||
|
||
Joerg Schulenburg <jNOschulen{at}gmx.SPAM.de> (remove NO+SPAM for valid EMAIL address) | ||
* Original idea and creation, programmer leader | ||
|
||
Bruno Barberi Gnecco <brunobg{at}users.sourceforge.net> | ||
* Programmer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
BUGS | ||
|
||
Reporting | ||
--------- | ||
Please do not hesitate to report bugs, and if possible their fixes! If you | ||
send an example file, please make sure it's small. To report bugs, do one | ||
of the following: | ||
|
||
* go to http://sourceforge.net/bugs/?func=addbug&group_id=7147. | ||
This is the preferred way to report bugs. | ||
|
||
* send it to one of the authors. Note that sometimes we may be busy, and | ||
we won't reply it for days. If you post using the previous method, surely | ||
one of the authors will read it. | ||
|
||
* use | ||
diff -ru gocr_origin/ gocr_changed/ >patch | ||
to create patches | ||
|
||
* if you have compiling problems, do not forget to send your configure-output | ||
and the config.log file | ||
|
||
|
||
Known bugs (see jocr.SF.net page too) | ||
---------- | ||
|
||
v0.48 cutting of double melted chars will fail (example: serif MN) | ||
v0.43 on dithered images gocr runs extremely long (seems to hang) | ||
v0.41 linker error using g++ and netpbm under SuSE-9.3 | ||
v0.3.5 | ||
- segfault on some systems which do not support ifalpha(256+x) | ||
- hexcode not read from database | ||
v0.2.5 german umlauts and i-dots are not handled correctly | ||
problems high resolution fonts | ||
v0.2.4 I guess, there are still bugs. | ||
Some systems do not handle stack in good manner (AmigaOS?). | ||
gocr does extensively consume stack for recursive functions. | ||
Therefore you can get memory protection failures or strange results. | ||
The worst case is a huge black area. If that is a problem for you | ||
request for changing it. | ||
--- --- --- --- only for linux freaks --- --- --- | ||
By mistake I programmed an endless rekursiv function and ... | ||
SuSE6.4+linux2.2.12/13 got several "out of mem" and system CRASHED!!! | ||
ulimit: stack=unlimited | ||
- if text is framed, frame should be ignored, but it is not | ||
v0.2.3 still problems with segmentation | ||
- gcc 2.95.2 (SuSE6.4) error in load_db(), => fixed (thx to jasper) | ||
v0.2.1 | ||
- some people have problems running gocr on DOS/Win95 | ||
I guess: stack overflow. Is someone able to analyze or fix this? | ||
- large black areas on pbm-files cause a segfault on | ||
Ultra/Sparc (64bit) machines running Linux (2.1.126). | ||
There is a recursive function in the program which causes a | ||
stack overflow, which is not detected by the linux-kernel (BUG?). | ||
I look for a better solution. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
Thanks: | ||
...to everyone who contributed to gocr. If you feel that your | ||
name should be in this list, write mail to the author. These | ||
are in no particular order: | ||
|
||
G.Kugler for sending me first example files and testing. (MayMM) | ||
Klaas Freitag for the libPgm2asc-patch <freitag@suse.de> | ||
Ryan Dibble for the otsu.c file <dibbler@umich.edu> | ||
Tim Waugh for the man page <twaugh@redhat.com> | ||
David Pinson for the tkispell-patch <dpinson@materials.unsw.EDU.AU> | ||
Martin Goldhahn for some patches <Martin.Goldhahn@Webcenter.no> | ||
Eberhard Burkard for the gocr.tcl patch <E.Burkard@web.de> | ||
James R. Van Zandt for lot of tips <jrv@vanzandt.mv.com> | ||
... | ||
|
||
... and everyone else who submitted bug-reports, | ||
feature-requests, patches and lots of example files. |
Oops, something went wrong.