If you have cloned Tesseract from GitHub, you must generate the configure script.
If you have tesseract 4.0x installation in your system, please remove it before new build.
Known dependencies for training tools (excluding leptonica):
- compiler with c++11 support
- autoconf-archive
- automake
- pkg-config
- pango-devel
- cairo-devel
- icu-devel
So, the steps for making Tesseract are:
$ ./autogen.sh
$ ./configure
$ make
$ sudo make install
$ make training
$ sudo make training-install
You need to install at least English language and OSD data files to TESSDATA_PREFIX directory. You can retrieve single file with tools like wget, curl, GithubDownloader or browser.
All language data files can be retrieved from git repository (useful only for packagers!):
$ git clone https://github.com/tesseract-ocr/tessdata.git tesseract-ocr.tessdata
(Repository is huge - more that 1.2 GB. You do not need to download all languages).
You need an Internet connection to compile ScrollView.jar because the build will automatically download piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar and place them to tesseract/java.
Just run:
$ make ScrollView.jar
and follow the instruction on Viewer Debugging wiki.
There is alternative build system based on multiplatform cmake
$ mkdir build
$ cd build && cmake .. && make
$ sudo make install
You need to use leptonica with cmake patch:
git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
mkdir build
cd build
cmake ..
cmake --build .
cd ..\..
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
mkdir build
cd build
cmake .. -DLeptonica_BUILD_DIR=\abs\path\to\leptonica\build
cmake --build .
Please read http://vorba.ch/2014/tesseract-3.03-vs2013.html