mirror of https://codeberg.org/gitdode/ocr-cpp.git synced 2026-07-10 04:19:55 +02:00

A simple web service to perform optical character recognition on images using Tesseract API with Leptonica

C++ 95.7%
Makefile 3.3%
Dockerfile 1%

Find a file

Torsten Römer 5b01dc7344 Enable auto-fitting params into free memory (#20 ) - Set n_gpu_layers default to -1 (auto) - Pad tensor_buft_overrides to enable auto-fit - Add (commented) system prompt just for reference - Correct initialization order - Interrupt Tesseract and Llama multipage text extraction - Adjust unit tests Reviewed-on: https://codeberg.org/gitdode/ocr-cpp/pulls/20		2026-06-28 16:41:08 +02:00
.gitea/workflows	Update Makefile, README.md, Dockerfile... (#2 )	2026-05-08 21:44:57 +02:00
.settings	Remove unnecessary const, improve doc, update to llama.cpp 9737 (#18 )	2026-06-21 14:44:26 +02:00
LICENSES	Move licenses to directory	2026-05-15 01:26:04 +02:00
res	Correctly handle output of PDF, improve tests (#10 )	2026-05-16 23:38:54 +02:00
.clang-format	Add license, capture only what is needed	2026-04-22 21:20:32 +02:00
.clangd	Use libvips to convert and scale images (#5 )	2026-05-15 01:20:08 +02:00
.cproject	Scale and convert images before passing them to libllama (#4 )	2026-05-10 23:52:04 +02:00
.gitignore	Incremental build (#11 )	2026-05-17 21:08:25 +02:00
.project	Rename to ocr-cpp, refactor, add tests, improve doc (#2 )	2026-04-23 12:03:38 +02:00
detail-test.cpp	Allow setting basic GPU parameters, cleanup (#14 )	2026-05-24 22:57:31 +02:00
detail.cpp	Remove unnecessary const, improve doc, update to llama.cpp 9737 (#18 )	2026-06-21 14:44:26 +02:00
detail.cppm	Document exceptions thrown (#19 )	2026-06-21 15:27:08 +02:00
Dockerfile	Use libvips to convert and scale images (#5 )	2026-05-15 01:20:08 +02:00
Doxyfile	Remove unnecessary const, improve doc, update to llama.cpp 9737 (#18 )	2026-06-21 14:44:26 +02:00
exception.cpp	Just some cleanup	2026-05-09 15:51:46 +02:00
exception.cppm	Remove unnecessary const, improve doc, update to llama.cpp 9737 (#18 )	2026-06-21 14:44:26 +02:00
image-test.cpp	Various improvements (#13 )	2026-05-22 20:57:01 +02:00
image.cpp	Remove unnecessary const, improve doc, update to llama.cpp 9737 (#18 )	2026-06-21 14:44:26 +02:00
image.cppm	Document exceptions thrown (#19 )	2026-06-21 15:27:08 +02:00
LICENSE	Add license, capture only what is needed	2026-04-22 21:20:32 +02:00
llama.cpp	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
llama.cppm	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
Makefile	Incremental build (#11 )	2026-05-17 21:08:25 +02:00
ocr.cpp	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
README.md	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
tesseract-test.cpp	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
tesseract.cpp	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00
tesseract.cppm	Enable auto-fitting params into free memory (#20 )	2026-06-28 16:41:08 +02:00

README.md

ocr-cpp

About

OCR Service in C++.

Web service to perform optical character recognition on images using Tesseract API with Leptonica.

There is currently experimental support to optionally use a local LLM such as GLM OCR with llama.cpp to recognize i.e. handwritten text which Tesseract is not designed for.

The web service is using cpp-httplib, providing multithreading depending on the number of availabe logical CPUs, and json to structure recognized text in a Json array. Plain text, hOCR and TSV output is supported as well.

Since a Tesseract API instance can not be used concurrently, one instance is created per thread and reused for all requests handled by that thread.

LLM recognitions are possible only one at a time; concurrent requests using LLM are blocked until an ongoing process completes. Recognitions using Tesseract are however handled concurrently, also while the LLM is busy.

Images are converted to JPEG, so all image formats supported by libvips can be used for LLM recognitions, including multipage TIFF and PDF. If necessary, images are scaled down to a reasonable size to improve performance.

As a little by-product, the service can also generate thumbnails very efficiently.

Limitations

LLM recognition currently only returns plain text/Markdown

Building

A C++ compiler supporting modules, Make and the dependent libraries are required to build and test the project. Building works fine on Debian 13 with the following toolchain and libraries installed from Debian repository with 'apt':

GNU Make
g++ 15.2.0
doxygen
catch2
libtesseract5
libleptonica6
libcpp-httplib0.41
nlohmann-json3
libicu78
libvips

The following command should install all that is needed to build the project:

sudo apt install build-essential libtesseract-dev libleptonica-dev \
libcpp-httplib-dev nlohmann-json3-dev libicu-dev libvips-dev \
doxygen catch2

Since this project is currently using the internal API, it is probably easiest to download the source from llama.cpp, build it and copy the needed headers to i.e. /usr/local/include/llama and the shared libraries to /usr/local/lib/llama.

Once all dependencies are satisfied the project can be built by running:

export LD_LIBRARY_PATH=/usr/local/lib/llama
make

This compiles the executable ocr.

Testing

To build and run the tests:

export LD_LIBRARY_PATH=/usr/local/lib/llama
make test

This compiles the executable ocr-test and runs it - which runs the actual tests.

Running

It might be necessary to install libtesseract5 and language files, and other libraries:

sudo apt install libtesseract5 tesseract-ocr-deu tesseract-ocr-eng \
tesseract-ocr-fra libleptonica6 libcpp-httplib0.41 libicu78 \
libvips42t64 libopenblas0 libvulkan1

The service is run with for example:

export LD_LIBRARY_PATH=/usr/local/lib/llama
./ocr 0.0.0.0 8080 /path/to/GLM-OCR-Q8_0.gguf /path/to/mmproj-GLM-OCR-Q8_0.gguf

Available parameters are:

bind address: i.e. '0.0.0.0' or a hostname/FQDN
http port: i.e. '8080'
model path: the path to the model, i.e. '/path/to/GLM-OCR-Q8_0.gguf'
mmproj path: the path to the multimodal projection, i.e. '/path/to/mmproj-GLM-OCR-Q8_0.gguf'

Environment variables that can be set:

OCR_GPU_LAYERS: number of GPU layers in VRAM. '-1' is auto (default), <= '-2' is all
OCR_NO_GPU: if set to any value, the GPU is not used for multimodal
OMP_THREAD_LIMIT: max. mumber of CPU threads

An effort is made to clean up before exiting on CTRL-C.

Using

Images can be PUT'ed to the REST endpoint /ocr with the following query parameters:

llm: 'true' for using the LLM, 'false' or absent to use Tesseract.
lang: i.e. 'lang=en', applies only to Tesseract. The matching language file must be available.
format: can be one of: 'text', 'json', 'hocr', 'tsv'. Ignored by the LLM option.

Just for fun, the header X-Prompt can be used to override the default prompt with for example "Could you describe the image for me?"

Examples

Recognize text in 'res/eng.png' with Tesseract and return it as plain text:

curl --request PUT --url 'http://localhost:8080/ocr?lang=en' \
--data-binary @res/eng.png

Recognize script in 'res/scribble.png' with llama and return it as plain text:

curl --request PUT --url 'http://localhost:8080/ocr?llm=true' \
--data-binary @res/scribble.png

Generate a thumbnail (JPEG):

curl --request PUT --url 'http://localhost:8080/thu' \
--data-binary @res/eng.png --output thumb.jpg

Container

Building a container image currently requires some manual work:

Copy the llama shared libraries into the build context, i.e.: mkdir -p llama && cp -r /usr/local/lib/llama/* llama/
Copy the model and multimodal projection to: models/model.gguf and models/mmproj.gguf
Run make image

To run the container:

docker run --privileged --rm -p 8080:8080 gitdode/ocr-cpp

Running the container in privileged mode is necessary to give access to the graphics device.

Documentation

To update the documentation in the directory doc, run:

make doc

Disassembly

For insights on what the compiler produces, like how many and which instructions are executed for a function, run

make disasm

and have a look at ocr.lst.

Clang/Eclipse CDT LSP Editor

Once initially, the BMI for libstdc++ needs to be precompiled:

make bmi

Precompiling the module interfaces *.cppm when they were modified is done the same way, and for the changes to be reflected in the LSP editor it is enough to edit an affected file.

TODO

Write more tests (always)
See issues