A simple web service to perform optical character recognition on images using Tesseract API with Leptonica
  • C++ 95.4%
  • Makefile 3.6%
  • Dockerfile 1%
Find a file
Torsten Römer 8377920a85 Allow setting basic GPU parameters, cleanup (#14)
- Add function to get envariable with default value incl. tests
- Allow setting 'mmproj_use_gpu' through OCR_NO_GPU with default true
- Allow setting 'gpu_layers' through OCR_GPU_LAYERS with default 99
  => reducing helps with insufficient memory
- Terminate if initialization (of the model) fails

Reviewed-on: https://codeberg.org/gitdode/ocr-cpp/pulls/14
2026-05-24 15:56:56 +02:00
.gitea/workflows Update Makefile, README.md, Dockerfile... (#2) 2026-05-08 21:44:57 +02:00
.settings Add LLM capability with llama.cpp (#1) 2026-05-08 01:04:02 +02:00
LICENSES Move licenses to directory 2026-05-15 01:26:04 +02:00
res Correctly handle output of PDF, improve tests (#10) 2026-05-16 23:38:54 +02:00
.clang-format Add license, capture only what is needed 2026-04-22 21:20:32 +02:00
.clangd Use libvips to convert and scale images (#5) 2026-05-15 01:20:08 +02:00
.cproject Scale and convert images before passing them to libllama (#4) 2026-05-10 23:52:04 +02:00
.gitignore Incremental build (#11) 2026-05-17 21:08:25 +02:00
.project Rename to ocr-cpp, refactor, add tests, improve doc (#2) 2026-04-23 12:03:38 +02:00
detail-test.cpp Allow setting basic GPU parameters, cleanup (#14) 2026-05-24 15:56:56 +02:00
detail.cpp Allow setting basic GPU parameters, cleanup (#14) 2026-05-24 15:56:56 +02:00
detail.cppm Allow setting basic GPU parameters, cleanup (#14) 2026-05-24 15:56:56 +02:00
Dockerfile Use libvips to convert and scale images (#5) 2026-05-15 01:20:08 +02:00
Doxyfile Just some cleanup 2026-05-09 15:51:46 +02:00
exception.cpp Just some cleanup 2026-05-09 15:51:46 +02:00
exception.cppm Just some cleanup 2026-05-09 15:51:46 +02:00
image-test.cpp Various improvements (#13) 2026-05-22 20:57:01 +02:00
image.cpp Various improvements (#13) 2026-05-22 20:57:01 +02:00
image.cppm Correctly handle output of PDF, improve tests (#10) 2026-05-16 23:38:54 +02:00
LICENSE Add license, capture only what is needed 2026-04-22 21:20:32 +02:00
llama.cpp Various improvements (#13) 2026-05-22 20:57:01 +02:00
llama.cppm Support PDF with Tesseract (#6) 2026-05-15 12:58:08 +02:00
Makefile Incremental build (#11) 2026-05-17 21:08:25 +02:00
ocr.cpp Allow setting basic GPU parameters, cleanup (#14) 2026-05-24 15:56:56 +02:00
README.md Various improvements (#13) 2026-05-22 20:57:01 +02:00
tesseract-test.cpp Correctly handle output of PDF, improve tests (#10) 2026-05-16 23:38:54 +02:00
tesseract.cpp Some minor optimizations (#12) 2026-05-18 19:30:43 +02:00
tesseract.cppm Correctly handle output of PDF, improve tests (#10) 2026-05-16 23:38:54 +02:00

ocr-cpp

c-cpp.yml

About

OCR Service in C++.

Web service to perform optical character recognition on images using Tesseract API with Leptonica.

There is currently experimental support to optionally use a local LLM such as GLM OCR with llama.cpp to recognize i.e. handwritten text which Tesseract is not designed for.

The web service is using cpp-httplib, providing multithreading depending on the number of availabe logical CPUs, and json to structure recognized text in a Json array. Plain text, hOCR and TSV output is supported as well.

Since a Tesseract API instance can not be used concurrently, one instance is created per thread and reused for all requests handled by that thread.

LLM recognitions are possible only one at a time; concurrent requests using LLM are blocked until an ongoing process completes. Recognitions using Tesseract are however handled concurrently, also while the LLM is busy.

Images are converted to JPEG, so all image formats supported by libvips can be used for LLM recognitions, including multipage TIFF and PDF. If necessary, images are scaled down to a reasonable size to improve performance.

As a little by-product, the service can also generate thumbnails very efficiently.

Limitations

  • LLM recognition currently only returns plain text/Markdown

Building

A C++ compiler supporting modules, Make and the dependent libraries are required to build and test the project. Building works fine on Debian 13 with the following toolchain and libraries installed from Debian repository with 'apt':

  • GNU Make
  • g++ 15.2.0
  • doxygen
  • catch2
  • libtesseract5
  • libleptonica6
  • libcpp-httplib0.41
  • nlohmann-json3
  • libicu78
  • libvips

The following command should install all that is needed to build the project:

sudo apt install build-essential libtesseract-dev libleptonica-dev \
libcpp-httplib-dev nlohmann-json3-dev libicu-dev libvips-dev \
doxygen catch2

Since this project is currently using the internal API, it is probably easiest to download the source from llama.cpp, build it and copy the needed headers to i.e. /usr/local/include/llama and the shared libraries to /usr/local/lib/llama.

Once all dependencies are satisfied the project can be built by running:

export LD_LIBRARY_PATH=/usr/local/lib/llama
make

This compiles the executable ocr.

Testing

To build and run the tests:

export LD_LIBRARY_PATH=/usr/local/lib/llama
make test

This compiles the executable ocr-test and runs it - which runs the actual tests.

Running

It might be necessary to install libtesseract5 and language files, and other libraries:

sudo apt install libtesseract5 tesseract-ocr-deu tesseract-ocr-eng \
tesseract-ocr-fra libleptonica6 libcpp-httplib0.18 libicu78 \
libvips42t64 libopenblas0 libvulkan1

The service is run with for example:

export LD_LIBRARY_PATH=/usr/local/lib/llama
./ocr 0.0.0.0 8080 /path/to/GLM-OCR-Q8_0.gguf /path/to/mmproj-GLM-OCR-Q8_0.gguf

Available parameters are:

  • bind address: i.e. 0.0.0.0 or a hostname/FQDN
  • http port: i.e. 8080
  • model path: the path to the model, i.e. /path/to/GLM-OCR-Q8_0.gguf
  • mmproj path: the path to the multimodal projection, i.e. /path/to/mmproj-GLM-OCR-Q8_0.gguf

An effort is made to clean up before exiting on CTRL-C.

Using

Images can be PUT'ed to the REST endpoint /ocr with the following query parameters:

  • llm: 'true' for using the LLM, 'false' or absent to use Tesseract.
  • lang: i.e. 'lang=en', applies only to Tesseract. The matching language file must be available.
  • format: can be one of: 'text', 'json', 'hocr', 'tsv'. Ignored by the LLM option.

Just for fun, the header X-Prompt can be used to override the default prompt with for example "Could you describe the image for me?"

Examples

Recognize text in 'res/eng.png' with Tesseract and return it as plain text:

curl --request PUT --url 'http://localhost:8080/ocr?lang=en' \
--data-binary @res/eng.png

Recognize script in 'res/scribble.png' with llama and return it as plain text:

curl --request PUT --url 'http://localhost:8080/ocr?llm=true' \
--data-binary @res/scribble.png

Generate a thumbnail (JPEG):

curl --request PUT --url 'http://localhost:8080/thu' \
--data-binary @res/eng.png --output thumb.jpg

Container

Building a container image currently requires some manual work:

  • Copy the llama shared libraries into the build context, i.e.: mkdir -p llama && cp -r /usr/local/lib/llama/* llama/
  • Copy the model and multimodal projection to: models/model.gguf and models/mmproj.gguf
  • Run make image

To run the container:

docker run --privileged --rm -p 8080:8080 gitdode/ocr-cpp

Running the container in privileged mode is necessary to give access to the graphics device.

Documentation

To update the documentation in the directory doc, run:

make doc

Disassembly

For insights on what the compiler produces, like how many and which instructions are executed for a function, run

make disasm

and have a look at ocr.lst.

Clang/Eclipse CDT LSP Editor

Once initially, the BMI for libstdc++ needs to be precompiled, for example:

make bmi

Precompiling the module interfaces *.cppm when they were modified is done the same way, and for the changes to be reflected in the LSP editor it is enough to edit an affected file.

TODO

  • Write more tests (always)
  • See issues