I'm using Tesseract with GScan2pdf under Linux.
When I do an OCR on a document, Tesseract recognition is quite good however it put a carriage return after each word. It is very annoying !
- Scanned document in gscand2pdf:

- Text recognition in gscan2pdf:

- Generated pdf opened with Okular; copy selected text:

- Pasted text:
Conformément
à l’article
12 du
Règlement du Fonds, le Fonds a procédé à sa deuxième
distribution.
Ce deuxième
remboursement
de capital
s'élève
à
un
montant
de
5.74
€
par part,
soit
5,74 % du nominal
investi.
Here information about my system:
pascal@pascal-Latitude-5580:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 24.04.3 LTS
Release: 24.04
Codename: noble
pascal@pascal-Latitude-5580:~$ tesseract --version
tesseract 5.3.4
leptonica-1.82.0
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.5) : libpng 1.6.43 : libtiff 4.5.1 : zlib 1.3 : libwebp 1.3.2 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.2 zlib/1.3 liblzma/5.4.5 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
Found libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7
Any idea ?
Thanks a lot in advance for your help.