Support for alto - option in Tesseract for linux

85 views
Skip to first unread message

Tommy Klausen

unread,
Aug 8, 2019, 9:29:05 AM8/8/19
to tesseract-ocr
Hi.

Is the ALTO config option supported in the last linux version of Tesseract?
I have managed to use the HOCR but not ALTO.
Is it something I need to do with the config files?

Tommy

Shree Devi Kumar

unread,
Aug 8, 2019, 9:51:27 AM8/8/19
to tesseract-ocr

You can use `alto` config file or use the config variable as part of command

-c tessedit_create_alto=1 

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/de13eba7-8b6f-47bc-b1a7-981bc87e1ed5%40googlegroups.com.


--

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

Tommy Klausen

unread,
Aug 8, 2019, 10:04:23 AM8/8/19
to tesseract-ocr
Ok.

Because if a config file for alto exists (which didn`t for some reason in the install) I can just write the command with "alto" in the end, right?

Can you give me the two different commands for reading an image (with and without the confg file)?

torsdag 8. august 2019 11.51.27 UTC+2 skrev shree følgende:

You can use `alto` config file or use the config variable as part of command

-c tessedit_create_alto=1 

On Thu, Aug 8, 2019 at 2:59 PM Tommy Klausen <klaus...@gmail.com> wrote:
Hi.

Is the ALTO config option supported in the last linux version of Tesseract?
I have managed to use the HOCR but not ALTO.
Is it something I need to do with the config files?

Tommy

--
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesser...@googlegroups.com.

Tommy Klausen

unread,
Aug 8, 2019, 10:52:52 AM8/8/19
to tesseract-ocr
Take look at the attached file.

How can I implement ALTO in it and what will the command look like in teminal?

Tommy
ocr.py

Shree Devi Kumar

unread,
Aug 8, 2019, 2:44:52 PM8/8/19
to tesseract-ocr
Yes, it should be similar to hocr and you can write it at end of command.

examples:

tesseract phototest.tif - alto

 tesseract phototest.tif - -c tessedit_create_alto=1


<?xml version="1.0" encoding="UTF-8"?>
<alto xmlns="http://www.loc.gov/standards/alto/ns-v3#" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/alto/ns-v3# http://www.loc.gov/alto/v3/alto-3-0.xsd">
        <Description>
                <MeasurementUnit>pixel</MeasurementUnit>
                <sourceImageInformation>
                        <fileName>                      </fileName>
                </sourceImageInformation>
                <OCRProcessing ID="OCR_0">
                        <ocrProcessingStep>
                                <processingSoftware>
                                        <softwareName>tesseract 5.0.0-alpha-332-gb839</softwareName>
                                </processingSoftware>
                        </ocrProcessingStep>
                </OCRProcessing>
        </Description>
        <Layout>
Page 1
                <Page WIDTH="640" HEIGHT="480" PHYSICAL_IMG_NR="0" ID="page_0">
                        <PrintSpace HPOS="0" VPOS="0" WIDTH="640" HEIGHT="480">
                                <TextBlock ID="block_0" HPOS="36" VPOS="92" WIDTH="582" HEIGHT="269">
                                        <TextLine ID="line_0" HPOS="36" VPOS="92" WIDTH="544" HEIGHT="30">
                                                <String ID="string_0" HPOS="36" VPOS="92" WIDTH="60" HEIGHT="24" WC="0.87" CONTENT="This"/><SP WIDTH="13" VPOS="92" HPOS="96"/>
                                                <String ID="string_1" HPOS="109" VPOS="92" WIDTH="20" HEIGHT="24" WC="0.87" CONTENT="is"/><SP WIDTH="12" VPOS="92" HPOS="129"/>
                                                <String ID="string_2" HPOS="141" VPOS="98" WIDTH="15" HEIGHT="18" WC="0.86" CONTENT="a"/><SP WIDTH="13" VPOS="98" HPOS="156"/>
                                                <String ID="string_3" HPOS="169" VPOS="92" WIDTH="32" HEIGHT="24" WC="0.86" CONTENT="lot"/><SP WIDTH="11" VPOS="92" HPOS="201"/>
                                                <String ID="string_4" HPOS="212" VPOS="92" WIDTH="28" HEIGHT="24" WC="0.92" CONTENT="of"/><SP WIDTH="11" VPOS="92" HPOS="240"/>
                                                <String ID="string_5" HPOS="251" VPOS="92" WIDTH="31" HEIGHT="24" WC="0.93" CONTENT="12"/><SP WIDTH="14" VPOS="92" HPOS="282"/>
                                                <String ID="string_6" HPOS="296" VPOS="92" WIDTH="68" HEIGHT="30" WC="0.92" CONTENT="point"/><SP WIDTH="10" VPOS="92" HPOS="364"/>
                                                <String ID="string_7" HPOS="374" VPOS="93" WIDTH="53" HEIGHT="23" WC="0.92" CONTENT="text"/><SP WIDTH="10" VPOS="93" HPOS="427"/>
                                                <String ID="string_8" HPOS="437" VPOS="93" WIDTH="26" HEIGHT="23" WC="0.93" CONTENT="to"/><SP WIDTH="11" VPOS="93" HPOS="463"/>
                                                <String ID="string_9" HPOS="474" VPOS="93" WIDTH="52" HEIGHT="23" WC="0.92" CONTENT="test"/><SP WIDTH="10" VPOS="93" HPOS="526"/>
                                                <String ID="string_10" HPOS="536" VPOS="92" WIDTH="44" HEIGHT="24" WC="0.92" CONTENT="the"/>
                                        </TextLine>
                                        <TextLine ID="line_1" HPOS="36" VPOS="126" WIDTH="582" HEIGHT="31">
                                                <String ID="string_11" HPOS="36" VPOS="132" WIDTH="45" HEIGHT="18" WC="0.91" CONTENT="ocr"/><SP WIDTH="10" VPOS="132" HPOS="81"/>
                                                <String ID="string_12" HPOS="91" VPOS="126" WIDTH="69" HEIGHT="24" WC="0.91" CONTENT="code"/><SP WIDTH="12" VPOS="126" HPOS="160"/>
                                                <String ID="string_13" HPOS="172" VPOS="126" WIDTH="51" HEIGHT="24" WC="0.90" CONTENT="and"/><SP WIDTH="13" VPOS="126" HPOS="223"/>
                                                <String ID="string_14" HPOS="236" VPOS="132" WIDTH="50" HEIGHT="18" WC="0.88" CONTENT="see"/><SP WIDTH="13" VPOS="132" HPOS="286"/>
                                                <String ID="string_15" HPOS="299" VPOS="126" WIDTH="15" HEIGHT="24" WC="0.84" CONTENT="if"/><SP WIDTH="11" VPOS="126" HPOS="314"/>
                                                <String ID="string_16" HPOS="325" VPOS="126" WIDTH="14" HEIGHT="24" WC="0.93" CONTENT="it"/><SP WIDTH="9" VPOS="126" HPOS="339"/>
                                                <String ID="string_17" HPOS="348" VPOS="126" WIDTH="85" HEIGHT="24" WC="0.91" CONTENT="works"/><SP WIDTH="12" VPOS="126" HPOS="433"/>
                                                <String ID="string_18" HPOS="445" VPOS="132" WIDTH="33" HEIGHT="18" WC="0.40" CONTENT="on"/><SP WIDTH="22" VPOS="132" HPOS="478"/>
                                                <String ID="string_19" HPOS="500" VPOS="126" WIDTH="29" HEIGHT="24" WC="0.40" CONTENT="all"/><SP WIDTH="12" VPOS="126" HPOS="529"/>
                                                <String ID="string_20" HPOS="541" VPOS="127" WIDTH="77" HEIGHT="30" WC="0.92" CONTENT="types"/>
                                        </TextLine>
                                        <TextLine ID="line_2" HPOS="36" VPOS="160" WIDTH="187" HEIGHT="24">
                                                <String ID="string_21" HPOS="36" VPOS="160" WIDTH="28" HEIGHT="24" WC="0.92" CONTENT="of"/><SP WIDTH="8" VPOS="160" HPOS="64"/>
                                                <String ID="string_22" HPOS="72" VPOS="160" WIDTH="41" HEIGHT="24" WC="0.89" CONTENT="file"/><SP WIDTH="10" VPOS="160" HPOS="113"/>
                                                <String ID="string_23" HPOS="123" VPOS="160" WIDTH="100" HEIGHT="24" WC="0.91" CONTENT="format."/>
                                        </TextLine>
                                        <TextLine ID="line_3" HPOS="36" VPOS="194" WIDTH="549" HEIGHT="31">
                                                <String ID="string_24" HPOS="36" VPOS="194" WIDTH="55" HEIGHT="24" WC="0.88" CONTENT="The"/><SP WIDTH="11" VPOS="194" HPOS="91"/>
                                                <String ID="string_25" HPOS="102" VPOS="194" WIDTH="75" HEIGHT="30" WC="0.90" CONTENT="quick"/><SP WIDTH="12" VPOS="194" HPOS="177"/>
                                                <String ID="string_26" HPOS="189" VPOS="194" WIDTH="85" HEIGHT="24" WC="0.90" CONTENT="brown"/><SP WIDTH="13" VPOS="194" HPOS="274"/>
                                                <String ID="string_27" HPOS="287" VPOS="194" WIDTH="52" HEIGHT="31" WC="0.91" CONTENT="dog"/><SP WIDTH="9" VPOS="194" HPOS="339"/>
                                                <String ID="string_28" HPOS="348" VPOS="194" WIDTH="108" HEIGHT="31" WC="0.91" CONTENT="jumped"/><SP WIDTH="12" VPOS="194" HPOS="456"/>
                                                <String ID="string_29" HPOS="468" VPOS="200" WIDTH="63" HEIGHT="18" WC="0.91" CONTENT="over"/><SP WIDTH="9" VPOS="200" HPOS="531"/>
                                                <String ID="string_30" HPOS="540" VPOS="194" WIDTH="45" HEIGHT="24" WC="0.93" CONTENT="the"/>
                                        </TextLine>
                                        <TextLine ID="line_4" HPOS="37" VPOS="228" WIDTH="548" HEIGHT="31">
                                                <String ID="string_31" HPOS="37" VPOS="228" WIDTH="55" HEIGHT="31" WC="0.91" CONTENT="lazy"/><SP WIDTH="11" VPOS="228" HPOS="92"/>
                                                <String ID="string_32" HPOS="103" VPOS="228" WIDTH="50" HEIGHT="24" WC="0.92" CONTENT="fox."/><SP WIDTH="12" VPOS="228" HPOS="153"/>
                                                <String ID="string_33" HPOS="165" VPOS="228" WIDTH="55" HEIGHT="24" WC="0.92" CONTENT="The"/><SP WIDTH="12" VPOS="228" HPOS="220"/>
                                                <String ID="string_34" HPOS="232" VPOS="228" WIDTH="75" HEIGHT="30" WC="0.90" CONTENT="quick"/><SP WIDTH="12" VPOS="228" HPOS="307"/>
                                                <String ID="string_35" HPOS="319" VPOS="228" WIDTH="85" HEIGHT="24" WC="0.90" CONTENT="brown"/><SP WIDTH="13" VPOS="228" HPOS="404"/>
                                                <String ID="string_36" HPOS="417" VPOS="228" WIDTH="51" HEIGHT="31" WC="0.91" CONTENT="dog"/><SP WIDTH="10" VPOS="228" HPOS="468"/>
                                                <String ID="string_37" HPOS="478" VPOS="228" WIDTH="107" HEIGHT="31" WC="0.91" CONTENT="jumped"/>
                                        </TextLine>
                                        <TextLine ID="line_5" HPOS="36" VPOS="262" WIDTH="561" HEIGHT="31">
                                                <String ID="string_38" HPOS="36" VPOS="268" WIDTH="63" HEIGHT="18" WC="0.91" CONTENT="over"/><SP WIDTH="10" VPOS="268" HPOS="99"/>
                                                <String ID="string_39" HPOS="109" VPOS="262" WIDTH="44" HEIGHT="24" WC="0.91" CONTENT="the"/><SP WIDTH="12" VPOS="262" HPOS="153"/>
                                                <String ID="string_40" HPOS="165" VPOS="262" WIDTH="56" HEIGHT="31" WC="0.91" CONTENT="lazy"/><SP WIDTH="10" VPOS="262" HPOS="221"/>
                                                <String ID="string_41" HPOS="231" VPOS="262" WIDTH="50" HEIGHT="24" WC="0.92" CONTENT="fox."/><SP WIDTH="13" VPOS="262" HPOS="281"/>
                                                <String ID="string_42" HPOS="294" VPOS="262" WIDTH="55" HEIGHT="24" WC="0.91" CONTENT="The"/><SP WIDTH="11" VPOS="262" HPOS="349"/>
                                                <String ID="string_43" HPOS="360" VPOS="262" WIDTH="75" HEIGHT="30" WC="0.91" CONTENT="quick"/><SP WIDTH="12" VPOS="262" HPOS="435"/>
                                                <String ID="string_44" HPOS="447" VPOS="262" WIDTH="85" HEIGHT="24" WC="0.91" CONTENT="brown"/><SP WIDTH="13" VPOS="262" HPOS="532"/>
                                                <String ID="string_45" HPOS="545" VPOS="262" WIDTH="52" HEIGHT="31" WC="0.91" CONTENT="dog"/>
                                        </TextLine>
                                        <TextLine ID="line_6" HPOS="43" VPOS="296" WIDTH="518" HEIGHT="31">
                                                <String ID="string_46" HPOS="43" VPOS="296" WIDTH="107" HEIGHT="31" WC="0.90" CONTENT="jumped"/><SP WIDTH="12" VPOS="296" HPOS="150"/>
                                                <String ID="string_47" HPOS="162" VPOS="302" WIDTH="64" HEIGHT="18" WC="0.92" CONTENT="over"/><SP WIDTH="9" VPOS="302" HPOS="226"/>
                                                <String ID="string_48" HPOS="235" VPOS="296" WIDTH="44" HEIGHT="24" WC="0.92" CONTENT="the"/><SP WIDTH="13" VPOS="296" HPOS="279"/>
                                                <String ID="string_49" HPOS="292" VPOS="296" WIDTH="55" HEIGHT="31" WC="0.91" CONTENT="lazy"/><SP WIDTH="10" VPOS="296" HPOS="347"/>
                                                <String ID="string_50" HPOS="357" VPOS="296" WIDTH="50" HEIGHT="24" WC="0.92" CONTENT="fox."/><SP WIDTH="13" VPOS="296" HPOS="407"/>
                                                <String ID="string_51" HPOS="420" VPOS="296" WIDTH="55" HEIGHT="24" WC="0.91" CONTENT="The"/><SP WIDTH="11" VPOS="296" HPOS="475"/>
                                                <String ID="string_52" HPOS="486" VPOS="296" WIDTH="75" HEIGHT="30" WC="0.91" CONTENT="quick"/>
                                        </TextLine>
                                        <TextLine ID="line_7" HPOS="37" VPOS="330" WIDTH="524" HEIGHT="31">
                                                <String ID="string_53" HPOS="37" VPOS="330" WIDTH="85" HEIGHT="24" WC="0.92" CONTENT="brown"/><SP WIDTH="13" VPOS="330" HPOS="122"/>
                                                <String ID="string_54" HPOS="135" VPOS="330" WIDTH="52" HEIGHT="31" WC="0.92" CONTENT="dog"/><SP WIDTH="9" VPOS="330" HPOS="187"/>
                                                <String ID="string_55" HPOS="196" VPOS="330" WIDTH="108" HEIGHT="31" WC="0.91" CONTENT="jumped"/><SP WIDTH="12" VPOS="330" HPOS="304"/>
                                                <String ID="string_56" HPOS="316" VPOS="336" WIDTH="63" HEIGHT="18" WC="0.91" CONTENT="over"/><SP WIDTH="9" VPOS="336" HPOS="379"/>
                                                <String ID="string_57" HPOS="388" VPOS="330" WIDTH="45" HEIGHT="24" WC="0.92" CONTENT="the"/><SP WIDTH="12" VPOS="330" HPOS="433"/>
                                                <String ID="string_58" HPOS="445" VPOS="330" WIDTH="55" HEIGHT="31" WC="0.92" CONTENT="lazy"/><SP WIDTH="11" VPOS="330" HPOS="500"/>
                                                <String ID="string_59" HPOS="511" VPOS="330" WIDTH="50" HEIGHT="24" WC="0.93" CONTENT="fox."/>
                                        </TextLine>
                                </TextBlock>
                        </PrintSpace>
                </Page>
        </Layout>
</alto>



To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/67416782-1c49-46a0-b55a-d86806c98734%40googlegroups.com.

Tommy Klausen

unread,
Aug 8, 2019, 9:03:15 PM8/8/19
to tesseract-ocr
Thank you.
Did you see my attached file above?

Tommy

shree

unread,
Aug 9, 2019, 2:41:48 AM8/9/19
to tesseract-ocr
I hope other members who use tesseract with python will provide the needed guidance.

Tommy Klausen

unread,
Aug 18, 2019, 6:25:25 PM8/18/19
to tesseract-ocr
I hope some day soon because I am in bit of a hurry.
I am in the right forum for this question?

Tommy

Reply all
Reply to author
Forward
0 new messages