"expansion did not yield any files" assertion

carolin

unread,

Aug 24, 2012, 6:12:36 AM8/24/12

to ocr...@googlegroups.com

Hi,

we installed ocropus 0.6 on an virtual Ubuntu 12.04 maschine according to the instructrions on your web page and got the following assertion on one of our files when using the ocropus-recognize-book script:

carolin@carolin-VirtualBox:~/ocrinput$ ocropus-recognize-book testbild2.png -o output.html
book directory ./_book-003266
testbild2.png -> ./_book-003266/0001.png

=== preprocess

# ocropus-nlbin ./_book-003266/????.png
=== ./_book-003266/0001.png 1
flattening
estimating skew angle
estimating thresholds
rescaling
./_book-003266/0001.png lo-hi (0.05 1.00) angle 0.0
writing

=== page segmentation

# ocropus-gpageseg ./_book-003266/????.bin.png
./_book-003266/0001.bin.png
./_book-003266/0001.bin.png: scale (6.9282) less than --minscale; skipping

=== line recognition

# ocropus-lattices ./_book-003266/????/??????.bin.png
Traceback (most recent call last):
File "/usr/local/bin/ocropus-lattices", line 56, in <module>
    args.files = ocrolib.glob_all(args.files)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/toplevel.py", line 204, in argument_checks
    result = f(*args,**kw)
File "/usr/local/lib/python2.7/dist-packages/ocrolib/common.py", line 509, in glob_all
    raise Exception("%s: expansion did not yield any files"%arg)
Exception: ./_book-003266/????/??????.bin.png: expansion did not yield any files
Traceback (most recent call last):
File "/usr/local/bin/ocropus-recognize-book", line 79, in <module>
    run(args.linerec,book+"/????/??????.bin.png",m=args.model)
File "/usr/local/bin/ocropus-recognize-book", line 57, in run
    assert subprocess.call(args)==0
AssertionError

We attached the sample file which we used. Is there any obvious thing that we are doing wrong?

Best,

Carolin

testbild2.png

rs_nuke

unread,

Oct 3, 2012, 11:23:18 AM10/3/12

to ocr...@googlegroups.com

What does "scale (6.9282) less than --minscale; skipping" really mean? is it the resolution of the image? DPI?

I too have this issue. Curious to know what the issue is?

Tom

unread,

Oct 4, 2012, 2:49:10 AM10/4/12

to ocr...@googlegroups.com

ocropus-gpageseg is telling you that the characters in the image are too small. These are not 10pt characters at 300dpi or larger.

As a result, it doesn't find any text lines. And that's why the next step doesn't have any lines to work on.

OCRopus 0.6 works down to about 10pt at 300dpi. OCRopus 0.7 will work at lower resolutions.

If you want to test with generated test, be sure to generate reasonably sized text or scale up the image.

Tom

unread,

Oct 4, 2012, 2:53:12 AM10/4/12

to ocr...@googlegroups.com

Yes basically. The "scale" is the estimated size of the character x-height in pixels, and your is too small. Your characters are set in a 3pt font on a 300dpi page.

Tom

Sriranga(78yrsold)

unread,

Oct 8, 2012, 5:43:18 AM10/8/12

to Tom

Tom,
Is it not possible if data stored in RAM exceeds 1GB moved or transfer to Swap pragmatically OR data stored in RAM moved every seconds to the Swap and cleared up RAM for next processing the data. - I presumed that this can be done by suitable coding in the relevant source code.in python.
I am of view majority of users may not have higher end of computer having more than 4GB - resulting discourage them for usage of ocropus project. i searched for such type of computer but not available - all depends on motherboards..

If you feel ocropus project is meant only the use of large scale machine learning for addressing problems in document analysis. and NOT for any lang OCR purpose( i.e. small scale) by users. Kindly confirm.
With warmest regards,
-sriranga(79yrs)

On Fri, Oct 5, 2012 at 6:42 PM, Sriranga(78yrsold) <withbl...@gmail.com> wrote:

Tom,
As suggested by you that training requires a lot of memory, I am thinking whether it is feasible to create swap area to 20GB in addition to existing 4GB RAM - which is maximum for my Dell machine. However I find program does not utilize the swap area effectively - for which suitable coding is required I guess - ultimately solve the problem of memory error. At present swap area is 16GB.
Alternatively whether external HDD or internal one HDD can be utilized as combined RAM+swap?
With Warmest Regards,
-sriranga(79yrs)

On Fri, Oct 5, 2012 at 2:48 PM, Tom <tmb...@gmail.com> wrote:

wed, I'm sorry I couldn't help you more. , am not aware Of any memoryleaks, but as I indicated, training requires a lot of memory, We Usually use 8-16GB of memory for training.

Tom

On Oct 4, 2012 9:25 AM, "Sriranga(78yrsold)" <withbl...@gmail.com> wrote:

My computer Dell optiplex 330(System Type X86-based PC)
which has Processor Intel(R)core(TM)DuoCPU E7...@2.53GHz 2.53GHz OS win XPwith Sp3. RAM 4GB(recently added more ram) Ubuntu 12.04(32bit) and also 12.10.(both has swap of 16GB) It appears that my machine does not support Ocropus project which requires higher capacity computer..

After several experiments with python 0.6 it is observed very often memory error will displayed. I feel there is leakage of memory in relevant python source codes of 0.6 more over some of py programs does not support kannada script except during run-test and run-box-training. Since i am not python programmer nor developer. i find difficult to pursue the kannada OCR project under Ocropus Project due to support not received from any one except from you but you are too busy in release of next stage of version.
As such I am frustrated in kannada OCR project of Ocropus - decided to discontinue.
I am always ready to furnish Kannada text, tif or png with its box file generated in tesseract-ocr. to any python programmer or developer for research purpose at any time.and also undertake beta-testing/feedback of py programs for kannada at any stage, if any, received from you.

Thanks for the help rendered to me from time to time.
With warmest Regards,
-sriranga(79yrs)

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/c_q4jBZOA8UJ.

For more options, visit https://groups.google.com/groups/opt_out.

Tom

unread,

Oct 15, 2012, 1:18:56 AM10/15/12

to ocr...@googlegroups.com, Tom

Is it not possible if data stored in RAM exceeds 1GB moved or transfer to Swap pragmatically OR data stored in RAM moved every seconds to the Swap and cleared up RAM for next processing the data. - I presumed that this can be done by suitable coding in the relevant source code in python. [...]

I am of view majority of users may not have higher end of computer having more than 4GB - resulting discourage them for usage of ocropus project.

Users will generally only use OCRopus for recognizing text, and that works fine in 4GB of memory.

It is training that requires large amounts of memory, but users don't usually do that.

The motivation for sharing open source projects is so that people contribute source code to the project. The box training code that you are having problems with is indeed very memory inefficient. Since that seems to be a problem for you, I encourage you to improve the Python code and contribute the improvements.

Tom

Sriranga(78yrsold)

unread,

Oct 15, 2012, 5:18:51 AM10/15/12

to ocr...@googlegroups.com, T.M. Breuel

Tom,
thanks for the prompt clarification. inline reply below for your persual and further guidance please.
With warmest Regards,
-sriranga(79yrs)

On Mon, Oct 15, 2012 at 10:48 AM, Tom <tmb...@gmail.com> wrote:

Is it not possible if data stored in RAM exceeds 1GB moved or transfer to Swap pragmatically OR data stored in RAM moved every seconds to the Swap and cleared up RAM for next processing the data. - I presumed that this can be done by suitable coding in the relevant source code in python. [...]

I am of view majority of users may not have higher end of computer having more than 4GB - resulting discourage them for usage of ocropus project.

Users will generally only use OCRopus for recognizing text, and that works fine in 4GB of memory.

Yes I agree with you. I thought so - that only use OCRopus for recognizing the text containing Kannada script or other indic script like hindi, telugu, sanskrit etc
- How to do without trained language text of relevant languages available for ocropus ?

It is presumed that if the Ocropus is able to use for recognizing text of languages based on generated <lang>.traineddata files for tesseract-OCR by users?.

It is training that requires large amounts of memory, but users don't usually do that.

Yes - it is true. If the training consumes large amounts of memory - users(except commercial vendors) will be discouraged to train language of his choice especially Indic like Kannada, tamil, telugu, hindi sanskrit bengali etc. As such OCRopus for Indic may not available for users in the world !!.

The motivation for sharing open source projects is so that people contribute source code to the project. The box training code that you are having problems with is indeed very memory inefficient. Since that seems to be a problem for you, I encourage you to improve the Python code and contribute the improvements.

Sorry. Since I am not programmer/ developer of python
for which i have to depend on other Python experts. Only I can contribute - by way of beta-testing and feedback for further improvements at your end.

Tom

--
You received this message because you are subscribed to the Google Groups "ocropus" group.
To post to this group, send email to ocr...@googlegroups.com.
To unsubscribe from this group, send email to ocropus+u...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msg/ocropus/-/udPhMuQpfRUJ.

Reply all

Reply to author

Forward