Re: Android Tesseract - How to include trained languages in my app without workarounds?

4,096 views
Skip to first unread message

Jean-Jérôme Sarrasin

unread,
Sep 12, 2012, 9:37:15 AM9/12/12
to tesser...@googlegroups.com
Have a look at:

They use the getExternalStorageDirectory Android function. It stores the data on the common storage, but you can try to use "getFilesDir": http://developer.android.com/reference/android/content/Context.html#getFilesDir()

I'm not sure the NDK Tesseract project will have access to this storage that is linked to the application, but it should.

Keep us informed



2012/9/12 Napalm <gabriel.no...@gmail.com>
I'm using tesseract for android and dont know how to include trainned data languages in to my application. I'm seeing that most people use static path to the tessdata. The code below show this:

TessBaseAPI ocrManager = new TessBaseAPI();
if(!ocrManager.init("/mnt/sdcard/", "eng")){
Log.e(TAG, "Error on tesseract...");
}

It's not good because if I install my app in another device must copy manually the trained language to the /mnt/sdcard/tessdata/.

Another approach is to include language data to the app AssetManager. But if not mistaken its not possible to gets path from AssetManager. A workaround is to copy trained language to an statical path that i showed in code above. Again it's bad because app generates files outside your memory space.

It's possible to include somehow the trained data in my app and use it in tesseract initialization without any workaround(statical path or copy to statical path)?

Thanks

--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesser...@googlegroups.com
To unsubscribe from this group, send email to
tesseract-oc...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Jean-Jérôme Sarrasin

unread,
Sep 12, 2012, 4:02:21 PM9/12/12
to tesser...@googlegroups.com
FYI, I've tested it and it works !



2012/9/12 Jean-Jérôme Sarrasin <sarr...@icare.ch>
You can't get the path to asset manager as it is packed in the application file. The best solution as I said is to try to copy the file in your application directory through the "getFilesDir" function: http://developer.android.com/reference/android/content/Context.html#getFilesDir()




2012/9/12 Napalm <gabriel.no...@gmail.com>
Using getExternalStorageDirectory not resolves the problem, in the source code he copies trained data from assetManager to the root folder of android. For me it's not good solution. If I can pass path from the tessdata directory direct from my AssetManager or using another approach that not uses statical paths(including external paths from the app such as getExternalStorageDirectory ) or do copies would better.

The better solution would add trained languages in the app and use them without any copy or acessing external paths. The AssetManager would be a good solution, i can create a tessdata folder and insert all trained data. But i don't know  how to get path from AssetManager that i can put to the tesseract initialization.

if anyone has any ideas would greatly help my progress, thanks.

Jean-Jérôme Sarrasin

unread,
Sep 12, 2012, 2:54:46 PM9/12/12
to tesser...@googlegroups.com
You can't get the path to asset manager as it is packed in the application file. The best solution as I said is to try to copy the file in your application directory through the "getFilesDir" function: http://developer.android.com/reference/android/content/Context.html#getFilesDir()



2012/9/12 Napalm <gabriel.no...@gmail.com>
Using getExternalStorageDirectory not resolves the problem, in the source code he copies trained data from assetManager to the root folder of android. For me it's not good solution. If I can pass path from the tessdata directory direct from my AssetManager or using another approach that not uses statical paths(including external paths from the app such as getExternalStorageDirectory ) or do copies would better.

The better solution would add trained languages in the app and use them without any copy or acessing external paths. The AssetManager would be a good solution, i can create a tessdata folder and insert all trained data. But i don't know  how to get path from AssetManager that i can put to the tesseract initialization.

if anyone has any ideas would greatly help my progress, thanks.

Em quarta-feira, 12 de setembro de 2012 10h37min15s UTC-3, Hrk escreveu:

Jean-Jérôme Sarrasin

unread,
Sep 13, 2012, 2:25:17 PM9/13/12
to tesser...@googlegroups.com
Yes, as you say, it's not the best because it doubles the size needed, but it's better than "getExternalDir" as it's in the application directory.

The last solution as you say is to modify the tesseract source. After some research, It should be possible. Here are some hints:
- The path is load in the "getpath" function at the line 40 of the basedir.cpp if you use tess-two.

Give us a feedback if you could make it work.



2012/9/13 Napalm <gabriel.no...@gmail.com>
I modified my project using GetFilesDir(), it's solution is not ideal because i do copies at app execution. Now I will try to modify tesseract source code to read trainneddatas from the tessdata directory on api. For me is the last chance.

Other ideas?

Thanks

Jean-Jérôme Sarrasin

unread,
Sep 19, 2012, 2:36:59 PM9/19/12
to tesser...@googlegroups.com
Hello Gabriel,

Could you make it ?



2012/9/13 Jean-Jérôme Sarrasin <sarr...@icare.ch>
Reply all
Reply to author
Forward
0 new messages