On Tue, Feb 25, 2014 at 09:13:15AM -0800, Will Manley wrote:
> For anyone who's interested: I've created a git mirror of tesseract-ocr. It
> includes history from both googlecode SVN and sourceforge.net CVS. I've tidied
> up the commit/author information where I could.
That's great, thanks for that. Is it automatically updated when
there are new SVN commits?
It would be handy if we could switch to using git as part of the
main project; having proper local git branching is incredibly
useful :)
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Well I'm ambivalent about it, as I would have to learn git from scratch, although that may be useful. I currently only use svn on a very basic level anyway as most of the time I use a perforce-like system, and merge with the svn repository in a separate svn client.I have been told that the merge facility with git is "really good" but I currently use meld for all my merges and it works very well for that. Anyone with experience of both like to comment on that?
If we were to switch to git, there is a button in the admin pages that could just make it happen. I have no idea how easy it would be to add the sourceforge change data to it.
There are two semi-independent changes which could be made:1. Switch from svn to git for source control2. Switch from Google Code to Github for project hosting
When we move to git we should get rid of everything in tessdata/
except the tessconfigs/ and configs/ directories.
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/20140806142657.GF7804%40manta.lan.
Visit this group at http://groups.google.com/group/tesseract-dev.
For more options, visit https://groups.google.com/d/optout.
Hi,
For the question of Github user and repositories, can somebody request a Tesseract organization user. This way there can be administrators etc.
Jan
Hi Ray,
That plan sounds great to me, thanks for this.
On Fri, Aug 08, 2014 at 03:51:55PM -0700, Ray Smith wrote:
> OK, after much time spend doing little about this, I now have a plan:
> 1. Switch tesseract-ocr to git.
> 2. Create 2 new repositories - tessdata and langdata.
> 3. Add new language source data to langdata, and .traineddata files to
> tessdata. (configs and tessconfigs stay with the source code.)
> 4. Updates to the new repositories are the releases of the big data blobs -
> they can be tagged with versions quite easily for clarity.
> 5. tarballs probably have to still go to Google drive.
> 6. Syncing and updating the code to fix more issues will be my learning
> experience with git, but it looks very similar to svn in many ways.
A couple of "nice to have, but not worth spending much extra time
over" items:
- If you could use git tags to make clear exactly which langdata
commit the .traineddata in tessdata was built from, that would be
a nice thing.
- Add a line at the top of lang.config in the .traineddata that
makes clear the git revision it's built from, so add a rule doing
that to whatever make process you currently have.
Also, what do you want to do regarding training developed by others?
Could appropriately automated and self-contained trainings make it
into the langdata repository? That seems sensible to me.
Thanks again for this :)
Nick
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/20140812143555.GA4889%40manta.lan.
Something is currently interfering with your secure connection to tessdata.tesseract-ocr.googlecode.com.
Try to reload this page in a few minutes or after switching to a new network. If you have recently connected to a new Wi-Fi network, finish logging in before reloading.
If you were to visit tessdata.tesseract-ocr.googlecode.com right now, you might share private information with an attacker. To protect your privacy, Chrome will not load the page until it can establish a secure connection to the real tessdata.tesseract-ocr.googlecode.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/a0d944c2-6290-4cec-8b44-a568880b63fc%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAG2NduV%3DSGT8iuuPP7kdjhmk71P-gn1sJ_tdOwmsXkoh6G3U0Q%40mail.gmail.com.
I can confirm this. An example failing URL is
https://tessdata.tesseract-ocr.googlecode.com/archive/bf82613055ebc6e63d9e3b438a5c234bfd638c93.zip
Command-line access
Get a local copy of the tesseract-ocr tessdata repository with this command:
But, this picks up the whole repository for all languages. Is there a way to just download traineddata for one language?
Thanks,
Shree
--
You received this message because you are subscribed to a topic in the Google Groups "tesseract-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/tesseract-dev/kJEYuvEZuDs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/13b401fd-cb28-4d65-91e1-1a8401ec5a46%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "tesseract-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-de...@googlegroups.com.
To post to this group, send email to tesser...@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-dev.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CAG2NduV4jRcTw4YPkzkK2mosknfJ8kb2tAgPWQbD%2B7f8JUgoFw%40mail.gmail.com.
Yes I also wanted to download only one set of kan.langdata
Srirangaji,
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/c6f031eb-77cf-4a4e-98c4-817f84a6f497%40googlegroups.com.
There's a bug in Google Code, I think. Remove the "s" from https://langdata.tesseract-ocr...., so the new address starts with http://langdata.tesseract-ocr....
Yes, I meant the bug is that it's linking to HTTPS despite the SSL certificate is not valid for the domain. Paul
rao,I was able to download the kan.traineddata and attached herewith for your usage. pl feedback to me whether it is same as old tdf or has improved one?
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/62d21840-b00d-492c-9f4b-aaf24a031d05%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-dev/CANKD7YzN3xoHDe_Dh3K3dFhjMmK1aW2s%2BpS69dTAJbvRVV5KiA%40mail.gmail.com.