I had to edit a few Tesseract box files to generate training data recently and didn't find any of the
existing tools to my liking. I wanted something that ran on Mac OS X and showed letters inside their boxes.
So I built a web-based tool which I'm calling boxedit.
A few things to like about it:
- It's entirely browser-based, so it runs on any platform and requires no installation.
- You can use the browser's zoom in/out features.
- It shows OCR'd letters on top of the source image, so the accuracy is easy to gauge.
- It can split boxes N ways.
- You can edit the raw box data or use the GUI, either works & they stay in sync.
- It's easy to get going: drag & drop an image and its box file to get started.
A few things to dislike:
- The UI could use some work: the overlaying of transcribed letters could be much clearer.
- Saving your changes back to disk is tedious (my best solution is to copy/paste back into the box file).
- Missing a few important features (e.g. n-way merge and moving/resizing boxes visually)
If people find this useful, I'm happy to polish it a bit more. Feel free to
file issues on GitHub.
- Dan