My opinion/suggestion is to have "lean" ocr library (libtesseract) with minimum dependencies, so it can be easily integrated by other project. Maybe with simple example how to use library.
Than it would be great to have (feature rich) tool, that would be help standard use problem like you mention above e.g. downloading missing data files, fixing dpi, image preprocessing (fixing rotation, deskewing...) - so more external dependencies can be expected to have better user experience.
With this scenario training tools would be separated too... ;-)
I believe this can bring more flexibility, because:
- more user friendly frontend can be rapidly develop/released
- adding new features will bring more problems (e.g. for downloading data: using proxy, parsing json data from github api), that are not related ocr itself
- more advanced users can focus on improving API and OCR library (e.g. for python, java C# usage)
- not to forget: others could focus training and looking for improvement at this area (also from coding point of view: e.g. using CUDA or OpenCL)