Good day.
INTRODUCTION
Open Source Buddhism Library initiative it is team of developers
and practitioners dedicated to mission of Dharma texts public
access, study and practice.
About this initiative you may read here
https://www.buddism.ru///___DHARMA___/1548842022.phtml?edit=print
COOPERATION PROPOSAL
1. At the present
time, based on 5 years of research developed algorithms for
optical computer recognition OCR of the the Tibetan
manuscripts.
For about 7 years our library support free OCR service for
tibetan and sanskrit texts. This service established on base of
Trace Foundation grant and library users donations.
The main core of this service it is OCRLib c++ development.
https://github.com/RimeOCRLIB/OCRLib
www.buddism.ru/ocr
For task of tibetan and sanskrit manuscript OCR we have in
development open source OCR library. This library
core is C, ASM and C++ based.
This OCR copre it is four level convolution network with
functional network level segmentation. Every level of network
is representation of human physiology recognition process
represented as fast and optimized C and ASM functions. For
hand write texts recognition it is used topological graph
analysis of image skeleton structure.
https://www.buddism.ru///ocrlib/documentation/OCRLib_documentation2018_eng.pdf
This development was subject of few official and private grant
donation and we represent that efforts as collective work.
2. Following the
development now can be set the task of recognition of all the
materials of the library and create a single Corpus of
Buddhist texts, including printed books and manuscripts.
This corpus need to be include tibetan, sanskrit, pali,
mongolian chinese and western languages Dharma texts with
opportunity to search, grammar and dictionary analysis.
At present corpus include 5gb of tibetan, pali and western
texts. Example of text corpus record
http://www.buddism.ru:4000/?index=5030&field=1&ln=eng&ocrData=TibetanUTFToEng&mode=read&ln=eng
This text corpus development include tibetan-sanskrit-western
joined dictionary edition and translation memory development.
At present joined tibetan dictionary include 360 000 unique
articles and translation memory database include 150 000
translation records.
3. On base of OCR
and grammar analysis development it is need publish free and
open for public comparative edition of the whole Tibetan
Tripitaka Canon and translate three volumes on Russian and
English.
At present we try to reach 84000.co team with this proposal.
We may represent of lider russian translators works on base of
library tibetan texts knowledge base.
Sarva Mangalam!
Open Source Buddhism Library
www.buddism.ru
Alexander Stroganov