The Okapi development team is happy to announce the release of the brand new cross-platform Okapi Framework. It has been completely re-designed and re-written in Java. It runs on Linux, Windows and Macintosh.
You can download the different Okapi packages from here:
http://okapi.opentag.com/downloads.html
The framework is composed of different parts, including:
- Rainbow, a toolbox for localization tasks: For example you can use Rainbow to perform some tasks such as preparing documents for translation (with XLIFF, OmegaT, RTF); for converting the encoding of a file; for doing search and replace on the text parts of files in different file formats, and for many more things...
- There is also a segmenter, an SRX-based segmentation engine to break-down paragraphs into sentences. It can be easily integrated with the other components of the framework. A small standalone application (Ratel) allows you to create and maintain segmentation rules in WYSIWYG mode.
- The most interesting part is probably the library that includes many components that you can re-use in your own programs or scripts (with Jython for example). This includes several filters sharing a common API that allow you to access, manipulate and re-write translatable text. The resource model of the extracted data and the event-driven pipeline mechanism used by the framework is documented in the Developer's Guide here: http://okapi.opentag.com/devguide/index.html
The pipeline components are still in early development in this first milestone, but it will become the backbone of the framework in the next releases. Several additional important components will also be coming up: more filters, a TM engine, terminology support, and much more.
The Okapi project’s main purpose is to architect a set of building blocks for the creation of larger open source localization and translation tools. But many Okapi components are generic enough to be of interest to the text mining, natural language processing and text retrieval communities. Okapi’s many text filters (HTML, Properties, XML (ITS XPath-based rules), OpenXML, ODF, Regex etc.) provide a straightforward way to access the text of multiple document formats. Okapi’s document events and pipeline can be made to integrate with other frameworks such as UIMA, LingPipe, OpenPipeline, OpenNLP, GATE and Lucene.
The advantage of Okapi’s text filters is that not only is text extracted, but all non-textual formatting is preserved. It is possible to decompose a document into events, process them via the pipeline, and then rebuild the input document without loss. Structural information can be added to Okapi document events so that tables, lists, links, titles etc. are grouped together and treated as a unit. This is useful when context based on a “universal” document structure is needed.
The Okapi event model supports user configurable annotations, similar to UIMA, but simpler and more restricted in scope. User can annotate spans of text or add new resources such as translation memory matches, terminology, token types, or part of speech information.
Okapi main web site:
Google-Code project:
http://code.google.com/p/okapi/
Users group:
http://tech.groups.yahoo.com/group/okapitools/
Developer group:
http://groups.google.com/group/okapi-devel
-the Okapi Team
Asgeir (Red Hat),
Fredrik, Yves (ENLASO)
Jim, Sergei, Christian, Dan (LDS Church)
On jeudi 30 avr. 09, at 23:35, Yves Savourel wrote:
> The Okapi development team is happy to announce the release of the
> brand new cross-platform Okapi Framework. It has been completely
> re-designed and re-written in Java. It runs on Linux, Windows and
> Macintosh.
>
> You can download the different Okapi packages from here:
> http://okapi.opentag.com/downloads.html
Jean-Christophe Helary
------------------------------------
http://mac4translators.blogspot.com/