<xdxf> should have 'version' attribute to be true extensible and more

177 views
Skip to first unread message

Alex Enz

unread,
Dec 2, 2011, 11:43:27 AM12/2/11
to XDXF - The Extensible (XML) Dictionary Exchange Format
Here I suggest to discuss new features that are dictionary (article)
related and can be put into spec.
Short list of proposed are:
- dictionary index
- embedded XPM

Sergey

unread,
Dec 3, 2011, 10:40:58 AM12/3/11
to XDXF - The Extensible (XML) Dictionary Exchange Format
Good idea.

Leonīds Sošinskis

unread,
Apr 13, 2012, 7:31:03 AM4/13/12
to xdxf-...@googlegroups.com
I think that, that dictionary index should be made by the dictionary software developers.
Becase in different cases they might need different structure/format of the index.

What is so magical about XPM format? I'm thinking about base64 though.

Alex Enz

unread,
Apr 14, 2012, 6:03:15 PM4/14/12
to XDXF - The Extensible (XML) Dictionary Exchange Format
I don't see any restrictions against having standardized index file
format for xdxf format. Client software may use it with no need to
index dictionary itself. Yet it must be noted that modifying
dictionary makes related index file invalid. Though single dictionary
without index file is just a raw data that must be parsed and indexed
in some way. Additionally I expect that standardized index can be more
advanced over indexes produced by existing clients. For example, let
we have the following English-Russian article:

<ar><k>cast</k>
<b>I</b>
<dtrn>приведение <co>(<i>к какой-л. форме</i>)</co> || приводить
<co>(<i>к какой-л. форме</i>)</co></dtrn>
<ex>to cast out — отбрасывать <co>(<i>напр., члены ряда</i>)</co></
ex>
- <kref>type cast</kref>
<b>II</b>
<dtrn>изобразительный ряд <co>(<i>создаваемого компьютерного фильма</
i>)</co></dtrn></ar>


This article have one example 'to cast out'. Generic client would
index 'cast' keyword only, but not examples. I consider it as a
serious shortcoming. We may want to find exactly 'cast out' article or
article having such example.
This means that we can't apply on client software to index dictionary
correctly.
Standard index can be produced by some standard xdxf index generator.
That's it. It's up to client software whether to use such index or
not.

Leonīds Sošinskis

unread,
Apr 18, 2012, 3:51:26 AM4/18/12
to xdxf-...@googlegroups.com
I'm seriously sure, that creating index should be on the part of the software.
And in all feature lists there is note whether fulltext indexing is performed.
You should just choose the dictionary that supports it :)
By the way, you might want to look at the new XDXF standard - https://github.com/soshial/xdxf_makedict/tree/master/format_standard
Any criticisms are welcome! :)

Alex Enz

unread,
Jun 18, 2012, 7:14:06 AM6/18/12
to xdxf-...@googlegroups.com
As far as I understand there are new tags such as <abbreviations>. It looks remarkable. It will make abbreviations and dictionary itself in one file. It's good. I don't see any reason to have them split.
<syn>, <ant>, <hpr> ... these all is good, good, good :).

Lenny Soshinskiy

unread,
Jun 18, 2012, 7:22:32 AM6/18/12
to xdxf-...@googlegroups.com
Glad you liked it :) Unfortunately, ther is still lack of support of the functions, if you have any time and skill to help with it, you may join xdxf converter or goldendict xdxf support.

Alex Enz

unread,
Jun 18, 2012, 11:10:24 AM6/18/12
to xdxf-...@googlegroups.com
Goldendict is hard to build. It has many deps that should be set optional. For example if I don't need any audio or spelling support I really don't want it. I couldn't test it on my slackware machine. But anyway I work on parallel concurrent project XDClient which is more Unix-oriented.

Alex Enz

unread,
Jun 18, 2012, 12:19:24 PM6/18/12
to xdxf-...@googlegroups.com
Did somebody think about creating more general format, not dictionary related? Dictionary is a subclass of reference which is more general. It could be called XRXF - XML Reference eXtensible Format or so. It could be used for storing technical docs, like mans, with very good formatting extended with new tags. Why should we tie to dictionaries?! I investigated web and didn't find any standard format for doc. references. All refs are shipped either as mans or visually formatted HTML. What do you think?


On Monday, June 18, 2012 2:22:32 PM UTC+3, Lenny Soshinskiy wrote:
On Monday, June 18, 2012 2:22:32 PM UTC+3, Lenny Soshinskiy wrote:

Lenny Soshinskiy

unread,
Jul 17, 2012, 6:48:06 AM7/17/12
to xdxf-...@googlegroups.com
If I understood you correct, actually there are some formats that are meant for what you said: .md (http://en.wikipedia.org/wiki/Markdown)
There is examle of Github supporting this documentation format https://github.com/nltk/nltk/blob/master/README.md
I would also like to state that XDXF as a dictionary format is designed to be as simple as it is possible for an attempt to code all dictionary sematics. BUT! it already becomes overgrown by excessive tags, which I'm not sure yet needed or not. So it will be a process of simplification, not otherwise.

Looking forward on your answer.

Alex Enz

unread,
Jul 19, 2012, 1:51:42 PM7/19/12
to xdxf-...@googlegroups.com
No, Markdown is not that. I talk not about free formatting and natural langs. I talk about strict logical structure for reference books (encyclopedia, dictionaries and other technical and non-technical reference). Take into account that dictionary is a kind of reference. Every reference has list of word and articles tied to each word. Take man pages as a reference database. You type `man grep` and it shows an article for 'grep' page. If we talk about dictionaries for every word there is corresponding article with translation or definition. XDXF is best suited for dictionaries, but not for other types of references. Take as an example some technical reference. FreeType for example (http://freetype.sourceforge.net/freetype2/docs/reference/ft2-index.html). You can use HTML doc for this. But what about packing it into logical standard format, like XDXF for dictionaries. It will make possible to search for special reference page with simple util. This is what 'man' does. But,
1 'man' is relatively slow and not very suitable for really big references
2 It uses Roff language which is visual, not logical

Standard logical format will be very useful for technical programming references, among others. It's just about not to use PDF and HTML. It's about using 'man'-like CLI or GUI util for quick search.
Reply all
Reply to author
Forward
0 new messages