UKB for Czech Wordnet

36 views
Skip to first unread message

pomp...@gmail.com

unread,
Dec 16, 2015, 10:53:22 AM12/16/15
to ukblist
Hello everybody,

I'm trying to use UKB with Czech Wordnet (https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0001-4880-3), but the problem is that it is stored as a single XML file with entries like:

<SYNSET><ID>00005811-v</ID><POS>v</POS><SYNONYM><LITERAL>mrkat<SENSE>2</SENSE></LITERAL><LITERAL>zamrkat<SENSE>1</SENSE></LITERAL></SYNONYM><ILR>00559482-v<TYPE>hypernym</TYPE></ILR></SYNSET>

or
 
<SYNSET><ID>00014558-n</ID><POS>n</POS><SYNONYM><LITERAL>forma<SENSE>1</SENSE></LITERAL><LITERAL>tvar<SENSE>1</SENSE></LITERAL><LITERAL>podoba<SENSE>1</SENSE></LITERAL></SYNONYM><BASE>*</BASE></SYNSET>

but I haven't found any script to convert it to dict/ files, which are expected by UKB scripts.
Please, has anybody encountered anything similar?

Best regards,
Roman Sudarikov

Aitor Soroa

unread,
Dec 16, 2015, 11:05:05 AM12/16/15
to ukb...@googlegroups.com
Hi Roman,

if you provide me with a sample XML file I can try to hack a script to
create the graph file and dictionary as needed by ukb.

best,
aitor

Roma Sudarikov

unread,
Dec 16, 2015, 11:08:32 AM12/16/15
to ukblist
Hi Aitor,

thank you for such a quick response.

Best regards,
Roman Sudarikov
Czech_WordNet_1.9_PDT.zip

Aitor Soroa

unread,
Dec 16, 2015, 11:23:44 AM12/16/15
to ukb...@googlegroups.com
Hi Roman,

unfortunately, this is not a valid XML file:

$ xmllint --noout nas_anotacni_slovnik.xml
nas_anotacni_slovnik.xml:3: parser error : Extra content at the end of the document
<SYNSET><ID>00004022-v</ID><POS>v</POS><SYNONYM><LITERAL>inhalovat<SENSE>1</SENS
^

and therefore I can't use any XML library, which makes the job more
complex. Do you have input files which are valid XML?

best,
aitor

On Wed, Dec 16, 2015 at 08:08:32AM -0800, Roma Sudarikov wrote:
> Hi Aitor,
>
> thank you for such a quick response.
> I've attached the file, but also you can download it from the page I've
> mentioned
> (https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11858/00-097C-0000-0001-4880-3/Czech_WordNet_1.9_PDT.zip?sequence=1&isAllowed=y)
>
> Best regards,
> Roman Sudarikov
>
>
> On Wednesday, December 16, 2015 at 5:05:05 PM UTC+1, Aitor Soroa wrote:
> >
> > Hi Roman,
> >
> > if you provide me with a sample XML file I can try to hack a script to
> > create the graph file and dictionary as needed by ukb.
> >
> > best,
> > aitor
> >
> > On Wed, Dec 16, 2015 at 07:53:22AM -0800, pomp...@gmail.com <javascript:>
> > wrote:
> > > Hello everybody,
> > >
> > > I'm trying to use UKB with Czech Wordnet
> > > (
> > https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0001-4880-3
> > <https://www.google.com/url?q=https%3A%2F%2Flindat.mff.cuni.cz%2Frepository%2Fxmlui%2Fhandle%2F11858%2F00-097C-0000-0001-4880-3&sa=D&sntz=1&usg=AFQjCNFMRAwKm7903ma64-8vcnLRSQYThg>),
> >
> > > but the problem is that it is stored as a single XML file with entries
> > like:
> > >
> > >
> > <SYNSET><ID>00005811-v</ID><POS>v</POS><SYNONYM><LITERAL>mrkat<SENSE>2</SENSE></LITERAL><LITERAL>zamrkat<SENSE>1</SENSE></LITERAL></SYNONYM><ILR>00559482-v<TYPE>hypernym</TYPE></ILR></SYNSET>
> >
> > >
> > > or
> > >
> > >
> > <SYNSET><ID>00014558-n</ID><POS>n</POS><SYNONYM><LITERAL>forma<SENSE>1</SENSE></LITERAL><LITERAL>tvar<SENSE>1</SENSE></LITERAL><LITERAL>podoba<SENSE>1</SENSE></LITERAL></SYNONYM><BASE>*</BASE></SYNSET>
> >
> > >
> > > but I haven't found any script to convert it to dict/ files, which are
> > > expected by UKB scripts.
> > > Please, has anybody encountered anything similar?
> > >
> > > Best regards,
> > > Roman Sudarikov
> >
>
> --
> You received this message because you are subscribed to the Google Groups "ukblist" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ukblist+u...@googlegroups.com.
> To post to this group, send email to ukb...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ukblist.
> For more options, visit https://groups.google.com/d/optout.



--
ondo izan
aitor

Roma Sudarikov

unread,
Dec 17, 2015, 1:47:56 PM12/17/15
to ukblist
Hi Aitor,

sorry, I didn't notice that.
Sure, here is fixed version. I've checked xmllint and it worked fine.

Best regards,
Roman

среда, 16 декабря 2015 г., 17:23:44 UTC+1 пользователь Aitor Soroa написал:
CzechWordnet1.9.zip

Aitor Soroa

unread,
Dec 18, 2015, 3:33:11 AM12/18/15
to ukb...@googlegroups.com
Hi Roman,

please find attached the graph and dictionary files for the Czech
wordnet, along with the script used to create them.

best, and good luck with your experiments!

aitor
> > an email to ukblist+u...@googlegroups.com <javascript:>.
> > > To post to this group, send email to ukb...@googlegroups.com
> > <javascript:>.
ukb_cz.bz2

Roma Sudarikov

unread,
Dec 21, 2015, 9:12:26 AM12/21/15
to ukb...@googlegroups.com
Hi Aitor,

thank you very much for your help!

Best regards,
Roman Sudarikov

You received this message because you are subscribed to a topic in the Google Groups "ukblist" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/ukblist/zD27zqOOi8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to ukblist+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages