Skip to first unread message

zest

unread,
Mar 24, 2014, 7:04:49 AM3/24/14
to unitex-...@googlegroups.com
Hello, 

I wonder how to detect and replace concatenated words like :

laville -> la ville
pain auchocolat -> pain au chocolat

My first idea was to use regexp and capture but i didn't see anything about regex capture in the manual.

Is there a way to do this in Unitex ?


Thanks for response

Eric

eric.laporte

unread,
Mar 24, 2014, 8:03:06 AM3/24/14
to unitex-...@googlegroups.com
 Hello,
You can use the morphological mode (in the manual, section 6.4 p. 135). Make sure your graph does not recognise isolated letters: a, b, c etc. as words, otherwise each word in the text might be segmented in letters.
Best regards,
Eric Laporte

zest

unread,
Mar 24, 2014, 10:31:08 AM3/24/14
to unitex-...@googlegroups.com
Hello,

Thanks for response.

I found a solution using the morphological mode as you said. Here a link to a screenshot of the graph: http://pbrd.co/1h1lh5m

I noticed that using "<<vil?e>>" in place of "ville+vile" doesn't work for me although the manual gives this example :

in -> un -> <A><<ble$>>  ->  out  

thanks again.

Eric

eric.laporte

unread,
Mar 25, 2014, 5:01:07 AM3/25/14
to unitex-...@googlegroups.com
Hello,
The "?" operator means "one or zero times". The morphological filter for what you want to do should be <<vill?e>>.
Reply all
Reply to author
Forward
0 new messages