OmegaT 2.0 extract tags/extract text

7 views
Skip to first unread message

Jean-Christophe Helary

unread,
Mar 27, 2009, 4:24:17 AM3/27/09
to Ome...@yahoogroups.com, Mac for Translators
I have 2 sed expressions to extract the tags/text from source.txt and
1 to extract the text from selection.

Extract tags from source:
sed 's/>[^<]*</></g' /path/to/source.txt

-> gives all the tags from the current source segment, it is based on
Robert's grep expression. I also want to thank the debian-fr IRC
members for their inspiration :)

Extract text from source:
sed 's/<[^>]*>//g' /path/to/source.txt

-> gives all the text from the current (supposedly tag encumbered)
source segment.

Extract text from selection:
sed 's/<[^>]*>//g' /path/to/selection.txt

-> gives all the text from the current (supposedly tag encumbered)
selection.


Use case:

With MS Open XML files (eventually ODF), when I enter a segment with a
lot of tags:
1) Extract tags from source
2) Paste the tags into the target field
3) Type the translation between the tags and proceed

When I need to search source contents that is mixed with tags:
1) Extract text from source
2) Paste into the search window

When I have a 100% match where all the tags are different:
1) Select the match text with the mouse and export
2) Extract text from selection
3) Paste the matching text at an appropriate location inside the
target field


On OSX:

The three commands are inside 3 small Applescripts (saved as an
Application Bundle) and come with a piping to xpbcopy (Apple's pbcopy
has issues with encodings). As applications I can call them from
Spotlight.

I could not find the source to xpbcopy, and it looks like there are a
number of similar command line tools, so check the one that best fits
you.


============================
Extract text from selection:
============================
on run {}
do shell script "sed \"s/<[^>]*>//g\" ~/Library/Preferences/OmegaT/
script/selection.txt | /usr/local/bin/xpbcopy"
end run

=========================
Extract text from source:
=========================
on run {}
do shell script "sed \"s/<[^>]*>//g\" ~/Library/Preferences/OmegaT/
script/source.txt | /usr/local/bin/xpbcopy"
end run

=========================
Extract tags from source:
=========================
on run {}
do shell script "sed \"s/>[^<]*</></g\" ~/Library/Preferences/OmegaT/
script/source.txt | tr -d \"[

]\" | /usr/local/bin/xpbcopy"
end run

Jean-Christophe Helary

------------------------------------
http://mac4translators.blogspot.com/

Reply all
Reply to author
Forward
0 new messages