For example, if you're trying to extract the textual portion of raw
html, there's a cleaner built in to do that.
Otherwise, you'll need to set up a custom cleaner with an extract text
action. You'll need to a regular expression to match the "markers"
(both beginning & ending) and the text. Depending on the complexity of
the markers, it can be a simple expression or something much more
involved. For example: <b> </b> are fairly easy to find.
Can you offer some additional context to indicate the type of markers
you're dealing with?
Mark
> --
> You received this message because you are subscribed to the Google Groups "TextSoap" group.
> To post to this group, send email to text...@googlegroups.com.
> To unsubscribe from this group, send email to textsoap+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/textsoap?hl=en.
>
>
--
Mark Munz
unmarked software
http://www.unmarked.com/
First action: Extract Text
Expression: (?s)---begin--->.*<---end---
Second action: Find/Repace
Find: ---begin--->
Replace:
Third action: Find/Replace
Find: <---end---
Replace:
The first action will extract the text, including the markers. The
remaining two steps simply find the beginning and end markers and
delete them (by replacing them with nothing).
The (?s) at the beginning of the regular expression means that .* will
match newlines (returns). Otherwise, you won't get the desired
results.
Mark
> For more options, visit this group at http://groups.google.com/group/textsoap?hl=en.
Create a workflow or application (droplet) with a "Clean Text Files"
action. This will allow you to specify the type of files that can be
processed as well as the TextSoap cleaner to use to process the files.
There are also options to preserve the original copies for safe
cleaning.
With a droplet, you can just drag-n-drop the files you want to process
onto the droplet icon and they'll be cleaned for you.
Mark
> For more options, visit this group at http://groups.google.com/group/textsoap?hl=en.