Automator actions and Grep: Extracting lines

54 views
Skip to first unread message

Miguel Perez

unread,
May 19, 2021, 9:30:03 AM5/19/21
to BBEdit Talk
Hi!

I have a question regarding Automator and BBEdit.

Context:

On a daily basis I get an XML file. This file contains information about some dossiers. I need to extract two elements from each dossier: (1) a URL to download associated images, and (2) the dossier's name.


Information in the file is in Spanish.

What I currently do:

I open the XML file on BBEdit and use Grep search to extract the information. My Grep patterns are:

To extract the URLs:
<clave><!\[CDATA\[Imagen\]\]></clave>\n\s+<valor><!\[CDATA\[(.+?)\]

To extract the dossier's name:
<clave><!\[CDATA\[Denominación\]\]></clave>\n\s+<valor><!\[CDATA\[(.+?)\]

I "replace" this Grep patterns with \1 to extract everything and works like a charm.

Both pieces of information get saved in their own plain text files.

Then I download the images using some wget magic:
wget -E -H -k -K -p -e robots=off -P /users/USERNAME/TARGETFOLDER -i /users/USERNAME/URLSLIST.txt

As a final touch to my workflow, I run a batch rename on all files to add the filetype *.GIF on all images and I'm ready to work.

What I want to do:

I want to further automate the process.

Using Automator I created a Service (Quick action) that uses files as input in Finder.

What I have in mind is:
➤ Run the service on the XML file
➤ Read the contents of the file
➤ Use BBEdit's Automator action called "Extract lines containing" in Grep mode to extract the URLs
➤ Use a shell script to download all images
➤ Use a batch rename action to add the *.GIF filetype

For the love of me I can't get "Extract lines containing" to work. I'm using BBEdit 13.5.6 and Big Sur 11.3.1.

Any ideas?

Does anybody know if BBEdit's Automator actions still work?

Christopher Stone

unread,
May 20, 2021, 1:47:27 AM5/20/21
to BBEdit-Talk
On 05/19/2021, at 08:30, Miguel Perez <maperez...@gmail.com> wrote:
On a daily basis I get an XML file. This file contains information about some dossiers. I need to extract two elements from each dossier: (1) a URL to download associated images, and (2) the dossier's name.


Hey Miguel,

I don't think that automator action will take a file as input, but I'm not absolutely certain.

In any case I think you should bite the bullet and learn a little more shell scripting – since you're already using wget.

I'm just playing around here a bit to show you what's easily accomplished.

You can run these conveniently in a BBEdit worksheet to see what they do.



filePath=~/Downloads/MA_NO_2021_05_011.xml

# Prints the found pattern in the given text file.
grep -E -o -e 'http.+ImagenFichaServlet[^]]+' "$filePath"



filePath=~/Downloads/MA_NO_2021_05_011.xml

# Prints the line after the found pattern along with the found text.
grep -E -A 1 -o 'Denominación' "$filePath"



filePath=~/Downloads/MA_NO_2021_05_011.xml

# Prints the line after the found pattern then deletes the detritus.
sed -n '/Denominación/{n;p;}' "$filePath" \
| sed -E 's!.+\[|\].+!!g'



The output of these is easily redirected to files.

Your wget downloader can follow the extract-text segments.

You can rename your files using wget itself, or you can follow-on with a rename script.

Personally I'd do all of this in Bash.

That said let me recommend Keyboard Maestro to you as a great tool for getting real work done on a Mac. In my expert opinion it's currently the Best-of-Breed Mac Automation Utility. (Commercial $36.00 US.)

It's a bit like Automator on steroids, and I've been a happy superuser for about 17 years now.

Like BBEdit it runs 24/7 on my Macs.

--
Best Regards,
Chris

jj

unread,
May 20, 2021, 7:52:23 AM5/20/21
to BBEdit Talk
Miguel,

If you are use Automator and AppleScript, "System Events" has some useful XML parsing functionality.


Your input XML could be parsed with something like this:

```applescript
    tell application "System Events"
        tell XML file "~/Downloads/MA_NO_2021_05_011.xml"
            tell XML element "ejemplar"
                set vSecciones to every XML element whose name = "seccion"
                repeat with vSeccion in vSecciones
                    tell vSeccion
                        set vFichas to (every XML element whose name = "ficha")
                        repeat with vFicha in vFichas
                            tell vFicha
                                set vCampos to (every XML element whose name = "campo")
                                repeat with vCampo in vCampos
                                    tell vCampo
                                        set vClave to value of XML element "clave"
                                        set vValor to value of XML element "valor"
                                        log {clave:vClave, valor:vValor}
                                    end tell
                                end repeat
                            end tell
                        end repeat
                    end tell
                end repeat
            end tell
        end tell
    end tell
```
HTH,

Jean Jourdain
Reply all
Reply to author
Forward
0 new messages