grep 2 text strings and extract them to the SAME line of output?

229 views
Skip to first unread message

BBLuv

unread,
Aug 25, 2021, 8:12:48 AM8/25/21
to BBEdit Talk
Hi - non technical user here, proud to have figured out part of my goal.

I have multiple XML files that I am trying to scan through to pluck out instances of tag attributes. The XML looks like this, and I want to extract the NAME and LIKES  attributes across the files.

<ANIMAL NAME="Fluffy" SPECIES="dog" LIKES="bones" AGE="6">

Using grep + extract I am able to produce this:

search = (?<=NAME=").*?(?=")|(?<=LIKES=").*?(?=")
output =
Fluffy
bones
Tiger
chewtoys
Rusty
mailmen
           
I would love output that includes my own text:
Fluffy -- bones
Tiger -- chewtoys
Rusty -- mailmen

or at least has the output values in one line per XML tag
Fluffy bones
Tiger chewtoys
Rusty mailmen

I learned the pipe character lets me make multiple extractions but I am not able to control the output.

Can you help?

Thanking you...



jj

unread,
Aug 25, 2021, 9:53:49 AM8/25/21
to BBEdit Talk

Hi BBLuv,

Find: 

    <ANIMAL[^>]+NAME="([^"]*)"[^>]+LIKES="([^"]*)"[^>]*>

Replace:

    \1 -- \2

and use the Extract button of the find window.

HTH

Jean Jourdain

BBLuv

unread,
Aug 25, 2021, 3:48:50 PM8/25/21
to BBEdit Talk
Appreciated Jean! It works as if miraculous. If you would accept a further challenge: how can the expression also find instances when only one of the attributes is present? bonus challenge: and if the attributes are not in that order?
Great big thanks

jj

unread,
Aug 25, 2021, 5:20:03 PM8/25/21
to BBEdit Talk
Find:

    <ANIMAL[^>]+?(?:NAME="([^"]+)"[^>]+LIKES="([^"]+)"|LIKES="([^"]+)"[^>]+NAME="([^"]+)"|(?:NAME|LIKES)="([^"]+)")[^>]*>
    
Replace:

    \1 \2 \4 \3 \5
    
Test data :

    <ANIMAL NAME="Fluffy" SPECIES="dog" LIKES="bones" AGE="6">
    <ANIMAL LIKES="Banana" NAME="Abu" SPECIES="Monkey" AGE="6">
    <ANIMAL NAME="Fluffy" SPECIES="dog" LIKES="" AGE="6">
    <ANIMAL LIKES="Banana" NAME="" SPECIES="Monkey" AGE="6">
    <ANIMAL AGE="6" NAME="Peter" SPECIES="Rabbit" >
    <ANIMAL SPECIES="Rabbit" AGE="6" LIKES="Carrots">

Extract:

    Fluffy bones   
      Abu Banana 
        Fluffy
        Banana
        Peter
        Carrots

In case you will want to remove the leading spaces.

Find:

    ^\s+

Replace:

    <empty>
    
If it gets more complicated than that maybe you should go for more advanced XML parsing tools.
Regular expressions are not the best fit for those cases.

HTH

Jean Jourdain

BBLuv

unread,
Aug 26, 2021, 11:18:14 AM8/26/21
to BBEdit Talk
You are a grandmaster JJ
Merci!
Reply all
Reply to author
Forward
0 new messages