Sort via GREP involving tabs and invisibles

108 views
Skip to first unread message

Matthew Bender

unread,
Oct 8, 2022, 4:58:37 PM10/8/22
to BBEdit Talk
Hi all,

I'm a photographer, and my experience sorting with GREP is incredibly minimal. If this can't be done - apologies! 

I'm building a structured hierarchal keyword list for stock photography purposes. I aquired two separate pre-assembled lists. Both did certain things better than the other. I want to combine keyword groups across these two lists into one large, alphabetized group that maintains the established hierarchy.  

Most keywords are simply words, but a few involve numbers (#1, for example.). Being hierarchal, it's been built out using tabs to communicate nested levels and curly brackets to communicate synonyms. An example is below, the spaces are all tabs.

        ambition
            {ambitious}
        ancient
        train
        ancient civilization
        apex
            {#1}
            {number 1}
        appetizing
        apprehend
        approach
            {approaching}
        approachable
        approval
            {approve}
            {approving}elegant
        email
        embryonic
        emotion
            {emotional}
            {emotions}
            affection
                devotion
                fondness
                love
                passion
                sympathy
                tenderness
                warmth
            agitated
                {agitation}
                flustered
                frantic
            amusement
                {amused}
                {amusing}
        ego
        elegance
        embarrassment
        emergence
        empathy
        emphasis
        enchantment
        encouragement
        ending
        endurance
        energy
        enhancement
        enjoyment
        ennui
        enthusiasm
        envy
        equality

This example shows what I'd like to do - I want to merge the bottom set (ego to equality, this came from one list) into the top set (ambition to emotion, this came from the other list) alphabetically while maintaining the synonyms and nested structure. Is there a way to just sort data on the 'top layer' alphabetically (and accounting for the 1% of kewords that are things like #1 and the special characters like the curly brackets) while maintaining the established hierarchy? My attempts so far leave things alphabetically ordered, but the nested layers and top layers get all jumbled together and the structure is completely broken. It alphabetizes every single word, which isn't what I want at all.

Any help or tips would be much appreciated. 

Thanks,
-Matt

Media Mouth

unread,
Oct 8, 2022, 5:22:47 PM10/8/22
to bbe...@googlegroups.com
Following.  Sure is an interesting challenge.

It'll be cool of someone with mad GREP skills has a brilliant solution.
What seems like might work more easily would be to convert the lists to JSON data, merge them, then sort, then convert back to the tabbed format of your example.

Out of curiosity, what system is making use of keyword formatting in your example.  Photo Mechanic?  Wondering if it's mostly proprietary or if it's a standard keyword categorization system that's in wider use.

 - MM



--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/d43d48c6-3181-457c-a29b-53f7a8ca5f8fn%40googlegroups.com.

Bruce Van Allen

unread,
Oct 8, 2022, 5:55:49 PM10/8/22
to bbe...@googlegroups.com
It’s not clear to me how you want the lower list (ego to equality) to fit into the upper list. Are the lower list items all at the same hierarchical level as email and embryo in the upper list? Did any hierarchy of the lower list get lost in the email translation?

Could you show the result you want from those two list segments once merged?

Is it only a matter of sorting while maintaining the whitespace indentation, or does merging need to shift some items' levels?

> On Oct 8, 2022, at 1:57 PM, Matthew Bender <stu...@matthewbenderstudios.com> wrote:
>
> --
> This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
> ---
> You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/d43d48c6-3181-457c-a29b-53f7a8ca5f8fn%40googlegroups.com.


— Bruce

_bruce__van_allen__santa_cruz_ca_
_831_429_1688_p_
_831_332_3649_c_






GJR groups

unread,
Oct 8, 2022, 6:37:19 PM10/8/22
to bbe...@googlegroups.com
This is a pedestrian solution from a non-coder, just done step by step, so I can check if I’m getting the expected result and backing up if not. Fortunately BBEdit is zippy quick and believes in CMD-Z. I’m sure someone who has actual skills will supply a more elegant solution.

convert spaces to tabs - email text has spaces
replace ^\t\t with nothing - (strips out the 2 leading tabs at the beginning of each line)
replace \t\t with \t@ - all remaining double tabs reduced to one tab and @
replace \n\t with @ - all indented items are now in the line following the term to be alphabetized, with the indentation tier coded as @s
sort lines - as it says on the tin
replace @@ with @\t - decodes second-tier terms by one tab
replace @ with \n\t - finishes replacing the tabs and puts terms back on their own lines

This assumes that there ar no more than two indented levels, and that the indented terms are wanted in their existing order. Doing the alphabetizing on those levels would have to deal with disregarding the brackets, and I suspect that a clever data structure would help. Note that input data errors do exist: "elegant" should probably be on its own line, unindented, for example, so you may have a few more things to account for.

Greg
> To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/E6A4757E-1078-4F2B-8CCE-F5FECB95292C%40gmail.com.

Matthew Bender

unread,
Oct 11, 2022, 6:25:21 PM10/11/22
to BBEdit Talk
Thank you all for the replies thus far. 

For MM - I'm using Photo Mechanic, yes, although the hierarchy was recognized by Capture One when I imported it as a test as well. The keyword list is just saved as a .txt file, and the tab/bracket hierarchy for a structured keyword list such as this is pretty standardized. I actually can't speak for Lightroom because I don't use it a lot, but I imagine it would be similar, if not identical. Merging into JSON would maintain the existing hierarchy? Could you recommend a program or method to do this? Also out of my wheelhouse.

For Bruce - To clarify on what I'd like to happen, I'd like this, my original list - 

        ambition
            {ambitious}
        ancient
        train
        ancient civilization
        apex
            {#1}
            {number 1}
        appetizing
        apprehend
        approach
            {approaching}
        approachable
        approval
            {approve}
            {approving}
to become the below - ego, elegance, embarrassment and emergence have moved upward into the list on the same 'level' and filed in the appropriate place alphabetically. The 'top level' is the one furthest to the left. Layers tabbed in one level are the next level. And so forth. 

I believe sorting while maintaining white space indentation would solve my issue, yes. The indentations on this list are all tabs. I want to force only the top 'level' into alphabetical order while making sure that subsequent layers remain children to their parents. For example, 'apex' is in my list with two synonyms, {#1} and {number 1}. If apex needs to be moved to another location on the list, I need those two bracketed items to go along with it. That will keep my hierarchy intact.

Some layers need shifted to make the top levels 'match' between the two lists, but I've figured that out myself in BBEdit. I've prepped things so nothing needs to be shifted during the sort. Does this clarify things?

        ambition
            {ambitious}
        ancient
        train
        ancient civilization
        apex
            {#1}
            {number 1}
        appetizing
        apprehend
        approach
            {approaching}
        approachable
        approval
            {approve}
            {approving}
        ego
        elegance
        email
        embarrassment
        embryonic
        emergence

        emotion
            {emotional}
            {emotions}
            affection
                devotion
                fondness
                love
                passion
                sympathy
                tenderness
                warmth
            agitated
                {agitation}
                flustered
                frantic
            amusement
                {amused}
                {amusing}
        empathy
        emphasis
        enchantment
        encouragement
        ending
        endurance
        energy
        enhancement
        enjoyment
        ennui
        enthusiasm
        envy
        equality

For Greg - there may be more than two indents. Yeah, I just noticed elegant myself as I typed this reply. That was my mistake when copy/pasting earlier, there aren't any mistakes like that in the actual list itself. I'll definitely look into your approach as well, it's much appreciated.

Harvey Pikelberger

unread,
Oct 11, 2022, 7:16:39 PM10/11/22
to BBEdit Talk
Curious if this the list below represents what you're trying to achieve
It uses a JS/JSON approach and is based on the entries you provided.  It assumes the indentations were tabs and returns them as tabs.
In this case it looks like the synonyms (curly braces) sort to the bottom, after all alpha-numeric characters.
Putting them above should be a relatively simple tweak.

ambition
    {ambitious}
ancient

ancient civilization
apex
    {#1}
    {number 1}
appetizing
apprehend
approach
    {approaching}
approachable
approval
    {approve}
    {approving}
ego
elegance
elegant

email
embarrassment
embryonic
emergence
emotion
    affection
        devotion
        fondness
        love
        passion
        sympathy
        tenderness
        warmth
    agitated
        flustered
        frantic
        {agitation}
    amusement
        {amused}
        {amusing}
    {emotional}
    {emotions}

empathy
emphasis
enchantment
encouragement
ending
endurance
energy
enhancement
enjoyment
ennui
enthusiasm
envy
equality
train

Matthew Bender

unread,
Oct 12, 2022, 11:17:41 AM10/12/22
to BBEdit Talk
This is definitely very close to what I want to happen, yes. It works in PM with the synonyms at the bottom. I'd really like the synonyms at the top for consistency and to avoid confusion, but it's on the right track. Is there a way to have the synonyms stay at the top?

How would I do this sorting? I can convert the txt file into a json file - can it be sorted via BBEdit, or does it require another program?
-Matt

Harvey Pikelberger

unread,
Oct 12, 2022, 11:48:21 AM10/12/22
to BBEdit Talk
I ended up doing it with rather ugly JS code where it creates a hierarchy of keys only, then converts that into a more traditional JSON of proper key/values that are sortable, then finally exports that back out to your original file format. 
I'm sure there's a tighter, more elegant way to do it, but it seems to do the trick.
I made some assumptions about the formatting of your original files, so the code probably wouldn't be reliable out in the wild.
If you're comfortable sending the originals, I'll just run them and send you back the single sorted list?

-MM

Matthew Bender

unread,
Oct 12, 2022, 12:15:40 PM10/12/22
to bbe...@googlegroups.com
Got it. Sure, I have no problem sending it. It's attached. Hopefully nothing in here throws you off. 
-Matt

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.


--
Matthew Bender / Photographer 

PHL / 215.300.1789


Concepts List.txt

Harvey Pikelberger

unread,
Oct 12, 2022, 12:24:39 PM10/12/22
to BBEdit Talk
LMK if this worked...
ConceptsListSorted.txt

Harvey Pikelberger

unread,
Oct 12, 2022, 12:31:47 PM10/12/22
to BBEdit Talk
Here's the same list, sort case-insensitive...
ConceptsListSortedCaseInsensitive.txt

Matthew Bender

unread,
Oct 12, 2022, 1:01:23 PM10/12/22
to BBEdit Talk
Case-insensitive works perfectly! Thank you so much, this is exactly what I wanted to do. 
Knowing now that this works, is there any info you could provide to let me do this kind of sorting myself? As my workflow evolves it's possible I'll need to do it again. PM doesn't let one move or sort more than one keyword at a time, so I'll be using an external program for bulk sorting and changes.
-Matt

Harvey Pikelberger

unread,
Oct 12, 2022, 1:49:24 PM10/12/22
to BBEdit Talk
Basically I'm using JavaScript as the solution, and my impression is a reliable solution might take a while to develop.

For instance, the 2nd list you sent tripped up the original JS solution because it included keywords with spaces.
No big deal / easy enough to tweak, but kind of shows how one fix reveals underlying additional challenges.

So check this out: You might notice that the case-insensitive sorted list reveals a lot of duplicate keywords.
I imagine you'd want those removed, which in turn would involve preserving but not duplicating sub-keywords.

I'm not familiar with Photo Mechanic, but apparently it uses IPTC metadata.  I wonder if that means the keywords are embedded into the metadata on a per-photo basis.  The best solution would involve better understanding the underlying logic (along with the idiosyncratic quirks) of the software + knowing what you're trying to achieve (and how that might evolve).

Apologies for the not-so-concise answer here.  Asset management can get pretty involved.  imho, everything off-the-shelf has frustrating limits.
And the solution to that -- building out custom solutions -- has a steep learning curve.

Matthew Bender

unread,
Oct 12, 2022, 3:23:10 PM10/12/22
to BBEdit Talk
Got it, thanks. It's fine, I know it's a complex topic. The 'assets' in this particular case is my keyword library. I love PM, but the limitations it has with managing a keyword library in-program is a severe limitation. A change of any real size clearly needs to be done through a second program and then loaded back into PM. 
Yeah, I'll want duplicates removed. I've deleted them manually for this, but it's not ideal in a long term solution.
PM uses IPTC metadata, and they're linked on a per-photo basis, you're correct. I use PM to attach appropriate keywords to each image and then upload them to my archive site. The structured keyword feature of PM lets me attach them faster and more consistently to a group of many related images. It's just unfortunate that it's not easier to manage the keywords themselves through the program.
-Matt

Harvey Pikelberger

unread,
Oct 12, 2022, 3:59:34 PM10/12/22
to BBEdit Talk
What comes to mind that might work would be to layer on (and eventually migrate to) a database-managed system.
As an approach it's not necessarily super easy, but it's also not impossible or beyond the reach.  FileMaker is actually a pretty good / easy-to-learn system for this kind of stuff.
There's a TROI plugin for it that handles metadata. Even better (if you have the patience) is to build your own metadata integration using something like exiftool (an open source command line app, very powerful) which, with a little jujitsu, integrates with FMP very nicely.

Process would go something like this:
Database that includes a "Photos" table.
From each photo you extract a list of all associated keywords
Each new keyword is added to a "Keywords" table in the same database.  Any duplicate keyword is not added.
Each item in the Photos table "knows" its associated keywords by means of a reference to the Keywords table.

The icing on the cake would be to not only extract but also push-to-update those keywords into the IPTC metadata of each individual photo in such a way that PM continues to recognize them -- so you use both your custom solution and PM seamlessly with each other.

All that assumes you even need embedded keywords.  If the point is to coordinate how images are presented on a web page or how they appear in web searches, you might be able to forego the IPTC data altogether and handle everything via database + dynamic web pages with keywords attached.

It can take a while to figure out the best approach, but it sounds like you're doing cool stuff.

jj

unread,
Oct 16, 2022, 1:34:51 PM10/16/22
to BBEdit Talk
Considering that list indentation is a form of compression, this BBEdit script sort_and_merge_indented_list.applescript decompresses the list, sorts it with the shell `sort` command, and compresses it back again.
In doing so, the indented list will be sorted, sub lists will be merged and duplicates will be eliminated.
For sorting options (case insensitive, etc.) see the `sort` man page in terminal and  adapt the do shell script according to your needs.
This script work on the selection of the front document or on the whole document if  nothing is selected. Selected lines should cover whole sub-lists appropriately to make sense.
By default the script accepts TAB or 4 SPACES indented lists but outputs TAB indented lists.
If you'd prefer a final 4 SPACES indented list, change the `vIndentation` variable.

INSTALL

    Copy this script to ~/Library/Library/Application\ Support/BBEdit/Scripts/sort_and_merge_indented_list.applescript
    Call it from BBEdit's AppleScript menu.

HTH,

Jean Jourdain

Harvey Pikelberger

unread,
Oct 17, 2022, 11:54:36 AM10/17/22
to BBEdit Talk
JJ -- very cool.  Simple, smart algorithm which is head-smackingly obvious... AFTER you presented it.

Thanks for sharing!
Reply all
Reply to author
Forward
0 new messages