Sorting multiple records in a text file

179 views
Skip to first unread message

Howard

unread,
Nov 4, 2024, 11:35:38 AM11/4/24
to BBEdit Talk

I have multiple records in a text file in the format below (seven sample records shown). I want to sort all of them by Author and then, within Author, Created At. In a record, the first four lines are always just one line; however, the fifth line (Comment) can be up to 30-40 lines, possibly more).


Is this something that BBEdit can do? If it is, how can I do it?


Howard

______________________________________________________________________________


Annotation 1:

Created at: 2024-10-30 08:02

Author: Arthur Antique

Type: Reply

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Quisque fusce placerat rutrum fusce; purus mollis nulla lobortis. Vivamus faucibus mi neque; a pretium elementum ut suscipit rutrum. Enim sapien et rutrum dictum elit habitasse porta. “Ipsum euismod fermentum facilisis quisque eu ut elementum diam.”


Annotation 2:

Created at: 2024-10-30 08:02

Author: Alice Underhill

Type: Reply

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. “Interdum dictumst maecenas lacinia parturient per dignissim etiam quis.”


Annotation 3:

Created at: 2024-10-31 19:15

Author: Andy Absence

Type: Annotation

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Interdum dictumst maecenas lacinia parturient per dignissim etiam quis. Asapien hendrerit curae; eu malesuada imperdiet et. Lectus taciti curabitur auctor sodales placerat. Mus consequat amet ornare ridiculus mi et ipsum. Nullam est urna interdum nunc ultricies efficitur tellus.


Annotation 4:

Created at: 2024-10-31 19:15

Author: Andy Absence

Type: Page note

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Interdum dictumst maecenas lacinia parturient per dignissim etiam quis. Asapien hendrerit curae; eu malesuada imperdiet et. Lectus taciti curabitur auctor sodales placerat. Mus consequat amet ornare ridiculus mi et ipsum. Nullam est urna interdum nunc ultricies efficitur tellus.


Annotation 5:

Created at: 2024-10-31 19:15

Author: Andy Absence

Type: Annotation

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Interdum dictumst maecenas lacinia parturient per dignissim etiam quis. Asapien hendrerit curae; eu malesuada imperdiet et. Lectus taciti curabitur auctor sodales placerat. Mus consequat amet ornare ridiculus mi et ipsum. Nullam est urna interdum nunc ultricies efficitur tellus.


Annotation 6:

Created at: 2024-11-03 19:15

Author: Audrey Afterall

Type: Page note

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Interdum dictumst maecenas lacinia parturient per dignissim etiam quis. Asapien hendrerit curae; eu malesuada imperdiet et. Lectus taciti curabitur auctor sodales placerat. Mus consequat amet ornare ridiculus mi et ipsum. Nullam est urna interdum nunc ultricies efficitur tellus.


Annotation 7:

Created at: 2024-11-03 22:51

Author: Audrey Afterall

Type: Annotation

Comment: Lorem ipsum odor amet, consectetuer adipiscing elit. Interdum dictumst maecenas lacinia parturient per dignissim etiam quis. Asapien hendrerit curae; eu malesuada imperdiet et.

Neil Faiman

unread,
Nov 4, 2024, 11:53:49 AM11/4/24
to BBEdit Talk Mailing List
As far as I know, BBEdit simply supports sorting lines — not arbitrary records represented by batches of text lines. But do not despair. All is not lost. BBEdit has really robust support for sorting lines.

I would start with a GREP that could match across multiple lines and collapse them into a single line, with some arbitrary separator character representing where the original line breaks were. (You might need two patterns, one to collapse the multi-line Comments into a single line, and then a second one to collapse all the line in the record into a single line.

Now that each record is represented by a single line, you can write a regular expression that recognizes the sort keys within the line. Then you can use the “Sort using pattern“ feature of the Text > Sort Lines… command to sort the records on those keys.

Finally, you can reverse the process from the first step and split the records back into multiple lines.

Once you’ve got each of the steps perfected, you can create a text factory that will apply them to a file automatically, and you should be good to go.

Good luck,
Neil Faiman

Bruce Van Allen

unread,
Nov 4, 2024, 11:59:35 AM11/4/24
to bbe...@googlegroups.com
Neil’s recommendation is that same as I was just composing.

Be sure to operate on a copy of the original file as you test the steps.

People here can review your regular expressions as you try them.

— Bruce

_bruce__van_allen__santa_cruz_ca_
> --
> This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
> ---
> You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/bbedit/B3ABB4DB-E670-4991-8155-2ED358198C1D%40faiman.org.

James Reynolds

unread,
Nov 4, 2024, 12:00:52 PM11/4/24
to bbe...@googlegroups.com
If you only want it sorted by author, you could use BBEdit. But if you might even slightly want to sort it by something else I'd use a grep search to turn it into a csv file and open it in Excel/Numbers and sort it there.

James Reynolds
https://magnusviri.com
> --
> This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
> ---
> You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/bbedit/2ca662c0-b506-4a1d-adff-71ebf139178an%40googlegroups.com.

flet...@cumuli.com

unread,
Nov 4, 2024, 12:02:30 PM11/4/24
to bbe...@googlegroups.com
I think Neil has the right idea. This works for me but it isn't real elegant. One problem is that it sorts by the first name of the author rather than the last. The dates work fine as long as they are all in the same format.

First change all the returns in each record to a tab by entering these patterns into the find dialog box and doing replace all.

Find: (?<=\S)\n(?=\S)
Replace: \t

Then you can run the sort command using sub-patterns so the desired fields are used. Select All in the document and then select Text > Sort Lines... and use these settings.

Sort using pattern: ^Annotation[\s\S]+?Created at: (.*?)[\n\t]Author: (.*?)[\n\t][\s\S]+?$
Specific sub-patterns: \2\1

Strip off the whitespace at the top of the document. Then replace \n with \n\n to restore the space between records and replace \t with \n to change the tabs back into returns.

[fletcher]

Howard

unread,
Nov 4, 2024, 12:10:04 PM11/4/24
to BBEdit Talk
I think I can write the GREP code that matches the first four lines, but I am not sure how to do that for the Comment lines. Also, once I do that, how do I "write a regular expression that recognizes the sort keys within the line"? 

I've also never used text factory. (Is it easier to use in BBEdit 15 than in BBEdit 14?)
Howard

jj

unread,
Nov 4, 2024, 5:15:56 PM11/4/24
to BBEdit Talk
Hi Howard,

You could do that with a canonize file and a few regular espressions.

 1. Create a new file named canonize_lines_to_columns.txt with this content:

# -*- x-bbedit-canon-case-sensitive: 1; x-bbedit-canon-match-words: 0; x-bbedit-canon-grep: 1; -*-
# End:
# Local Variables:
# coding: utf-8
# indent_style: tab
#===
# Save the annotation number in a <<< >>> bracket that we will need later.
^Annotation (\d+):<TAB><<<\1>>>
# Replace all the column titles by tabs.
\n(Created at|Author|Type|Comment):\h*<TAB>\t
# Replace all the newlines by a space in case there are some in the contents of Comment fields.
\n<TAB>\x20
# Put a newline before each annotation number and remove the <<< >>> bracket.
<<<(\d+)>>><TAB>\n\1
# Put the column names in the first line.
\A\h*$<TAB>Annotation\tCreated At\tAuthor\tType\tComment
# Put a single space where there is more that one.
\x20{2,}<TAB>\x20
# Reorder the columns to Author, Created At, Annotation, Type, Comment.
^(.+?)\t(.+?)\t(.+?)\t<TAB>\3\t\2\t\1\t
# Now the file could be sorted in BBEdit by Author, Created At
# Or imported into a Spreadsheet as Tab Separated Values.

 2. Once you have created this file replace in it all the <TAB> by real tabs. Take care to deselect the "Auto-expand tabs" option on the file before you save it otherwise they will be replaced by spaces and we need them as separators.

Find: <TAB>
Replace: \t

3. Go to your data file and use the menu Text > Canonize… with the saved canonize file and apply it to your data.

4. Your data should be converted to Tab-Separated-Values with the columns reordered as to be sorted in this order: Author, Created At, Annotation, Type, Comment.

5. Use the menu Text > Sort Lines… or import the resulting TSV into a spreadsheet.

HTH

Jean Jourdain

Howard

unread,
Nov 5, 2024, 6:37:59 AM11/5/24
to BBEdit Talk
Jean, I tried to apply your canonize_lines_to_columns.txt file to the data shown earlier in this post, following your directions; however, after letting it run for a few minutes, nothing happened. I had to Force Quit.

I had to manually change every <TAB> to `\t` and the BBEdit Find/Replace wouldn't do it. In BBEdit Settings, the "Auto-expand tabs" option was off. How do I deselect that option?

How long should it take for the data to be converted? What could I be doing wrong?

Howard

jj

unread,
Nov 5, 2024, 1:13:46 PM11/5/24
to BBEdit Talk
1. here is the find/replace dialog:
Screenshot 2024-11-05 at 18.50.03.png
2. Here is what the file should look like once the replacement has been done:
(the text options are displayed by clicking on the ⚙️ on the left of the navigation bar)
Screenshot 2024-11-05 at 18.49.22.png

Notice that the <TAB> tags where replaced by tab characters (the little triangles Δ only visible when Show invisibles > Show tabs is checked).

The conversion should be immediate.

Transforming your example to this Tab-Separated-Values result: 

Screenshot 2024-11-05 at 19.00.50.png
 (Notice that the invisible tab characters are visible here because Show invisibles > Show tabs is checked for this file too )

HTH

Jean Jourdain

Howard

unread,
Nov 6, 2024, 7:45:36 AM11/6/24
to BBEdit Talk
Thanks everyone for your responses. They enabled me to solve the problem.
Howard

Roland Küffner

unread,
Nov 10, 2024, 7:39:35 PM11/10/24
to bbe...@googlegroups.com
Hi, although this is solved, here another possible way using a text factory.

The ratio (text factory steps) would be this:
1) Replace every line break with an placeholder (e.g. <br>): "\n(?!Annotation)" => "<br>"  (this features "Positional Assertions" <= RTM) => effect: every record is on it's own line
2) Sort lines (using "Sort using pattern") – Searching pattern = "Created at: (.+?)<br>Author: .+\s(.+?)<br>", Specific sub-patterns = "\2\1"
3) Re-Replace your placeholder with real line breaks: TF-step Replace all: "<br>" => "\n"

Caveat: this assumes: 1) "Annotation" will always be the first field in a record; 2) "Created at" will always occur before "Author"; 3) the sequence "<br>" will not occur in the original text (change the placeholder if so)

It worked on your sample data,

Regards
Roland



--
This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
test.textfactory

Roland Küffner

unread,
Nov 10, 2024, 7:51:00 PM11/10/24
to bbe...@googlegroups.com
sorry, I missed a question mark in step 2. It should be: "Created at: (.+?)<br>Author: .+?\s(.+?)<br>", Specific sub-patterns = "\2\1"

test.textfactory
Reply all
Reply to author
Forward
0 new messages