Tag conversion to a new format

44 views
Skip to first unread message

lux

unread,
Feb 22, 2022, 8:05:00 PM2/22/22
to BBEdit Talk
Hello everyone,

I got a huge number of posts from an old static blog. Inside the header there's a "tags" line, made this way:

tags: Steve Jobs, Steve Throughton-Smith, T'Bone, Better to Be a Pirate Than Join the Navy, Ben & Jerry, NBA75

The actual number of tags in the line can be any. Comma act as a delimiter and, after that, any character can be used inside a tag, spaces included.

I can choose between two possible destination new formats. The first and preferred one is:

tags:
- Steve Jobs
- Steve Throughton-Smith
- T'Bone
- (et cetera)

The second format is:

tags: ["Steve Jobs", "Steve Throughton-Smith", "T'Bone", "and so on"]

Can anyone help me getting at least one of the two conversions right?

Many thanks in advance.

lux

David G Wagner

unread,
Feb 23, 2022, 12:30:31 PM2/23/22
to BBEdit Talk
When you state header, this implies that there is more data than what you present. Could you provide a small selection of what the data file actually looks like? ;)

Wags ;)
WagsWorld
Hebrews 4:15
Ph(primary) : 408-914-1341
Ph(secondary): 408-761-7391
--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bbedit/82bdf83e-e068-4d56-a7db-5fc6819c9c9fn%40googlegroups.com.

lux

unread,
Feb 24, 2022, 11:05:02 PM2/24/22
to BBEdit Talk
On Wednesday, February 23, 2022 at 6:30:31 PM UTC+1 wagsw...@gmail.com wrote:
When you state header, this implies that there is more data than what you present. Could you provide a small selection of what the data file actually looks like? ;)

This is an example of a complete header:

((( header begins )))
---
title: "Più uguali degli altri"
date: 2022-02-24T01:43:23+01:00
draft: false
toc: false
comments: false
categories:
- Education
tags:
- MacBook Pro
- iPad mini
- Apple Pencil
- Bowdoin
---
((( header ends )))

Right under the header, the body text begins.

The count for categories and tags items can be zero, one, or more than one.

My main issue is to refrain from mistakenly capturing sentences inside the body text that share the same structure (i.e., lists beginning with dashes).

Working on the problem, I could somewhat simplify the problem. I'd like now to convert the former header in the following one:

((( header begins )))
---
title: "Più uguali degli altri"
date: 2022-02-24T01:43:23+01:00
draft: false
toc: false
comments: false
categories: [Education]
tags: [MacBook Pro, iPad mini, Apple Pencil, Bowdoin]
---
((( header ends )))

Again, thanks very much for the attention. :-)

lux

jj

unread,
Feb 25, 2022, 5:37:06 AM2/25/22
to BBEdit Talk
Hi Lux,

Here are two perl text filters.
Copy them to ~/Library/Application Support/BBEdit/Text Filters.
You can test them from menu Text > Apply Text Filter.

This one (tags_and_categories_to_list_pl.pl) is for your initial case (CSV -> hyphenated item per line) :

    #!/usr/bin/env perl
    use v5.14;
    use strict;
    use warnings;

    while(my $line = <>)  {  
        if (my ($field, $items) = $line =~ /^(tags|categories):\s*(.+)\s*$/gi) {
            my $yaml_items = $items =~ s/\s*,\s*/\n- /gr;
            print "${field}:\n- ${yaml_items}\n";
        } else {
            print $line;  
        }
    }

    =for test


    I got a huge number of posts from an old static blog. Inside the header there's a "tags" line, made this way:

    tags: Steve Jobs, Steve Throughton-Smith, T'Bone, Better to Be a Pirate Than Join the Navy, Ben & Jerry, NBA75

    The actual number of tags in the line can be any. Comma act as a delimiter and, after that, any character can be used inside a tag, spaces included.

    categories: Education

    =cut

This other one (tags_and_categories_to_array_pl.pl) if for your second case (hyphenated item per line -> array):

    #!/usr/bin/env perl
    use v5.14;
    use strict;
    use warnings;

    $/ = undef;

    sub replace {
        my $match = shift;
        my @splitted = split /[[:blank:]]*\n[[:blank:]]*-[[:blank:]]/, $match;
        my $field = shift @splitted;
        map { s/^\s+|\s+$//g; } @splitted;
        my $joined = join ", ", @splitted;
        return "${field} [${joined}]\n";
    }

    print <> =~ s/^((?:tags|categories)[[:blank:]]*:[[:blank:]]*\n[[:blank:]]*(?:-[^\n]+\n)+)/replace $1/grimse;

    =for test


    title: "Più uguali degli altri"
    date: 2022-02-24T01:43:23+01:00
    draft: false
    toc: false
    comments: false
    categories:
    - Education
    tags:
    - MacBook Pro
    - iPad mini
    - Apple Pencil
    - Bowdoin

    =cut

HTH,

Jean Jourdain

Massimo Rainato

unread,
Feb 25, 2022, 1:05:32 PM2/25/22
to bbe...@googlegroups.com
only as Memo, “text factory”,

 replacin ‘^- (.*)’ into ‘[\1]’, then ‘\r[‘ into ‘[‘, then ‘][‘ into ‘, ‘ 

Non i’m unable to test it but that’s my Help. 

--
This is the BBEdit Talk public discussion group. If you have a feature request or need technical support, please email "sup...@barebones.com" rather than posting here. Follow @bbedit on Twitter: <https://twitter.com/bbedit>
---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.

lux

unread,
Feb 25, 2022, 7:28:24 PM2/25/22
to BBEdit Talk
On Friday, February 25, 2022 at 11:37:06 AM UTC+1 jj wrote:
Hi Lux,

Here are two perl text filters.

Thanks a lot, I used the second filter and it worked like a breeze! I'll test the first one also, to learn something even more. Gratefulness. :-)

lux 

lux

unread,
Feb 25, 2022, 7:28:27 PM2/25/22
to BBEdit Talk
On Friday, February 25, 2022 at 7:05:32 PM UTC+1 Massimo Rainato wrote:
only as Memo, “text factory”,

 replacin ‘^- (.*)’ into ‘[\1]’, then ‘\r[‘ into ‘[‘, then ‘][‘ into ‘, ‘

 Thanks, that's good but unfortunately works on some points in the body text of the post, where similar structure exist. I appreciated the help so much anyway. :-)

lux 
Reply all
Reply to author
Forward
0 new messages