Replace leading 4x[space]s with 1x[tab] (each!)

895 views
Skip to first unread message

Brian Porter

unread,
Feb 25, 2016, 3:22:09 PM2/25/16
to BBEdit Talk
I'm trying to find a trick for using the grep-enabled Find/Replace BBEdit to convert files that use spaces for indentation to using tabs. Here's an example input text snippet:

    1. Item
        a
. sub-item
            i
. third level
            ii
. another third level
   
2. Second Top Level


What I'd like to do is replace all sets-of-4-spaces at the beginning of the line with a single tab (per each 4-space-set), but it seems like this would require being able to count the number of matches and use that in the back reference somehow

For example, the following search and replace patterns can be used to replace the first set of spaces with a tab, but this pattern must be applied repeatedly for every "level" of indent being used in the file:

Find:
^(\t*)(\ \ \ \ )

Replace:
\1\t


Ideally I want something like this:

Find:
^((\ \ \ \ )+)

Replace:
\t{countOf(\2 in \1)}    # Yes it's wildly invalid; use your imagination. :-P


I'm unaware of any mechanism in grep/regex that would allow for this. Am I correct in thinking it's impossible? Is there an alternative approach, perhaps via Apple/shell script that could get the job done that I'm not considering?

Thanks,
Brian

Patrick Woolsey

unread,
Feb 25, 2016, 3:26:14 PM2/25/16
to bbe...@googlegroups.com
Provided I correctly understand the task at hand :-), the Entab
command (Text -> Entab) set to a tab width of 4 spaces should do
what you need.


Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com/>



On 2/25/16 at 3:05 PM, bepo...@gmail.com (Brian Porter) wrote:

>I'm trying to find a trick for using the grep-enabled
>Find/Replace BBEdit to convert files that use spaces for
>indentation to using tabs. Here's an example input text snippet:
>
>1. Item
>a. sub-item
>i. third level
>ii. another third level
>2. Second Top Level
>
>
>What I'd like to do is replace all sets-of-4-spaces at the
>beginning of the line with a single tab (per each 4-space-set),
>but it seems like this would require being able to count the
>*number *of matches and use that in the back reference somehow
>
>For example, the following search and replace patterns can be
>used to replace the *first *set of spaces with a tab, but this
>pattern must be applied repeatedly for every "level" of indent
>being used in the file:
>
>*Find:*
>^(\t*)(\ \ \ \ )
>
>*Replace:*
>\1\t
>
>
>Ideally I want something like this:
>
>*Find:*
>^((\ \ \ \ )+)
>
>*Replace:*

Fletcher Sandbeck

unread,
Feb 25, 2016, 3:45:38 PM2/25/16
to bbe...@googlegroups.com
The simpler answer is to use the Text > Entab... command. This will generally do the replace you want although it doesn't strictly follow the replace spaces only at the start of a line rule. The Text > Detab... command will replace tabs with spaces.

You can do this with a regular expression as well. First, if you want to find four of a character you can do "a{4}" rather than "aaaa". And then since you don't want to replace the tabs at the beginning of the line you can use a look-behind assertion to make sure you are at the start of a line. The look-behind assertion (?<=[\r\t])a would match an a only if it occurred immediately after a return or tab. In order to get spaces which might be the first characters in the file we need to switch this around (?<![^\r\t])a is a negative look-behind assertion that does the same thing, but also matches the start of the file.

Find: (?<![^\r\t]) {4}
Replace: \t

[fletcher]
 
--
This is the BBEdit Talk public discussion group. If you have a
feature request or would like to report a problem, please email
"sup...@barebones.com" rather than posting to the group.
Follow @bbedit on Twitter: <http://www.twitter.com/bbedit>

---
You received this message because you are subscribed to the Google Groups "BBEdit Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bbedit+un...@googlegroups.com.
To post to this group, send email to bbe...@googlegroups.com.

Brian Porter

unread,
Feb 29, 2016, 1:06:03 PM2/29/16
to BBEdit Talk
So after I complained a bit to my coworkers in Slack, they proved their superior Google-fu. For posterity, the regex solution turns out to be based on this StackOverflow answer: https://stackoverflow.com/a/26255717/70876

Find:
(\G|^) {4}

Replace:
\t

The \G switch is new to me and apparently "magical". I'm still not sure I understand what it's doing but it completely fulfills my request. I've saved it in BBEdit as a memorized Pattern and will be able to retain my muscle memory but not have to re-apply my old regex repeatedly.

Thanks again everyone.
--Brian

Brian Porter

unread,
Feb 29, 2016, 1:06:22 PM2/29/16
to BBEdit Talk
Well how about that.

Patrick, Fletcher: Thank you. I should have known there would be a menu command for that. You get so used to the parts of the app you "normally" use it can be easy to block out the rest. It's a shame the Entab command can't be constrained to the start of the line though (NOT a feature request) since I'd like to avoid replacing any 4x-spaces that might show up elsewhere in the file.

Likewise, the regex unfortunately doesn't seem to have a benefit over the previous example I provided since it can't match _multiple_ consecutive sets of 4 spaces at the beginning of the line and replace them with the corresponding "correct" number of tabs. For example:

# Imagine 10x spaces plus foo:       "          foo"
# First application of either regex: "\t      foo"
# Second application:                "\t\t  foo"

In other words: Applying either regex will only replace *one* set of 4 spaces with a tab at a time. The regex needs to be re-applied for each level of indentation.

Even so, applying judicious use of both techniques (along with review in git) should slightly simplify my workflow. If that still gets annoying, perhaps I can try hacking together a 1-line perl script.

Thanks again,
Brian
Reply all
Reply to author
Forward
0 new messages