Zap Gremlins With Numerical Code instead of Hex Code

221 views
Skip to first unread message

DarthPixel

unread,
Jun 18, 2009, 2:13:15 PM6/18/09
to BBEdit Talk
Kinda new to BBEdit as well as dealing with special characters in HTML
pages. We've been doing more Spanish language sites which include
special characters with accents (acutes and tildes) and non-English
punctuation marks (inverted exclamation point & question mark).

I tried Zap Gremlins with code option, however it substitutes the
desired characters with a Hex code. Is there a way to target those
characters with a numeric code?

Example, instead of replacing the inverted question mark with the Hex
code: "\xBF" is it possible to have BBEdit replace it with the numeric
code: "¿"

For now, I have put together a Text Factory that searches and
substitutes a few of these characters, but I figured this
functionality already exists somewhere in BBEdit...I am just not
looking in the right place.

Thanks in advance!

Patrick Woolsey

unread,
Jun 18, 2009, 4:53:04 PM6/18/09
to bbe...@googlegroups.com
DarthPixel <alcone.a...@gmail.com> sez:

[...]


>Example, instead of replacing the inverted question mark with the Hex
>code: "\xBF" is it possible to have BBEdit replace it with the numeric
>code: "&#191;"
>

The Translate command (Markup -> Utilities -> Translate, with appropriate
options set) should do what you need.


Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com>
P.O. Box 1048, Bedford, MA 01730-1048

Robert A. Rosenberg

unread,
Jun 18, 2009, 11:11:58 PM6/18/09
to bbe...@googlegroups.com, bbe...@googlegroups.com, Patrick Woolsey
At 16:53 -0400 on 06/18/2009, Patrick Woolsey wrote about Re: Zap
Gremlins With Numerical Code instead of Hex Code:

>DarthPixel <alcone.a...@gmail.com> sez:
>
>[...]
>>Example, instead of replacing the inverted question mark with the Hex
>>code: "\xBF" is it possible to have BBEdit replace it with the numeric
>>code: "&#191;"
>>
>
>The Translate command (Markup -> Utilities -> Translate, with appropriate
>options set) should do what you need.

No it will not. Translate does TXT to HTML and HTML to Text
transformations (and will do the code changes when it goes to HTML).
It does NOT take HTML and output HTML with the codes altered (ie: It
will not take an HTML file and just alter the entries to
&named/&hex/&decimal).
--


Robert A. Rosenberg
RAR Programming Systems Ltd.
(845)-357-0931 - Home
(646)-479-1984 - Cell Phone
(646)-349-4025 - Fax

Patrick Woolsey

unread,
Jun 19, 2009, 8:50:37 AM6/19/09
to bbe...@googlegroups.com
"Robert A. Rosenberg" <rar...@banet.net> sez:

>At 16:53 -0400 on 06/18/2009, Patrick Woolsey wrote:>
>>DarthPixel <alcone.a...@gmail.com> sez:
>>[...]
>>>Example, instead of replacing the inverted question mark with the Hex
>>>code: "\xBF" is it possible to have BBEdit replace it with the numeric
>>>code: "&#191;"
>>>
>>
>>The Translate command (Markup -> Utilities -> Translate, with appropriate
>>options set) should do what you need.
>
>No it will not. Translate does TXT to HTML and HTML to Text
>transformations (and will do the code changes when it goes to HTML).
>It does NOT take HTML and output HTML with the codes altered (ie: It
>will not take an HTML file and just alter the entries to
>&named/&hex/&decimal).


Sorry, I ought to've been more specific. :-)

You can use the Translate command for this purpose as follows: select
"Translate: Text to HTML", with the "Paragraphs" option turned OFF, and the
"Ignore < and >" option turned ON.

Now, when you apply this command to an HTML file, BBEdit will process the
file's contents and convert all extended characters, without affecting its
tags.

Semper Fidelis

unread,
Jun 19, 2009, 9:02:30 AM6/19/09
to bbe...@googlegroups.com
... AND choose "Decimal" as the form of the HTML entity.

DarthPixel

unread,
Jun 19, 2009, 12:10:55 PM6/19/09
to BBEdit Talk
Hot dog!

Thanks to all of you, this is awesome (simple mind, simple pleasure)!

It's easy when you know how.

Thanks again!

Robert A. Rosenberg

unread,
Jun 19, 2009, 4:51:35 PM6/19/09
to bbe...@googlegroups.com
At 08:50 -0400 on 06/19/2009, Patrick Woolsey wrote about Re: Zap
Gremlins With Numerical Code instead of Hex Code:

>Sorry, I ought to've been more specific.
>


>You can use the Translate command for this purpose as follows: select
>"Translate: Text to HTML", with the "Paragraphs" option turned OFF, and the
>"Ignore < and >" option turned ON.
>
>Now, when you apply this command to an HTML file, BBEdit will process the
>file's contents and convert all extended characters, without affecting its
>tags.

Thank you for the follow-up. I tried this and it works. While I would
rather have this as part of the Format Command, it seems to work as
you stated.

I have two slight areas that I would like to see it enhanced. First
is to have an option to tell it to rescan the HTML entries and
convert them to the current setting. IOW: If I have &copy; but I am
set to decimal have it alter to &#169;. Second, give it the ability
to fix invalid x80-x9F codes so they are the correct Unicode
equivalents. I note that if I use the character itself, the Unicode
value is used for the replacement so that is working already but
unless you add the rescan there is no way to fix/correct any usage of
the invalid values except manually.

John Delacour

unread,
Jun 7, 2010, 3:06:50 PM6/7/10
to bbe...@googlegroups.com

When I search documents in various languages in BBEdit I don't want
to have to use accents in the search pattern.

When I look for "ver" I want to find verse, v�rbo and v�rso and not
only the unaccented forms, not to speak of Greek.

So far as I can see BBEdit is not able to do such searches and this
seems to be a serious deficiency. There must be a standard algorithm
for such searches -- Google must use one. Have I missed something,
and if not, are there any plans to allow a diacritic-blind search?

JD

Greg Shenaut

unread,
Jun 7, 2010, 3:49:11 PM6/7/10
to bbe...@googlegroups.com
On Jun 7, 2010, at 12:06 PM, John Delacour wrote:
> When I search documents in various languages in BBEdit I don't want to have to use accents in the search pattern.
>
> When I look for "ver" I want to find verse, vérbo and vèrso and not only the unaccented forms, not to speak of Greek.

>
> So far as I can see BBEdit is not able to do such searches and this seems to be a serious deficiency. There must be a standard algorithm for such searches -- Google must use one. Have I missed something, and if not, are there any plans to allow a diacritic-blind search?

The normal way to do that is to put [=e=] inside a grep character class, like « v[[=e=]]r », but when I tried that just now in BBEdit, I got a message "The search cannot proceed, because of a syntax error in the Grep pattern: POSIX collating elements are not supported...".

To see this in action, try the command line command

echo 'xyz\nEh?\nélève\nabc\nêtre\nest\net' | grep '[[=e=]]'

or similar.


Greg Shenaut

John Delacour

unread,
Jun 7, 2010, 4:27:43 PM6/7/10
to bbe...@googlegroups.com
At 12:49 -0700 7/6/10, Greg Shenaut wrote:

>The normal way to do that is to put [=e=] inside a grep character

>class, like ÔøΩ v[[=e=]]r ÔøΩ, but when I tried that just now in BBEdit,

>I got a message "The search cannot proceed, because of a syntax
>error in the Grep pattern: POSIX collating elements are not
>supported...".

I wasn't aware of that trick. Thank you. I thought at least I'd be
able to get round it by writing a Unix filter in Perl, but that too
gives this error:

POSIX syntax [= =] is reserved for future extensions in regex;

and that's using Perl 5.12.0

So what's the deal with these POSIX extensions?

Surely our friends at Barebones are not going to be intimidated?!

JD

Reply all
Reply to author
Forward
0 new messages