Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to replace control characters and special characters in emacs lisp function? much like unix "tr"

36 views
Skip to first unread message

H.Singh

unread,
Nov 2, 2021, 1:17:07 AM11/2/21
to
Likely, that this is a basic question to experts from a newbie like myself.

Suppose, I have a list of text and I want to transliterate the characters.
I want to do it in emacs so unix utilities like tr and sed might not be desirable for the moment.

Also, I want to build it incrementally and write this as a lisp function.
For example, I have this text

Figure
and I want to replace it with any kind of mapping and to make example easy for the moment it is

Figure -> fIGURE

and I have cooked this and tested it in emacs.

(mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
'[F i g u r e]
'[f I G U R E])

I just tested it again and it works on the textg below it by C-x C-e.

But now, I have control characters in the mapping.
I cannot write the control characters but the place where I get them by copy-paste gives me the replacement text as control characters and I do not want to manually be converting them into Hex or Octal or Decimal character codes.

This is the main reason I have built this above function based on mapcar* so it repeatedly car's the [] (list in C and vector in lisp) and here I can write a macro to put spaces between the characters.

I do not even want to quote the pasted text characters.

Have a read at a similar problem and undesirable solution at this webpage

https://unix.stackexchange.com/questions/148837/replace-control-characters-in-emacs

where it requires you to discover the encoding and I do not want to do that. I want strictly a simple emacs transliteration function to do this along the lines above.

The above for control character fails mainly because
(format "%s" x) does not work for non-string or non-printable characters.

Here are some examples I tried:

(format "%s" 'F)
(format "%s" '§)
(format "%s" '𝛱)
(format "%s" ' )

and the last one is Control-H which looks like ^H and it fails.

The very specific example which fails and the transliteration desired is below

-------------------------------------
(mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
'[ ]
'[F i g u r e])



Figure
----------------------------------------

At some stage, I probably also want to add special characters in my tranformation table.

NOTE: I do not want to write a code line by line like this"

(replace-string "F" "f")
(replace-string "i" "I")
(replace-string "g" "G")

and so on.

Thank you in advance for any help.
H Singh









H.Singh

unread,
Nov 2, 2021, 1:26:10 AM11/2/21
to
On Monday, November 1, 2021 at 10:17:07 PM UTC-7, H.Singh wrote:
> Likely, that this is a basic question to experts from a newbie like myself.
>
> Suppose, I have a list of text and I want to transliterate the characters.
> I want to do it in emacs so unix utilities like tr and sed might not be desirable for the moment.
>
> Also, I want to build it incrementally and write this as a lisp function.
> For example, I have this text
>
> Figure
> and I want to replace it with any kind of mapping and to make example easy for the moment it is
>
> Figure -> fIGURE
>
> and I have cooked this and tested it in emacs.
>
> (mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
> '[F i g u r e]
> '[f I G U R E])
>
> I just tested it again and it works on the textg below it by C-x C-e.

I just tested it again and it works on the text below it by C-x C-e. <--------------- typo corrected

> But now, I have control characters in the mapping.
> I cannot write the control characters but the place where I get them by copy-paste gives me the replacement text as control characters and I do not want to manually be converting them into Hex or Octal or Decimal character codes.
>
> This is the main reason I have built this above function based on mapcar* so it repeatedly car's the [] (list in C and vector in lisp) and here I can write a macro to put spaces between the characters.
>
> I do not even want to quote the pasted text characters.
>
> Have a read at a similar problem and undesirable solution at this webpage
>
> https://unix.stackexchange.com/questions/148837/replace-control-characters-in-emacs
>
> where it requires you to discover the encoding and I do not want to do that. I want strictly a simple emacs transliteration function to do this along the lines above.
>
> The above for control character fails mainly because
> (format "%s" x) does not work for non-string or non-printable characters.
>
> Here are some examples I tried:
>
> (format "%s" 'F)
> (format "%s" '§)
> (format "%s" '𝛱)
> (format "%s" ' )
>
> and the last one is Control-H which looks like ^H and it fails.
>
> The very specific example which fails and the transliteration desired is below
>
> -------------------------------------
> (mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
> '[ ]

In the above [] there were exactly one CONTROL CHARACTER for each of the letters of the string "Figure".

^H ^V ^T ^_ ^\ ^R

Ben Bacarisse

unread,
Nov 2, 2021, 7:08:39 AM11/2/21
to
"H.Singh" <hashis...@gmail.com> writes:

> Suppose, I have a list of text and I want to transliterate the characters.
> I want to do it in emacs so unix utilities like tr and sed might not
> be desirable for the moment.
>
> Also, I want to build it incrementally and write this as a lisp function.
> For example, I have this text
>
> Figure
> and I want to replace it with any kind of mapping and to make example
> easy for the moment it is
>
> Figure -> fIGURE
>
> and I have cooked this and tested it in emacs.
>
> (mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
> '[F i g u r e]
> '[f I G U R E])
>
> I just tested it again and it works on the textg below it by C-x C-e.

I suggest you take a look at subst-char-in-region and translate-region.
The latter can apply a translation table to a region.

> But now, I have control characters in the mapping.
> I cannot write the control characters but the place where I get them
> by copy-paste gives me the replacement text as control characters and
> I do not want to manually be converting them into Hex or Octal or
> Decimal character codes.
>
> This is the main reason I have built this above function based on
> mapcar* so it repeatedly car's the [] (list in C and vector in lisp)
> and here I can write a macro to put spaces between the characters.
>
> I do not even want to quote the pasted text characters.
>
> Have a read at a similar problem and undesirable solution at this webpage
>
> https://unix.stackexchange.com/questions/148837/replace-control-characters-in-emacs
>
> where it requires you to discover the encoding and I do not want to do that. I want strictly a simple emacs transliteration function to do this along the lines above.
>
> The above for control character fails mainly because
> (format "%s" x) does not work for non-string or non-printable characters.
>
> Here are some examples I tried:
>
> (format "%s" 'F)
> (format "%s" '§)
> (format "%s" '𝛱)
> (format "%s" ' )
>
> and the last one is Control-H which looks like ^H and it fails.
>
> The very specific example which fails and the transliteration desired is below
>
> -------------------------------------
> (mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
> '[ ]

'[" " "" "" " " "" ""]

> '[F i g u r e])

Should work. The trouble is that you are relying on the fact that '[F i
g u r e] is an array of symbols, and symbols convert naturally (using
%s) to a string.

--
Ben.

Ben Bacarisse

unread,
Nov 2, 2021, 7:38:21 AM11/2/21
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> "H.Singh" <hashis...@gmail.com> writes:

>> (mapcar* '(lambda (x y)
>> (save-excursion (replace-string (format "%s" x) (format "%s" y))))
>
> '[" " "" "" " " "" ""]
>
>> '[F i g u r e])
>
> Should work.

On a side note, strings are sequences that you can "mapcar" over, so you
could, more simply write

(mapcar* '(lambda (x y)
(save-excursion (replace-string (string x) (string y))))
" "
"fIGURE")

On another side note, having control characters in posts might not
always work, so you might want to use this representation:

'[?\^H ?\^V ?\^_ ?\^\\ ?\^R]

--
Ben.

Gene

unread,
Nov 3, 2021, 8:59:35 AM11/3/21
to
On Tuesday, November 2, 2021 at 1:17:07 AM UTC-4, H.Singh wrote:
> Likely, that this is a basic question to experts from a newbie like myself.
>
> Suppose, I have a list of text and I want to transliterate the characters.
> I want to do it in emacs so unix utilities like tr and sed might not be desirable for the moment.
>
> Also, I want to build it incrementally and write this as a lisp function.
> For example, I have this text
>
> Figure
> and I want to replace it with any kind of mapping and to make example easy for the moment it is
>
> Figure -> fIGURE
>
> and I have cooked this and tested it in emacs.
>
> (mapcar* '(lambda (x y) (save-excursion (replace-string (format "%s" x) (format "%s" y))))
> '[F i g u r e]
> '[f I G U R E])
>
> I just tested it again and it works on the textg below it by C-x C-e.
>
> But now, I have control characters in the mapping.
> I cannot write the control characters but the place where I get them by copy-paste gives me the replacement text as control characters and I do not want to manually be converting them into Hex or Octal or Decimal character codes.
>
> This is the main reason I have built this above function based on mapcar* so it repeatedly car's the [] (list in C and vector in lisp) and here I can write a macro to put spaces between the characters.
>
> I do not even want to quote the pasted text characters.
>
> Have a read at a similar problem and undesirable solution at this webpage
>
> https://unix.stackexchange.com/questions/148837/replace-control-characters-in-emacs
>
> where it requires you to discover the encoding and I do not want to do that. I want strictly a simple emacs transliteration function to do this along the lines above.


> At some stage, I probably also want to add special characters in my tranformation table.

As someone who has used emacs on MS-Dog and Winblows machines, the control-M character used to terminate lines was the goad for me to grope around for a means to do what you presently want to do.

So for the example I'm about to present, image a file tainted with those dumbass ^M characters (EG as they are displayed via emacs, although only a single control character)

Emacs has query-replace, which allows one to replace on string with another.

(query-replace FROM-STRING TO-STRING &optional DELIMITED START END
BACKWARD REGION-NONCONTIGUOUS-P)

By default, I believe this function is mapped to Alt-Shft-% if you want to do a Ctrl-h-k to get help on that key binding.

As every character may be thought of as a string of length 1, this option seems viable, right?

The next trick is input or specify the control character ^M, the extraneous end-of-line character (mis)used in MS-Dog text files.
As a rookie it took me quite some time to discover a magic key sequence which would allow me to specify a control character.
The key sequence maps onto

(quoted-insert ARG)

... which by default, I believe, is mapped onto C-q

So when I open a file with obnoxious ^M characters terminating every line I use these 2 emacs functions.

And when I've wanted to automate things, I use command-history AFTER having used those two functions; the quoted-insert function exploited in the UI results in a delightfully re-useable sexp being inserted in command-history for re-use.

(query-replace "
" "" nil nil nil nil nil)

note that viewing this in command history the ^M is visible in emacs, whereas above the interface of the mail app I'm using to compose this interpreted the ^M as a newline, thus displaying the single-line sexp on two lines instead of one as you'll see it in emacs.

So if one wants to exploit this method by generalizing, one may use the lambda calculus by turning a specific case into a `parametric equation' cum anonymous parametric function as follows:

((lambda (from-string to-string) (query-replace from-string to-string nil nil nil nil nil)) <your `before'> <your `after'>)

... or something to this effect.

As there are many string substitution and character substitution functions available in emacs lisp, the permutations are myriad.
The variation I presented is something which has worked for me to accomplish a variation on your central theme.

Best of luck with hacking together something which suits your needs and temperament.

Cheers!
Gene



Ben Bacarisse

unread,
Nov 3, 2021, 11:48:14 AM11/3/21
to
Gene <gene.s...@gmail.com> writes:

> As someone who has used emacs on MS-Dog and Winblows machines, the
> control-M character used to terminate lines was the goad for me to
> grope around for a means to do what you presently want to do.
>
> So for the example I'm about to present, image a file tainted with
> those dumbass ^M characters (EG as they are displayed via emacs,
> although only a single control character)
>
> Emacs has query-replace, which allows one to replace on string with another.
>
> (query-replace FROM-STRING TO-STRING &optional DELIMITED START END
> BACKWARD REGION-NONCONTIGUOUS-P)
>
> By default, I believe this function is mapped to Alt-Shft-% if you
> want to do a Ctrl-h-k to get help on that key binding.
>
> As every character may be thought of as a string of length 1, this
> option seems viable, right?
>
> The next trick is input or specify the control character ^M, the
> extraneous end-of-line character (mis)used in MS-Dog text files. As a
> rookie it took me quite some time to discover a magic key sequence
> which would allow me to specify a control character. The key sequence
> maps onto
>
> (quoted-insert ARG)
>
> ... which by default, I believe, is mapped onto C-q
>
> So when I open a file with obnoxious ^M characters terminating every
> line I use these 2 emacs functions.

I am happy this works for you, but it seems odd and rather a lot of
effort. In general, I don't want to remove ^M characters from Windows
or DOS files because they are needed when other, native, programs
process the files.

I get the kind of convenience I like without removing them using file
coding systems (essentially a translation on the way in and on the way
out).

--
Ben.

Gene

unread,
Nov 3, 2021, 12:31:22 PM11/3/21
to
On Wednesday, November 3, 2021 at 11:48:14 AM UTC-4, Ben Bacarisse wrote:

> > So when I open a file with obnoxious ^M characters terminating every
> > line I use these 2 emacs functions.

> I am happy this works for you, but it seems odd and rather a lot of
> effort.``

I'm with Fritz Perls; I too have never heard anything before the `but'.
It's very little effort to have emacs remove the too-me-obnoxious ^M characters.
I place point/cursor at the start of the buffer containing the monopoly$oft bullshit, initiate the string replace function, specify the ^M as the dog shit I want replaced, and no character at all as what I want in it's place.
Then when prompted by emacs I opt for `!' to replace all.
Then I save the much-improved contents to the original file as I thumb my nose at the Monopoly$oft Hegemony and their (l)users.

There ... was that more flame-warish for you? 8-}

> In general, I don't want to remove ^M characters from Windows
> or DOS files because they are needed when other, native, programs
> process the files.

All the power to you, my friend.
>
> I get the kind of convenience I like without removing them using file
> coding systems (essentially a translation on the way in and on the way
> out).

Once again ... good for you.

Now, if I were to point at the moon on behalf of another user would you be commenting on the manicure of my index finger?

The OP pointed at ... what?
I used an example pursuant to addressing HIS issue, NOT YOURS.

Yes, if I wanted to use a text file downstream via another Winblows app -- which I rarely, if EVER, do -- my attitude would be exactly like yours, I suppose.
While I'm snarfing text, downloading text files, and working with ad hoc snippets of text I haphazardly encounter ^M to which I give short shrift via emacs lisp.
Ahhhh ... the joys of having a Domain Specific Lisp Engine masquerading as a text editor!

Gene

Ben Bacarisse

unread,
Nov 3, 2021, 6:44:14 PM11/3/21
to
Gene <gene.s...@gmail.com> writes:

> On Wednesday, November 3, 2021 at 11:48:14 AM UTC-4, Ben Bacarisse wrote:
>
>> > So when I open a file with obnoxious ^M characters terminating every
>> > line I use these 2 emacs functions.
>
>> I am happy this works for you, but it seems odd and rather a lot of
>> effort.``
>
> I'm with Fritz Perls; I too have never heard anything before the `but'.
> It's very little effort to have emacs remove the too-me-obnoxious ^M
> characters.

You missed the "odd" bit. It is were not an odd thing to do, the small
effort would not seem to be rather a lot. But maybe it's not odd
because you only see ^M in situations where they don't matter.

>> In general, I don't want to remove ^M characters from Windows
>> or DOS files because they are needed when other, native, programs
>> process the files.
>
> All the power to you, my friend.
>
>> I get the kind of convenience I like without removing them using file
>> coding systems (essentially a translation on the way in and on the way
>> out).
>
> Once again ... good for you.

I was trying to be helpful by explaining my usual use case.

--
Ben.

H.Singh

unread,
Nov 27, 2021, 3:08:38 PM11/27/21
to
Thank you all. it was a good discussion and help.
I learnt something even though the problem had
no fixed encoding, just garbled letters in a pdf
where the text selection does not match the glyph.

It probably requires a genius like Ben to tackle such
problems.
0 new messages