Method to "translate" characters

52 views
Skip to first unread message

krep...@tuta.io

unread,
Jun 28, 2019, 4:44:08 AM6/28/19
to elixir-l...@googlegroups.com
Recently I found myself wanting to replace certain characters in a string with others (similar to "tr" shell command).

In my case I wanted to replace any of the characters in :;<=>?@[\]^_` with the corresponding character from ABCDEFGabcdef.

I'm not sure what the most idiomatic way to write this would be in Elixir but I used:

Enum.map_join(input, &my_replace/1)

defp my_replace(":"), do: "A"
defp my_replace(";"), do: "B"
defp my_replace("<"), do: "C"
defp my_replace("="), do: "D"
defp my_replace(">"), do: "E"
defp my_replace("?"), do: "F"
defp my_replace("@"), do: "G"
defp my_replace("["), do: "a"
defp my_replace("\\"), do: "b"
defp my_replace("]"), do: "c"
defp my_replace("^"), do: "d"
defp my_replace("_"), do: "e"
defp my_replace("`"), do: "f"
defp my_replace(char), do: char

This works, is readable, but not concise. My proposal would thus be to open a discussion on the way to implement a more concise way to do this, unless there is already a concise way to do this that I'm not aware of and if people agree that it's worth implementing.

This has been partly discussed before in https://github.com/elixir-lang/elixir/issues/4473

Here are some reference implementations in other languages, to open up ideas:

PHP:

strtr($input, ":;<=>?@[\]^_`", "ABCDEFGabcdef")

Ruby:

input.tr!(':;<=>?@[\\]^_`', 'ABCDEFGabcdef')

Go has an implementation that is not concise but does allow multiple char/string translation:

r := strings.NewReplacer(
     ":", "A",
     ";", "B",
     "<", "C",
     // and so on...
)
r.Replace(input)

Python (3) requires creation of a "translation map" first:

table = str.maketrans(":;<=>?@[\]^_`", "ABCDEFGabcdef")
input.translate(table)

A lot of these implementations also allow other uses, besides simple character translations.

José Valim

unread,
Jun 28, 2019, 4:56:09 AM6/28/19
to elixir-l...@googlegroups.com
Thanks for the proposal!

I assume those languages do not have a powerful mechanism for traversing binaries like Elixir does though?

In Elixir you could do:

    for <<char <- input>>, do: maybe_replace(char), into: ""

Note we are matching in characters though, so:

    defp maybe_replace(?A), do: "a"
    ...
    defp maybe_replace(char), do: <<char>>

You can also use pattern matching and recursion.

Both variants should be quite more efficient than an API that expects a map and do map look-ups. It also gives you more control over what you are matching (is it characters? Unicode codepoints? 16-bits integers?).

But if for some reason you need map look-ups:

    map = %{
      ?A => "a"
    }
    for <<char <- input>>, do: Map.get(map, char) || <<char>>, into: ""

And finally, if you want to match on substrings (requires Elixir v1.9):

    map = %{
      "AA" => "a"
    }
    String.replace(input, Map.keys(map), &map[&1])

TL;DR - I believe we have more performant and more flexible ways of achieving the same feature to not warrant the particular "tr" implementation found in other languages.

José Valim
Skype: jv.ptec
Founder and Director of R&D


--
You received this message because you are subscribed to the Google Groups "elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/LiSFMq0--3-1%40tuta.io.
For more options, visit https://groups.google.com/d/optout.

Krep Flap

unread,
Jun 28, 2019, 8:08:24 AM6/28/19
to elixir-lang-core
Thanks for your reply.

I agree that adding syntactic sugar purely for an ultra-concise implementation is probably unneeded, given the options and flexibility you demonstrated.

In your example, what if we also want to replace multi-byte characters, e.g. replace "€" with "E"?

Since we cannot use the bitstring generator in that case, is it still efficient to use a charlist to get the codepoints?

    for char <- to_charlist(input), do: maybe_replace(char), into: ""

    defp maybe_replace(?A), do: "a"
    defp maybe_replace(?€), do: "E"
    ...
    defp maybe_replace(char), do: <<char>>


Op vrijdag 28 juni 2019 10:56:09 UTC+2 schreef José Valim:
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-l...@googlegroups.com.

José Valim

unread,
Jun 28, 2019, 8:20:52 AM6/28/19
to elixir-l...@googlegroups.com
If you want Unicode characters, then you can do:

   for <<char::utf8 <- input>>, do: maybe_replace(char), into: ""

Note we are matching in characters though, so:

    defp maybe_replace(?A), do: "a"
    defp maybe_replace(?€), do: "E"
    ...
    defp maybe_replace(char), do: <<char::utf8>>

It won't work for graphemes though (which is a much more complicated problem because the same grapheme can be in different forms and their interpretation may also be localy dependent).

José Valim
Skype: jv.ptec
Founder and Director of R&D
To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-co...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/2e11c739-9771-4623-94ba-5e9159bdc675%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages