Proposal: hook to filter/modify all incoming text

8 views
Skip to first unread message

Stephan Weinberger

unread,
Oct 22, 2023, 9:10:46 PM10/22/23
to LDMud Talk
Hello,

I'm struggling with encodings (again), particularly the sketchy/broken
implementation of TRANSLIT in my libiconv.
Currently I'm using H_MODIFY_COMMAND to "manually" transliterate German
umlauts (ä -> ae, etc.), but unfortunately that does not catch text sent
to input_to().
I could implement a wrapper simul_efun for input_to(), that passes the
string through a closure doing the transliteration before passing it on
to the target function, but that would interfere with input_to()'s
ability to call static functions, so it isn't ideal.

A much cleaner solution IMHO would be a new hook H_MODIFY_INPUT, that is
applied to _all_ incoming text.

For convenience the same could also be done for text that is about to be
sent out (e.g. to apply colors, etc.).


Comments appreciated,
  Stephan

Gnomi

unread,
Oct 23, 2023, 4:39:21 AM10/23/23
to ldmud...@googlegroups.com
Hi,

Stephan Weinberger wrote:
> I'm struggling with encodings (again), particularly the sketchy/broken
> implementation of TRANSLIT in my libiconv.

//TRANSLIT would not work on inputs, as the target encoding (UTF-8) can
handle all characters from whatever source encoding, so no transliteration
is necessary when reading user input.

> Currently I'm using H_MODIFY_COMMAND to "manually" transliterate German
> umlauts (ä -> ae, etc.), but unfortunately that does not catch text sent to
> input_to().
> I could implement a wrapper simul_efun for input_to(), that passes the
> string through a closure doing the transliteration before passing it on to
> the target function, but that would interfere with input_to()'s ability to
> call static functions, so it isn't ideal.

That's what we had before setting the encoding. I don't recall
having problems with static functions (the simul-efun does
set_this_object(previous_object()); at the beginning).

> A much cleaner solution IMHO would be a new hook H_MODIFY_INPUT, that is
> applied to _all_ incoming text.

A companion to H_MODIFY_COMMAND for input_to()s would make sense, that one
could get input_to information as well. Combining both into a single
function for all text could be done on the mudlib site.

But would it be sufficient to get the already converted unicode string, or
do you need the actual byte input? Because that would be messy. The
conversion is done right at the beginning interwoven with telnet handling
(as telnet handling can change the encoding) and before command (newline)
detection.

Greetings,
Gnomi

Stephan Weinberger

unread,
Oct 23, 2023, 6:08:12 AM10/23/23
to ldmud...@googlegroups.com
On 23.10.23 10:39, Gnomi wrote:
Hi,

Stephan Weinberger wrote:
I'm struggling with encodings (again), particularly the sketchy/broken
implementation of TRANSLIT in my libiconv.
//TRANSLIT would not work on inputs, as the target encoding (UTF-8) can
handle all characters from whatever source encoding, so no transliteration
is necessary when reading user input.

Sure. I was trying to use to_text(to_bytes(...)) to coerce it to do the transliteration for me, but I only got "invalid character sequence" errors (or very strange transliterations like "ö" -> "A?" when trying to use "ASCII//TRANSLIT" as a target encoding. After some googling I came to the conclusion that there are various different implementations of iconv in circulation that all do TRANSLIT a bit differently (or not at all).

That's what we had before setting the encoding. I don't recall
having problems with static functions (the simul-efun does
set_this_object(previous_object()); at the beginning).

Does simul_efun get the same special treatment for call_other() as input_to() does? I admit I haven't tried that yet.



      
A much cleaner solution IMHO would be a new hook H_MODIFY_INPUT, that is
applied to _all_ incoming text.
A companion to H_MODIFY_COMMAND for input_to()s would make sense, that one
could get input_to information as well. Combining both into a single
function for all text could be done on the mudlib site.

My thinking was that, if you want to do something like that it would likely affect all input anyways, so having one "central" hook would make more sense to me. The special hook for commands would be more geared towards aliases, handling of paralysis, etc. after that.

But either way is fine with me, as long as there is a way to capture any input at some point. Commands and input_to()s are the only ways to get text input, aren't they?

But would it be sufficient to get the already converted unicode string, or
do you need the actual byte input? Because that would be messy. The
conversion is done right at the beginning interwoven with telnet handling
(as telnet handling can change the encoding) and before command (newline)
detection.

Yes telneg handling and conversion to unicode should happen before that. That's exactly what I _don't_ want to deal with in the lib ;-)
So either H_MODIFY_INPUT right after telnegs/conversion but before the branch into command/input_to, or H_MODIFY_INPUT_TO right before calling the input_to target function.


Or we just all stick to unicode and have a lot of "fun" on intermud :-D

Greets
  Invis
Reply all
Reply to author
Forward
0 new messages