Input filtering

80 views
Skip to first unread message

Neil Hodgson

unread,
Apr 14, 2014, 6:24:28 AM4/14/14
to Scintilla mailing list
Over Scintilla’s history there have been requests to enable filtering of user input to Scintilla and for this to be easy to implement in one place without having to understand how to intercept keyboard, paste, drag-and-drop, and other events.

One example would be to disallow the entry of most control characters and the associated blobs in Scintilla. Some users are apparently confused when they type an unassigned control character and see something like [BEL]. Another situation is where a technical limitation or policy decision requires that the name “Löwis” be inserted as “Loewis” or “₩” be inserted as “\u20A9”. Characters that look very similar to other characters like Cyrillic ‘о’ could be inserted as “о” in a HTML file.

Some applications keep Scintilla in Unicode mode and translate between UTF-8 and the file encoding when loading and saving. With filtering they can ensure that any characters that can not be saved into the destination encoding are transformed into something that can be saved. Since this transformation occurs immediately, any mistake is more likely to be seen and fixed in context than if an encoding check is performed at save time.

A problem with implementing this has been that Scintilla itself inserts text with the expectation that the insertion will succeed and that code would have to change to allow for the filtering. There are over 40 call sites where text is inserted into the document. One approach would be for there to be two calls for inserting text, one that runs the filter and another that always inserts the exact requested text. I currently think that there may be useful transformations that the application could want in all circumstances and it is possible to rewrite the calling sites.

To implement this, there needs to be a notification (say, SC_MOD_INSERTCHECK) from Scintilla to the application that some text is being inserted to the document. Then there needs to be a call to change the text if wanted. Call that SCI_CHANGEINSERTION. An example implementation that changes control characters to octal escapes and changes spaces to a Unicode character with a small square box could look like this:

std::string sInsertion(notification->text, notification->length);
std::string sChanged;
for (unsigned char ch : sInsertion) {
if (ch == '\r' || ch == '\n' || ch == '\t') {
sChanged.push_back(ch);
} else if (ch == ' ') {
// Small white square
sChanged.append("\xe2\x96\xab");
} else if (ch < ' ') {
char szOctal[10];
sprintf(szOctal, "\\%03o", ch);
sChanged.append(szOctal);
} else {
sChanged.push_back(ch);
}
}
if (sChanged != sInsertion) {
wEditor.CallString(SCI_CHANGEINSERTION, sChanged.length(), sChanged.c_str());
}

When Scintilla is itself performing insertions, it is often for whitespace formatting using the space, tab, new line and carriage return characters and transforming these characters may confuse Scintilla or application code. However, it could conceivably be useful to insert a Unicode line end character like PS (paragraph separator) for new line or a non-breaking space for space.

This feature would be purely to remove or transform individual characters. It would not be suitable for generic translation of one string to another as the triggering string could be entered over multiple insert operations, possibly interleaved with other modifications. Text inserted for undo and redo would not trigger the notification.

Neil

Neil Hodgson

unread,
Apr 16, 2014, 12:29:41 AM4/16/14
to scintilla...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages