Non escaped redirect form

57 views
Skip to first unread message

Nicolas Crittin

unread,
Dec 6, 2024, 5:07:37 AM12/6/24
to Pac4j development mailing list
Hi,

I've been playing with pac4j for a few days now and I'm in the testing phase to determine whether pac4j is suited to my needs. It guess it is, although I'm stuck at version 5 because of the java version I'm limited to and I wonder if and how long it is maintained.

Now I'm experiencing an issue when I run failover tests. In my use case, I send a POST request to an instance that has just started. The network stack shows a redirect to the callback URL. This generates an HTML form in which the fields from the initial form are taken over as hidden fields. The problem is that the values are not correctly escaped. In particular, quotes are taken as they are, generating invalid values of the kind:

<input type='hidden' name="myInput" value="{"key":"val"}" />

I would expect something like

<input type='hidden' name="myInput" value="{&quot;key &quot; : &quot; val &quot; }" />

The org.pac4j.core.util.HttpActionHelper class seems to be responsible for this error because the buildFormPostContent() method that builds the form does not escape values (although input names are well escaped). I suggest chanting the line

buffer.append("<input type='hidden' name=\"" + escapeHtml(entry.getKey()) + "\" value=\"" + values[0] + "\" />\n");

into:

buffer.append("<input type='hidden' name=\"" + escapeHtml(entry.getKey()) + "\" value=\"" + escapeHtml(values[0]) + "\" />\n");

Best Regards

Jérôme LELEU

unread,
Dec 6, 2024, 8:33:44 AM12/6/24
to Nicolas Crittin, Pac4j development mailing list
Hi,

The idea was to abandon v5.x as soon as possible, but actually most people are still stuck on it so it is planned to keep it as long as required.

Yes, it seems like a bug. It should be fixed in v5.8.8-SNAPSHOT and v6.1.1-SNAPSHOT:

Can you confirm?

Thanks.
Best regards,
Jérôme


--
You received this message because you are subscribed to the Google Groups "Pac4j development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pac4j-dev+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/pac4j-dev/b24b38a3-2723-43f6-8d31-e5cc9c5b90b0n%40googlegroups.com.

Nicolas Crittin

unread,
Dec 6, 2024, 9:09:07 AM12/6/24
to Pac4j development mailing list
I just tested with 5.7.8-SNAPSHOT and it fixes the issue.

Merci !

Moritz Kobel

unread,
Dec 17, 2024, 11:15:31 AM12/17/24
to Pac4j development mailing list
Hello Jérôme,

we had issues with this form too:
- encoding: to ensure "special characters" as umlauts are not broken, we added a meta tag with the used character encoding and set the same encoding on the native response in `DefaultSavedRequestHandler.restore(..)`. (We are not sure if this issue is caused by pac4j or our application setup)
- HTML escape: Is there any reason the local escapeHtml method is used, especially since it only performs partial HTML escaping? Perhaps it could be replaced by org.apache.commons.text.StringEscapeUtils.escapeHtml4, since commons-text is already included as a dependency.
- Only the first value of a parameter is used. (We did not yet fix this because it does not affect our applications)

Let me know if you would like to get more information about these issues and our workarounds/fixes.

Best regards

Moritz

Jérôme LELEU

unread,
Dec 18, 2024, 2:34:28 AM12/18/24
to Moritz Kobel, Pac4j development mailing list
Hi,

Master now uses escapeHtml4https://github.com/pac4j/pac4j/commit/fb657a351b6180c8a6187fcf6c9f568a97b8ed8e. It's not possible for v5.7.x.

Please submit a PR with unit tests to demonstrate the problems and the fixes.

Thanks.
Best regards,
Jérôme

 

Nicolas Crittin

unread,
Oct 22, 2025, 4:01:07 AMOct 22
to Pac4j development mailing list
Hi,

I'm coming back on this issue because non ISO-8859-1 charset is not well supported. Since form does not declare any meta tag, browser may serialize it with inappropriate charset.

For instance, if I post a UTF-8 form which contains an input with value "é", the original POST will contain %C3%A9 sequence, which is a correct UTF-8 encoding of "é". The escapeHtml4 will then encode it as &eacute;. When returned to browser for final POST (after authentication), the browser will read this &eacute; and import as Unicode U+00E9, and then serialize it as ISO-8859-1 sequence %E9. This will make pac4j generate a HTTP 500 error because of exception when trying to invoke getRequestParameter() method. Exception cause is:

Character decoding failed. Parameter [sample] with value [�] has been ignored. Note that the name and value quoted here may be corrupted due to the failed decoding. Use debug level logging to see the original, non-corrupted values.

As Moritz says, a meta tag would be a safe way to fix this.

I prepared the PR #3635 that adds a meta tag in form.

Best Regards
Nicolas

Jérôme LELEU

unread,
Oct 23, 2025, 2:05:59 AMOct 23
to Nicolas Crittin, Pac4j development mailing list
Hi,

Shouldn't the UTF-8 encoding be mandatory instead when using pac4j (to make things easier)? Where does this ISO-8859-1 encoding come from?
Thanks.
Best regards,
Jérôme



Nicolas Crittin

unread,
Oct 23, 2025, 3:26:28 AMOct 23
to Pac4j development mailing list
Hi,

Perhaps I was imprecise in mentioning iso-8859-1. I just did another test using the character "€" instead of "é" (because "€" is not part of iso-8859-1). The form value is serialized with sequence %80 , which is the sequence for "€" in windows-1252 (note that windows-1252 is an extension of iso-8859-1 with some extra chars, including "€") .

Where does this charset come from ? As far as I know, when charset is not declared (i.e. when there is not meta tag or accept-charset attribute), browsers apply a default encoding. According to https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding, the "suggested default encoding" without locale info is windows-1252.

Now, I agree with you that UTF-8 should be used by default, and my first idea was to force the <meta charset="UTF-8">. But that might be a little too aggressive and cause regressions. So I chose to reuse the charset encoding from the initial request.

Jérôme LELEU

unread,
Oct 23, 2025, 4:19:03 AMOct 23
to Nicolas Crittin, Pac4j development mailing list
Hi,

Yes, this might be too aggressive to force UTF-8 for a minor version. Similarly, the Optional<String> getCharacterEncoding(); method should return Optional.empty() by default to avoid any breaking change.
Thanks.
Best regards,
Jérôme


Nicolas Crittin

unread,
Oct 23, 2025, 5:15:06 AMOct 23
to Pac4j development mailing list
Hi,

Here is one more element:

I notice that the encoding regression appears with commit fb657a351b6180c8a6187fcf6c9f568a97b8ed8e (https://github.com/pac4j/pac4j/commit/fb657a351b6180c8a6187fcf6c9f568a97b8ed8e) : in this commit, original local method escapeHtml() was replaced with escapeHtml4(). The latter converts special characters into html entities. As mentioned in https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding, browser can sniff encoding based on different elements, including bytes. For instance, a byte greater than 0x7F indicates with high probability a UTF-8 encoding. Using html entites prevent this sniffing algorithm because it only contains standard bytes.

So, one alternative would be to simply rollback this commit, although adding explicit charset in form would be a good practice imho.

Best Regards
Nicolas

Jérôme LELEU

unread,
Oct 23, 2025, 8:00:38 AMOct 23
to Nicolas Crittin, Pac4j development mailing list
Hi,

Yes, adding explicit charset seems a better option to me.
Thanks.
Best regards,
Jérôme


Reply all
Reply to author
Forward
0 new messages