Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Problems with accents

13 views
Skip to first unread message

andre

unread,
May 17, 2018, 4:02:03 AM5/17/18
to
This is working well:
print "L'homme\n";
Buts this
print "<P>INPUT NAME='titre' VALUE='l\'homme'>\n";
Print only upto the appostrophe! : l\
Any idea??
Many thanks
André
--
Les politiciens sont imprévoyants, et les électeurs sans mémoire!

Thomas 'PointedEars' Lahn

unread,
May 17, 2018, 6:32:26 AM5/17/18
to
andre wrote:
^^^^^
Your last name is missing there, André no. 74656.

> This is working well:
> print "L'homme\n";
> Buts this
> print "<P>INPUT NAME='titre' VALUE='l\'homme'>\n";
> Print only upto the appostrophe! : l\
> Any idea??

As you can see if you look at the source code of the dynamically generated
Web document (e.g. with Ctrl+U), this generates some sort of HTML markup:

<P>INPUT NAME='titre' VALUE='l\'homme'>

(without indentation).

But HTML is _not_ PHP; if, as you intended, you use the apostrophe as
(“VALUE”) attribute value delimiter, then for a literal apostrophe you have
to escape it with a character reference:

<P>INPUT NAME='titre' VALUE='l&#39;homme'>

This can be automated with PHP:

echo "<P>INPUT NAME='titre' VALUE='l" . htmlspecialchars("'", ENT_QUOTES)
. "homme'>";

<https://php.net/htmlspecialchars>

(Do not use “print” unless you are interested in the return value.
<https://php.net/print>)

In fact, if the attribute value is from a PHP value, the whole attribute
value should be escaped:

echo "<P>INPUT NAME='titre' VALUE='" . htmlspecialchars($value,
ENT_QUOTES) . ">";

But if the value is fixed, in this special case it is easier, and usually
better, to use another delimiter both for the PHP string and the HTML
attribute value:

echo '<P>INPUT NAME="titre" VALUE="l'homme">';

The problem does not arise in the first place if you use the *proper*
(typographical) apostrophe:

<P>INPUT NAME='titre' VALUE='l’homme'>


Also, your markup is syntactically wrong; a leading “<” character is missing
for the “input” element:

<P><input name="titre" value="l’homme">

This is still *semantically* wrong, because an “input” element does not
belong in a paragraph (“p”) element; but, for example, in a “fieldset”
element:

<fieldset><input name="titre" value="l’homme"> …</fieldset>

Assuming that “l’homme” is an attribute value that is not fixed, but comes
from a PHP value, then it is not necessary to use either “echo” or “print”;
PHP is the *P*HP *H*ypertext *P*reprocessor:

$value = 'l’homme';

?><fieldset><input name="titre" value="<?= htmlspecialchars($value) ?>"> …
</fieldset>

This approach also works better with syntax highlighting.


You should avoid language-specific identifiers and values that are not
displayed to the user. Your code will be easier to write, and easier
understood by others, if you choose identifiers in English, the
/lingua franca/ of computer technology.

For example, in this case a “select” element is usually the proper element,
where the “option” elements’ values should be in English, and the element’s
content can be in the user’s language (which can be automated with classes
such as Zend\I18n\Translator\Translator):

<select name="title" size="1">
<option value="Mr.">M.</option>
<option value="Ms.">Mme</option>
<option value="Dr.">Dr</option>
<option value="Prof.">Pr</option>
</select>


If this is part of a form, it would be better to use a table anyway, where
the table headers contain the labels and the table data are the form
controls:

<fieldset>
<legend>User data</legend>
<table>
<tr>
<th>Titre</th>
<td><select name="title" size="1">
<option value="Mr.">M.</option>
<option value="Ms.">Mme</option>
<option value="Dr.">Dr</option>
<option value="Prof.">Pr</option>
</select></td>
</tr>

</table>
</fieldset>

Such *HTML* basics are usually discussed in
<news:comp.infosystems.www.authoring.html> only.


Finally, an apostrophe is _not_ an accent; both are diacritic marks.

--
PointedEars
Zend Certified PHP Engineer <http://www.zend.com/en/yellow-pages/ZEND024953>
<https://github.com/PointedEars> | <http://PointedEars.de/wsvn>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

andre

unread,
May 17, 2018, 7:36:53 AM5/17/18
to
Le 17/05/2018 12:32, Thomas 'PointedEars' Lahn a écrit :

Many thanks.

This is made to display the result of a SQL query, but the contains of
the database is plurilingal and some of the data are in french!!
The file called from a html form, is a php script generating HTML output.
so in fact the line is:
<P>INPUT NAME='titre' VALUE:'$Roww[0]' SIZE=60\n";
This to give users the possibility to modify a erroneous entry!

Thomas 'PointedEars' Lahn

unread,
May 17, 2018, 8:31:13 AM5/17/18
to
andre wrote:

> Le 17/05/2018 12:32, Thomas 'PointedEars' Lahn a écrit :
>> But if the value is fixed, in this special case it is easier, and usually
>> better, to use another delimiter both for the PHP string and the HTML
>> attribute value:
>>
>> echo '<P>INPUT NAME="titre" VALUE="l'homme">';
^------------------------------^

This is syntactically incorrect PHP code; should be at least

echo '<P>INPUT NAME="titre" VALUE="l\'homme">';
// ^---------------------------------------^

instead. [In this way, the PHP-based escaping works because the "attribute"
value is delimited by <"> (straight quotation mark) instead.]

> The file called from a html form, is a php script generating HTML output.
> so in fact the line is:
> <P>INPUT NAME='titre' VALUE:'$Roww[0]' SIZE=60\n";

No, it is not.

> This to give users the possibility to modify a erroneous entry!

What I said before applies, then. If the value can be entered this way
(which AISB I do not think it should), you have to escape the previous,
*user-provided* value for output in order to prevent code injection such
as cross-site scripting (XSS):

<https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet>

Next time, do not top-post:

<https://www.netmeister.org/news/learn2quote.html>

Also, while your real name is somewhat optional (it is considered polite),
it is a violation of Internet standards and disregard of Network etiquette
(Netiquette) to falsify address header fields and use foreign namespaces
without authorization. For example, your “pas...@pasici.be” is *presently*
_not_ an e-mail address as *presently* pasici.be is not a registered second-
level domain (but may be in the future):

| Verifying <pas...@pasici.be>...
| Mail exchanger(s) for pasici.be: none.
| `A' record for pasici.be:
| None, thus <pas...@pasici.be> is definitely not an e-mail address (no MX).

<http://www.interhack.net/pubs/munging-harmful/>

Richard Yates

unread,
May 17, 2018, 8:58:39 AM5/17/18
to
On Thu, 17 May 2018 10:01:26 +0200, andre <a...@blabla.be> wrote:

>This is working well:
>print "L'homme\n";
>Buts this
>print "<P>INPUT NAME='titre' VALUE='l\'homme'>\n";
>Print only upto the appostrophe! : l\
>Any idea??
>Many thanks
>André

You are missing a '<' before INPUT.

Escaping a character with a backslash is not something that works in
HTML.

This does what you want:

print "<P><INPUT NAME='titre' VALUE='l&#39;homme'>\n";



Jerry Stuckle

unread,
May 18, 2018, 8:47:09 AM5/18/18
to
On 5/17/2018 6:32 AM, the internet troll Thomas 'Pointed Head' Lahn wrote:
> andre wrote:
> ^^^^^
> Your last name is missing there, André no. 74656.
>
You're the only one who gives a damn. He doesn't have to give ANY name
if he doesn't want to.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

Jerry Stuckle

unread,
May 18, 2018, 8:50:04 AM5/18/18
to
On 5/17/2018 8:31 AM, Thomas 'PointedEars' Lahn wrote:
> andre wrote:
>
>> Le 17/05/2018 12:32, Thomas 'PointedEars' Lahn a écrit :
>>> But if the value is fixed, in this special case it is easier, and usually
>>> better, to use another delimiter both for the PHP string and the HTML
>>> attribute value:
>>>
>>> echo '<P>INPUT NAME="titre" VALUE="l'homme">';
> ^------------------------------^
>
> This is syntactically incorrect PHP code; should be at least
>
> echo '<P>INPUT NAME="titre" VALUE="l\'homme">';
> // ^---------------------------------------^
>
> instead. [In this way, the PHP-based escaping works because the "attribute"
> value is delimited by <"> (straight quotation mark) instead.]
>

Which will cause a problem if there is a double quote in the string.

>> The file called from a html form, is a php script generating HTML output.
>> so in fact the line is:
>> <P>INPUT NAME='titre' VALUE:'$Roww[0]' SIZE=60\n";
>
> No, it is not.
>

You know his code better than he does? I do NOT think so.

>> This to give users the possibility to modify a erroneous entry!
>
> What I said before applies, then. If the value can be entered this way
> (which AISB I do not think it should), you have to escape the previous,
> *user-provided* value for output in order to prevent code injection such
> as cross-site scripting (XSS):
>

You don't know what processing is done on the input.

> <https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet>
>
> Next time, do not top-post:
>
> <https://www.netmeister.org/news/learn2quote.html>
>
> Also, while your real name is somewhat optional (it is considered polite),
> it is a violation of Internet standards and disregard of Network etiquette
> (Netiquette) to falsify address header fields and use foreign namespaces
> without authorization. For example, your “pas...@pasici.be” is *presently*
> _not_ an e-mail address as *presently* pasici.be is not a registered second-
> level domain (but may be in the future):
>
> | Verifying <pas...@pasici.be>...
> | Mail exchanger(s) for pasici.be: none.
> | `A' record for pasici.be:
> | None, thus <pas...@pasici.be> is definitely not an e-mail address (no MX).
>
> <http://www.interhack.net/pubs/munging-harmful/>
>

YOU are a violation of the Internet standards.

Jerry Stuckle

unread,
May 18, 2018, 8:55:39 AM5/18/18
to
On 5/17/2018 7:36 AM, andre wrote:
> Le 17/05/2018 12:32, Thomas 'PointedEars' Lahn a écrit :
>
> Many thanks.
>
> This is made to display the result of a SQL query, but the contains of
> the database is plurilingal and some of the data are in french!!
> The file called from a html form, is a php script generating HTML output.
> so in fact the line is:
> <P>INPUT NAME='titre' VALUE:'$Roww[0]' SIZE=60\n";
> This to give users the possibility to modify a erroneous entry!
>
>

You can still use htmlspecialchars() on $Roww[0] (although
htmlentities() may be a better choice, especially since you could have
some French characters).

andre

unread,
May 18, 2018, 11:13:50 AM5/18/18
to
Le 17/05/2018 10:01, andre a écrit :
> This is working well:
> print "L'homme\n";
> Buts this
> print "<P>INPUT NAME='titre' VALUE='l\'homme'>\n";
> Print only upto the appostrophe! : l\
> Any idea??
> Many thanks
> André
As the data comes from a database, I have made the change in the entry
procees to the db.
$TT = htmlspecialchars($Titre, ENT_QUOTES);
and oups OK.
I have a few records to update will not be too difficult.
Many thanks again
André

Jerry Stuckle

unread,
May 18, 2018, 1:35:49 PM5/18/18
to
HTML is a scripting language for displaying data, not storing it. You
really don't want to store the HTML equivalents in your database. That
will make searching the database more complicated. You should store the
characters in their natural form in the database then use
htmlspecialchars() or htmlentities() to display the characters.

andre

unread,
May 19, 2018, 4:10:03 AM5/19/18
to
Le 18/05/2018 19:36, Jerry Stuckle a écrit :
> On 5/18/2018 11:13 AM, andre wrote:
>> Le 17/05/2018 10:01, andre a écrit :
>>> This is working well:
>>> print "L'homme\n";
>>> Buts this
>>> print "<P>INPUT NAME='titre' VALUE='l\'homme'>\n";
>>> Print only upto the appostrophe! : l\
>>> Any idea??
>>> Many thanks
>>> André
>> As the data comes from a database, I have made the change in the entry
>> procees to the db.
>> $TT = htmlspecialchars($Titre, ENT_QUOTES);
>> and oups OK.
>> I have a few records to update will not be too difficult.
>> Many thanks again
>> André
>
> HTML is a scripting language for displaying data, not storing it. You
> really don't want to store the HTML equivalents in your database. That
> will make searching the database more complicated. You should store the
> characters in their natural form in the database then use
> htmlspecialchars() or htmlentities() to display the characters.
>
Here the problem is limited to a character you never use for searching
the database (full text) so in my case it's a limited problem, but you
are right better no HTML in the database.
Many thanks

Jerry Stuckle

unread,
May 19, 2018, 8:48:02 AM5/19/18
to
Or if you ever want to generate non-html output, such as ad hoc queries
or reports. Many reasons you don't want HTML in your database.

Thomas 'PointedEars' Lahn

unread,
May 19, 2018, 2:18:33 PM5/19/18
to
andre <pas...@pasici.be> wrote:
^^^^^^^^^^^^^^^^
Yes, and another reason is that it requires more space to be stored.
There are instances where one wants to store at least some markup in a
database, but this is not one of them.

However, HTML, the HyperText Markup Language, is (as the name says, too)
a _markup_ language, _not_ a scripting language. (The scripting languages
that can be used in HTML include ECMAScript implementations such as
JavaScript, which is one of my other fields of expertise.)

> Many thanks

If only you would play by the rules…

Jerry Stuckle

unread,
May 19, 2018, 6:33:30 PM5/19/18
to
On 5/19/2018 2:18 PM, Thomas 'PointedEars' Lahn wrote:
> andre <pas...@pasici.be> wrote:
> ^^^^^^^^^^^^^^^^
>
> However, HTML, the HyperText Markup Language, is (as the name says, too)
> a _markup_ language, _not_ a scripting language. (The scripting languages
> that can be used in HTML include ECMAScript implementations such as
> JavaScript, which is one of my other fields of expertise.)
>

So please educate us - what is the difference between a scripting
language and a markup language?

Peter H. Coffin

unread,
May 20, 2018, 11:55:13 AM5/20/18
to
On Sat, 19 May 2018 18:33:47 -0400, Jerry Stuckle wrote:
> On 5/19/2018 2:18 PM, Thomas 'PointedEars' Lahn wrote:
>>
>> However, HTML, the HyperText Markup Language, is (as the name says, too)
>> a _markup_ language, _not_ a scripting language. (The scripting languages
>> that can be used in HTML include ECMAScript implementations such as
>> JavaScript, which is one of my other fields of expertise.)
>>
>
> So please educate us - what is the difference between a scripting
> language and a markup language?

Oooh! Oooh! Mr Kottair, sir! Oooh! ;)

--
48. I will treat any beast which I control through magic or technology
with respect and kindness. Thus if the control is ever broken, it
will not immediately come after me for revenge.
--Peter Anspach's list of things to do as an Evil Overlord
0 new messages