Gene Wirchenko wrote:
> On Fri, 04 Nov 2011 10:30:14 +0100, Christian Kirsch wrote:
>
>>Gene Wirchenko schrieb:
>>
>>> You can read my questions as being about JavaScript
>>> specifically
>>> or the conglomerate of systems I mentioned. I could use help with
>>> both.
>
>>But what *are* your questions? You mention one thing ("escape
>
> They are the sentences that end with "?". Here they are again:
>
> Are there any gotchas that I should be particularly alert for?
For some reason the biggest "gotcha" with javascript is premature
confidence. There seem to be recurrent manifestations of the
Dunning-Kruger effect (if you are not familiar with than look it up as
it is an interesting, and all too human, psychological phenomenon). To
date I have had three epiphanies where I have suddenly "got" javascript
and realized how utterly diffident my previous understanding of the
subject had been. For the perspective following the third I realized
that 1. the code I was writing after the first epiphany was so bad it
was actively dangerous (to the people who were employing me to write
it), and 2. that there is no good reason to be certain that I have
actually "got" javascript now (event given that I have been writing
nothing but javascript full time for the last 8 years).
The next general "gotcha" is related to the first. Premature (but
actually unjustified) confidence, if examined, would lead to cognitive
dissonance, humans deal with that by irrationally re-enforcing and
bolstering the decisions that they have made. This manifests itself in
some people adopting a religion-like attitude towards some aspects of
browser scripting, and you will find these people writing all over the
internet. Generally you want to be interested in the reasoning that
backs any position that anyone takes (how good are their actual
arguments) and be very wary of those who only assert that they are/were
correct (and particularly those who will not listen to/engage with
criticism).
Recently I had to write a technical test for some JS programmer
interview. I tried to include every common misconception about
javascript that I have noticed. These are (most of) the points I tested
for that bset qualify as "gotchas":-
1. Understanding the factors that determine the - this - value in
javascript code; That it is determined by how you call a function/method
and not, for example, which object that function/method may 'belong' to.
2. Clearly seeing the boundaries between the programming language and
the environment that the browser provides to be programmed; the
distinctions between what the 3rd edition of the language specification
term "host objects", "built-in objects" and "native object".
3. Identifying and understanding the types of values that javascript
uses; that there are 5 primitive data types (boolean, number, string,
null and the undefined type). You can create String, Number and Boolean
objects (explicitly or implicitly) but those objects are distinct from
their corresponding primitive types. (Java programmers are particularly
fond of assuming the string primitive type is actually an object
instance created (implicitly) with the String constructor, it is not.)
4. Understanding that in a loosely types language it is the programmer's
responsibility to keep track of the types of the values that any
particularly variable/property/parameter will/may have; to know when and
what implicit type conversions will occur and avoid every having the
need to exhaustively test the type of an unknown value (particularly
function arguments).
5. The specification for the User Agent header and the corresponding
userAgent property of the navigator object defines its contents as an
effectively arbitrary sequence of characters that does not even need to
be identical for two consecutive HTTP requests. This means that there is
no technical basis for assuming that the userAgent property can be used
as a means of identifying web browser types or versions. Tempting as it
may seem at times, driving script behaviour on the bases of a limited
set of observations of what example userAgent strings may have contained
at some time is a common mistake, and takes quite a bit of fixing once
reality hits home.
5. Confusing javascript's object/array literals with the (necessarily)
much more formal JSON.
6. Assumptions/Inferences about web browser environments are dangerous
and should not be used if they can be avoided (and they can be avoided
much more often than people seem to think). For example:-
if(document.images){
var img1 = new Image();
img1.src = 'foo.png';
}
- tests for the existence of - document.images - in order to guard
against the use of the - Image - constructor in browsers that don't
support it. These two features were introduced together so in practice
this test (which was a common construct, and is given as an example of
'best practice' in some places) did not fail significantly (To the best
of my knowledge it only failed in versions of IceBrowser between 2 and
5, and that browser was so little used that few are even aware that it
ever existed (and it is now dead)). However, if the point is to guard
against attempting to use the - Image - constructor in an environment
where it is not available then the logical subject for the test is the -
Image - constructor itself. When you cannot find the constructor then it
is not there, and you shouldn't try to use it.
The general rule for feature testing is; design the test with its
subject having as close a relationship as possible to the feature that
is to be used, and preferably a one-to-one relationship.
7. Failing to learn the javascript related technologies (HTML, DOM, CSS
and HTTP (at minimum)) formally. That is, become sufficiently familiar
with their specification to know what is there, and where to look in
order to find the answers to the questions that you may have about them.
> Are there any good books that get into the nasty bits?
It was the books that I first read on javascript that got me to the
point of writing code that was actively dangerous. Be very careful with
javascript books; 60%+ factually false is no unknown and I would be
supported to see worthwhile script design advice (given what I have seen
to date).
>>characters"), and I don't even understand, what you mean by this. If
>>you
>
> Maybe you use different terminology? In some languages, to
> represent certain characters in a string, one must escape the
> character. C example:
> char tab='\t';
> tab would then contain the tab character, not a backslash followed by
> a lower-case T.
Escaping is necessary but can be very context dependent. If you get an
arbitrary sting out of a database what you are going to have to do to it
in order to render it 'safe' depends on where it is going to end up. If
it is going into a javascript string then it needs one set escaping, if
its destination is as text in an HTML element then it needs another,
while an HTML attribute may need another. Then there are things like
inserting data as a javascript string that will be used as part of a URL
in the code that will be used for an HTML intrinsic event attribute;
that is three possible levels of escaping for different contexts, that
will need to be applied in the correct order.
Javascript has escape sequences that start with a backslash, but there
are a number of types. So '\t' (the tab) may also be the hexadecimal
escape sequence '\x09' and the Unicode escape sequence '\u0009'. Notice
that the Unicode sequence can accommodate a much wider range of
characters. A little worthwhile advice; you will usually need to escape
quote marks and apostrophes for use inside javascript string, but if you
do that by just preceding them with a backslash then the quote character
is still in the source text and so may need handling at a separate level
in the escaping process. But if you use a hexadecimal or a Unicode
escape sequence instead then the problematic character is gone from that
point forward (and the remaining backslash is both easier to handle and
less often problematic (e.g. not a character that HTML treats as
special).
Also, IE's JScript has had (may still have, I have not checked this in
IE9 yet) a long standing but where the escape sequence '\v' was taken as
a literal 'v' when it should have been treated as a vertical tab. You
don't see vertical tabs often but if they are going to be escaped it is
definitely more predictable to escape then to a hexadecimal or Unicode
sequence.
Escaping forward slashes may seem unnecessary. However, I have seen
(possibly overzealous) security software objecting to seeing them in URL
query strings and I have a recollection of seeing a JSON script
insertion strategy that would be neutralised by unconditionally escaping
that character.
The set of characters that probably should be escaped when included in a
javascript string are the backslash, apostrophe, double quote, carriage
return (\x0D), line feed (\x0A), form feed (\x0C), line separator
(\u2028) and paragraph separator(\u2029). I would also habitually escape
backspace (\x08), Horizontal tab, vertical tab (\x0B), BOM (\uFEFF) and
the forward slash. Always escape the escape character first when
escaping and unescape it last when unescaping (backslash when escaping
for javascript, but whichever escape character is used in whichever
context: it is a rule and an oft forgotten one).
>>want answers, you should probably be asking clear questions.
>
> One of the problems of getting started is finding out exactly
> where to start. When I have specific questions, I will ask them. For
> now, I am trying to get the lay of the land.
No matter how bad things may look they are actually 100 times better
than they were 10 years ago, and people learnt to coped back then.
Richard.