Jorgen Grahn <
grahn...@snipabacken.se> writes:
[...]
> I always assumed parsing meant reading any kind of data, deciding if
> it matches the parser's language, and if it does, offering the
> "pieces" in a helpful way to the next step of processing.
This description is consistent with my own understanding and also
with the various definitions and other online resources I
consulted.
> E.g. yacc is a parser generator, but in my mind so is scanf().
I wouldn't say scanf() is a parser generator, because it doesn't
generate a parser but rather goes ahead and does (some) parsing.
Perhaps more significantly, usually scanf() is used to parse not
an entire language but only some part of an input, with other
calls to scanf() parsing other parts of the input (and hence
other parts of the language being accepted). To me scanf() is
closer to being a lexer than a parser. However these are minor
distinctions; it's fair to say that a scanf() call parses a
line (or some other part) of an input.
> (And as background: I know for sure I was taught in computer science
> that "language" is anything with a syntax and (I suppose) semantics.
> It doesn't have to be Turing-complete or anything.)
In formal Computer Science, a language is any set of strings over
a finite alphabet. In most cases we are concerned only with those
languages that can be recognized by a computer program, and so
necessarily have a finite (and well-defined) "grammar". I put the
word "grammar" in quotes only because computer programs are more
varied than what is generally meant by the word grammar.
By themselves languages are just sets of strings, and do not have
any associated semantics. However it is certainly right that
a language needn't be large or "complete" in any sense. The set
of strings { "cat", "dog" } qualifies as a language in formal
language theory.
> [...] I tend to do things which fit in Unix pipelines.
(Comment about English language usage only.) In English, this
sentence is better written using "that" in place of "which". The
reason is the word "that" is restrictive, whereas the word "which"
is non-restrictive. That should be enough so you can read a more
full description somewheres on the web. (Or if not ask and I will
try to track down a good reference.)
(Note: my comment is meant only to be helpful, not corrective. My
friends for whom English is not their first language tell me it's
helpful for them to hear remarks like this, so their English can
be better. That's all I'm trying to do here.)