Am 04.09.2012 00:11, schrieb David Li:
> So perhaps some heuristic for differentiating
> between various input languages and then interpreting them as Python
> (Python, TeX, "English-like", etc.) could also be an interesting task.
Heh. That's simple:
- Have a grammar for each syntax that we have,
- run the input through all grammars,
- use the grammar that doesn't return an error.
The fun begins when considering the following cases:
1) No grammar matches.
2) More than one grammar matches.
For (1), you'd want to somehow rank the grammars according to how close
the input is to each grammar, and assume the user really meant the
closest one.
For (2), you'd want to check if the different grammars all really mean
the same. E.g. "1*1" should parse the same for all math grammars. Just
continue processing.
Otherwise, you'll have to ask the user. Or randomly guess one and let
the user explicitly select grammars.
There's also a slight complication for case (2): You may get different
parse trees but they'd boil down to the same operations. For examples,
grammars with different numbers of precedence levels tend to end up that
way; 1*2 could end as
op: *
int: 1
int: 2
or as
op: *
literal
int: 1
literal
int: 1
where the second grammar would for some reason differentiate between
literals, names, and other representations, where the first does not.
You'll either need a pass that normalizes grammars, or require that
commonalities between grammars are handled by identical rules.
The first approach probably requires less work because SymPy already has
routines for simplifying expressions; however, that makes error
reporting more difficult because the transformations aren't built for
keeping track of input line/column numbers.
You see, there's enough to do :-)
Not all aspects need to be addressed on the first round though. Just
choose how much of this all you want to deal with, and code in a way
that the rest can be added later without rewriting everything.
> Since Gamma only deals with mathematical expressions (which is more limited
> than Wolfram|Alpha) I believe at least some basic English-like queries can
> be interpreted.
> ...
> Given how
> difficult it is, though, I guess just being able to interpret 2x, sin
> x, and integral of x^2 would be a nice step up in functionality.
Indeed, that's easy enough. You can always write a grammar that accepts
a subset of English.
Main points:
- Do not require parentheses for function parameters; a function call is
just: name {expr}
- Make name {expr} bind weaker than all operators, so sin x+y is
equivalent to sin (x+y).
> I should've been more specific about that. I thought that
> natural language could help somewhat with the task, or at least point me
> towards algorithms and ideas, which is why I mentioned it.
That wouldn't have worked. Parsing natural language is really hard. And
the algorithms beyond parsing aren't related much to natural language.
Still, the natural language parsers should be suitable.