On 2017-01-19, Janis Papanagnou <
janis_pa...@hotmail.com> wrote:
> On 19.01.2017 05:52, Kaz Kylheku wrote:
>> On 2017-01-19, Janis Papanagnou <
janis_pa...@hotmail.com> wrote:
>>> On 19.01.2017 00:31, Kaz Kylheku wrote:
>>>>> [ match() function ]
>>>>
>>>
>>> In addition to my previous post one more comment on your code below...
>>>
>>> Ordinary (non-backtracking based) regexp matching is of complexity O(N).
>>> If you assume the code below you would get a severe degradation in
>>> performance.
>>
>> That model with the search loop specifies the abstract behavior
>> (the result which is obtained), not necessarily the implementation.
>
> The point was that the abstract code you provided makes no sense to
> me. It immediately imposes an O(N^2) complexity where only O(N) would
> do.
More correctly, it relaxes the worst case upper bound on running
time to that.
Poor, yet very simple, algorithms have excellent uses.
They can clearly specify *what* is to be computed, so then
faster algorithms can be checked that they compute the same thing.
They provide a simple implementation when performance isn't
important.
The basic principle of the regexp search is that the input data
> (And any concrete implementation would need some counterpart of your
> abstract model. So I'm still looking forward seeing concrete pointers.)
Gawk-wise, see the function re_search_internal in regexec.c.
See the loop starting here:
for (;; match_first += incr)
{
err = REG_NOMATCH;
if (match_first < left_lim || right_lim < match_first)
goto free_return;
/* Advance as rapidly as possible through the string, until we
find a plausible place to start matching. This may be done
with varying efficiency, so there are various possibilities:
only the most common of them are specialized, in order to
save on code size. We use a switch statement for speed. */
and where it later calls the actual regex matcher:
/* It seems to be appropriate one, then use the matcher. */
/* We assume that the matching starts from 0. */
mctx.state_log_top = mctx.nbkref_ents = mctx.max_mb_elem_len = 0;
match_last = check_matching (&mctx, fl_longest_match,
range >= 0 ? &match_first : NULL);
What is check_matching? This:
/* Check whether the regular expression match input string INPUT or not,
and return the index where the matching end, return -1 if not match,
or return -2 in case of an error.
FL_LONGEST_MATCH means we want the POSIX longest matching.
If P_MATCH_FIRST is not NULL, and the match fails, it is set to the
next place where we may want to try matching.
Note that the matcher assume that the maching starts from the current
index of the buffer. */
static int
internal_function __attribute_warn_unused_result__
check_matching (re_match_context_t *mctx, int fl_longest_match,
int *p_match_first)
See? A function that appears to implement a pure regex automaton which
reports how many characters of the input match, if at all.
The function is optimized in how it searches forward through the string;
it doesn't just throw a "check_matching" call at every character
position. But it *could* do that. Quite possibly, the first versions of
it might have been like that.