Hi Steve,
Yes I am well aware that this regex example is not well suited for SPM.
This was a proof of concept. Pushing things no the extreme is my
way of understanding things deeply, so this was something I needed.
For some reason, I love and hate regex. I hate it because it is
unpythonic, char only and ugly. I love it because it is fast, and by the
use of the verbose flag also quite readable.
But getting rid of regex in favor of something even more capable was
a long-standing wish that is yet not fulfilled, because the nature of
both features is (still) pretty different.
I would love to have similar building blocks as in regex, but with a
pythonic syntax, and extending the basic string matching to general
objects. At the moment I don't see this in SPM because there are basic
flexible patterns missing. The only flexible thing in sequences is
the star operator, but in my example this is always eaten by the need
of an open end in the pattern. This is something that might improve.
As a drive-by, while looking into the Pilgrim algorithm for Roman
literals, I found by chance a faster algorithm :)
Not only that my SPM craziness is now really faster than the regex
solution, but I found something better, based on Pilgrim's `toRoman`
part of the algorithm :D
Given one of the basic algorithms in the internet which are fast
and incomplete, this here is much faster than using regex:
def from_roman_fastest(numeral):
if numeral == 'N':
return 0
num = from_roman_numeral(numeral)
cmp = roman.toRoman(num)
if numeral != cmp:
raise InvalidRomanNumeralError(f"Invalid Roman numeral:
{numeral}")
return num
This follows the old observation "Listening is much harder than talking",
so this algorithm does not try a complex solution, but uses a simple one
and checks if the input string was correctly reconstructed.
Cheers -- Chris
On 02.08.23 22:30, Steve Holden wrote:
> Hi Chris,
>
> Nice to see you on the list.
>
> While this is definitely off-topic, I trust I might be given license by
> the list's few remaining readers to point out that the match-case
> construct is for _structural_ pattern matching. As I wrote in the latest
> Nutshell: "Resist the temptation to use match unless there is a need to
> analyse the _structure_ of an object."
>
> I don't believe it's accidental that match-case sequence patterns won't
> match str, bytes or bytearrray objects - regexen are the tool already
> optimised for that purpose, so it's quite impressive that you are
> managing to approach the same level of performance!
>
> Kind regards,
> Steve
>
>
> On Wed, 2 Aug 2023 at 18:26, Christian Tismer-Sperling
> <
tis...@stackless.com <mailto:
tis...@stackless.com>> wrote:
>
> On 02.08.23 18:30, Paul Moore wrote:
> > On Wed, 2 Aug 2023 at 15:24, Stephen J. Turnbull
> > <
turnbull....@u.tsukuba.ac.jp
> <mailto:
turnbull....@u.tsukuba.ac.jp>
> > <mailto:
turnbull....@u.tsukuba.ac.jp
> <mailto:
tis...@stackless.com>
> <mailto:
pytho...@python.org>
> <mailto:
python-d...@python.org>
>
https://mail.python.org/mailman3/lists/python-dev.python.org/
>
https://mail.python.org/archives/list/pytho...@python.org/message/OFLAU34KWAKREKG4H2M5GES3PGT6VBAU/ <
https://mail.python.org/archives/list/pytho...@python.org/message/OFLAU34KWAKREKG4H2M5GES3PGT6VBAU/>
> <
http://python.org/psf/codeofconduct/>
> Message archived at
https://mail.python.org/archives/list/pytho...@python.org/message/DYTVT7CUFVVGIDPXG2MKIOELWJPG3W73/
Message archived at
https://mail.python.org/archives/list/pytho...@python.org/message/XGTQVVTRMQRKVKXSE4O5WZYZITMN5DBE/