Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Help with RexExp in English?

1 view
Skip to first unread message

Heinz Kesting

unread,
Jun 10, 2017, 2:57:32 PM6/10/17
to
Hallo,
I am new to RegExp, and I don't speak any French. I was lucky to finally
have found a newsgroup for this topic, can you give some advice or
example for me in English, or point me to an English-speaking forum for
RegExp?

Thank you very much for helping!

Kind regrads, Heinz

Duzz'

unread,
Jun 10, 2017, 4:43:03 PM6/10/17
to

Heinz Kesting

unread,
Jun 11, 2017, 6:16:51 AM6/11/17
to
Hi,
>
> This could be a first step :
> <https://www.google.fr/?gws_rd=ssl#q=english+regexp+forum>
>

Thanks for this, but I'd prefer to find a forum in a newsreader, like
thunderbird, not in a web interface with all its adverts and distractions.

Could you be so kind and allow here an exception for me in English, please?
I am looking for a RegExp that would match the following pattern:

[Any alphanumeric text][<a href="][Any alphanumeric text][<a href="][Any
alphanumeric text][<a/>]

I need this for a text which is intended to be displayed later in a
web-like editor with hyperlinks.
I'd like to make sure that the hyperlinks are correctly inserted, before
the text goes into the web display. So the following pattern would be OK
and must not be a match:

[Any alphanumeric text][<a href="][Any alphanumeric text][<a/>][Any
alphanumeric text][<a href="][Any alphanumeric text][<a/>]

I tried patterns like '<a href=.+?^[<a/>].+?<a href=.+?</a>'
which was intended to work like:

'<a href=' not being followed by '<a/>' but 'any alphanumeric text' and
a second '<a href=">'

in other words if '<a href=' is not followed by '</a>' but a second '<a
href=' that's faulty and must be matched and reported for correction.

Hope I could explain well enough ...

Thanks for any help on this!
Kind regards, Heinz

Olivier Miakinen

unread,
Jun 11, 2017, 6:55:58 AM6/11/17
to
Bonjour,

Le 11/06/2017 12:16, Heinz Kesting a écrit :
> Hi,
>>
>> This could be a first step :
>> <https://www.google.fr/?gws_rd=ssl#q=english+regexp+forum>
>>
>
> Thanks for this, but I'd prefer to find a forum in a newsreader, like
> thunderbird, not in a web interface with all its adverts and distractions.
>
> Could you be so kind and allow here an exception for me in English, please?

Je veux bien lire ton anglais, mais accepte de recevoir les réponses en
français.

> I am looking for a RegExp that would match the following pattern:
>
> [Any alphanumeric text][<a href="][Any alphanumeric text][<a href="][Any
> alphanumeric text][<a/>]

« alphanumeric » = [a-zA-Z0-9]

Je suppose que ce n'est pas ce que tu veux car le code HTML ne serait
même pas bien formé.

> I need this for a text which is intended to be displayed later in a
> web-like editor with hyperlinks.
> I'd like to make sure that the hyperlinks are correctly inserted, before
> the text goes into the web display. So the following pattern would be OK
> and must not be a match:
>
> [Any alphanumeric text][<a href="][Any alphanumeric text][<a/>][Any
> alphanumeric text][<a href="][Any alphanumeric text][<a/>]

Ah, en fait ce que tu veux c'est :
[<a href="][Any text *but* <a/>][a href="]

Sauf que le <a/> est sûrement une erreur.

Donc :
[<a href="][Any text *but* </a>][a href="]

> I tried patterns like '<a href=.+?^[<a/>].+?<a href=.+?</a>'
> which was intended to work like:
>
> '<a href=' not being followed by '<a/>' but 'any alphanumeric text' and
> a second '<a href=">'

1) Ne pas confondre </a> avec <a/>
2) ^ en dehors des [] signifie « début de ligne » et pas « exclure »
3) [<a/>] prend un seul caractère qui est soit un <, soit un a, soit
un /, soit un >. C'est équivalent à [/<>a].

> in other words if '<a href=' is not followed by '</a>' but a second '<a
> href=' that's faulty and must be matched and reported for correction.

Bon, là c'est clair.

http://php.net/manual/en/regexp.reference.assertions.php

Donc :
'<a href=((?!</a>).)*<a href='

Ou plus simplement :
'<a ((?!</a>).)*<a '

> Hope I could explain well enough ...

Hope you can use Google translation or something like that.


--
Olivier Miakinen

Heinz Kesting

unread,
Jun 11, 2017, 5:21:42 PM6/11/17
to
Bonjour, Olivier!

> Ah, en fait ce que tu veux c'est :
> [<a href="][Any text *but* <a/>][a href="]
>
> Sauf que le <a/> est sûrement une erreur.
>
> Donc :
> [<a href="][Any text *but* </a>][a href="]
>

Sorry, that was a typo, of course!

>> in other words if '<a href=' is not followed by '</a>' but a second '<a
>> href=' that's faulty and must be matched and reported for correction.
>
> Bon, là c'est clair.
>
> http://php.net/manual/en/regexp.reference.assertions.php
>

My dear, I've read so many of such articles, but somehow I can't get it
into my brain! Obviously, Regular Expressions (and French, too!) are two
languages my brain seems to be almost incompatible with ... (grin)

> Donc :
> '<a href=((?!</a>).)*<a href='
>
> Ou plus simplement :
> '<a ((?!</a>).)*<a'
>

Oh yeah, that did the trick! For all cases I could think of and tested
up to now, it worked like a charm, perfectly from the very start. By the
way, I used the first, 'long' version you proposed, just in case the
text might contain '<a' within any other context than a hyperlink.
So my main mistake seems to be the mix-up or confusion between ^ and !
besides the bracket syntax ...
>
> Hope you can use Google translation or something like that.

Yes, using an internet tranlation site helped quite well understanding
your reply. I must confess, except for Bonjour or Merci I don't have
much french vocabulary at hand - I had been lost completely without such
a tool.

So I'd like to say a very big MERCI to you - thank you sooooo much, you
saved my day, or even more correctly, you saved my week with your quick
and accurate support!

Knd regards, Heinz

Olivier Miakinen

unread,
Jun 11, 2017, 6:36:03 PM6/11/17
to
Le 11/06/2017 23:21, Heinz Kesting a écrit :
>>
>> http://php.net/manual/en/regexp.reference.assertions.php
>
> My dear, I've read so many of such articles, but somehow I can't get it
> into my brain! Obviously, Regular Expressions (and French, too!) are two
> languages my brain seems to be almost incompatible with ... (grin)

;-)

>> Donc :
>> '<a href=((?!</a>).)*<a href='
>>
>> Ou plus simplement :
>> '<a ((?!</a>).)*<a'

Une espace a été supprimée dans ta réponse, j'avais écrit :
'<a ((?!</a>).)*<a '
et non :
'<a ((?!</a>).)*<a'

> Oh yeah, that did the trick! For all cases I could think of and tested
> up to now, it worked like a charm, perfectly from the very start. By the
> way, I used the first, 'long' version you proposed, just in case the
> text might contain '<a' within any other context than a hyperlink.
> So my main mistake seems to be the mix-up or confusion between ^ and !
> besides the bracket syntax ...

Quel que soit le contexte, un '<a ' (ne pas oublier l'espace) doit
toujours être suivi d'un '</a>', même si c'est un '<a name=' et pas
un '<a href='.

Mais on peut améliorer ce test en prévoyant aussi la fin de fichier :
'<a ((?!</a>).)*(<a |$)'

Si vraiment tu tiens à la version longue :
'<a href=((?!</a>).)*(<a href=|$)'

> [...]
>
> So I'd like to say a very big MERCI to you

C'était avec plaisir.


--
Olivier Miakinen

Heinz Kesting

unread,
Jun 15, 2017, 6:08:57 PM6/15/17
to
Hallo Olivier,

Sorry to keep you waiting for so long for my reply, but I really didn't
find the time to get back to this until now.

> Quel que soit le contexte, un '<a ' (ne pas oublier l'espace) doit
> toujours être suivi d'un '</a>', même si c'est un '<a name=' et pas
> un '<a href='.
>

I am not sure if I understood what you're trying to point at here, but
since I am using the 'long' version of your solution, I guess (and hope)
it doesn't matter here.

> Mais on peut améliorer ce test en prévoyant aussi la fin de fichier :
> '<a ((?!</a>).)*(<a |$)'
>
> Si vraiment tu tiens à la version longue :
> '<a href=((?!</a>).)*(<a href=|$)'
>

If I understand correctly what you're doing here, then you make sure
that we will have a match even if the string "<a href=" is at the very
end of the text we're testing it on, right?
But that won't be happening - if a hyperlink is inserted, it will always
be something in the following pattern:

... some other text before a <a href="LINK_KEY_VALUE1">linked text
passage</a> appears and then any more text ...

where LINK_KEY_VALUE1 would be the value the programme would be looking
for (and jump to, if found) while the 'linked text passage' would be the
text in the web interface which would appear underlined to show that
this is a link to click on. So the text we're testing the RegExp on will
always end with some normal characters, or if the last word would be a
hyperlink, it would actually end with </a>, but never with a "<a href="
So, as I understand your change to the RegExp, it would not be necessary
to test for the possible ending "<a href=".

What I am trying to avoid with this RegExp formula is something like:

<a href="LINK_KEY_VALUE1">linked text <a
href="LINK_KEY_VALUE2">passage</a></a>

where two consecutive beginning tags of a hyperlink appear before an
ending tag. This might happen if the user who creates the links
highlights a text passage which is already part of a hyperlink - in this
example 'linked text passage' is already linked to 'LINK_KEY_VALUE1',
and now the word 'passage' is highlighted again and given a link to
'LINK_KEY_VALUE2', which doesn't make sense.
But since I can not prevent the user from highlighting any text
passages, I can only test AFTER the link has been created, if the syntax
of the new link is valid, like the following example:

<a href="LINK_KEY_VALUE1">linked text</a> <a
href="LINK_KEY_VALUE2">passage</a>

Here the text passage 'linked text' is linked to LINK_KEY_VALUE1 and the
word 'passage' is linked to LINK_KEY_VALUE2, no consecutive beginning
tags, but each beginning tag is followed by an ending tag before the
next begining tag appears.

I hope you can understand what I am meaning ...

Thanks again for your great support!

Kind reagrds, Heinz

Olivier Miakinen

unread,
Jun 15, 2017, 7:10:44 PM6/15/17
to
Le 16/06/2017 00:08, Heinz Kesting a écrit :
> Hallo Olivier,

Hallo Heinz,

> Sorry to keep you waiting for so long for my reply, but I really didn't
> find the time to get back to this until now.
>
>> Quel que soit le contexte, un '<a ' (ne pas oublier l'espace) doit
>> toujours être suivi d'un '</a>', même si c'est un '<a name=' et pas
>> un '<a href='.
>
> I am not sure if I understood what you're trying to point at here, but
> since I am using the 'long' version of your solution, I guess (and hope)
> it doesn't matter here.

Ja, du hast Recht.

> If I understand correctly what you're doing here, then you make sure
> that we will have a match even if the string "<a href=" is at the very
> end of the text we're testing it on, right?

Richtig.

> [...]
>
> I hope you can understand what I am meaning ...

Oui, parfaitement. La première regexp que j'ai proposée (et que tu as
adoptée) convient alors parfaitement. Inutile de changer.

Cordialement,
--
Olivier Miakinen

Heinz Kesting

unread,
Jun 18, 2017, 6:14:50 AM6/18/17
to
Hallo Olivier,

It's eventually Sunday, a little time to get to the things you don't get
done in the rest of the week ...
>>
>> I am not sure if I understood what you're trying to point at here, but
>> since I am using the 'long' version of your solution, I guess (and hope)
>> it doesn't matter here.
>
> Ja, du hast Recht.
>

You actually touched me by answering in my mother tongue - how did you
know that - was is my 'Hallo' that gave me away?
Yes, it feels REALLY good to receive such a warm welcome.
But anyway, I'll continue my reply in English, in case others are
following this, English seems to be more commonly acknowledged than
German is. I'd feel ashamed to write in German in a French forum ...

>> If I understand correctly what you're doing here, then you make sure
>> that we will have a match even if the string "<a href=" is at the very
>> end of the text we're testing it on, right?
>
> Richtig.

Well, that feels good again, to find that I have understood correctly
what you're pointing to.

>>
>> I hope you can understand what I am meaning ...
>
> Oui, parfaitement. La première regexp que j'ai proposée (et que tu as
> adoptée) convient alors parfaitement. Inutile de changer.

Good, so I'll stay with the 'long' version of your solution, as it works
like a charm in all the testing I've done with it.
>
> Cordialement

Thanks ocne more for your great and quick support!

Kind regards from Germany, Heinz

Olivier Miakinen

unread,
Jun 18, 2017, 1:05:42 PM6/18/17
to
Hallo Heinz,

Le 18/06/2017 12:14, Heinz Kesting a écrit :
>
> You actually touched me by answering in my mother tongue - how did you
> know that - was is my 'Hallo' that gave me away?
> Yes, it feels REALLY good to receive such a warm welcome.

J'aime beaucoup la langue allemande, plus que l'anglais, même si
malheureusement j'ai moins l'occasion de parler en allemand (et
que du coup j'ai beaucoup perdu depuis l'école).

Oui, quand j'ai vu ton « Hallo » j'ai pensé que tu pouvais être
germanophone, et en lisant ton prénom « Heinz », plus le fait que
ton nom commence par un « K », je me suis dit qu'il y avait peu
de chances que je me trompe.

> But anyway, I'll continue my reply in English, in case others are
> following this, English seems to be more commonly acknowledged than
> German is. I'd feel ashamed to write in German in a French forum ...

Tu as raison. En principe je râle aussi quand on y écrit en anglais,
mais le respect que tu as montré en demandant l'autorisation, plus
le fait qu'il n'existe pas de groupe usenet anglophone sur les regexp,
m'ont convaincu de ne pas être trop désagréable. ;-) En plus tu as
accepté de bon cœur que l'on te réponde en français.

>> [...]
>
> Kind regards from Germany, Heinz

Viele Grüße aus Frankreich,
--
Olivier Miakinen

Logan Won-Ki Lee

unread,
Jul 13, 2021, 10:19:17 PM7/13/21
to
Hi Heinz. This site is great: https://www.rexegg.com

;)

Olivier Miakinen

unread,
Jul 14, 2021, 3:23:50 AM7/14/21
to
Bonjour,

[en réponse à une question en anglais de 2011 ou de 2017]

Le 14/07/2021 04:19, Logan Won-Ki Lee a écrit :
> Hi Heinz. This site is great: https://www.rexegg.com

J'espère que depuis le temps Heinz avait déjà trouvé. D'autant plus qu'on
trouve quand même plus facilement des ressources en anglais et que ce n'était
pas vraiment la peine de poser la question sur un groupe francophone. ;-)

Il n'empêche que le site a l'air bien en effet. Quand on lit l'anglais.

--
Olivier Miakinen

Otomatic

unread,
Jul 14, 2021, 11:09:31 AM7/14/21
to
Olivier Miakinen <om+...@miakinen.net> écrivait :

> Il n'empêche que le site a l'air bien en effet. Quand on lit l'anglais.
J'aime bien la documentation de PHP afférente aux regexp
https://www.php.net/manual/fr/reference.pcre.pattern.syntax.php
0 new messages