Matching multiple regexes

61 views
Skip to first unread message

Dirk Olmes

unread,
Oct 9, 2013, 10:32:35 PM10/9/13
to streamfly...@googlegroups.com
Hi,

I'm using streamflyer to parse HTML pages directly from the response stream. Matching a single regular expression works quite well  but I'm failing to understand how to match multiple regular expressions from the same stream.

The HTML page I'm trying to parse displays different sections with lists of items in each section. I need to extract a section's title and then each item in the list. I got as far as matching each bit separately but I'm having a hard time to figure out how to match both types in a single pass.

Is this possible with streamflyer at all?

-dirk

rw...@gmx.de

unread,
Oct 12, 2013, 9:29:07 AM10/12/13
to streamfly...@googlegroups.com
Hi Dirk,

if you want to match different regular expressions in a stream 
you can combine the expressions in a single regular expression.

After the match your match processor has to find out which expression of all the expressions has been matched. Then the processor can do its expression-specific action.

If this is looks a bit complicated, please have a look at the class MyTokenProcessorTest I just commited in the Maven module streamflyer-support (get the latest sources, please).

This class uses a little framework that makes it very easy to match many expressions.

best regards
rod

rw...@gmx.de

unread,
Nov 11, 2013, 1:59:43 PM11/11/13
to streamfly...@googlegroups.com
Hi Dirk,

There is a bunch of new classes in the new release 1.1.1 which might help you.

Have a look at StateMachine. This class allows you to exchange the searched regular expression during the stream reading/writing.

Regards
Rod


On Thursday, 10 October 2013 04:32:35 UTC+2, Dirk Olmes wrote:

Dirk Olmes

unread,
Nov 13, 2013, 8:44:27 PM11/13/13
to streamfly...@googlegroups.com
On Monday, November 11, 2013 7:59:43 PM UTC+1, rw...@gmx.de wrote:
Hi Dirk,

Hi Rod,
 
There is a bunch of new classes in the new release 1.1.1 which might help you.

Have a look at StateMachine. This class allows you to exchange the searched regular expression during the stream reading/writing.

I just had a look at StateMachine and friends and it looks useful to match multiple regexes.

Meanwhile I took a different approach to soliving the problem that triggered my initial question. Parsing HTML was a cludgy approach anyway ...

Thanks for your feedback and your help,

-dirk

rw...@gmx.de

unread,
Nov 24, 2013, 12:52:38 PM11/24/13
to streamfly...@googlegroups.com
> Thanks for your feedback and your help,

Message has been deleted
Message has been deleted

Kamyar Nadjafloo

unread,
Dec 2, 2014, 4:52:40 PM12/2/14
to streamfly...@googlegroups.com
running into issues with char-set as you mentioned here

I have a bytestream with utf-8 charset and I am getting garbled data when it's converted to text and back. Any solution to this?

rw...@gmx.de

unread,
Dec 2, 2014, 5:36:03 PM12/2/14
to streamfly...@googlegroups.com
Hi Kamyar,

if your byte stream contains only UTF-8 encoded characters, then you shouldn't get any problem. Are you sure that your byte stream contains UTF-8 encoded characters?

Are you aware of BOM bytes at the beginning of the stream? Could a BOM cause your problem?

Kamyar Nadjafloo

unread,
Dec 3, 2014, 11:48:28 AM12/3/14
to streamfly...@googlegroups.com
Thank you for the reply. I would it should work too. It only fails on Chunked Encoding and we're using https://github.com/membrane/service-proxy/blob/master/core/src/main/java/com/predic8/membrane/core/http/ChunkedBody.java#L52 and it can't find the chunk size because the chars are different than they would be if I just used Byte stream without modifying the inputstream. trying to figure out if there's something that changes the initial part of the stream. I will look into BOM bytes as well and I wasn't aware of them. Thank you

Kamyar Nadjafloo

unread,
Dec 4, 2014, 12:24:13 PM12/4/14
to streamfly...@googlegroups.com
It's actually my issue. looks like the library working great. thank you

rw...@gmx.de

unread,
Dec 5, 2014, 2:40:43 PM12/5/14
to streamfly...@googlegroups.com
HI Kamyar

I am glad to hear that.

Cheers
Rod
Reply all
Reply to author
Forward
0 new messages