url parser

25 views
Skip to first unread message

Ovidiu

unread,
Oct 16, 2014, 8:14:16 AM10/16/14
to professi...@googlegroups.com
Hello interwebs,

I'm looking for a logical solution for the following task:

I want to match the category part of the url of any possible combinations and limit them to 2 pieces;

for example:

http://example.com/cat/subcat/page.html to be converted into an array of max two values array('cat', 'subcat')

the problem is that I can't check for the .html or any .* extension because not all websites have this kind of url structure;

my current work is this: explode by '/': if count($pieces <= 1) continue; array_pop($pieces) array_slice($pieces, 0, 2); implode('/', $pieces)

the issue here is that on pages such as /cat/subcat I'll have the result 'cat' and, ongoing, for /cat => /

to make sense of my task I'll say that I want to match all possible urls of a website by the most common prefixes and limit the match by a factor of maximum 2 prefix categories (but excluding the page.html or any kind of page that is not a subcat). Simple right?

My question to all of you all, if you'll be so kind as to help me, is: "What suggestions do you have for me regarding this concept?"

I thank the PHP comunity,

Cheers!

Robert Gonzalez

unread,
Oct 28, 2014, 8:27:29 PM10/28/14
to professional-php
I know this is late, but if this is still an issue for you, can you reply with some commonly expected formats for your URLs you will be checking and what your expectation is from each? I've done this sort of thing countless times, but each time was a little different based on what was being requested.

--
You received this message because you are subscribed to the Google Groups "Professional PHP Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to professional-p...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Robert Gonzalez

Ovidiu Alexa

unread,
Oct 29, 2014, 4:40:10 AM10/29/14
to professi...@googlegroups.com
Hi,

I've got this going. I have used array_pop() and everything is just great. I was wondering for your opinions, that's all.

Cheers!

vishal bhandare

unread,
Oct 31, 2014, 1:03:24 PM10/31/14
to professi...@googlegroups.com

Hi,
Preg_split will give you correct result if you are looking for string between /(.*)/. Have u tried it?

Ovidiu Alexa

unread,
Oct 31, 2014, 1:53:17 PM10/31/14
to professi...@googlegroups.com
Hi,

how about a preg_match of a web URL that MIGHT contain either categories in the form of /cat/subca/page.html OR cat-subcat-page.aspx?

My first 2 cents:

 1. get an extension (.html, php, aspx)
2. get rid of the page containing the extension (page.html)
3. if the split character is the same everywhere (SEO purposes), such as '-' in the WHOLE URL, then bummer. ELSE, if there is a special character for a cat / subcat delimiter, then the match should be simple to program with regex.
4. finally, check the count of the categories found and make a decision: is this a valid category? return the text; Otherwise do the monkey :D

Good night everybody!

Is this a difficult REGEX syntax?

Thanks for any help.
Reply all
Reply to author
Forward
0 new messages