How to find the 'best' match

瀏覽次數:111 次
跳到第一則未讀訊息

Robert Labrie

未讀,
2014年6月4日 上午9:07:282014/6/4
收件者:brow...@googlegroups.com
I'm writing my own parser of the browscap.csv, because I want to write matched lines out to a cache which will be checked first on subsequent lookups. The goal is to speedup the browscap parsing.

The part I'm not understanding is how to find the "best" match. My UA string matches 32bit and 64bit Windows (WOW64).

Existing Python and PHP projects do special processing of the file, and rebuilding the original data line turned into a gigantic hassle, which is why I'm re-writing.

Any suggestions?

Thanks

James Titcumb

未讀,
2014年6月4日 中午12:24:582014/6/4
收件者:browscap on behalf of Robert Labrie
Hi Robert,

I can't remember the exact output of how the CSV is created, but certainly the INI file is generated in a specific order. The file should then be processed in order from top to bottom, if each regex fails to match (i.e most specific regex first), then fall down to the next, which eventually gets down to "fallback" regexes (like Chrome, generic), and finally to "DefaultBrowser" which just matches .*

Hope this helps!  & good luck, I'd be interested to see what you come up with!

Thanks
James



--
You received this message because you are subscribed to the Google Groups "browscap" group.
To unsubscribe from this group and stop receiving emails from it, send an email to browscap+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Robert Labrie

未讀,
2014年6月7日 晚上11:00:542014/6/7
收件者:brow...@googlegroups.com

Hi James,

Well I got my splunk app working and submitted to them for acceptance. This is what I came up with:
https://github.com/tnwinc/TA-browscap_express

The comments in browscap_lookup.py explain what I was doing pretty well. My big unknown is handling generics. I want to cache "Generic Java Crawler" but not "Chrome Generic". If "Chrome Generic" ever becomes "Generic Chrome" I'm in trouble.

The CSV file is not in order of detail. You have to use the longest matched string to get the winner. Still, this app is super fast compared to the others, and the browscap data is the best there is.

Cheers,
Rob

James Titcumb

未讀,
2014年6月8日 凌晨4:20:132014/6/8
收件者:browscap on behalf of Robert Labrie
Hi Rob,

Thanks for this - I've added this to our list of tools that work with browscap :) (see: https://github.com/browscap/browscap/wiki/Using-Browscap)

Just to point out - the "parent" stuff is about inheritance - I think it's pretty much redundant for the CSV as all fields are displayed regardless of inheritance, but in the INI file fields are ommitted from the "children" if the "parent" has the same configured field. e.g. Chrome 35 is a child of Chrome Generic, is a child of Default Properties - it's to reduce duplication.

Thanks
James


Ben Simo

未讀,
2014年7月9日 下午4:17:442014/7/9
收件者:brow...@googlegroups.com
Hi James,

I'm encountering issues searching for things in the CSV file from top to bottom -- specifically matching iPad browsers. The file contains the non-tablet iPhone versions of the PropertyNames before the iPad ones. This causes the match to occur to the non-tablet one. It seems to me that the file is therefore ordered incorrectly.

For example,

Mozilla/5.0 (iPad; CPU OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3

will match to

Mozilla/5.0 (*CPU*OS*like Mac OS X*)*AppleWebKit/*(*KHTML, like Gecko*)*Version/*Mobile/*Safari/*

instead of something more specific like

Mozilla/5.0 (iPad*CPU OS 5_1* like Mac OS X*)*AppleWebKit/*(*KHTML, like Gecko*)*Version/*Mobile/*Safari/*


which occurs hundreds of lines beyond the first match.



Ben

Ben Simo

未讀,
2014年7月9日 下午4:19:392014/7/9
收件者:brow...@googlegroups.com
Doh! Now I see Robert's comment about having to use the longest matched string. I'll try that.

Ben

Ben Simo

未讀,
2014年7月10日 下午2:37:392014/7/10
收件者:browscap on behalf of Ben Simo

My "good enough for now" quickie  solution was to sort the file by line length in descending order.

Ben

You received this message because you are subscribed to a topic in the Google Groups "browscap" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/browscap/VkIYev3p6WA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to browscap+u...@googlegroups.com.

James Titcumb

未讀,
2014年7月10日 下午2:55:062014/7/10
收件者:browscap

I believe that is how the browscap/browscap-php library works anyway :)

Thanks
James

回覆所有人
回覆作者
轉寄
0 則新訊息