Q: is there a way to speed up the ruby MetaInspector?

48 views
Skip to first unread message

Ronnie Kessler

unread,
Nov 17, 2015, 2:22:25 AM11/17/15
to MetaInspector
I'm finding it's taking on average around 2294.466 ms to process URLs that I'm interested in.

Any recommendations (other than caching) that might speed things up?

If I'm just looking for meta tag information so can I stop image + link traversing?

Was considering just writing my own parser but I really like your gem and the helper methods.

Jaime Iniesta

unread,
Nov 17, 2015, 5:21:33 AM11/17/15
to metain...@googlegroups.com
Hi Ronnie, 

I'll be happy to investigate this.

2015-11-17 8:22 GMT+01:00 Ronnie Kessler <rlke...@gmail.com>:
I'm finding it's taking on average around 2294.466 ms to process URLs that I'm interested in.

It would be good to have a benchmark that we can take as a reference. Could you post some URLs that we can use?
 
Any recommendations (other than caching) that might speed things up?


I'd bet most of the time invested is on the request phase, so cache is important here.

Also, if you already have downloaded the page, you can pass its contents to MetaInspector using the :document option.


If I'm just looking for meta tag information so can I stop image + link traversing?


On initialization, MetaInspector will request the URL and then parse the document to build a tree of its contents with Nokogiri:


Other than that, it does not traverse links or images until you ask it to.

If you're only interested in getting meta tags defined on the <head> section, you could try fetching the page on your own, removing the whole <body> section, and pass the rest to MetaInspector using the :document option, maybe in this way Nokogiri has less load to parse the whole document.

Was considering just writing my own parser but I really like your gem and the helper methods.


This is a good option as well, if you need just a subset of what MetaInspector does!

Ronnie Kessler

unread,
Nov 17, 2015, 5:41:26 AM11/17/15
to Jaime Iniesta, metain...@googlegroups.com
Thanks for the quick response Jaime.

I don't request the same docs each time. It's dynamic based on a feed from Twitter. So the kind of links are: 

I realised it could be an issue with my internet speed at home and potential specs of my Macbook.

I am going to run some tests in my production environment now and see if I see similar scenarios.

I'm too sure that the striping the body section will save that much as it seems Nokogiri is quite efficient.

Q: when you call page.images.best (note: download_images: false), is that only calculated on initialization or only when called?

--
You received this message because you are subscribed to a topic in the Google Groups "MetaInspector" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/metainspector/sfikZPMl1uM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to metainspecto...@googlegroups.com.
To post to this group, send email to metain...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/metainspector/CAKFFWV-LcgYmxny1B_3mcD6T9AVCAD0LfQA2dtDwLo0pcgR5AQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--

Ronnie Kessler

unread,
Nov 17, 2015, 6:29:31 AM11/17/15
to Jaime Iniesta, metain...@googlegroups.com
Btw, just ran in Production and seems a LOT faster.. may not have to optimise too much in the end.

Jaime Iniesta

unread,
Nov 17, 2015, 6:39:58 AM11/17/15
to Ronnie Kessler, metain...@googlegroups.com
Good to know, Ronnie!

It looks like most of the time is spent on following the redirections from Twitter. In fact, when you click one of those links you actually see it takes a while to get you to the final page.

I've done some benchmarks, take a look. Would be great to see your results in production as well:


Here I'm running 10 times each report to get an average, as you can see it takes almost 5 seconds just to initialize it as it has to make a request to t.co and then follow the redirects to get to the final page.

On the second line, I'm doing the same but with the resolved URL, in that case it takes just 1 second!

On the third line, I initialize it passing the document contents (so no request is done), in that case it just takes 0.14 seconds.

And on the rest of the lines I measure how several parsings take, once you have initialized the document. You can see that this time is very small compared to what a request takes.

Sure we could optimize the parsing code, but this is not going to change much as long as we have to request pages and resolve redirections.

Jaime Iniesta :: Ruby on Rails consultant
http://jaimeiniesta.com

Jaime Iniesta

unread,
Nov 17, 2015, 6:43:08 AM11/17/15
to Ronnie Kessler, metain...@googlegroups.com
2015-11-17 11:41 GMT+01:00 Ronnie Kessler <rlke...@gmail.com>:

Q: when you call page.images.best (note: download_images: false), is that only calculated on initialization or only when called?

That is only calculated when called. 

Ronnie Kessler

unread,
Nov 17, 2015, 7:33:17 PM11/17/15
to Jaime Iniesta, metain...@googlegroups.com
Looks like it is the Twitter redirects that are the problem but I can't do anything about that unfortunately.

I will keep an eye on it and may get in touch should I need some more help.

Thanks Jaime.
Reply all
Reply to author
Forward
0 new messages