Crawling youtube.com

4 views
Skip to first unread message

Juls

unread,
Nov 12, 2009, 4:13:58 PM11/12/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Has anyone played around with crawling their companies content that
lies within social media channels?

Initially I'm thinking how to get our Youtube channel videos to be
indexed but could take it further. Tried and so far I can get our
Youtube channel homepage to be Indexed but that as far as I get after
playing around with the regex and avoiding the robots.txt disallow
file that's in place.

Any ideas?

Adam Burr

unread,
Nov 15, 2009, 10:32:30 AM11/15/09
to google-search-...@googlegroups.com
Hi Juls,

I have done this... But I did it using a feed. I wrote a small app that
used the YouTube REST API to query the website periodically. Once it has
the data from YouTube the app feeds the URLs along with some descriptive
text and metadata to the GSA. If you want more details then let me know.

Regards,

Adam
--

You received this message because you are subscribed to the Google Groups
"Google Search Appliance/Google Mini" group.
To post to this group, send email to
google-search-...@googlegroups.com.
To unsubscribe from this group, send email to
google-search-applia...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-search-appliance-help?hl=.


Pedro Jacinto

unread,
Nov 16, 2009, 10:40:13 AM11/16/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
Hi,

I'm also interested in crawling my company's channel on youtube. Can
you tell me how to bypass the robots.txt?

I understand that the more flexible solution would be the one Adam
suggested but for now I'm just interested in showing something on the
search results...

Thanks in advance,
Pedro Jacinto

brianb

unread,
Nov 19, 2009, 9:49:11 PM11/19/09
to Google Search Appliance/Google Mini - Google Search Appliance/Google Mini
You cannot bypass the robots.txt settings since this is internet
protocol to protect websites (from people like you. :-) ).
So the best and only way to do this would be to do what Adam did above
and feed those URLs in there.

Brian

OC

unread,
May 22, 2012, 1:44:11 PM5/22/12
to Google-Search-...@googlegroups.com, google-search-...@googlegroups.com
Can you tlle me how you did this using a feed?
 
Thanks,
Omar
google-search-appliance-help@googlegroups.com.
To unsubscribe from this group, send email to
google-search-appliance-help+unsub...@googlegroups.com.

Dave Watts

unread,
May 22, 2012, 3:03:09 PM5/22/12
to google-search-...@googlegroups.com
> Can you tlle me how you did this using a feed?

Actually, if the videos are in a single channel, you can just point
your GSA to the channel.

Dave Watts, CTO, Fig Leaf Software
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Veteran-Owned Small Business (VOSB) on
GSA Schedule, and provides the highest caliber vendor-authorized
instruction at our training centers, online, or onsite.

Kim Negaard

unread,
Jun 25, 2013, 11:04:30 AM6/25/13
to Google-Search-...@googlegroups.com, google-search-...@googlegroups.com
I just came upon this post as we are trying to index our YouTube channel.
I tried pointing the GSA to our channel, but because the videos don't seem to have any unique patterns to their URL I am unsure of how to stop it from continuing beyond our channel and crawling other videos on YouTube.
Any ideas on what put for "?

Thanks,
Kim

Jeremy Garreau

unread,
Jun 25, 2013, 11:54:20 AM6/25/13
to Google-Search-...@googlegroups.com, google-search-...@googlegroups.com
Feed them using the YouTube API instead

Dave Watts

unread,
Jun 25, 2013, 12:42:04 PM6/25/13
to Google-Search-...@googlegroups.com
> I just came upon this post as we are trying to index our YouTube channel.
> I tried pointing the GSA to our channel, but because the videos don't seem
> to have any unique patterns to their URL I am unsure of how to stop it from
> continuing beyond our channel and crawling other videos on YouTube.
> Any ideas on what put for "Follow and Crawl Only URLs with the Following
> Patterns"?

You used to be able to crawl a YouTube channel, but I don't think it's
possible to do that any more without capturing a bunch of other links
you don't want. So, Jeremy is right - you need to build a content feed
against the YouTube API.

Kim Negaard

unread,
Jun 25, 2013, 1:50:01 PM6/25/13
to Google-Search-...@googlegroups.com
Thanks for the insight Jeremy and Dave.
Kim

Kim Negaard

unread,
Aug 28, 2013, 10:36:31 AM8/28/13
to Google-Search-...@googlegroups.com
Following up on this post - Since feeding in the data with the YouTube API was needed, we decided to create a connector and make a product out of it.
If anyone else needs to index YouTube channels this connector is one option to do so: http://www.fishbowlsolutions.com/Google/Search/youtubeconnector/index.htm
You can contact me if you want more info.
Reply all
Reply to author
Forward
0 new messages