[REQUEST] Data sets of blog posts

255 views
Skip to first unread message

Adam Dossa

unread,
Nov 18, 2012, 3:18:02 PM11/18/12
to get-t...@googlegroups.com
Hi All,

I am looking for data sets containing blog posts (I don't mind if they are annotated or not) for a University project.

So far I have found:
http://www.icwsm.org/2009/data/ (which requires a form to be completed etc.)
and

Does anyone have any pointers to other data sets of this type? I would rather they were based on general news type topics, rather than other specific topics.

Thanks very much,
Adam

Drew

unread,
Nov 18, 2012, 4:26:33 PM11/18/12
to get-t...@googlegroups.com
Hi Adam -

It's not exactly what you're looking for, but I recently uploaded an archive of pull quotes (anything the writer cited as a quote) from many 19 different blogs. There are about 15k quotes with metadata regarding the article they're from.


Might be of help.

Best,
Drew

Miles Thompson

unread,
Nov 18, 2012, 5:05:37 PM11/18/12
to get-t...@googlegroups.com


On Mon, Nov 19, 2012 at 11:04 AM, Miles Thompson <mi...@hashmapd.com> wrote:

Hi Adam,

Have you looked at the common crawl yet? Not limited to blog posts as far as I know but there may be jobs on there that are doing just that?


 
For instance if you limited yourself only to content with 'parsed_as'  set to 'rss' that would be just blog posts, pretty much?
Reply all
Reply to author
Forward
0 new messages