Extracting basic info from CrunchBase

101 views
Skip to first unread message

TJS

unread,
Sep 11, 2009, 12:41:11 AM9/11/09
to crunchbase-api
Hi,

I'm a linguist at Stanford interested in trends in company names. For
that reason, I'd love to be able to extract the CrunchBase data for
particular zip codes at particular times (what were the trends in
names in Mountain View pre-dot-com-bubble-bursting vs. post, for
example). The easiest output would be something like a spreadsheet.

If anyone has some advice on how to do this, I'd appreciate them.

Thanks,

Tyler

Benjamin Nowack

unread,
Sep 11, 2009, 7:51:14 AM9/11/09
to crunchb...@googlegroups.com

Hi Tyler,

You can perhaps use Semantic CrunchBase[1] for this. The data hasn't
been re-crawled for quite some time (I'm rebuilding the system but
don't have funds), but *maybe* that's not too problematic for your
use case.

Semantic CB provides a SPARQL endpoint. SPARQL is a graph query
language for the semantic web which lets you retrieve tabular results
via a query such as

{{{
SELECT DISTINCT ?name ?zip ?city ?founded WHERE {
?comp a cb:Company ;
cb:name ?name ;
cb:founded_year ?founded ;
cb:office ?office .
?office cb:zip_code ?zip ;
cb:city ?city .
FILTER(?founded < 2000)
FILTER(REGEX(?zip, "^9"))
}
LIMIT 50
}}}

This will return up to 50 companies that were founded before 2000 and
with offices in an area where the zip starts with a "9".

The endpoint can return XML, JSON, serialized PHP, and a few other
formats. A little trick for getting this into Excel: Pick "HTML Table"
as output format and simply save the result as a ".xls" file. Excel or
OOo Calc will then let you open the file straight as a spreadsheet.

You can find the above sample query with inline results at [2] where
you can (hopefully) tweak the patterns w/o too much SPARQL know-how.

Just in case: If you want to install the system locally, the complete
data can be DUMPed and the RDF/PHP system can be downloaded at
arc.semsol.org.

HTH,
Benji

[1] http://cb.semsol.org/
[2] http://bit.ly/2zFBkh

--
Benjamin Nowack
http://bnode.org/
http://semsol.com/

Reply all
Reply to author
Forward
0 new messages