U.S. corporate ownership RDF data

4 views
Skip to first unread message

Josh Tauberer

unread,
Apr 19, 2008, 8:20:45 AM4/19/08
to get-t...@googlegroups.com
(I just posted this on the Linking Open Data semantic web community mail
list, and tried to cross-post it here, but I sent it from an address
that isn't subscribed to this list, so I'm sending it here separately.
Also, btw, I've updated the source code and data dump (new URL-- gzipped
it's actually quite small) since the previous emails.)

In response to a thread on Aaron Swartz's get-theinfo list, I
resurrected my RDF data for U.S. corporate ownership derived from
publicly filed records to the U.S. Securities and Exchange commission's
EDGAR database.

It's 1 million triples, HTTP and SPARQL-accessible. More here (including
source code, data dump, and examples):
http://rdfabout.com/demo/sec/

The records establish board membership, officer positions, and
10%-or-more ownership relations. Note that people can enter into any of
those relations with corporations, but additionally corporations can be
10% owners of other corporations. The records exist at time points when
the interest (i.e. stock ownership) of an individual or corporation that
is in one of the relations above with a corporation changes. It is thus
possible (and likely) that individuals who are no longer in such a
relation with a corporation are still listed as such in this data.

Here are some starting points:

News Corp (owner of FOX, WSJ, and other media things):
http://www.rdfabout.com/rdf/usgov/sec/id/cik0001308161

Rupert Murdoch (media mogul behind News Corp):
http://www.rdfabout.com/rdf/usgov/sec/id/cik0001024835

There are no links to other data sets.

--
- Josh Tauberer
- GovTrack.us

http://razor.occams.info

"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)

croc...@corpwatch.org

unread,
May 1, 2008, 2:06:55 PM5/1/08
to get-t...@googlegroups.com
Hi,

I posted this:

http://www.netsquared.org/blog/ian-elwood/screen-scraping-tools-secs-edgar-database

To try to solicit help from people involved in the NetSquared Mashup
Challenge. Does it look like an accurate of a project to build a
visualization on top of Josh's dataset, and do you think it is possible?

--ian--

Aaron Swartz

unread,
May 2, 2008, 11:58:11 AM5/2/08
to get-t...@googlegroups.com
Having tried to look at large graphs with these tools, I expect it's
probably too big to just import directly into GraphViz or Social
Action, although the transmogrification wouldn't be hard. You might
ask if the Social Action people think their software could hold a
network of that size; if they do, converting it into the right format
is pretty trivial.

croc...@corpwatch.org

unread,
May 2, 2008, 6:15:16 PM5/2/08
to get-t...@googlegroups.com
Ah, I see. So it might not be the right solution for this project. I
want people to be able to select a company name, and see all of the
"next connections" from there. Basically the idea is to follow the
chain "up" to find the parent company, which is usually culpable to what
it's subsidiaries do.

Any other suggestions for web applications that might be able to do this
with the dataset from the SEC?

--ian--

Philip (flip) Kromer

unread,
May 6, 2008, 12:51:04 AM5/6/08
to get-t...@googlegroups.com
Freebase has apparently been working with the SEC data as well; this
is from their recent newsletter:
If you want to see some cool applications for Freebase data, Toby Segaran,
the guy behind the Walmart Growth Video
http://blog.kiwitobes.com/?p=3D51
has created a fascinating map of the interrelationships between some of
http://blog.freebase.com/2008/04/03/company-data-from-the-sec/
America's largest corporations. All this comes out of the public domain SEC
data we've been loading into Freebase over the last month or so.

I don't know whether you want something graphical or textual, but
Freebase might be just the thing for this kind of exploration, and it
sounds like they're already playing with that data.

Also, I was going to point you to a tool called LGL (large Graph
Layout) that is quite rough but handles *really* large graphs (as you
might guess.) It's, as I said, pretty rough -- it's scientific
software for visualizing massive protein network structures.
http://apropos.icmb.utexas.edu/lgl/
http://sourceforge.net/projects/lgl
Then I reread your question and realized that this is exactly what you
don't want, sorry. But! While we are on the subject of graph
exploration tools, what else is out there for visualizing/exploring
sparse random network graphs at the n>10million nodes / ~10-1000
typical edges per node? I suppose I'd better repost to
view.theinfo.org.

flip

--
http://www.infochimps.org
Connected Open Free Data

Joseph Turian

unread,
May 6, 2008, 1:18:52 AM5/6/08
to get-t...@googlegroups.com
> While we are on the subject of graph
> exploration tools, what else is out there for visualizing/exploring
> sparse random network graphs at the n>10million nodes / ~10-1000
> typical edges per node? I suppose I'd better repost to
> view.theinfo.org.

I am curious what other people have to say.

http://www.humanbraincloud.com/ has a great visualization, although it
is a fewer orders of magnitude smaller than your task.

Joseph

--
Academic: http://www-etud.iro.umontreal.ca/~turian/
Business: http://www.metaoptimize.com/

Philip (flip) Kromer

unread,
May 6, 2008, 2:33:27 AM5/6/08
to get-t...@googlegroups.com
That human brain cloud is beautiful. I've emailed them to see if
they'll share the dataset.

The Prefuse Gallery has a lot of beautiful-looking graph
visualizations that appear to scale to ~1000s
http://prefuse.org/gallery/
but the great majority of them hate my browser. Or me. Or my version of Java.

The visual thesaurus http://www.visualthesaurus.com/ is based on
http://www.thinkmap.com/faq.jsp thinkmap, a java-based viz tool that
looks pretty expensive. It's snappy but doesn't seem to tower above
the others -- since they ask to to pay (subscription!) to even use
their demo I don't think they're interested in our business.

There's also
http://flexed.wordpress.com/2006/11/20/mark-shepard-a-flex-component-for-graph-visualization/
which is I can say from experience will start to drag ass at say n >
100 node+edges on-screen. It's simple, but its code is as clean as it
comes.

All of these take the "Explore local clusters" approach -- out of your
arbitrary-sized network graph you can wander through a neighborhood of
10's to 100's of nodes; and all of them seem to use the
weights-on-springs-molecular-dynamics approach. Are there
browser-based tools that give you multiscale visualization (use
clustering to find a reduced graph of cliques, let you drill down or
wander)? And are there ones that let you pivot among different
visualizations (cluster matrix, ring graph, ...)?

Summarizing my post to view-theinfo: for standalone&interactive apps,
the best one seems to be http://www.mkbergman.com/?p=415 cytoscape;
Circos http://mkweb.bcgsc.ca/circos/ is worth mentioning too; I can't
find any information on how it scales, though. LGL
http://apropos.icmb.utexas.edu/lgl is one of the better out of a rough
lot for huge (~10^7 nodes+edges) supercomputer-or-big-cluster-based
tools. There's a great review of viz tools at
http://www.mkbergman.com/?p=414 . I've entered a lot of this into the
page at
http://theinfo.org/view/tools

flip

On Tue, May 6, 2008 at 12:18 AM, Joseph Turian <tur...@gmail.com> wrote:
> > While we are on the subject of graph
> > exploration tools, what else is out there for visualizing/exploring
> > sparse random network graphs at the n>10million nodes / ~10-1000
> > typical edges per node? I suppose I'd better repost to
> > view.theinfo.org.
>
> I am curious what other people have to say.
> http://www.humanbraincloud.com/ has a great visualization, although it
> is a fewer orders of magnitude smaller than your task.
> Joseph

http://www.infochimps.org
Connected Open Free Data

Reply all
Reply to author
Forward
0 new messages