In response to a thread on Aaron Swartz's get-theinfo list, I
resurrected my RDF data for U.S. corporate ownership derived from
publicly filed records to the U.S. Securities and Exchange commission's
EDGAR database.
It's 1 million triples, HTTP and SPARQL-accessible. More here (including
source code, data dump, and examples):
http://rdfabout.com/demo/sec/
The records establish board membership, officer positions, and
10%-or-more ownership relations. Note that people can enter into any of
those relations with corporations, but additionally corporations can be
10% owners of other corporations. The records exist at time points when
the interest (i.e. stock ownership) of an individual or corporation that
is in one of the relations above with a corporation changes. It is thus
possible (and likely) that individuals who are no longer in such a
relation with a corporation are still listed as such in this data.
Here are some starting points:
News Corp (owner of FOX, WSJ, and other media things):
http://www.rdfabout.com/rdf/usgov/sec/id/cik0001308161
Rupert Murdoch (media mogul behind News Corp):
http://www.rdfabout.com/rdf/usgov/sec/id/cik0001024835
There are no links to other data sets.
--
- Josh Tauberer
- GovTrack.us
"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Godel, Escher, Bach" by Douglas Hofstadter)
I posted this:
http://www.netsquared.org/blog/ian-elwood/screen-scraping-tools-secs-edgar-database
To try to solicit help from people involved in the NetSquared Mashup
Challenge. Does it look like an accurate of a project to build a
visualization on top of Josh's dataset, and do you think it is possible?
--ian--
Any other suggestions for web applications that might be able to do this
with the dataset from the SEC?
--ian--
I don't know whether you want something graphical or textual, but
Freebase might be just the thing for this kind of exploration, and it
sounds like they're already playing with that data.
Also, I was going to point you to a tool called LGL (large Graph
Layout) that is quite rough but handles *really* large graphs (as you
might guess.) It's, as I said, pretty rough -- it's scientific
software for visualizing massive protein network structures.
http://apropos.icmb.utexas.edu/lgl/
http://sourceforge.net/projects/lgl
Then I reread your question and realized that this is exactly what you
don't want, sorry. But! While we are on the subject of graph
exploration tools, what else is out there for visualizing/exploring
sparse random network graphs at the n>10million nodes / ~10-1000
typical edges per node? I suppose I'd better repost to
view.theinfo.org.
flip
--
http://www.infochimps.org
Connected Open Free Data
I am curious what other people have to say.
http://www.humanbraincloud.com/ has a great visualization, although it
is a fewer orders of magnitude smaller than your task.
Joseph
--
Academic: http://www-etud.iro.umontreal.ca/~turian/
Business: http://www.metaoptimize.com/
The Prefuse Gallery has a lot of beautiful-looking graph
visualizations that appear to scale to ~1000s
http://prefuse.org/gallery/
but the great majority of them hate my browser. Or me. Or my version of Java.
The visual thesaurus http://www.visualthesaurus.com/ is based on
http://www.thinkmap.com/faq.jsp thinkmap, a java-based viz tool that
looks pretty expensive. It's snappy but doesn't seem to tower above
the others -- since they ask to to pay (subscription!) to even use
their demo I don't think they're interested in our business.
There's also
http://flexed.wordpress.com/2006/11/20/mark-shepard-a-flex-component-for-graph-visualization/
which is I can say from experience will start to drag ass at say n >
100 node+edges on-screen. It's simple, but its code is as clean as it
comes.
All of these take the "Explore local clusters" approach -- out of your
arbitrary-sized network graph you can wander through a neighborhood of
10's to 100's of nodes; and all of them seem to use the
weights-on-springs-molecular-dynamics approach. Are there
browser-based tools that give you multiscale visualization (use
clustering to find a reduced graph of cliques, let you drill down or
wander)? And are there ones that let you pivot among different
visualizations (cluster matrix, ring graph, ...)?
Summarizing my post to view-theinfo: for standalone&interactive apps,
the best one seems to be http://www.mkbergman.com/?p=415 cytoscape;
Circos http://mkweb.bcgsc.ca/circos/ is worth mentioning too; I can't
find any information on how it scales, though. LGL
http://apropos.icmb.utexas.edu/lgl is one of the better out of a rough
lot for huge (~10^7 nodes+edges) supercomputer-or-big-cluster-based
tools. There's a great review of viz tools at
http://www.mkbergman.com/?p=414 . I've entered a lot of this into the
page at
http://theinfo.org/view/tools
flip
On Tue, May 6, 2008 at 12:18 AM, Joseph Turian <tur...@gmail.com> wrote:
> > While we are on the subject of graph
> > exploration tools, what else is out there for visualizing/exploring
> > sparse random network graphs at the n>10million nodes / ~10-1000
> > typical edges per node? I suppose I'd better repost to
> > view.theinfo.org.
>
> I am curious what other people have to say.
> http://www.humanbraincloud.com/ has a great visualization, although it
> is a fewer orders of magnitude smaller than your task.
> Joseph
> Academic: http://www-etud.iro.umontreal.ca/~turian/
> Business: http://www.metaoptimize.com/
http://www.infochimps.org
Connected Open Free Data