Government Science Data

Alexis Madrigal, Wired

unread,

Jan 14, 2009, 1:32:31 PM1/14/09

to Open Government

Hello Group,

My name is Alexis Madrigal and I'm a science reporter with Wired.com.
I'm working on a story about specific areas where the Obama
administration could make scientific data from the USDA, Minerals
Management Service, DOE, NIH, and other agencies more available and/or
accessible. We're talking not just about having it online, but also in
formats that make mashing it up better (the old XLS -> XML
transition).

I'd love the input of the group on either specific datasets that could
get a makeover or success stories that have already occurred.

Feel free to contact me at: alexis....@gmail.com or you can call
my desk: 415-276-8481.

Thanks,

Alexis Madrigal
Staff Writer
Science and Energy
Wired.com
W: 415.276.8481
M: 415.602.4953
Wired Science: http://blog.wired.com/wiredscience/
Book Research: http://www.greentechhistory.com

Joseph Lorenzo Hall

unread,

Jan 14, 2009, 2:55:08 PM1/14/09

to open-go...@googlegroups.com

You might look a bit into the astronomical realm for some success
stories. Hubble data is "secret" (that is, under the control of the
PI) for one year, after which point anyone can get it. There are a
few cases of very cool things being done with this data (none I can
recall off the top of my head). I think that policy has spread to
other instrumentation (but I don't have a catalog).

Sounds like you're thinking of categorical (social science, survey,
etc.) data? I know there have been some efforts to build repositories
with some successes and some crashes and burns (the promising but now
defunct Google Research Data project is a good example there).

I'm sure others here know more than I do about this. best, Joe

--
Joseph Lorenzo Hall
ACCURATE Postdoctoral Research Associate
UC Berkeley School of Information
Princeton Center for Information Technology Policy
http://josephhall.org/

Jessy Cowan-sharp

unread,

Jan 14, 2009, 3:43:58 PM1/14/09

to open-go...@googlegroups.com

i work on a web team at NASA ames research center that is looking into these and similar questions. NASA's planetary data system [1] is a great example of data that is already out there but not in a terribly useful format-- either technically, documentation-wise, or from a UI standpoint. there are a few other projects happening at Ames (among others i'm sure) that might be of interest-- the LCROSS mission [2] is working with the amateur observer community to collect observations of impact [3], and simple things like allowing uploads of those images to a site that supports RSS feeds will all on its own be a huge step forward for the ability of others to re-use those observations. indexing and making metadata associated with those images searchable is another element that will help greatly as well.

there are many areas where data is not privacy-sensitive nor scientifically interesting or useful enough to embargo (or has already been used for publication). if we changed policy to have automatic timeframes within which data became publicly releasable, or even had certain categories of data that were by default publicly releasable, i think we would overcome one of the major hurdles to accessibility-- paperwork :). there is also the small matter of incentive. there is no incentive for those along the approval chain of releasing such data to do so. if they approve the release, there is a chance they will be reprimanded for it; whereas if they say no and maintain the status quo, since it IS the status quo, there is no chance of repercussion.

heh, sorry-- </rant>.

jessy
(please apply the usual disclaimer of these being my personal opinions and not statements representing NASA or my employer, etc. etc.)

[1] http://pds.jpl.nasa.gov/
[2] http://lcross.arc.nasa.gov/
[3] http://lcross.arc.nasa.gov/observation.htm

--
Jessy Cowan-Sharp
quaternio.net

Jonathan Gray

unread,

Jan 14, 2009, 4:56:14 PM1/14/09

to open-go...@googlegroups.com, alexis....@gmail.com, Rufus Pollock

Hi Alexis,

I've forwarded your request to the open-science [1] and okfn-discuss [2] lists.

In addition to making data available in a useful format - one could
argue that it is valuable to publish clear and appropriate licenses,
legal information and/or terms of use. Its not uncommon to find
official datasets where terms of re-use are unclear - from the absence
of legal/re-use information, to not being explicit about which
material is from Federal Government and in the public domain, and
which material is from a third party and subject to restrictions. One
would hope that, by default, scientific data produced or predominantly
funded by government would be open (as in http://opendefinition.org/).

Technically one could also emphasise the importance of bulk
downloadability (rather than having to download individual files, or
access through an API). Also - what do people think about versioning
for data?

You might be interested in the following posts on related topics:

http://blog.okfn.org/2008/07/29/ckan-and-finding-open-data-in-the-life-sciences/
http://blog.okfn.org/2007/11/20/the-ipcc-data-distribution-centre-environmental-data-licensing/

And our Open Environmental Data project:

http://okfn.org/wiki/OpenEnvironmentalData

We're registering scientific and other data on our CKAN project -
including details of re-usability, access, formats and so on. See, for
example:

http://ckan.net/tag/read/science
http://ckan.net/tag/read/environment

Warm regards,

Jonathan Gray
The Open Knowledge Foundation
http://okfn.org

[1] http://lists.okfn.org/mailman/listinfo/open-science
[2] http://lists.okfn.org/mailman/listinfo/okfn-discuss

Rick Blum

unread,

Jan 14, 2009, 4:09:11 PM1/14/09

to open-go...@googlegroups.com

What are the specific obstacles that you allude to? E-FOIA doesn't give agencies enough push (and people like you enough firepower) to get data & documents out. Thinking just about the stuff that could be released (e.g., not risking national security, individual privacy, confidential business information, pre-publication scientific data) what obstacles are being put in the way of releasing? Anything beyond turf, budget and attitudes?

______________

Rick Blum

Coordinator, Sunshine in Government Initiative

www.sunshineingovernment.org

From: open-go...@googlegroups.com [mailto:open-go...@googlegroups.com] On Behalf Of Jessy Cowan-sharp
Sent: Wednesday, January 14, 2009 3:44 PM
To: open-go...@googlegroups.com
Subject: [open-gov] Re: Government Science Data

Dwight Hines

unread,

Jan 15, 2009, 8:32:31 AM1/15/09

to open-go...@googlegroups.com

Dear Alexis:
Go for state and regional databases. Like in Florida, state highway patrol has tons of data on traffic and so does State dept of transportation, and they both have regional databases with good analysts. Similarly, if you're interested in water and environmental issues, Florida has regional water management districts that not only have good data but they have deep pockets for making changes happen. Florida Dept Environmental protection has good databases and studies, especially on creatures that we hope to keep around. The databases are huge and sometimes the FTP downloads dont work smoothly but they are quick to help out and solve problems.
Work on those because the state agencies are closer to the data than the fed agencies and the local governments (city and County) do not have analysts who they pay to make sense of the data.

Gettting these data sets out to the public or at least letting them know about them will help mucho in letting people see for themselves how we are going to be drinking our own urine in the not too distant future if we don't get desalination plants up and running.

I'd personally like to know what types and quality of datasets other state environmental agencies have on macroinvertebrates in fresh water and salt water marshlands and how they are using those data sets to evaluate how successful their de-pollution programs are working. Jeesh, get these data sets into high schools and colleges so students can use real data to solve real problems. Indeed, some of the internet writers who have good opinions would have even better opinions on what we should do if they based their opinions on data rather than their own unobservable un-empirical logic.

Dwight
St. Augustine, Florida

Hello Group,

My name is Alexis Madrigal and I'm a science reporter with Wired.com.
I'm working on a story about specific areas where the Obama
administration could make scientific data from the USDA, Minerals
Management Service, DOE, NIH, and other agencies more available and/or

accessible. We're talking not just about having it online, but also in

Jonathan Gray

unread,

Jan 15, 2009, 9:21:51 AM1/15/09

to open-go...@googlegroups.com

Dwight,

As it sounds like you are familiar with quite a few public datasets
for Florida - I wonder whether you might be able to spend a few
minutes registering them on CKAN, as a first step to helping others
find what is out there?

http://ckan.net/

It would be great to rack your brains and get these listed and tagged
'us' and 'gov' (or 'usgov'). If you don't have time, you could pop
names/URLs into an email, and I could try and do it - or alternately
you could tag them something like 'ckan-toadd' on delicious, or
similar.

We've started doing this for UK government and other public
collections of content and data:

http://ckan.net/tag/read/uk

Let me know what you think!

Warm regards,

Jonathan

Greg Elin

unread,

Jan 15, 2009, 11:13:00 AM1/15/09

to open-go...@googlegroups.com

It's great to see the links everyone is adding.

There's a list of large data sets (not all science, though) over at <http://theinfo.org/get/data>.

Sooner or later, someone's going to need to figure out a way of indexing the existence and profile of these data sets. I mean, I know http://Swivel.com and http://infochimps.org/ would like to be a kind of home to these data sets and eventually Google and Yahoo will want to crawl them. But, that said, Google works as a search engine b/c of the quality of relevance for one's search (e.g., pagerank) and the fact that pages have hyperlinks to both help with determining relevance and creating linked set of information that you can follow along to get what you want. And swivel.com & infochimp of capturing any comer of data runs the risk of the yellow pages problem---everything is listed, but how do you know what to use? I'm not sure simple +1 voting will be adequate (though it will help) b/c people working with data will want know a deeper comparison of strengths and weakness and completeness of comparable data sets. So while I think swivel and many eyes are terrific, I think there is still more questions to answer as data sharing explodes.

BUT do we have a way for (a) metadata for describing data sets (think: microformats, JPEG EXIF data, Dublin Core)?

What about sharing DATA + Applications via S3 and EC2?

How do we hyperlink mashups to their source data?

Is there an equivalent to HTTP status codes that we could have for large data sets and/or for results we get back from API calls?

I promote spreading the DC's OCTO Data Catalog <http://data.octo.dc.gov/> to every government agency and website that's got data. But I'm guessing the format needs to evolve for BIG data sets and Agencies that have LOTS of data sets.

Greg

--
Greg Elin
Sunlight Foundation (http://sunlightfoundation.com)
Sunlight Labs (http://sunlightlabs.com)
ge...@sunlightfoundation.com
gr...@fotonotes.net
http://twitter.com/gregelin
skype: fotonotes
aim: wiredbike
cell: 917-304-3488

Peter Brantley

unread,

Jan 15, 2009, 11:15:11 AM1/15/09

to open-go...@googlegroups.com

there are many standards, many domain specific, e.g. in astro
data, etc. one general use standard in social science is DDI -

http://www.ddialliance.org/

"The Data Documentation Initiative is an international effort to
establish a standard for technical documentation describing
social science data. A membership-based Alliance is developing
the DDI specification, which is written in XML.

The DDI has received external support from the National Science
Foundation (NSF award SES0136447) and Health Canada."

Greg Elin wrote:
> It's great to see the links everyone is adding.
>
> There's a list of large data sets (not all science, though) over at
> <http://theinfo.org/get/data>.
>
> Sooner or later, someone's going to need to figure out a way of indexing
> the existence and profile of these data sets. I mean, I know
> http://Swivel.com and http://infochimps.org/ would like to be a kind of
> home to these data sets and eventually Google and Yahoo will want to
> crawl them. But, that said, Google works as a search engine b/c of the
> quality of relevance for one's search (e.g., pagerank) and the fact that
> pages have hyperlinks to both help with determining relevance and
> creating linked set of information that you can follow along to get what

> you want. And swivel.com <http://swivel.com> & infochimp of capturing

> <mailto:alexis....@gmail.com> or you can call

> my desk: 415-276-8481.
>
> Thanks,
>
> Alexis Madrigal
> Staff Writer
> Science and Energy
> Wired.com
> W: 415.276.8481
> M: 415.602.4953
> Wired Science: http://blog.wired.com/wiredscience/
> Book Research: http://www.greentechhistory.com
>
>
>
>
>
> --
> Greg Elin
> Sunlight Foundation (http://sunlightfoundation.com)
> Sunlight Labs (http://sunlightlabs.com)

> ge...@sunlightfoundation.com <mailto:ge...@sunlightfoundation.com>
> gr...@fotonotes.net <mailto:gr...@fotonotes.net>

Jonathan Gray

unread,

Jan 15, 2009, 11:34:00 AM1/15/09

to open-go...@googlegroups.com, Rufus Pollock

Greg,

This is exactly what CKAN is for! Both Flip from Infochimps and Aaron
from theinfo are involved in CKAN - which is a good place to add a
bare minimum of metadata, licensing info and tags in the first
instance:

http://ckan.net/
http://okfn.org/wiki/ckan

We've currently got over 350 packages and are working on support for
automated installation of data, mirroring, scripts for downloading,
and domain specific metadata support (e.g. bounding boxes for geodata,
Dublic Core, and so on).

All code and content is open - so anyone can take it and build on it.
I would highly recommend using this to register data assets that
others a free to re-use. Its all versioned, and anyone can edit using
Open ID, or anonymously, with a minimum of clicks!

I've just been in touch with the Vivek Kundra's OCTO about getting
someone to present their work at the 5th COMMUNIA Workshop in London
that we're currently organising. The Data Catalog is, indeed,
excellent. :-)

http://blog.okfn.org/2008/12/19/5th-communia-workshop-london-26-27th-march-2009/

Warm regards,

Jonathan Gray
The Open Knowledge Foundation

JoeGermuska

unread,

Jan 15, 2009, 12:09:43 PM1/15/09

to Open Government

On Jan 15, 10:13 am, "Greg Elin" <wiredb...@gmail.com> wrote:
> What about sharing DATA + Applications via S3 and EC2?

In case folks didn't know about this, Amazon is doing some interesting
data sharing with EC2/S3:
http://aws.amazon.com/publicdatasets/

Not downloadable, but usable with EC2 applications, and you don't have
to pay for storage of the public dataset; you only pay for your
application's data around the dataset.

> How do we hyperlink mashups to their source data?

Realistically, is anything more needed than a URL to an arbitrary web
page about the data? I feel the tug towards wanting some standardized
metadata kind of thing, but I've seen a lot of those designed and
dropped in the past.

Greg Elin

unread,

Jan 15, 2009, 1:01:56 PM1/15/09

to open-go...@googlegroups.com, Rufus Pollock

Wow. This is great.

Alexis, As you can see...public data sets is another example of how the Internet/Web enables also this stuff to get going until a critical mass of stuff is already going on and people just aren't connected up around it yet. (Probably b/c people are involved in so many *other* critical mass events happening at the same time!)

This discussion makes real clear that only spots within the government is keeping up with what's happening (like NASA). There's so much more government data that could be released...into a community that is developing the tools to handle it.

Greg

Alexis Madrigal

unread,

Jan 15, 2009, 1:22:17 PM1/15/09

to open-go...@googlegroups.com

Yes, thanks to everyone who has sent links over or called. This has
been a tremendous help. The piece looks like it's going to run shortly
after the inauguration, so feel free to keep sending information and
projects over through the next few days.

One more question: is there a need for a "Department of Data" at the
Federal level that would coordinate these types of issues? Or is it
better left to individual agencies/states, who (presumably) know their
users better?

Best,

Alexis

--

Greg Elin

unread,

Jan 15, 2009, 2:05:29 PM1/15/09

to open-go...@googlegroups.com

Wow. A Dept. of Data. There's no question there is a set of issues around data that stretches across all government agencies, and that coordination is needed to get to interoperability.

It certainly feels like something is needed to address issues (like privacy, FOIA reform, transparency) that are related to new data frontier we are approaching. But there's also a lot of existing legislated mandates and exec orders around "information". Think National Archives, Library of Congress, Paper Work reduction Act....

Maybe we can make a distinction between information that is produced by the government going about its business (e.g., memos, rules, bills, budgets) and data that is *collected* or *generated* about the WORLD via government dollars and (hopefully) used for fact-based decision making? It seems to me while the 8 Principles of Open Data apply to *any* public data, real-time and scientific data, collected from govt funded research grants to govt satellites to tracking of the economy, might be a unique class. A kind of information infrastructure. Then you have the precedent of govt' works not being copyrightable, and the public using GPS as a way of saying these resources are part of the commons.

Then a data czar or a Dept of Public Data (real-time, research, scientific, economic...) or Dept of Public Data Policy could make sense.

Greg

Rick Blum

unread,

Jan 15, 2009, 2:42:13 PM1/15/09

to open-go...@googlegroups.com

Greg -- Very well said. A few reactions:

1. Leaders in government made transparency a performance objective in each agency. For agencies that regulate, their mission statements should include a commitment to transparency. This would most dramatically affect the information infrastructure, as Greg called, it -- social science research, scientific data, population data, etc. Agency heads might push resources to making data and documents available online. The easiest data to make available online is the data whose disclosure is not controversial or slowed by interests such as privacy, national security or other reasons, and this might push agencies to push stuff online or make it easier to find in different formats when they otherwise would have moved on to another project. A centralized transparency agency should set standards for those data architectures, or at least define those collections of data that could be integrated (e.g., geographically-based data, research data, etc.) A key for this information is in changing incentives so agencies choose to make this information more accessible.

2. For information that, as Greg puts it, is produced in the course of government business (such as agendas, regulations, budgets and bills), the public has a significant interest in seeing most of that data (except, for example, the line items for specific programs dealing with intelligence). A lot of this information is public, but in forms that are harder to use and in formats that machines can't integrate well. For this kind of information, the key is making the information that is public more accessible and in different formats.

3. It would be useful for a central technology or transparency office to make that data which is public more accessible by making it available in a wider variety of formats for a wide variety of users. (Think the National Weather Service.) The transparency office could then identify biggest obstances to data and document sharing and spur solutions in that area as well.

Almost 10 years ago the Environmental Protection Agency realigned its information disseminiation programs and created a high-level Office of Environmental Information, reporting directly to the EPA administrator. Brought under this office were its FOIA operations, its (even then) deep website), and the task of integrating the disclosure of data from 13 separate programmatic areas (air, land, water, haz waste, etc.), each with its own reporting standards. Every agency should at least try to do the same thing.

______________

Rick Blum

Coordinator, Sunshine in Government Initiative

www.sunshineingovernment.org

From: open-go...@googlegroups.com [mailto:open-go...@googlegroups.com] On Behalf Of Greg Elin
Sent: Thursday, January 15, 2009 2:05 PM

To: open-go...@googlegroups.com
Subject: [open-gov] Re: Government Science Data

rick

unread,

Jan 19, 2009, 3:16:39 PM1/19/09

to open-go...@googlegroups.com

Greg & All:

A bit short of a Dept. of Data, but the Federal CIO council is accepting submissions for changes to its Data Reference Model (DRM) through January 30th. For more info the DRM is available here ...

http://www.whitehouse.gov/omb/egov/documents/DRM_2_0_Final.pdf

I can provide further details if there's some interest.

Rick

-- 
Rick

cell: 703-201-9129
web:  http://www.rickmurphy.org
blog: http://phaneron.rickmurphy.org

Dan Knauss

unread,

Jan 21, 2009, 10:46:45 AM1/21/09

to Open Government

That link does not work as posted, and http://www.whitehouse.gov/omb/egov/documents/DRM_2_0_Final.pdf
no longer exists. You can get it in the Google cache, however.

On Jan 19, 2:16 pm, rick <r...@rickmurphy.org> wrote:
> Greg & All:
>
> A bit short of a Dept. of Data, but the Federal CIO council is accepting
> submissions for changes to its Data Reference Model (DRM) through
> January 30th. For more info the DRM is available here ...
>

> http://www.*whitehouse*.*gov*/omb/e*gov*/documents/DRM_2_0_Final.pdf

> > On Thu, Jan 15, 2009 at 10:01 AM, Greg Elin <wiredb...@gmail.com

> > <mailto:wiredb...@gmail.com>> wrote:
> > > Wow. This is great.
>
> > > Alexis, As you can see...public data sets is another example of
> > how the
> > > Internet/Web enables also this stuff to get going until a
> > critical mass of
> > > stuff is already going on and people just aren't connected up
> > around it
> > > yet. (Probably b/c people are involved in so many *other*
> > critical mass
> > > events happening at the same time!)
>
> > > This discussion makes real clear that only spots within the
> > government is
> > > keeping up with what's happening (like NASA). There's so much more
> > > government data that could be released...into a community that
> > is developing
> > > the tools to handle it.
>
> > > Greg
>
> > > On Thu, Jan 15, 2009 at 11:34 AM, Jonathan Gray

> > <jonathan.g...@okfn.org <mailto:jonathan.g...@okfn.org>>

> > http://blog.okfn.org/2008/12/19/5th-communia-workshop-london-26-27th-...

>
> > >> Warm regards,
>
> > >> Jonathan Gray
> > >> The Open Knowledge Foundation
>

> > >> On Thu, Jan 15, 2009 at 4:13 PM, Greg Elin <wiredb...@gmail.com

> > <mailto:wiredb...@gmail.com>> wrote:
> > >> > It's great to see the links everyone is adding.
>
> > >> > There's a list of large data sets (not all science, though)
> > over at
> > >> > <http://theinfo.org/get/data>.
>
> > >> > Sooner or later, someone's going to need to figure out a way
> > of indexing
> > >> > the
> > >> > existence and profile of these data sets. I mean, I know
> > >> >http://Swivel.com

> > >> > andhttp://infochimps.org/would like to be a kind of home to

> > these data
> > >> > sets and eventually Google and Yahoo will want to crawl them.
> > But, that
> > >> > said, Google works as a search engine b/c of the quality of
> > relevance
> > >> > for
> > >> > one's search (e.g., pagerank) and the fact that pages have
> > hyperlinks to
> > >> > both help with determining relevance and creating linked set of
> > >> > information
> > >> > that you can follow along to get what you want. And

> > swivel.com <http://swivel.com> &

> > >> > <alexis.madri...@gmail.com

> > <mailto:alexis.madri...@gmail.com>> wrote:
>
> > >> >> Hello Group,
>
> > >> >> My name is Alexis Madrigal and I'm a science reporter with
> > Wired.com.
> > >> >> I'm working on a story about specific areas where the Obama
> > >> >> administration could make scientific data from the USDA,
> > Minerals
> > >> >> Management Service, DOE, NIH, and other agencies more
> > available and/or
> > >> >> accessible. We're talking not just about having it online,
> > but also in
> > >> >> formats that make mashing it up better (the old XLS -> XML
> > >> >> transition).
>
> > >> >> I'd love the input of the group on either specific datasets
> > that could
> > >> >> get a makeover or success stories that have already occurred.
>

> > >> >> Feel free to contact me at: alexis.madri...@gmail.com
> > <mailto:alexis.madri...@gmail.com> or you can call

> > >> >> my desk: 415-276-8481.
>
> > >> >> Thanks,
>
> > >> >> Alexis Madrigal
> > >> >> Staff Writer
> > >> >> Science and Energy
> > >> >> Wired.com
> > >> >> W: 415.276.8481
> > >> >> M: 415.602.4953
> > >> >> Wired Science:http://blog.wired.com/wiredscience/
> > >> >> Book Research:http://www.greentechhistory.com
>
> > >> > --
> > >> > Greg Elin
> > >> > Sunlight Foundation (http://sunlightfoundation.com)
> > >> > Sunlight Labs (http://sunlightlabs.com)
> > >> > ge...@sunlightfoundation.com

> > <mailto:ge...@sunlightfoundation.com>
> > >> > g...@fotonotes.net <mailto:g...@fotonotes.net>

> > >> >http://twitter.com/gregelin
> > >> > skype: fotonotes
> > >> > aim: wiredbike
> > >> > cell: 917-304-3488
>
> > > --
> > > Greg Elin
> > > Sunlight Foundation (http://sunlightfoundation.com)
> > > Sunlight Labs (http://sunlightlabs.com)

> > > ge...@sunlightfoundation.com <mailto:ge...@sunlightfoundation.com>
> > > g...@fotonotes.net <mailto:g...@fotonotes.net>
> > >http://twitter.com/gregelin
>
> ...
>
> read more »

Greg Elin

unread,

Jan 21, 2009, 5:50:07 PM1/21/09

to open-go...@googlegroups.com

Rick.

The link below did not work. Do you have more information? Thanks!

Carl Malamud

unread,

Jan 21, 2009, 5:57:59 PM1/21/09

to Open Government

Hi Greg -

This is general advice for anybody trying to get a whitehouse.gov doc
from the old administration ... use the wayback machine.

E.g.,

http://web.archive.org/web/*/http://www.whitehouse.gov/omb/egov/documents/DRM_2_0_Final.pdf

(As to why they aren't running a redirector on the current
whitehouse.gov site that accesses 43.archive.whitehouse.gov for
404s .... well, nobody asked me. :))

Carl

On Jan 21, 2:50 pm, Greg Elin <wiredb...@gmail.com> wrote:
> Rick.
>
> The link below did not work. Do you have more information? Thanks!
>
>
>
> On Mon, Jan 19, 2009 at 3:16 PM, rick <r...@rickmurphy.org> wrote:
> > Greg & All:
>
> > A bit short of a Dept. of Data, but the Federal CIO council is accepting
> > submissions for changes to its Data Reference Model (DRM) through January
> > 30th. For more info the DRM is available here ...
>

> >http://www.*whitehouse*.*gov*/omb/e*gov*/documents/DRM_2_0_Final.pdf

> >> On Thu, Jan 15, 2009 at 10:01 AM, Greg Elin <wiredb...@gmail.com> wrote:
> >> > Wow. This is great.
>
> >> > Alexis, As you can see...public data sets is another example of how the
> >> > Internet/Web enables also this stuff to get going until a critical mass
> >> of
> >> > stuff is already going on and people just aren't connected up around it
> >> > yet. (Probably b/c people are involved in so many *other* critical mass
> >> > events happening at the same time!)
>
> >> > This discussion makes real clear that only spots within the government
> >> is
> >> > keeping up with what's happening (like NASA). There's so much more
> >> > government data that could be released...into a community that is
> >> developing
> >> > the tools to handle it.
>
> >> > Greg
>

> >> > On Thu, Jan 15, 2009 at 11:34 AM, Jonathan Gray <jonathan.g...@okfn.org

>
> >> > wrote:
>
> >> >> Greg,
>
> >> >> This is exactly what CKAN is for! Both Flip from Infochimps and Aaron
> >> >> from theinfo are involved in CKAN - which is a good place to add a
> >> >> bare minimum of metadata, licensing info and tags in the first
> >> >> instance:
>
> >> >> http://ckan.net/
> >> >> http://okfn.org/wiki/ckan
>
> >> >> We've currently got over 350 packages and are working on support for
> >> >> automated installation of data, mirroring, scripts for downloading,
> >> >> and domain specific metadata support (e.g. bounding boxes for geodata,
> >> >> Dublic Core, and so on).
>
> >> >> All code and content is open - so anyone can take it and build on it.
> >> >> I would highly recommend using this to register data assets that
> >> >> others a free to re-use. Its all versioned, and anyone can edit using
> >> >> Open ID, or anonymously, with a minimum of clicks!
>
> >> >> I've just been in touch with the Vivek Kundra's OCTO about getting
> >> >> someone to present their work at the 5th COMMUNIA Workshop in London
> >> >> that we're currently organising. The Data Catalog is, indeed,
> >> >> excellent. :-)
>

> >>http://blog.okfn.org/2008/12/19/5th-communia-workshop-london-26-27th-...

>
> >> >> Warm regards,
>
> >> >> Jonathan Gray
> >> >> The Open Knowledge Foundation
>

> >> >> On Thu, Jan 15, 2009 at 4:13 PM, Greg Elin <wiredb...@gmail.com>

> >> wrote:
> >> >> > It's great to see the links everyone is adding.
>
> >> >> > There's a list of large data sets (not all science, though) over at
> >> >> > <http://theinfo.org/get/data>.
>
> >> >> > Sooner or later, someone's going to need to figure out a way of
> >> indexing
> >> >> > the
> >> >> > existence and profile of these data sets. I mean, I know
> >> >> >http://Swivel.com

> >> >> > andhttp://infochimps.org/would like to be a kind of home to these

> >> >> >> Feel free to contact me at: alexis.madri...@gmail.com or you can

> >> call
> >> >> >> my desk: 415-276-8481.
>
> >> >> >> Thanks,
>
> >> >> >> Alexis Madrigal
> >> >> >> Staff Writer
> >> >> >> Science and Energy
> >> >> >> Wired.com
> >> >> >> W: 415.276.8481
> >> >> >> M: 415.602.4953
> >> >> >> Wired Science:http://blog.wired.com/wiredscience/
> >> >> >> Book Research:http://www.greentechhistory.com
>
> >> >> > --
> >> >> > Greg Elin
> >> >> > Sunlight Foundation (http://sunlightfoundation.com)
> >> >> > Sunlight Labs (http://sunlightlabs.com)
> >> >> > ge...@sunlightfoundation.com

> >> >> > g...@fotonotes.net

> >> >> >http://twitter.com/gregelin
> >> >> > skype: fotonotes
> >> >> > aim: wiredbike
> >> >> > cell: 917-304-3488
>
> >> > --
> >> > Greg Elin
> >> > Sunlight Foundation (http://sunlightfoundation.com)
> >> > Sunlight Labs (http://sunlightlabs.com)
> >> > ge...@sunlightfoundation.com

> >> > g...@fotonotes.net

> >> >http://twitter.com/gregelin
> >> > skype: fotonotes
> >> > aim: wiredbike
> >> > cell: 917-304-3488
>
> >> --
> >> Alexis Madrigal
> >> Staff Writer
> >> Science and Energy
> >> Wired.com
> >> W: 415.276.8481
> >> M: 415.602.4953
> >> Wired Science:http://blog.wired.com/wiredscience/
> >> Book Research:http://www.greentechhistory.com
>
> > --
> > Greg Elin
> > Sunlight Foundation (http://sunlightfoundation.com)
> > Sunlight Labs (http://sunlightlabs.com)
> > ge...@sunlightfoundation.com

> > g...@fotonotes.net

> >http://twitter.com/gregelin
> > skype: fotonotes
> > aim: wiredbike
> > cell: 917-304-3488
>
> > --
> > Rick
>
> > cell: 703-201-9129
> > web: http://www.rickmurphy.org
> > blog:http://phaneron.rickmurphy.org
>
> --
> Greg Elin
> Sunlight Foundation (http://sunlightfoundation.com)
> Sunlight Labs (http://sunlightlabs.com)

> ge...@sunlightfoundation.com ...
>
> read more »

rick

unread,

Jan 21, 2009, 7:27:38 PM1/21/09

to open-go...@googlegroups.com

Carl, Greg & All:

Thanks, I didn't know that link was going away. I guess they really did
mean there would be change in Washington.

The OMB eGov site's pretty much wiped clean and currently has only one
page, the new VUE-IT app. VUE-IT was created by Ksmendra Paul, OMB's
Chief Architect, and allows for browsing IT investments categorized
according to either agency or the FEA Reference Model taxonomy.

http://www.whitehouse.gov/omb/egov/vue-it/index.html

Rick

Thomas Lord

unread,

Jan 21, 2009, 8:20:19 PM1/21/09

to open-go...@googlegroups.com

On Wed, 2009-01-21 at 14:57 -0800, Carl Malamud wrote:

> (As to why they aren't running a redirector on the current
> whitehouse.gov site that accesses 43.archive.whitehouse.gov for
> 404s .... well, nobody asked me. :))
>
> Carl

Carl,

This response to that is a little more "techy" and lower level than is
customary on this list but I hope you and others like it anyway.

Every time a static page on whitehouse.gov or a similar site is updated,
the service at that host should generate:

1) A stable URL for that version of that page, valid for some
time period (I'd say 30 years but even 30 days could work).
The site MUST NOT re-use these stable years ever: they
can take down the content after some period of time but
must not re-use the URL.

2) Meta-data on that page (e.g., using something like RDFa)
that includes checksums and perhaps a signature on the payload.

3) An RSS item announcing the publication and its stable URL.

4) Optionally: versioning meta-data relating it to previous
publications (e.g., "THIS replaces THAT" or "THIS combines
THAT and THAT OTHER THING"). Other optional meta-data such
as authorship.

Given those technically simple steps, it is no longer necessary to
archive those sites by spidering and hoping for meaningful snapshots.
An archivist can simply read off the RSS feed and collect the relevant
page snapshots from their stable URLs, using spidering mainly to
validate the archive.

An archivist can save the documents by using their stable URLs as a
relative URL on a new site.

A "redirector" is then a generic thing. A single redirector can be
applied to *any* site that constructs such an archive.

One valuable contribution of taking these steps, earlier rather than
later, is arguably this:

The resulting form of archive is easy for non-technical-type people to
understand. Anyone who can understand the concept of the Federal
Register can understand this form of archive. This form of archiving
not only creates an accurate record of how these sites change over time,
it reifies, in a human-friendly way, the form and function of an
archive.

Government communications to the public should be idealized as a kind of
"journaling" / "write-once" database / file-system with meta-data
sensitively designed for archival needs. Making that ideal real for
the web sites is well within reach, roughly along the lines of what I
described in (1)..(4) above.

As a technical matter, what I've described can likely be implemented in
a layered fashion without the need to substantially disrupt the content
management systems currently used.

Regards,
-t

Thomas Lord

unread,

Jan 21, 2009, 8:24:21 PM1/21/09

to open-go...@googlegroups.com

To complete the "geek out" I should add a quote from the Star Trek
episode wherein Spock stands before an ancient alien artifact that is
playing images of Earth history at high speed, his tricorder in hand but
idle, and suddenly realizes "I am a fool! I should be recording
this!" :-)

Also, I should caution that the meta-data design is in my opinion
critical and is one of those things hard to get right later if certain
mistakes are made up front. I should disclaim personal disinterest
since I believe I have expertise in that meta-data design (e.g., from
the GNU Arch project).

-t

Reply all

Reply to author

Forward