Standards for making government data more accessible

0 views
Skip to first unread message

Dennis D. McDonald

unread,
Jul 17, 2008, 12:56:18 PM7/17/08
to DataPortability.General
This is indirectly related to dataportability. I was listening to the
June 27 "In Motion" podcast and the interview with Drummond Reed where
he discussed the "data web" and XDI and that got me thinking about
ways to make government data more accessible. I ended up writing this:

"An Approach to Accommodating Government Alteration of Census Bureau
Data on Gay Marriages"

http://www.ddmcd.com/managing-technology/census.html

and would be very interested in hearing from those more technically
knowledgeable than I whether the ideas presented here are feasible or
not.

Dennis McDonald
Alexandria, Virginia
http://www.ddmcd.com

Elias Bizannes

unread,
Jul 18, 2008, 7:36:44 AM7/18/08
to dataportabi...@googlegroups.com
Sounds like an interesting proposal. By defining an ontology, this collected data by governments could conform to an international standard, thereby allowing international comparatives and manipulation of the data to determine information like that of marriages. the issue of identifiability is key however so I'm not sure how granular this would be
--
Elias Bizannes
http://liako.biz

Dennis D. McDonald

unread,
Jul 18, 2008, 11:59:10 AM7/18/08
to DataPortability.General
Elias,

yes, the issue of granularity is key. My thinking was somewhat along
the lines of the way Google is proposing to provide YouTube data - by
removing information related to individual IP addresses. I'm assuming
what's then provided is aggregated data that somehow answers the
questions the requestor is seeking.

What I was thinking with respect to the marriage data was that certain
levels of aggregation would be made available for both the original as
well as the recoded data, say, by state or region, but that
disaggregation to get at individual identities would be impossible.
Users of the data would be able to know what the rules for recoding
were but the rules would not allow disaggregation of data to the
individual respondent level.

I also had not thought about the international comparison aspect -- my
goal was to provide data for multiple uses within the US given the
beliefe differences about gay marriage, but your suggestion is
interesting.

Dennis

Gordon Rae

unread,
Jul 18, 2008, 1:07:50 PM7/18/08
to dataportabi...@googlegroups.com
I'd say that the taxonomy issues are more complicated than the granularity
ones. In various parts of the United States, there are:
* church marriages
* civil marriages
* common-law marriages
* deemed or putative marriages
* civil unions
* domestic partnerships.

California and Massachussets allow same-sex civil marriages. California also
allows domestic partnerships to both same-sex and opposite-sex couples, I
believe, and so does Oregon, but Oregon does not allow same-sex marriage.
At least one church (the United Church of Christ) allows same-sex church
marriages.

So, as I posted on Dennis' blog yesterday, unless the census takers actually
capture separate data points for all the different legal unions provided for
under state and federal law, producing aggregate statistics is not going to
be feasible.

The granularity issue is that you can only re-code partnerships one at a
time. You can delete data in bulk (which is what the Washington Post was
talking about, converting same-sex couples to unmarried singletons) but you
can't improve the data unless you know the identity of the couple.

Gordon

Dennis D. McDonald

unread,
Jul 19, 2008, 8:02:28 AM7/19/08
to DataPortability.General


On Jul 18, 1:07 pm, "Gordon Rae" <gor...@premiumadvice.net> wrote:
<snip>

> The granularity issue is that you can only re-code partnerships one at a
> time. You can delete data in bulk (which is what the Washington Post was
> talking about, converting same-sex couples to unmarried singletons) but you
> can't improve the data unless you know the identity of the couple.

Gordon's last point is a good one. At some point you may need the
original data to accurately recode the data. In some cases it is
possible to infer or estimate population characteristics based on
known population characteristics, but such probabilistic methods can
be subject to error.

I fall back on my original reaction to the Washington Post article:
when you start changing what the survey respondent actually said to
what you think the respondent should have said (for legal, political,
or other reasons), your entire process loses credibility.

Dennis McDonald
http://www.ddmcd.com
Reply all
Reply to author
Forward
0 new messages