SQL -- minimum information about file needed to use the file?

22 views
Skip to first unread message

dwighthines

unread,
Jan 17, 2008, 8:59:44 AM1/17/08
to process.theinfo
In Florida, we have a state rule on management of electronic
information, Rule 1B-26.003, F.A.C., that is not enforced and it means
that most of the managers of information systems have only a vague
clue as to what data the different departments are collecting.

Now, it is possible to get the entire SQL file, with sensitive info
like social security numbers deleted, but there is no narrative about
the different software used, there is no identification of the
individual variables, and it is just not possible to make sense of the
master file.

The other thing is that I don't believe that SQL qualifies as a
universal format,which is required by the public records law.

Does anyone have a statement from some information organization that
states the minimum amount of information that should be provided by an
SQL file?

NASCIO has all kinds of good requirements but I need lots of
statements prior to going to some type of court action.

Dwight Hines
St. Augustine, Florida

P.S. Is everyone using DataFerret? Please let me know what you think
of it, I just downloaded it last night.

Aaron Swartz

unread,
Jan 17, 2008, 9:34:49 AM1/17/08
to process...@googlegroups.com
So a couple thoughts come to mind:

1. Now you've got me curious! How do I get my hands on this SQL file?
How big is it? Any idea what's in it?

2. SQL is not a terrible format for distributing data, but if there's
literally no documentation about what the different columns mean, I
can see why that would be problematic. Still, perhaps it's easier to
track down the people who dumped the data and get them to explain the
columns that challenge them to switch formats.

3. You might want to talk to the people at the Open Government Data list:

http://groups.google.com/group/open-government/

I think they'd be very interested. They're the ones trying to develop
principles about what counts as open data:

http://wiki.opengovdata.org/index.php/OpenDataPrinciples

> P.S. Is everyone using DataFerret? Please let me know what you think
> of it, I just downloaded it last night.

Looks interesting. There ought to be an API for accessing the data
sets directly, though.

jmay

unread,
Jan 17, 2008, 12:42:13 PM1/17/08
to process.theinfo
Hi Dwight-

I assume you are referring to http://dataferrett.census.gov/

I downloaded this tool a few months ago and experimented with it
briefly. To be blunt: it's a stunningly awful piece of software. I'd
like to know more about if anyone is actually using it - I'd love to
find a reason to soften my opinion. It would seem to me that locating
the source data and pulling it directly into Excel would be preferable
to using DataFerrett.

-Jason

Jason May
Numbrary
http://numbrary.com/

Keith

unread,
Jan 18, 2008, 1:13:01 PM1/18/08
to process.theinfo
I actually just learned about DataFerrett in a workshop earlier this
week. It seems to be in regular use with demographers and
statisticians. Basically it lets you select data from a network of
sources (TheDataWeb) and combine it in different ways. For example,
you can get average rents by state, by the age of the building, by the
travel time to work, etc. (I'll agree that the interface could
withstand some modernization, but I've definitely seen worse.)

TheDataWeb includes data from the following sources:

American Community Survey (ACS)
American Housing Survey (AHS)
Behavioral Risk Factor Surveillance System (BRFSS)
Consumer Expenditure Survey (CES)
County Business Patterns (CBP)
Current Population Survey (CPS)
Decennial Census of Population and Housing (Census2000)
Decennial Census of Population and Housing (Census1990)
Delaware Statistics
Harvard MIT Data Center Collection
Home Mortgage Disclosure Act (HMDA)
Maryland Statistics
National Ambulatory Medical Care Survey (NAMCS)
National Health and Nutrition Examination Survey (NHANES)
National Health Interview Survey (NHIS)
National Survey of Fishing, Hunting, and Wildlife-Associated
Recreation (FHWAR)
Small Area Income and Poverty Estimates (SAIPE)
Social Security Administration
Survey of Income and Program Participation (SIPP)
Survey of Program Dynamics (SPD)

From what I understand, different organizations are making their data
accessible using TheDataWeb Publisher. For details, see:
http://www.thedataweb.org/meta_serve.html
http://www.thedataweb.org/mif_usersguide_rev.html

Most of these datasets would be probably too large for an application
like Excel to handle in their entirety. However, the query response
time seems fairly quick (at least for the tabulations I tried), so
perhaps someone wants to create a web service that taps these data
servers on the fly?

Keith


On Jan 17, 12:42 pm, jmay <jason....@gmail.com> wrote:
> I assume you are referring tohttp://dataferrett.census.gov/

Josh Tauberer

unread,
Jan 19, 2008, 9:25:48 AM1/19/08
to process...@googlegroups.com
Keith wrote:
> I actually just learned about DataFerrett in a workshop earlier this
> week. It seems to be in regular use with demographers and
> statisticians. Basically it lets you select data from a network of
> sources (TheDataWeb) and combine it in different ways. For example,
> you can get average rents by state, by the age of the building, by the
> travel time to work, etc.

Are the data sets networked together, do you know? That is, can I draw
data from different data sets at once, provided the data sets describe
different information about the same entities?

--
- Josh Tauberer
- GovTrack.us

http://razor.occams.info

"Yields falsehood when preceded by its quotation! Yields
falsehood when preceded by its quotation!" Achilles to
Tortoise (in "Gödel, Escher, Bach" by Douglas Hofstadter)

Keith

unread,
Jan 21, 2008, 4:37:11 PM1/21/08
to process.theinfo
On Jan 19, 9:25 am, Josh Tauberer <taube...@govtrack.us> wrote:
> Are the data sets networked together, do you know? That is, can I draw
> data from different data sets at once, provided the data sets describe
> different information about the same entities?

Most of the datasets are based on surveys and samples, where the
"entities" are either individuals or households, and the identities
are anonymized. So I don't think it is possible to combine data
across datasets at the observation level. You could, however, bring
together generalizations from across data sources, by tabulating data
within specific slices of the population, assuming that the different
sources have the data to enable tabulation of the same subsets (e.g.
unemployed apartment renters in Vermont).

Josh Tauberer

unread,
Jan 22, 2008, 8:47:18 AM1/22/08
to process...@googlegroups.com

Okay, so by entities I guess I meant the aggregates like "Vermont" ---
so that's possible? I'll have to go back to the site to try to figure
out how to do it.

dwighthines

unread,
Jan 23, 2008, 7:30:31 AM1/23/08
to process.theinfo
The dataset is about 160 meg in size, but that includes no defining
information, or old tables requests.
I will email the folks at open gov, thanks for that link.
The main issue with sql is that if a nice man, say about 56 years old,
is intimidated by asking for paper documents, how is he going to
respond to someone saying he can have the information on sql? He'll
balk. The thing to remember is that Florida has some great rules on
how the data must be kept so others can use it readily. It;s usually
the folks who are covering for shady operations who give out
unintelligilble data.
It would be nice to have some empirical research that compared
jurisdictions with high levels of open records compliance versus low
compliance levels on some dependent measures like community
satisfaction or innovation.
Dwight

Lukasz Szybalski

unread,
Feb 19, 2008, 11:37:18 PM2/19/08
to process.theinfo


On Jan 23, 6:30 am, dwighthines <dwight.hi...@gmail.com> wrote:
> The dataset is about 160 meg in size, but that includes no defining
> information, or old tables requests.
> I will email the folks at open gov, thanks for that link.
> The main issue with sql is that if a nice man, say about 56 years old,
> is intimidated by asking for paper documents, how is he going to
> respond to someone saying he can have the information on sql? He'll
> balk. The thing to remember is that Florida has some great rules on
> how the data must be kept so others can use it readily. It;s usually
> the folks who are covering for shady operations who give out
> unintelligilble data.
> It would be nice to have some empirical research that compared
> jurisdictions with high levels of open records compliance versus low
> compliance levels on some dependent measures like community
> satisfaction or innovation.
> Dwight
>

Seems to me instead of sql it would be better to send the data
information in csv file, and have another txt file that describes the
data. Both ziped files together in single file.
CSV is old enough and almost any tool can open it.?

Wouldn't that be better then sql file?

Lucas
Reply all
Reply to author
Forward
0 new messages