Adopting common schemas when possible

30 views
Skip to first unread message

Jamal Mazrui

unread,
Mar 1, 2015, 7:11:35 PM3/1/15
to us-govern...@googlegroups.com

I wonder if it would be helpful to encourage consistency in data schemas used by government APIs, whenever possible.  Naturally, if a data set has been in existence for a while and users have come to expect certain field names and data types, there is a good argument for continuing with the same schema even when the API is strengthened over time.

 

For new data sets coming online, however, schema consistency across agency APIs could benefit the learning curve and code base of developers, since the same kinds of objects would be associated with the same terminology and data types.  For example, schema consistency might include such things as date format and what terms are used for each component of a postal address.

 

Based on my research so far, the most promising collection of open schemas is at

www.schema.org

 

As you may know, this project was established by Google, Microsoft, Yahoo!, and other, major search engines to standardize structured data to some extent.  Some common schemas that agencies might adopt include the following:  government agency, government license, government service, person, place, and publication.

 

Thoughts?

 

Jamal

 

Jamal Mazrui
Deputy Director, Accessibility and Innovation Initiative
Federal Communications Commission
202.418.0069

Philip Ashlock - XAAB

unread,
Mar 2, 2015, 8:42:08 PM3/2/15
to Jamal Mazrui, us-govern...@googlegroups.com
In theory this is meant to be accomplished through NIEM, but I find people in government asking this question quite often without being aware that NIEM exists. Even if you're aware that it exists, the documentation isn't particularly accessible, see https://github.com/NIEM/Implementation-Cookbook/issues/1

Schema.org builds on top of a lot of existing schemas and depending on the context it may make sense to use the underlying native schema for some things and the Schema.org variant or serialization for others. For example, Data.gov and the Open Data Policy use a JSON-LD serialization of DCAT for the primary manifestation of the data, a JSON file, but we also use the Schema.org variant of DCAT when the data is rendered as HTML so that search engines can get it that way. 

The native schema is DCAT: http://www.w3.org/TR/vocab-dcat/
Our JSON-LD serialization is: http://project-open-data.cio.gov/v1.1/schema

We also provide mappings to the Schema.org variant: https://project-open-data.cio.gov/v1.1/metadata-resources/#field-mappings

I think there's a lot of value in really broad efforts like NIEM and Schema.org, but depending on the context it may make more sense to focus on a schema/spec/standard that does a better job of capturing the nuances and industry adoption of a specific domain - and then you can provide mappings back to those broader frameworks where needed. 

As far as capturing more domain-specific schemas used for APIs, I think there's a lot of potential in simply getting more people to provide machine readable documentation. All CFO Act agencies are already required to document their APIs in a standardized way and while machine-readable API documentation isn't required, it would let us search through, catalog, and compare existing API schemas in a much more sophisticated way. 




--
You received this message because you are subscribed to the Google Groups "US Government APIs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to us-government-a...@googlegroups.com.
To post to this group, send email to us-govern...@googlegroups.com.
Visit this group at http://groups.google.com/group/us-government-apis.



--
Philip Ashlock
Chief Architect, Data.gov
U.S. General Services Administration
Reply all
Reply to author
Forward
0 new messages