Draft Open Contracting Data Standard - Schema Definition Format and Newest Version?

0 views
Skip to first unread message

Owen Scott

unread,
Apr 28, 2014, 12:05:21 AM4/28/14
to publi...@webfoundation.org
Hi Everyone,

First off, very happy to be part of this group and looking forward to working with all of you! For background, I work with AidData, and am in Nepal right now working with the Government of Nepal, the Open Aid Partnership,  and the Open Contracting Partnership on a pilot exercise to publish a snapshot well-structured open contracting data for a number of ministries and departments from the government. 

As with our smaller pilot project in Nepal last year (see details here and here) one of our goals for this work is to do a solid field testing of the Open Contracting Data Standard, in its draft form, and to build up some tooling for both data production and data visualization. With that in mind I had three questions:

1) Is this still the most up-to-date version of the draft standard? https://github.com/birdsarah/oc-datamerge-spike
2) Is there a chosen solution for schema definition (e.g. something like http://json-schema.org/ that provides a bit more guidance on data types and required vs. optional fields)? And, related, is JSON the chosen format, or will the standard be data format agnostic?
3) Is there a process in place for versioning the standard and submitting proposed changes?

I had a look on http://open-contracting.github.io/ but just wanted to get some more clarity before we start this new phase of work in earnest.

Thanks!

Owen

James McKinney

unread,
Apr 28, 2014, 4:16:21 PM4/28/14
to publi...@webfoundation.org
In case no solution has been chosen for schema definition, Popolo offers JSON Schema and JSON-LD definitions. The JSON-LD definition can be transformed to other RDF formats, but there has been no demand.

Those schema definitions are for documents in JSON or other RDF formats. But we also simply define the terms used in those schema in HTML documents, such that people can read that documentation and reuse the terms in any format (CSV, etc.).

James

--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.
To view this discussion on the web visit https://groups.google.com/a/webfoundation.org/d/msgid/public-ocds/57fc4aba-0f04-477d-9bb8-89be1f8be3c3%40webfoundation.org.

Tim Davies

unread,
Apr 30, 2014, 3:56:19 PM4/30/14
to publi...@webfoundation.org
Hey Owen,

Sorry for the slow reply, I'll try some quick replies to the best of my knowledge below:

1) Is this still the most up-to-date version of the draft standard? https://github.com/birdsarah/oc-datamerge-spike

We've not got a new draft out yet. We're aiming to do that late by mid-June I believe.

The new draft will very likely evolve quite a lot from the 5-day sprint last year. Right now Sarah, Ana and the team are looking across a far larger range of different datasets than were explored in that spike (I think we had to commit to cover 40 or so in the funding proposal with Omidyar), and talking with lots of experts - which we'll be bringing together with work on use-cases in May to try and work out what kind of structure we're heading for.

(You can see some of the comparison of datasets starting to come together at http://ocds.aptivate.org/opendatacomparison/ - such as in the grids like http://ocds.aptivate.org/opendatacomparison/grids/g/oc-award-data/ which show how different publishers are making different parts of contracting data available, and which will be followed up with detailed field-by-field comparisons and mappings, hopefully helpful us to boil down to a more robust common/important fieldset which can be tested against use cases)

2) Is there a chosen solution for schema definition (e.g. something like http://json-schema.org/ that provides a bit more guidance on data types and required vs. optional fields)? And, related, is JSON the chosen format, or will the standard be data format agnostic?

The technical scoping we've been doing with the help of Jeni Tennison at https://github.com/open-contracting/technical-approach#data-model is pointing us towards a format-agnostic data model, which then has recommended serializations in a range of formats.

A full version of that is probably not to be anticipated before Q3 this year - as it will take us some time to develop tooling.

The other concept suggested in the technical scoping is that we might define required fields in terms of the needs of different 'communities of practice' (much like the idea of different validation rulesets for IATI data has been discussed).
 
3) Is there a process in place for versioning the standard and submitting proposed changes?

Not formally until we get the next draft coming together. For the time being the best place to propose vital fields, use-cases, models etc. is on this list.

If useful though I'm down in DC at OpenGovHub a week Monday (12th/13th May). If you happen to be around and with time to catch up around the immediate use cases you have for the standard we could arrange a time to chat (or set up a skype call otherwise).

All the best

Tim

--
-- 
Tim Davies
Research Coordinator, Open Data Research Network
@timdavies | @odrnetwork | www.opendataresearch.org 

World Wide Web Foundation | 1110 Vermont Ave NW, Suite 500, Washington DC 20005, USA | www.webfoundation.org | Twitter: @webfoundation


Juan Pane

unread,
May 5, 2014, 10:24:57 AM5/5/14
to publi...@webfoundation.org
Hi Tim, James:

I'm  working here in Paraguay with National Contracting body  (https://www.contrataciones.gov.py/) for defining the new format that they use for releasing all the public contracting data around July/August using Open Data standards.
We are technically relying on JSON-LD, which gives a lot of flexibility, and we are also aiming at defining a tailored ontology for the case of public contracts in Paraguay.

If possible, I'd like to talk to the poeple doing the modelling, if they have some time, so that we can be as compliant as possible when releasing the data.

Looking forward to talk to you

Greetings from Paraguay

Juan Pane



--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.

Owen Scott

unread,
May 5, 2014, 10:32:44 AM5/5/14
to publi...@webfoundation.org
Thanks for the input everyone. An update from our side as well. Our initial data collection is in postgres so am working off of a SQL schema for a loose subset of an of the original draft standard. For the final product will have serializations in CSV and JSON as well. All of this will be up on github shortly.

We are working with our partners in Nepal on a reasonably sized build out (including a portal for open contract data visualization) so would be good to be as close as possible to the newest version. If there was a chance to share an informal update - even a few bullet points on what's roughly meant by evolving "quite a lot" from the draft standard would be super helpful. Thanks a tonne in advance.






--

Owen Scott

Project Manager, Development Gateway

osc...@aiddata.org; oscott@developmentgateway.org

917-940-4923

www.aiddata.org 

www.twitter.com/aiddata 

cid:ii_14255478cae2a23d

A Partnership of Development Gateway l College of William & Mary l Brigham Young University 

-------------------------------------

Michael Roberts

unread,
May 21, 2014, 12:20:55 PM5/21/14
to publi...@webfoundation.org
Hi Juan,

Just following up on this message in case others might also want to discuss the data model. As mentioned in the 'media and open contracting data standard' web meeting last week we are planning an alpha release towards early June. We plan to have some specific web meetings and episodic discussions surrounding the data standard efforts.  

The website UI is also going to be re-organized slightly to make it more clear where to find resources you might need and the data standard roadmap. With all of that said, we would be most happy to discuss bilaterally as well for anyone interested to get more thorough understanding of the data model.  Please free to email me.

---
Michael Roberts
Open Data Technical Manager
@michaeloroberts | mich...@webfoundation.org
World Wide Web Foundation | 1110 Vermont Ave NW, Suite 500, Washington DC, 20005, USA | www.webfoundation.org | Twitter: @webfoundation


Juan Pane

unread,
May 21, 2014, 12:55:41 PM5/21/14
to publi...@webfoundation.org, mich...@webfoundation.org
Hi Michael

I'd be more than happy to collaborate, and I can bring in people from the government so that they can see how they work will be used later.

Regards

Juan Pane
Open Data Consultant
Programa de Democracia y Gobernabilidad - Paraguay


James McKinney

unread,
May 21, 2014, 1:27:50 PM5/21/14
to publi...@webfoundation.org, mich...@webfoundation.org
I am interested in the data model.

An approach used by the W3C that I’ve reused in Popolo [1] is to declare and define the terms for the classes and properties in the data model, and to describe the finer points of the data model, and to then establish different serializations using those terms.

W3C often publishes only an RDF serialization [2]. In Popolo, we publish both RDF and JSON. In the case of JSON, we publish JSON-LD contexts to make the link between JSON and RDF, and we publish JSON Schema, as it is a popular way of describing JSON documents.

It would also be possible to describe CSV serializations. There are experimental tools for writing CSV schema, and the W3C CSV on the Web Working Group [3] is working on some. They recently published two first public working drafts [4] [5] in case these are of interest. I anticipate that their later deliverables will be more relevant.

Anyway, all this to say that the data model can be separated from the particular technologies used to serialize the data.


James

Reply all
Reply to author
Forward
0 new messages