Language as an array instead of string

0 views
Skip to first unread message

Daniel Clair

unread,
Oct 23, 2014, 5:15:21 PM10/23/14
to publi...@webfoundation.org
Countries with more than one official language may post their Tender Notices, Award Notices and Contracts in all the official languages. I'm trying to create samples of current Tender Notices (as a contract release) using OCDS and then I'll validate with http://ocds.open-contracting.org/validator/validate/, but I'm not quite sure that it will work using an array instead of 'en' or whatever language I put.

Tim Davies

unread,
Oct 24, 2014, 4:36:39 AM10/24/14
to publi...@webfoundation.org
Hello Daniel,

The approach we've been working with for language, which will be fully detailed in the updated documentation currently in progress, is to suggest:
  • The 'language' field at the top of a release or record should contain the language code of the default language for titles, descriptions etc. in the file.
  • All descriptive text fields can then have variations for other languages, using the format {fieldname}_{language_code} - so,

    For example, if  "language":"es", then "title":"Contratos Abiertos" should be interpreted by a consuming application to be Spanish, and then you would use "title_en":"Open Contracting", "title_fr":... and so-on to express different language variations.
However, as far as I'm aware we've not heavily tested files like this against the validator yet, and we have to use a feature of JSONSchema0.4 (PatternProperties) to support validation, and so it is possible that files may not validate perfectly right now.

There are also currently a few open issue at https://github.com/open-contracting/standard/issues/21 and https://github.com/open-contracting/standard/issues/48 discussing aspects of multilingual publishing, in particular, whether we should extend from just 2-digit language codes, to also allow extend language codes (e.g. en_GB and en_US rather than just en... although this is more intended for other language variations then English where a publisher might have local dialect versions of data).

If you are finding the validator isn't working for you right now (it should next be updated around 10th November when the next schema revisions are made) then do feel free to share draft content for community feedback - as seeing how the draft standard is being used is extremely helpful for us in making revisions towards the Release Candidate.

All the best

Tim


On Thu, Oct 23, 2014 at 10:15 PM, Daniel Clair <daniel...@gmail.com> wrote:
Countries with more than one official language may post their Tender Notices, Award Notices and Contracts in all the official languages. I'm trying to create samples of current Tender Notices (as a contract release) using OCDS and then I'll validate with http://ocds.open-contracting.org/validator/validate/, but I'm not quite sure that it will work using an array instead of 'en' or whatever language I put.

--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.
To view this discussion on the web visit https://groups.google.com/a/webfoundation.org/d/msgid/public-ocds/91727e3c-0ed6-42f5-8fa2-3679ac3918b6%40webfoundation.org.



--
-- 
Tim Davies
Research Coordinator, Open Data Research Network
@timdavies | @odrnetwork | www.opendataresearch.org 

World Wide Web Foundation | 1110 Vermont Ave NW, Suite 500, Washington DC 20005, USA | www.webfoundation.org | Twitter: @webfoundation


Jamon Camisso

unread,
Nov 4, 2014, 10:41:16 AM11/4/14
to publi...@webfoundation.org
This is an interesting issue - what if there is no default language? For example, English and French are both official/default languages in Canada. Is it preferable then to set language to null in order to not privilege one language over others? A tender in this instance would essentially consist of two separate tenders, one in English and one in French, bound by their common ID.

Is this approach reasonable? Anything that screams, 'No you're doing it wrong!' ?

Cheers, Jamon

Tim Davies

unread,
Nov 4, 2014, 11:22:33 AM11/4/14
to publi...@webfoundation.org
Hello Jamon,

That's an interesting point. I can understand that politically in some contexts there cannot be a 'default' language.

Publishing two separate versions of a file bound by a common OCID would not be good - as this would lead to the merging of these into a record potentially overwriting the different title and description values.

I *think* the best approach in this context may be for us to allow that default language can be null, and in this case, only extended language fields would get used (e.g. 'title_en', and 'title_fr' in the Canadian case, with no 'title')

This makes me wonder whether it would also be useful to ask publishers to specify an array of the languages their files contain, so consuming applications can easily work out the title/language strings they should use.

All the best

Tim



Jamon Camisso

unread,
Nov 4, 2014, 2:43:55 PM11/4/14
to publi...@webfoundation.org
Thanks for the feedback! I've thought about this a little more and wonder if it could be taken a step further to allow for all three language use cases. I'll outline them here to make sure my understanding is correct and then show an example:

  1. Single language, say, ES. It is the default language.
  2. Three languages, say, default ES, then EN, and FR.
  3. Two languages, no default, EN and FR.

In all three cases, the languages could be an array, even of length 1 in the first case.

Now a title can look like this:

{
   
"title":
       
{"es": "título aquí"}
}

Or this:

{
   
"title":
       
{"en":"Title here"},
       
{"fr": "Titre ici"}
}

This way the default language can still be used in a release that uses it, while also allowing for equality between languages. In both cases the same schema is used and thus the same parsing or rendering methods can be applied.

Moreover, to render a French version of a Tender say, would just require pulling out all individual (untranslated) fields, along with all {"fr":"foo"} elements from those fields that are marked as translatable.

This also allows flattening using the aforementioned title_en or title_es methods, for up to n languages inside the title field.

The more I think about it, the more I wonder if this method could be used for every field in the entire schema that can be translated? Thoughts?

Cheers, Jamon

Tim Davies

unread,
Nov 4, 2014, 3:47:46 PM11/4/14
to publi...@webfoundation.org
Hello Jamon

We've just been chatting through this with the tech team.

We did look early on at using a language map like you suggest, but felt that added quite a lot of extra complexity to the schema, and for the majority of publishers and users who are dealing in single language cases, and for flattening out data, so are fairly settled on the _en, _fr, _es approach right now (which can be used with any n of languages)

We definitely want to have a consistent approach across all translated fields.

I have proposed in the issue queue that for now we add the array of languages used to the head of each release, so that consuming tools can easily work out which language variations to look for, and that for the political cases where there is no default language there can be a null language.

A larger change to language would be tricky to make right now, but certainly open to ongoing review of this based on early implementations.

Hope this helps...

All the best

Tim








--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.
Reply all
Reply to author
Forward
0 new messages