attachment data structure

1 view
Skip to first unread message

jamon...@gmail.com

unread,
Nov 3, 2014, 4:44:36 PM11/3/14
to publi...@webfoundation.org
There is no place to add metadata beyond basic description, url, and date information. There are many other possible attributes that an attachment might have like mime type, size, language etc. The attachment type should be able to represent an unknown metadata field without requiring a data type for each kind of name/value pair that might occur.

I would very much like to add the following to the attachment structure:
       
{
 
"type": "object",
 
"title": "Attachment",
 
"properties": {
   
"description": {
     
"description": "A description of the document.",
     
"type": [
       
"string",
       
"null"
     
]
   
}
,
   
"metadata": {
     
"description": "Additional name/value paired information about the document",
     
"type": "array",
   
},
   
"uri": {
     
"description": "Link to the document or attachment.",
     
"type": [
       
"string",
       
"null"
     
],
     
"format": "uri"
   
}
,
   
"lastModified": {
     
"description": "Date that the document was last modified",
     
"type": [
       
"string",
       
"null"
     
],
     
"format": "date-time"
   
}

 
}
,
 
"patternProperties": {
   
"^(description_[A-Za-z]{2})$": {
     
"type": [
       
"string",
       
"null"
     
]
   
}

 
}

}

   

Then add the 'metadata' type to the list of available types:

"metadata": {
    "description": "A name/value pair describing an attribute of a document.",
   
"type": "object",
   
"items": {
   
    "$ref": "#/definitions/metadata"
   
}

}

Full disclosure, I have no understanding of the processes, issues, stakeholders and intended constraints involved with formalizing the schema. But I definitely need a way to add meaningful non-string based data to a release in a set of fields that match my data. Storing said data inside the description field as a string is far too error prone and does not force the structure within the string to be made evident to a user upon parsing.

I am aware of the intent to ensure easily mapping to flat data structures like csv, and to that end I opted to use the Array type so that any object inside the Array can map to the parent type. For example, the key name in the metadata field could be used to generate a column in a csv file:

Given a set of pairs in the hypothetical metadata list:

[
   
{"mime-type": "application/json"},
   
{"size": 123456},
   
{"language": "en"}
]

The corresponding csv columns could be generated by using the name of each element: metadata_mime-type, metadata_size, metadata_language in this example.

Does this make sense? Is there room for this? I'm sure people have strong opinions about making generic bins where data can end up getting ditched outside the more formal schema elements, but at the same time, the current structure is too rigid to fully encapsulate the data with which I work and makes me wary of wholeheartedly adopting the standard.

Myroslav Opyr

unread,
Nov 4, 2014, 7:49:16 AM11/4/14
to publi...@webfoundation.org
Hi,

See the comments at https://github.com/open-contracting/standard/issues/17#issuecomment-60084340 - that should cover majority of use cases you'd raised.

Regards,

m.

--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.
To view this discussion on the web visit https://groups.google.com/a/webfoundation.org/d/msgid/public-ocds/986c11be-f8e9-46cf-9910-fb3014a11502%40webfoundation.org.



--
....................................................................................................................................
Myroslav Opyr   ▪   CTO   ▪    Quintagroup   ▪   +1.917.475.4725   ▪   http://quintagroup.com
˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙˙ 

Tim Davies

unread,
Nov 4, 2014, 10:05:18 AM11/4/14
to publi...@webfoundation.org
Thanks Myroslav for the pointer to the GitHub issue.

As that notes, we hope to have an expanded attachment schema block in place by later this week.

At the moment it doesn't include a field for file size, but if you wanted to propose that on the issue it could certainly be taken into consideration.

The other element, soon to be in the documentation that might be useful, is the Conformance section at https://github.com/open-contracting/standard/issues/50#issuecomment-60893164

I think the preferred approach if you found a property you needed was not handled by the standard would be to add this as a direct property, rather than a key-value map, so that it can be considered for future incorporation into the specification with a clear definition.

So, for .e.g. if we don't end up with fileSize in the specification right now, you could still generate files with this in as an extension, and this could then be considered in the (yet to be fully defined) future change processes for the spec.

Hope this makes sense.

Would love to hear more about the possible use case you have for the standard.

All best wishes

Tim




--
-- 
Tim Davies
Research Coordinator, Open Data Research Network
@timdavies | @odrnetwork | www.opendataresearch.org 

World Wide Web Foundation | 1110 Vermont Ave NW, Suite 500, Washington DC 20005, USA | www.webfoundation.org | Twitter: @webfoundation


Jamon Camisso

unread,
Nov 4, 2014, 10:28:23 PM11/4/14
to publi...@webfoundation.org
Hmm another use case/need. When grabbing data from CKAN repositories, I've run into issues with CSVs and byte order markers. That and general encoding issues. I would very much like to know the encoding of a file beforehand, say, with a simple string field called 'encoding'. Said field could use any of the ISO8859-*, CP*, windows-*, KOI* and UTF-* encodings.

Then for any release that contains a designated encoding, I don't need to write a parser to fiddle with and figure out BOM stuff.

Would this be useful to anyone else?

Cheers, Jamon

Sarah Bird

unread,
Nov 5, 2014, 7:03:32 PM11/5/14
to publi...@webfoundation.org
Issue #17 has now been fixed. Any additional fields desired, as Tim said, can be added as needed for a specific use case and proposed for addition.




--
sa...@aptivate.org
skype: birdsarah

Aptivate - Ethical IT for International Development
Aptivate | http://www.aptivate.org | Phone: +44 1223 967838
Citylife House, Sturton Street, Cambridge CB1 2QF

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.



Tim Davies

unread,
Nov 7, 2014, 3:37:36 AM11/7/14
to publi...@webfoundation.org
Hello Jamon,

This is an interesting suggestion and I've opened an issue for it at https://github.com/open-contracting/standard/issues/166

In JSON as I understand this shouldn't be an issue, as it is native unicode, but I understand the issue for CSV.

We are envisaging that good publication of flattened OCDS data would happen inside a Data Package (http://data.okfn.org/doc/data-package) to carry the meta-data so perhaps the more natural place for the encoding information would be there, or, where CSV files being served directly, in a header sent with the file?

Tim

--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.

Jamon Camisso

unread,
Nov 7, 2014, 9:40:56 AM11/7/14
to publi...@webfoundation.org
Ah I hadn't even considered the flattened CSV output. I was stuck
thinking about Attachment/Document fields where this could be used (we
have a lot of attachment metadata).

Say I'm parsing a Tender, and Attachment/Document URI points to a CSV -
what I was after is knowing what the encoding of that referenced CSV is
beforehand, so that when it comes time to parse said CSV I don't have to
write an encoding detection algorithm or use an extra library.

That said, being able to add fields that are not part of the schema to
the Document section might just be enough for that case.

Cheers, Jamon

Tim Davies

unread,
Nov 8, 2014, 7:58:35 AM11/8/14
to publi...@webfoundation.org
Ah - sorry for misunderstanding your original request.

Yes: this could certainly be added as an extension of the attachment block.

We should have an example extension available soon to illustrate how to create these.

Tim



--
You received this message because you are subscribed to the Google Groups "Public OCDS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to public-ocds...@webfoundation.org.
To post to this group, send email to publi...@webfoundation.org.
Visit this group at http://groups.google.com/a/webfoundation.org/group/public-ocds/.
Reply all
Reply to author
Forward
0 new messages