here are my comments for the introduction.
The first paragraph should talk about how parts of molecule can exhibit particular functionality or be specifically involved in biological processes. consider the importance of gene regulatory elements for control of the circadian rhythm, the SH3 domain in cytoskeletal control, or phosphorylation of specific tyrosine residues in signal transduction and regulatory control. We therefore have a need to specify annotations at a finer level of granularity.
next, talk about the various ways that people either conceptualize sub-structure annotations, and the formats that have emerged as a means to represent and exchange these data computationally. there should be some description of the NCBI data model, which codified this important aspect and provide the means to read/write in its genbank flat file as well as ASN.1 (in which parsers are generated from the specification).
enumerate the considerations in a format specification (which are specifically addressed in our paper too; this should be reiterated in the discussion as it pertains to FALDO and potentially contrasts with others). there should then be some discussion as to the evolution of other formats (presumably because they were "human" friendly, at the expense of almost everything else). then discuss the bio libraries as a means to reduce programmer time by having a common set of functions to work with arbitrary formats.
still, the problem with ASN.1 and other languages used for biological sequence representation is that while machine-readable, they are not intrinsically machine understandable. enter semantic web as a way to assign a formal semantic that is machine interpretable, arbitrarily extensible, and enables linking of data. However, the integration of data *nevertheless* requires conformance to using or mapping to a data model with a common terminology.
present FALDO as a community-grown effort to provide an ontology-based specification for the representation of regions, their location on reference sequences, and association with sequence features. describe its basis and how we will demonstrate its utility, thereby addressing the issues of machine readability (standard, web-friendly format), large scale data integration, distributed query answering, and application development.
m.