Deadline 2 weeks from now (just before new year) submit paper

Jerven Bolleman

unread,

Dec 10, 2013, 3:42:21 PM12/10/13

to fa...@googlegroups.com

Hi All,

I would like to give us all a christmas present by submitting the paper on friday the 27th.
Preferably to the Journal of Biomedical Semantics to be part of the hackathon series.

Please read the current state of the paper especially if you are co-author!
If you don't like that date or the current content of the paper please speak up and open issues.

Something that needs to be stated clearer in the paper is that faldo:Positions
are like coordinates on a map. The map being the sequence database records linked via the faldo:reference predicate.
How the map relates to the real world is an exercise left to the reader.

Now about the tricky part of authors. Currently is peter last, me first everyone else by first name order.
However, that is not common and arguably not fair.
Options are he who pays first, then rest alphabetical last name.
By number of git commits (with tie breaker, opened issues, then last name)
Something else that people can be happy about (i.e. closeness to tenure discussion)

For those who are at SWAT4LS, it sounds like a lot of nice things are happening!

Regards,
Jerven

Jerven Bolleman

unread,

Dec 10, 2013, 3:44:18 PM12/10/13

to fa...@googlegroups.com

For ease of commenting the paper in its current status.

locations.pdf

Michel Dumontier

unread,

Dec 10, 2013, 7:47:33 PM12/10/13

to Jerven Bolleman, fa...@googlegroups.com

Hi,

here are my comments for the introduction.

The first paragraph should talk about how parts of molecule can exhibit particular functionality or be specifically involved in biological processes. consider the importance of gene regulatory elements for control of the circadian rhythm, the SH3 domain in cytoskeletal control, or phosphorylation of specific tyrosine residues in signal transduction and regulatory control. We therefore have a need to specify annotations at a finer level of granularity.

next, talk about the various ways that people either conceptualize sub-structure annotations, and the formats that have emerged as a means to represent and exchange these data computationally. there should be some description of the NCBI data model, which codified this important aspect and provide the means to read/write in its genbank flat file as well as ASN.1 (in which parsers are generated from the specification).

http://www.ncbi.nlm.nih.gov/pubmed/9707929

http://www.ncbi.nlm.nih.gov/books/NBK7198/#ch_datamod.datamodel.data_model

enumerate the considerations in a format specification (which are specifically addressed in our paper too; this should be reiterated in the discussion as it pertains to FALDO and potentially contrasts with others). there should then be some discussion as to the evolution of other formats (presumably because they were "human" friendly, at the expense of almost everything else). then discuss the bio libraries as a means to reduce programmer time by having a common set of functions to work with arbitrary formats.

still, the problem with ASN.1 and other languages used for biological sequence representation is that while machine-readable, they are not intrinsically machine understandable. enter semantic web as a way to assign a formal semantic that is machine interpretable, arbitrarily extensible, and enables linking of data. However, the integration of data *nevertheless* requires conformance to using or mapping to a data model with a common terminology.

present FALDO as a community-grown effort to provide an ontology-based specification for the representation of regions, their location on reference sequences, and association with sequence features. describe its basis and how we will demonstrate its utility, thereby addressing the issues of machine readability (standard, web-friendly format), large scale data integration, distributed query answering, and application development.

m.

On Tue, Dec 10, 2013 at 12:44 PM, Jerven Bolleman <m...@jerven.eu> wrote:

For ease of commenting the paper in its current status.

--
You received this message because you are subscribed to the Google Groups "FALDO" group.
To unsubscribe from this group and stop receiving emails from it, send an email to faldo+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Michel Dumontier

Associate Professor of Medicine (Biomedical Informatics), Stanford University

Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group

http://dumontierlab.com

Peter Cock

unread,

Dec 11, 2013, 5:59:19 AM12/11/13

to Michel Dumontier, Jerven Bolleman, fa...@googlegroups.com

Hi Michel,

I agree we should add something about the NCBI data model in ASN.1,
do you have write access to the repository - or could you suggest some
specific language to add or change?

Peter

Joachim Baran

unread,

Dec 11, 2013, 7:44:09 AM12/11/13

to Peter Cock, Michel Dumontier, Jerven Bolleman, fa...@googlegroups.com

Hello,

I second Michel and Peter here.

I am going to go over the manuscript and update it for grammar and fix some factoids (licensing info, e.g.).

Joachim

Jerven Bolleman

unread,

Dec 12, 2013, 6:35:30 AM12/12/13

to Joachim Baran, Peter Cock, Michel Dumontier, fa...@googlegroups.com

Hi Everyone,

Thanks for replying, I got some UniProt issues this week... But have
holiday next week when I will be happy to ingrate changes for those
who do not have repo access.

If we discuss ANS.1 etc.,. we need to mention that the record centric
view makes it difficult for scientists to link data to gain insights
into biology.

Regards,
Jerven

--
Jerven Bolleman
m...@jerven.eu

Robert Buels

unread,

Dec 20, 2013, 4:29:42 PM12/20/13

to Jerven Bolleman, Joachim Baran, Peter Cock, Michel Dumontier, fa...@googlegroups.com

I just went over the first part of the paper (didn't get to the
discussion yet) and made a pull request with some edits, see:

https://github.com/JervenBolleman/FALDO-paper/pull/18

Jerven, you'll probably want to review each commit, I tried to separate
them by the nature of the change.

One issue that I couldn't resolve myself was that I didn't understand
the "Implementation -> Validating data encoded with FALDO" section. In
particular, how does the UniProtKB example demonstrate that the
constraints do not restrict users? If anything, it suggests the
opposite, doesn't it? Here's a link to the section:
https://github.com/JervenBolleman/FALDO-paper/blob/06876e5e750e1e62ff76216703f99bce3f245506/implementation.tex#L106

Jerven, could you clarify that?

Robert Buels
Lead Developer
JBrowse - http://jbrowse.org

Jerven Bolleman

unread,

Dec 22, 2013, 5:47:29 AM12/22/13

to Robert Buels, Joachim Baran, Peter Cock, Michel Dumontier, fa...@googlegroups.com

Hi Robert,

Thanks for the edits. I am going to apply all of them. If I think
something should be different I will again make new commits/edits.

I will improve the validation section. Probably talking about how
validation is external to the format. i.e. validation is application
specific not format specific.

Happy x-mas,
Jerven

--
Jerven Bolleman
m...@jerven.eu

Jerven Bolleman

unread,

Dec 23, 2013, 1:02:14 PM12/23/13

to Robert Buels, Joachim Baran, Peter Cock, Michel Dumontier, fa...@googlegroups.com

Hi All,

Robert and Francesco's changes have been merged. Expect a small
content update from me tomorrow morning.

Integrating open issues. Such as many of Michel's comments.

Regards,
Jerven

--
Jerven Bolleman
m...@jerven.eu

Joachim Baran

unread,

Dec 29, 2013, 1:29:44 PM12/29/13

to Robert Buels, Jerven Bolleman, Michel Dumontier, Peter Cock, fa...@googlegroups.com

Hi!

I just sent out a pull request with a few of my changes.

I think the document needs quite a bit of work still, because I found that there are some unresolved statements. For example, in the implementations section it is mentioned that FALDO has 14 classes, but then only eight of them are explicitly explained (2x 4 classes). Some statements need to be backed up too. When talking about query efficiency, there should ideally be query speeds listed; sentences such as "Its clear to anyone reading this paper that FALDO is considerably more verbose than existing formats.” might need further explaining — it is not clear to me because I do not know which existing formats we compare FALDO against.

Shall some of the outstanding tasks be delegated to individuals, so that we can speed this up?

Joachim

Reply all

Reply to author

Forward