[Obi-protocol-application-branch] DNA sequencing

0 views
Skip to first unread message

Philippe Rocca-Serra

unread,
Jul 27, 2009, 6:07:45 PM7/27/09
to OBI Developers, Protocol App Branch
Hi Everyone,

as discussed with some of you during the ICBO conference, I did some
review and work towards sequencing.
It turns out that the restrictions were a bit off. More details can be
found in the log.

I have added DNA sequencing as a defined class. I had currently to rely
on 'information content entity' as a specified output in the absence of
sequence (or read for that matter) but they will soon be available. I
have also distinguished DNA sequencing by use of DNA ligase from DNA
sequencing by use of DNA polymerase.
Created classes were pyrosequencing, chain termination sequence, SOLiD
sequencing.

The classifier runs smoothly and interestingly places 'genotyping' as
currently defined as a kind of DNA sequencing which is correct.
It places PCR-SSCP assay as a kind of DNA sequencing but this is down to
the fact we don't yet enough shades under information entity (but these
are in the pipelines).

There is still a lot of work to carry out. But before going further, I
wanted to give a heads up. It would be nice if OBI could cover
sequencing technology better since it is such a hot topic

I'd like now to add various instruments and their suppliers (hence the
work on organization).
I will also need a number of materials (luciferase, various enzymes and
chemical with role of reagent) + I will aslo need role/function for
primer, adaptors and add processes related to clonal amplification and
library construction.

And finally add metadata to those classes recently added.

All comment welcome


Great to see some of you during ICBO meeting, I think it is been a good
meeting for OBI.

--
Philippe Rocca-Serra, PhD

Technical Coordinator
www.ebi.ac.uk/net-project

The European Bioinformatics Institute email: ro...@ebi.ac.uk
EMBL Outstation - Hinxton direct: +44 (0)1223 492 553
Wellcome Trust Genome Campus fax: +44 (0)1223 492 620
Cambridge CB10 1SD, UK room: A3-141
--


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Obi-protocol-application-branch mailing list
Obi-protocol-ap...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/obi-protocol-application-branch

Alan Ruttenberg

unread,
Jul 27, 2009, 11:43:25 PM7/27/09
to Philippe Rocca-Serra, OBI Developers, Protocol App Branch
On Mon, Jul 27, 2009 at 6:07 PM, Philippe Rocca-Serra<ro...@ebi.ac.uk> wrote:
> Hi Everyone,
>
> as discussed with some of you during the ICBO conference, I did some
> review and work towards sequencing.
> It turns out that the restrictions were a bit off. More details can be
> found in the log.
>
> I have added DNA sequencing as a defined class. I had currently to rely
> on 'information content entity' as a specified output in the absence of
> sequence (or read for that matter) but they will soon be available. I
> have also distinguished DNA sequencing by use of DNA ligase from DNA
> sequencing by use of DNA polymerase.
> Created classes were pyrosequencing, chain termination sequence, SOLiD
> sequencing.
>
> The classifier runs smoothly and interestingly places 'genotyping' as
> currently defined as a kind of DNA sequencing which is correct.
> It places PCR-SSCP assay as a kind of DNA sequencing but this is down to
> the fact we don't yet enough shades under information entity (but these
> are in the pipelines).
>
> There is still a lot of work to carry out. But before going further, I
> wanted to give a heads up. It would be nice if  OBI could cover
> sequencing technology better since it is such a hot topic

Indeed. And thanks for getting this started. Not surprisingly, I have
some comments on the restrictions, with a mind to giving an idea where
I think some of the work is.

I don't like the has_agent relation. It is very ambiguous and I've
been trying to get it removed from RO. I see that you want to use it
to make the difference between sequencing by ligation and sequencing
by synthesis, but I think we need to find a different way. I will
think some about how.

The specified output: information content entity, is way too general.
As it is it could refer to getting a measurement of the mass of the
supplied dna, for example, or a count of cpg islands, or behavioral
assessments of the technicians who do the work.

Definitions are missing. I know you say that metadata is not complete,
but not having some indication of the scope of what you mean by the
terms makes it hard to offer precise suggestions for improving the
logical definitions. In particular, what are the boundaries of the
process - are you thinking about one in which a vial of dna is input
and a genome is output, or something at the chemical reaction level.
Is any kind of preparation included in these processes or not? Do
these processes included data transformations?

I believe that it should be added that these processes achieve planned
objective: sequence feature identification objective.

That's it for now.

Best,
Alan

> Obi-devel mailing list
> Obi-...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/obi-devel

Philippe Rocca-Serra

unread,
Jul 28, 2009, 12:30:38 AM7/28/09
to Alan Ruttenberg, OBI Developers, Protocol App Branch
Hi Alan,

Thanks for the feedback. really helpful.

> I don't like the has_agent relation. It is very ambiguous and I've
> been trying to get it removed from RO. I see that you want to use it
> to make the difference between sequencing by ligation and sequencing
> by synthesis, but I think we need to find a different way. I will
> think some about how.
>

I've used it on purpose to prompt a reaction: Essentially what I wanted
to get at it the following:
Should the enzyme used in the sequencing reaction be described in the
same terms as the input DNA material?
I was looking at the definition of has_specific_input and it said the
following:
"The continuant realizes specified_Input_Role for that process. In
general, not all participants present at the beginning of the process
are specified_inputs."
Initially I used has_specific_input relation instead of has_agent, then
had second thoughts.
The DNA polymerase or DNA ligase does realize their role / function in a
sequencing process.
Side note:
I have used continuant 'DNA ligase' and continuant 'DNA polymerase
complex'. I am not too happy about the 'complex' thing so was thinking
of miroeting a 'DNA polymerase'

> The specified output: information content entity, is way too general.
> As it is it could refer to getting a measurement of the mass of the
> supplied dna, for example, or a count of cpg islands, or behavioral
> assessments of the technicians who do the work.
>

Like I said, I was aware of this (and pointed to the classification of
PCR-SSCP assay as a consequence of 'information content entity' as too
general)
View this as placeholders that will be refined as more additions are
made to Information Content entities.
There was also a technical reason, I wanted to confine all my additions
to the PlanandPlannedProcesses.owl at the time of the editing.


> Definitions are missing. I know you say that metadata is not complete,
> but not having some indication of the scope of what you mean by the
> terms makes it hard to offer precise suggestions for improving the
> logical definitions.

You are right, I will work on those today.


> In particular, what are the boundaries of the
> process - are you thinking about one in which a vial of dna is input
> and a genome is output, or something at the chemical reaction level.
> Is any kind of preparation included in these processes or not? Do
> these processes included data transformations?
>

I am with you here. I have very fine grained representations for some of
the techniques, really drilling down. The work actually pointed to
interesting issues you picked upon.
When describing new sequencing techniques, I have included references to
key steps. For instance using preceded_by 'immobilization' and
preceded_by 'amplification of a clone'
but i could not find a way specify the order in which those steps would
occur. Those steps allow to distinguish Helicos sequencing (single
molecule sequencing no amplification needed) from Solexa or Solid
methods where an emulsion PCR is used for amplification
Also, for a number of techniques, a sequence of subprocesses are
repeated (introduction of reagent mix, enzymatic reaction, washing,
imaging, clivage in a cycle which a run over and over)
How can we describe this 'motif' and these cycles ?
But more importantly, do we really need this level of detail ? hence the
scope issue. I've been conservative choosing to insist on the input (a
library of genomic DNA fragments, which needs to be added) and the
output (possibly images, then sequence reads). on this side, I am
confident that the discussion in IAO and SO will give us what we need
with a very good consistency.


> I believe that it should be added that these processes achieve planned
> objective: sequence feature identification objective.
>

+1. I overlooked this. will add.

Bjoern pointed out that a range of information might also require
attention in the realm of genome assembly in order to have the
capability to indicate 'redundancy and fold coverage or number of
contigs and so forth.
These are information that matters to Dawn Field and the people behind
the Genome Standard Consortium.


--
Philippe Rocca-Serra, PhD

Technical Coordinator
www.ebi.ac.uk/net-project

Frank Gibson

unread,
Jul 28, 2009, 6:20:36 AM7/28/09
to Philippe Rocca-Serra, OBI Developers, Protocol App Branch
On Tue, Jul 28, 2009 at 5:30 AM, Philippe Rocca-Serra <ro...@ebi.ac.uk> wrote:
Hi Alan,

Thanks for the feedback. really helpful.

> I don't like the has_agent relation. It is very ambiguous and I've
> been trying to get it removed from RO. I see that you want to use it
> to make the difference between sequencing by ligation and sequencing
> by synthesis, but I think we need to find a different way. I will
> think some about how.
>

I've used it on purpose to prompt a reaction: Essentially what I wanted
to get at it the following:
Should the enzyme used in the sequencing reaction be described in the
same terms as the input DNA material?


Yes, it is a specified_input

 

I was looking at the definition of has_specific_input and it said the
following:
"The continuant realizes specified_Input_Role for that process. In
general, not all participants present at the beginning of the process
are specified_inputs."

The text actuallly needs changing - we decided to exclude the notion of roles. This was the reason we dropped the has_specified_data_input relation. We must have overlooked updating the text, to something like "those entities specified as inputs in the plan/specicification

 

Initially I used has_specific_input relation instead of has_agent, then
had second thoughts.
The DNA polymerase or DNA ligase does realize their role / function in a
sequencing process.

yes, what is the issue here? We have the function catalytic_activity



 

I am not sure I follow. If you say immobilization is preceeded_by amplification then you have said that immobilization is 1 and amplification is 2.



 
Those steps allow to distinguish Helicos sequencing (single
molecule sequencing no amplification needed) from Solexa or Solid
methods where an emulsion PCR is used for amplification

They are two distinct process which could be represented by the following

Helicos sequencing has_part single_molecule_sequencing

Solexa has_part single_molecule_sequencing preceeded_by (has_part amplification)


(no guarantee that actually reasons though :)
 

Also, for a number of techniques, a sequence of subprocesses are
repeated (introduction of reagent mix, enzymatic reaction, washing,
imaging, clivage in a cycle which a run over and over)
How can we describe this 'motif' and these cycles ?

create a defined class for this "motif" using has_part and preceeded_by. Then refer to it. Although each time you repeat the "motif in another process. I am assuming you are using the specified_output of the last time it was run, so you shoudl be able to link the different motifs in sequence based on the "named" specfied_output of the last run

 

But more importantly, do we really need this level of detail ? hence the
scope issue. I've been conservative choosing to insist on the input (a
library of genomic DNA fragments, which needs to be added) and the
output (possibly images, then sequence reads). on this side, I am
confident that the discussion in IAO and SO will give us what we need
with a very good consistency.

Just look at your use-case. Your use-case, and how it is used determines your scope,


 

> I believe that it should be added that these processes achieve planned
> objective: sequence feature identification objective.
>
+1. I overlooked this. will add.

Bjoern pointed out that a range of information might also require
attention in the realm of genome assembly in order to have the
capability to indicate 'redundancy and fold coverage or number of
contigs and so forth.
These are information that matters to Dawn Field and the people behind
the Genome Standard Consortium.

Can you present this information in the form of concrete case-studies and use-case please.

Frank



 



--
Frank Gibson, PhD
http://peanutbutter.wordpress.com/

Philippe Rocca-Serra

unread,
Jul 28, 2009, 7:14:16 AM7/28/09
to Frank Gibson, OBI Developers, Protocol App Branch
Hi Frank

>
> yes, what is the issue here? We have the function catalytic_activity
>

The thing is that, based on the information I have collected from
manufacturers, I am pretty sure that DNA ligase (protein) is added (more
precisely it should be T4 phage DNA ligase))
and the DNA polymerase is added. A Complex as indicated by the
definition is comprised of 2 or more subunits. We may want to set
restrictions on those classes to formally distinguish
protein complex from the rest.
I think I am simply missing DNA polymerase in OBI at the moment,: it may
be imported from...well this is where it can be difficult to decide. or
just as for current DNA ligase, we create a class in OBI but I 'd rather
not assert this in OBI since I feel it could live happily in another
resource and we should mireot it.


>
> I am not sure I follow. If you say immobilization is preceeded_by
> amplification then you have said that immobilization is 1 and
> amplification is 2.

I am only saying that Sequencing is 'preceded_by immobilization' and
is_preceded_by 'amplification'. I guess I need to change to restriction to
preceded_by some ( 'immobilization' preceded_by some 'amplification')


>
> Those steps allow to distinguish Helicos sequencing (single
> molecule sequencing no amplification needed) from Solexa or Solid
> methods where an emulsion PCR is used for amplification
>
>
> They are two distinct process which could be represented by the following
>
> Helicos sequencing has_part single_molecule_sequencing
>
> Solexa has_part single_molecule_sequencing preceeded_by (has_part
> amplification)
>
>
> (no guarantee that actually reasons though :)

I am currently adding the different libraries (paired end ditag library
or single fragment library) to the biomaterial branch and I will need to
add 'library construction' as a planned process.


>
> > I believe that it should be added that these processes achieve
> planned
> > objective: sequence feature identification objective.
> >
> +1. I overlooked this. will add.
>
> Bjoern pointed out that a range of information might also require
> attention in the realm of genome assembly in order to have the
> capability to indicate 'redundancy and fold coverage or number of
> contigs and so forth.
> These are information that matters to Dawn Field and the people behind
> the Genome Standard Consortium.
>
>
> Can you present this information in the form of concrete case-studies
> and use-case please.

thanks for the input

P

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________

Frank Gibson

unread,
Jul 28, 2009, 7:42:49 AM7/28/09
to Philippe Rocca-Serra, OBI Developers, Protocol App Branch
On Tue, Jul 28, 2009 at 12:14 PM, Philippe Rocca-Serra <ro...@ebi.ac.uk> wrote:
Hi Frank



yes, what is the issue here? We have the function catalytic_activity

The thing is that, based on the information I have collected from manufacturers, I am pretty sure that DNA ligase (protein) is added (more precisely it should be T4 phage DNA ligase))
and the DNA polymerase is added.  A Complex as indicated by the definition is comprised of 2 or more subunits. We may want to set restrictions on those classes to formally distinguish
protein complex from the rest.
I think I am simply missing DNA polymerase in OBI at the moment,: it may be imported from...well this is where it can be difficult to decide. or just as for current DNA ligase, we create a class in OBI but I 'd rather not assert this in OBI since I  feel it could live happily in another resource and we should mireot it.

DNA polymerase exists in GO, so we should MIREOT it

 




I am not sure I follow. If you say immobilization is preceeded_by amplification then you have said that immobilization is 1 and amplification is 2.
I am only saying that  Sequencing is 'preceded_by immobilization' and is_preceded_by 'amplification'. I guess I need to change to restriction to
preceded_by some ( 'immobilization' preceded_by some 'amplification')

I think you need to include the part_of relation here, (has_part and preceded_by)

Frank




   Those steps allow to distinguish Helicos sequencing (single
   molecule sequencing no amplification needed) from Solexa or Solid
   methods where an emulsion PCR is used for amplification


They are two distinct process which could be represented by the following

Helicos sequencing has_part single_molecule_sequencing

Solexa has_part single_molecule_sequencing preceeded_by (has_part amplification)


(no guarantee that actually reasons though :)

I am currently adding the different libraries (paired end ditag library or single fragment library) to the biomaterial branch and I will need to add 'library construction' as a planned process.




   > I believe that it should be added that these processes achieve
   planned
   > objective: sequence feature identification objective.
   >
   +1. I overlooked this. will add.

   Bjoern pointed out that a range of information might also require
   attention in the realm of genome assembly in order to have the
   capability to indicate 'redundancy and fold coverage or number of
   contigs and so forth.
   These are information that matters to Dawn Field and the people behind
   the Genome Standard Consortium.


Can you present this information in the form of concrete case-studies and use-case please.

thanks for the input

P

Bjoern Peters

unread,
Jul 28, 2009, 12:02:47 PM7/28/09
to Frank Gibson, OBI Developers, Protocol App Branch
- DNA polymerase is already imported under protein complex. From GO as suggested
- Agree with Frank that the 'has specified ... ' relations should have been cleaned up and not refer to roles any more. I created a ticket for Larissa.
- I thought at the workshop we wanted to create the 'has specified participant' relation, which was to be used for reagents / instruments etc. which are not necessarily present at the start of the process, and more importantly aren't the things transformed into outputs.


Hi Frank

Frank

Helicos sequencing has_part single_molecule_sequencing

thanks for the input

P

------------------------------------------------------------------------------


Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________

------------------------------------------------------------------------------

Philippe Rocca-Serra

unread,
Jul 28, 2009, 12:31:45 PM7/28/09
to OBI Developers, Protocol App Branch
Bjoern Peters wrote:
> - DNA polymerase is already imported under protein complex. From GO as suggested
>
What is imported in 'DNA polymerase complex', which would mean that
there are part which may be specified.
Taq polymerase is a DNA polymerase with DNA polymerase activity but I
don't think it is a DNA polymerase complex (as currently defined by OBI
definition of a complex).
We could potentially refer to the Klenow Fragment of DNA polymerase I,
which again is not a complex. This is what I was trying to point at.

I guess using 'DNA polymerase or DNA polymerase complex' on the
restriction would solve the problem.

> - Agree with Frank that the 'has specified ... ' relations should have been cleaned up and not refer to roles any more. I created a ticket for Larissa.
> - I thought at the workshop we wanted to create the 'has specified participant' relation, which was to be used for reagents / instruments etc. which are not necessarily present at the start of the process, and more importantly aren't the things transformed into outputs.
>

This is still fine for the time being. I will used those relations as
place holders and carry out the fixes as soon as those more refined
relations are phased in.


cheers

Philippe


--
Philippe Rocca-Serra, PhD

Technical Coordinator
www.ebi.ac.uk/net-project

The European Bioinformatics Institute email: ro...@ebi.ac.uk
EMBL Outstation - Hinxton direct: +44 (0)1223 492 553
Wellcome Trust Genome Campus fax: +44 (0)1223 492 620
Cambridge CB10 1SD, UK room: A3-141
--

Bjoern Peters

unread,
Jul 28, 2009, 3:40:22 PM7/28/09
to Philippe Rocca-Serra, OBI Developers, Protocol App Branch
Ah, I misunderstood your point about complex. I think that means we should request the term DNA polymerase from PRO + all the subclasses you need.


cheers


--
Philippe Rocca-Serra, PhD

Technical Coordinator
www.ebi.ac.uk/net-project

------------------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages