How to encode multiple values

48 views
Skip to first unread message

Ramon

unread,
Dec 3, 2012, 8:39:35 PM12/3/12
to isaf...@googlegroups.com
Hi --

Can someone please clarify how we should instruct submitters to provide multiple values for fields. For example, we would like to encode a Characteristics[Phenotype] entry in the Study file for samples that could have multiple values. Should these be semi-colon delimited in a single column, or appear as multiple columns?

Thanks,

Ramon

Philippe

unread,
Dec 4, 2012, 5:09:30 AM12/4/12
to isaf...@googlegroups.com, Ramon
Hello Ramon,


One may provide multiple values using semi-colon as a separator so
report values in some fields of the investigation file (for instance
when reporting the role played by a Person or the Parameters associated
with a Protocol). This is owing to the particular layout of the
investigation spreadsheet.

Now, for study and assay tables, the syntax does not really specify how
to do and any string found there will be currently considered to be the
value associated with the header.
Extra code would be required to support representation conventions.
We somewhat hit the limits of what can be done with a tabular
representation. I will give 2 examples (see attached spreadsheet)

1. Phenotype example:
Here, it depends on your needs in terms of granularity and recall once
the information is persisted.
Phenotype is a broad category should it be coded as 'Characteristics'
the values could be reported as semi-colon separated list.
I have represented 4 different ways of coding, the ideal one being
representation-4. the 'cost' is an increased in the number of fields,
the 'benefits' are unambiguous representation
representation 1 or 2 would be more compact but hard for users to
accurately produce without software support for data entering/rendering

2. representing mixtures:
This is closer to what I would see a need for reporting in one single
cell a one to many relation.
It is frequent to treat patients or study subjects with not with one
single compound but with a cocktail of drugs.

representation 1 would be possible in theory but is not supported by our
tools.

representation 2 is possible and is supported but the semantic is
unclear. Is it a series of individual treatment or a mixture? (i omitted
to representation the amounts)

So we working on clarifying the specifications and possibly devising new
structures to facilitate representation.
The constraint here is to retain consistency with current representation
> --
> --
>
> You received this message because you are subscribed to the Google
> Groups "ISAforum" group.
> To post to this group, send email to isaf...@googlegroups.com
> To unsubscribe from this group, send email to
> isaforum+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/isaforum?hl=en-GB
>
> Visit the ISAtools website at http://isa-tools.org

ISA-TAB coding-multiple values.xlsx

Bob MacCallum

unread,
Dec 4, 2012, 7:29:20 AM12/4/12
to isaf...@googlegroups.com, Ramon
Philippe's examples are useful and interesting.

The "Characteristics [short name(purl identifier)]" is new to me.  Is there any advantage of a purl over ONTO:acc (e.g. EFO:0012345)?
(And although I can see the purl page in wikipedia and have occasionally seen them on bioportal - what exactly is a purl?)

I had personally converged on using "Characteristics [ONTO:name]" (e.g. "Characteristics [EFO:cell type]") because it has the advantage of human-readability and as long as there are no typos it should be possible to retrieve the correct ontology term in software.  Unless you have a crazy "ontology" like GAZ with multiple terms with the same name (but you wouldn't use geo place terms as a column heading anyway).

cheers,
Bob.

Philippe

unread,
Dec 4, 2012, 9:13:08 AM12/4/12
to isaf...@googlegroups.com, Bob MacCallum, Ramon
Hi Bob,


The problem with the following representation
Characteristics [ONTO:name]" (e.g. "Characteristics [EFO:cell type])
is the loss of information, we know the term is somewhat anchored to an
ontology but we can not for sure perform the lookup because we are
missing the associated identifier.

Now, I used 'purl identifier' as short hand and probably slightly
inaccurate.
I have used it for 2 reasons:
i. to indicate the fact that it was possible to provide a resource
identifier to quality the term supplied between the square brackets
ii. to highlight future direction of developments.
As you may know, efforts are under way to convert ISA-Tab formatted
information to RDF and OWL. There are currently 2 github repositories
-ISA2RDF developed by Nina Jeliazkova from the ToxBank project
-ISA2OWL being developed by Alejandra Gonzalez-Beltran
For both project, having persistent and resolveable URLs would be better
for both as they would allow navigation to broader
I have used purl as an example, we are not prescriptive, we would be
happy with identifier.org identifiers.
Also, I have used purls as this is what we can obtain from Bioportal
service very easily.

Regarding GAZ, it actually would be nice to be able to have resolveable
uri for the terms. Purl exist but don't resolve. this will probably
change in the near future. This could be useful for environmental
genomics studies.

Thanks again.

Philippe
> <mailto:isaf...@googlegroups.com>
> > To unsubscribe from this group, send email to
> > isaforum+u...@googlegroups.com
> <mailto:isaforum%2Bunsu...@googlegroups.com>
> > For more options, visit this group at
> > http://groups.google.com/group/isaforum?hl=en-GB
> >
> > Visit the ISAtools website at http://isa-tools.org
>
> --
> --
>
> You received this message because you are subscribed to the Google
> Groups "ISAforum" group.
> To post to this group, send email to isaf...@googlegroups.com
> <mailto:isaf...@googlegroups.com>
> To unsubscribe from this group, send email to
> isaforum+u...@googlegroups.com
> <mailto:isaforum%2Bunsu...@googlegroups.com>

Bob MacCallum

unread,
Dec 4, 2012, 11:09:19 AM12/4/12
to isaf...@googlegroups.com
On Tue, Dec 4, 2012 at 2:13 PM, Philippe <procc...@gmail.com> wrote:
Hi Bob,


The problem with the following representation
Characteristics [ONTO:name]" (e.g.  "Characteristics [EFO:cell type])
is the loss of information, we know the term is somewhat anchored to an
ontology but we can not for sure perform the lookup because we are
missing the associated identifier.

Now, I used 'purl identifier' as short hand and probably slightly
inaccurate.
I have used it for 2 reasons:
i. to indicate the fact that it was possible to provide a resource
identifier to quality the term supplied between the square brackets
ii. to highlight future direction of developments.
As you may know, efforts are under way to convert ISA-Tab formatted
information to RDF and OWL. There are currently 2 github repositories
-ISA2RDF developed by Nina Jeliazkova from the ToxBank project
-ISA2OWL being developed by Alejandra Gonzalez-Beltran
For both project, having persistent and resolveable URLs would be better
for both as they would allow navigation to broader
I have used purl as an example, we are not prescriptive, we would be
happy with identifier.org identifiers.
Also, I have used purls as this is what we can obtain from Bioportal
service very easily.

Cool, thanks for the explanation.  Does the latest and greatest ISA-Creator put the purl identifier into the "Characteristics [...]" heading automatically? (Sorry, I haven't checked the last few versions.)
 

Philippe

unread,
Dec 5, 2012, 5:52:11 AM12/5/12
to isaf...@googlegroups.com, Bob MacCallum
Hello Bob,

>
> Cool, thanks for the explanation. Does the latest and greatest
> ISA-Creator put the purl identifier into the "Characteristics [...]"
> heading automatically? (Sorry, I haven't checked the last few versions.)
>

:) In fact, this is supported since ISAcreator1.6 and ISAvalidator1.5
versions.
I have attached a screenshot: characteristics organism reported with
associated uri (second column) or as a simple string (third column).

all the best

P.




Screen shot 2012-12-05 at 10.42.10.png

Mathias Kuhring

unread,
Jul 17, 2018, 9:09:33 AM7/17/18
to ISAforum
Hey Philippe,

sorry to dig up this old thread. But we also got stuck on the question, whether we can indicate multiple ontology terms (of prior unknown number) in one field.

CASE2-representation1 seems like a reasonable solution to us:
Sample Name    Factor Value[chemical mix]    Term Source REF    Term Accession Number
s1    aspirin;vitamin C;ephedrine    CHEBI;CHEBI;CHEBI    chebi Id;chebi Id;chebi Id

You mentioned that
representation 1 would be possible in theory but is not supported by our
tools.

Does this mean it is allowed per specification, so we could implement it when we set up documents manually or with own code/software?

Best, Mathias
Reply all
Reply to author
Forward
0 new messages