Submission details

12 views
Skip to first unread message

Andrew Collins

unread,
Oct 5, 2018, 7:01:56 PM10/5/18
to OGRDB Discussion
Hi William,

Your notes highlight the fact that submissions can include sequences from multiple individuals, but the Submission Details tab takes you to a page that amongst other things documents 'Ethnicity'. Should there be access to information on each of the individuals whose sequences make up the submission?

On a related matter, no doubt there will soon be familial studies, and identification of a polymorphism within different related individuals could strengthen an inference. Such information could be provided in teh Notes section, but should there be fields available for such information? Is there any other personal information that people could find useful? Age? Sex? Or does this just open a can of worms, ethical and otherwise?

Andrew

wil...@lees.org.uk

unread,
Oct 6, 2018, 4:52:35 AM10/6/18
to OGRDB Discussion
Thanks Andrew - good point. Another thing that's been at the back of my mind for some time is that 'ethnicity' is a good descriptor for human samples, but animal samples may need other fields.

One possibility would be to link to the NIH Biosample data. I've shown below the details of the record for an example Biosample. NIH provide a range of records tailored to different organisms, with
many optional fields. You can see what's available here: https://submit.ncbi.nlm.nih.gov/biosample/template/

On the plus side, relying on the Biosample record would mean that the information is held as far 'upstream' as possible: it comes from the original depositor, is disseminated to anyone looking at
the sequences, and the original author attests to NIH that privacy and ethical issues have been addressed. And we'd know that our data was structured in a way that was consistent with NIH.

On the minus side, it would align us more strongly with NIH. With publications, I thought it was important to retrieve some details so that the submitter could verify that the correct PMID had been
entered: its so easy to mistype a long number. The same is also true of the accession numbers such as the Biosample ID, but I don't check those at all at the moment, because we reference NIH
in the submission guidelines as just one repository people can use, and I was worried about getting drawn into checking details against multiple repositories with different data representations,
interfaces and so on. It could be quite a bit of work. A compromise might be to encourage people to submit NIH records where available (given that records are exchanged between the major
repositories), do a good job of integrating against NIH, but allow people to submit details from another repository with less checking and information exchange if the samples were for some reaosn
not available in NIH.

William

Reply all
Reply to author
Forward
0 new messages