xml vs. simpler formats like csv

20 views
Skip to first unread message

Joseph Picone

unread,
May 11, 2020, 10:56:49 AM5/11/20
to nedc_tuh_eeg
We would like to hear your thoughts on the best way to distribute
annotation information.

We currently use a simple ASCII format that is very close to a csv
format. Its advantages are that it is easy to process using simple
scripts and Linux command line tools.

An alternative is xml. It is a more powerful representation, but it adds
complexity.

We have used xml in speech recognition research for many years.
Advantages include being able to leverage a large amount of software
that supports xml. The disadvantage is the programming expertise
required is more advanced than you typically find in small machine
learning research groups. We are concerned this would make the data less
accessible for our customers.

So, today's question is: would you prefer we distribute annotation
information in an xml or csv format?

Thanks in advance,

-Joe Picone

P.S. v1.5.2 of TUSZ will be released next week.

Joseph Picone

unread,
May 11, 2020, 1:35:04 PM5/11/20
to nedc_tuh_eeg
Thanks to all who responded so quickly. A digest of the responses
(anonymized) is attached.

One of the reasons we asked this question is related to another research
project we are working on - digital pathology (DPATH). In this space,
the tool providers output xml-formatted annotations. The annotations,
like the annotations we have used in speech recognition research, are
hierarchical.

We currently have a schema that works for both EEG and DPATH and are
developing tools that handle both transparently.

The responses today were overwhelmingly in favor of csv, which I
expected. Therefore, if we go the route of xml, we will provide csv
also. We also will continue to provide tools to make it easy to access
the data no matter what format it is in.

Thanks again for many excellent responses.

-Joe

========
Subject: xml vs. simpler formats like csv
Date: Mon, 11 May 2020 10:56:35 -0400
From: Joseph Picone <joseph...@gmail.com>
To: nedc_tuh_eeg <nedc_t...@googlegroups.com>
comments.txt
Reply all
Reply to author
Forward
0 new messages