Browse view: allow us to select the data in oen column

15 views
Skip to first unread message

mark

unread,
Jun 16, 2008, 9:39:39 AM6/16/08
to FLEx list
Sometimes you want to export just one column of data in the browse
view. For example, I just made a custom field with more or less
automatically generated syllable structures for all my lexeme entries.
I now want to calculate some rough statistics, for example I want to
find out if ideophones on average tend to have more syllables than
other words. To do this I'll simply use the character counting
abilities of MS Excel.

This would be trivial if FLEx in the Browse view would let me select
the values in one column. However, it doesn't, so anytime I want to
get data like this out I have to sanitize it and remove unneeded
columns in another application before I can get to the actual work.

Andreas_Joswig

unread,
Jun 16, 2008, 10:10:47 AM6/16/08
to flex...@googlegroups.com
I second this request.

Andreas

Craig Farrow

unread,
Jun 16, 2008, 10:53:25 PM6/16/08
to flex...@googlegroups.com

How are you getting the data out of FLEx into Excel? It only takes a
couple of seconds to delete extra columns in Excel, so I'm wondering
what other work/process you are going through to 'sanitize' your data...

Craig.


mark

unread,
Jun 17, 2008, 6:01:38 AM6/17/08
to FLEx list
> How are you getting the data out of FLEx into Excel? It only takes a
> couple of seconds to delete extra columns in Excel, so I'm wondering
> what other work/process you are going through to 'sanitize' your data...

Have you tried it? Trouble is, you don't get neat columns in the copy/
paste output from FLEx, as FLEx mixes tabs and paragraph marks as
delimiters, and great problems arise if a column in FLEx contains more
than one thing (e.g., twice the grammatical category because a word
has two senses) because then the tab count goes awry and stuff appears
in the wrong column. Excel's text import wizard doesn't help because
it doesn't recognize the paragraph mark as a possible delimiter. So I
usually turn to Word's Text>Table conversion feature but still have to
fix a lot manually because of the delimiter problem.

Craig Farrow

unread,
Jun 17, 2008, 8:49:37 AM6/17/08
to flex...@googlegroups.com
Mark,

Have you tried turning off the columns you don't want in FLEx? I saw the trouble with extraneous paragraph marks, but turning off definition field and others I got a clean output with Select All, Copy (with three columns.) As I understood your first message you only want one or two columns of data anyway. Does that help, or are you still stuck?

Craig.

17/06/2008 6:01 p.m. dï, mark pišdimiš:

mark

unread,
Jun 19, 2008, 6:15:04 AM6/19/08
to FLEx list
On Jun 17, 2:49 pm, Craig Farrow <craig_far...@sil.org> wrote:
> Mark,
>
> Have you tried turning off the columns you don't want in FLEx? I saw the
> trouble with extraneous paragraph marks, but turning off definition
> field and others I got a clean output with Select All, Copy (with three
> columns.) As I understood your first message you only want one or two
> columns of data anyway. Does that help, or are you still stuck?
>
> Craig.


Hi Craig, yes, still stuck, because I need one of the problem-causing
columns for filtering. Say I want to display the data from a certain
custom field, but only for nouns. I will have to display the
'Grammatical info' column to filter by part of speech. There is no way
to get rid of this column without also losing the filter.

All this would be solved of course by allowing CSV or tab-delimited
text export of a certain view; or by making the copy/paste
functionality respect the columns.

Mark

Marlin_...@gial.edu

unread,
Jun 20, 2008, 2:15:57 PM6/20/08
to flex...@googlegroups.com

Mark,

If you can, bulk copy your syllable structures custom field data into one of the Entry level fields that are visible in Bulk Edit Senses view. I would recommend Import Residue (Entry), though Bibliography and Etymology fields are viewable too. Then you might access your data in Bulk Edit Senses view. It does not have a problem copying missing data or multiple senses like the Entry views do (at least in FLEx 5.2 beta. I don't know about 5.0). It puts a tab character after each column, filled or not filled, and a paragraph at the end of the line. Bulk Edit Senses gives you one line per sense and multiplies the entry level data to appear on each line. You might then be able to do your statistics count in FLEx, or Select All twice and paste into Excel or Word.

Hope I got it correctly.

Marlin

mark

unread,
Jun 24, 2008, 4:30:30 AM6/24/08
to FLEx list
Thanks Marlin, I'll try that next time. (Also using 5.2 here.) It
sounds like a good workaround.

Mark

David Baines JarMail

unread,
Jun 26, 2008, 11:00:53 AM6/26/08
to flex...@googlegroups.com
Hi All,

I'm inclined to agree with Mark about exporting as CSV. I think that there would be many views of the data that can be displayed in
FLEx which users would like to be able to deal with in a spreadsheet or table. Not many users will be able to deal with XML
themselves, so they are dependant on others providing tools.

An "Export" option that exports the current view in real CSV would be useful. It would seem to be something that would be simple to
implement, though no doubt there are problems like how do deal with multiline fields, and repeating the entry data for each sense.
This could be useful for gathering statistics, or any kind of analysis that Flex doesn't (yet) do. I'm sure that more uses would be
found for the feature if it were there.

Another idea would be that if we export to CSV it would perhaps be good to be able to import from CSV too.....

However, I like using spreadsheets so this is a fairly biased view. It would be good to know whether there are many linguists that
would use the feature before adding it.

FWIW.
David.

Heidi James Rosendall

unread,
Jun 27, 2008, 2:47:52 AM6/27/08
to flex...@googlegroups.com
I agree. I would like to see Fieldworks export and import csv. This would be extremely useful. It would make the Fieldworks data accessible outside of the program to many more of our people who are computer savvy and like to manipulate their data but are not on the cutting edge of computer knowledge (like me and my husband).

Heidi Rosendall
Wycliffe Nigeria

Robert Hedinger

unread,
Jun 26, 2008, 3:24:20 PM6/26/08
to flex...@googlegroups.com
Does CSV assume you have each field only once in an entry? In dictionaries
you commonly get multiple sense bundles, and there can be multiple example
bundles in one entry. Would this be a problem?

Robert

----- Original Message -----
From: "David Baines JarMail" <david_...@sil.org>
To: <flex...@googlegroups.com>
Sent: Thursday, June 26, 2008 4:00 PM
Subject: [FLEx] Re: Browse view: allow us to select the data in oen column


>

David Baines

unread,
Jun 27, 2008, 6:17:46 AM6/27/08
to flex...@googlegroups.com
Hi Robert,

I'm not sure that I've followed your question...

Exporting as CSV shouldn't make any assumptions, it should just export the data in the columns of the current view. When the CSV
file is opened in a spreadsheet it should look similar to the data in FLEx.
All of the data to be exported is already in a table. The aim of the export would be to facilitate the transfer of that view into a
spreadsheet.

So for Lexicon Edit, it would only be the data in the Entries pane that is exported. For Lexicon - Browse, Bulk Edit Entries, Bulk
Edit Senses, and Bulk Edit Reversal Entries
the data is already in a tabular format and it shouldn't be difficult to export it.

I imaginge that the option wouldn't be available for the Dictionary since it isn't already in a tabular form.

It would be good to provide the option for all the tabular data in FLEx, including most of the tools in the Grammar area, and the
concordance tools. Wherever there is data already in a table.

In Bulk Edit Senses List Choice, on the development version, one can use cut and paste to get the data from the table into a
spreadsheet. In this case, entry information, such as Headword, is repeated for each sense. Unfortunately this doesn't work in
Bulk Edit Entries, or Lexicon Edit. The cells are not respected and data ends up in the wrong column.

Does this answer your question?

David B.

Robert Hedinger

unread,
Jun 27, 2008, 6:48:00 AM6/27/08
to flex...@googlegroups.com
Dear David,

I assume that CSV as a spreadsheet so you would only have each field once.
If this is so, my concern would be that in a dictionary, not a simple
wordlist, you can have the same field many times in one dictionary entry. I
can't see how this would work with a spread sheet. But perhaps I don't
understand CSV and need educating.

Robert

PS did you get my zip sent via YouSendIt ?

Robert Hedinger

unread,
Jun 27, 2008, 6:57:49 AM6/27/08
to flex...@googlegroups.com
Sorry, I didn't read your email properly. I see you would only use it for
the bulk edit view. So ignore my comments/questions.

Robert

----- Original Message -----
From: "David Baines" <david_...@sil.org>
To: <flex...@googlegroups.com>
Sent: Friday, June 27, 2008 11:17 AM

John_T...@sil.org

unread,
Jun 27, 2008, 10:30:26 AM6/27/08
to flex...@googlegroups.com

>
> Does CSV assume you have each field only once in an entry? In
dictionaries
> you commonly get multiple sense bundles, and there can be multiple
example
> bundles in one entry. Would this be a problem?
>
As David suggested, I think we would basically just output what is in the
view. That can be one line per entry, with some cells containing lists of
sense information, or one per sense, with entry-level information
duplicated.

But there are a few tricky points:

1. How do we delimit fields? Strict CSV uses commas...but data could well
contain commas, confusing things. At least some CSV programs allow quotes
to be put around data containing commas...but data could contain quotes,
too. Then you start wondering how different possible clients escape quotes.
On the whole, I would be inclined to actually delimit with tab characters
between fields. FieldWorks data can't have tab characters embedded, so this
is unambiguous, and some (most?) CSV programs, certainly Excel, can also
handle tab-delimited. Of course we could make it configurable, but the more
complex you make the task, the less likely it gets done.

2. What do we do about multiple paragraphs in a cell? Some of our views
show multiple values on separate lines within a cell (e.g., sense glosses
if looking at entries, or multiple semantic domains). We don't want
newlines in a CSV format except one at the end of each record. Would it
make sense to use a slash or a vertical bar to separate 'lines' within a
cell? Any better ideas?

Being able to import is much harder. Some issues that immediately come to
mind:

- Will imported entries replace existing ones? All the existing ones, or
just ones with the same headword? Does 'same headword' mean lexeme form,
citation form, or what? Does it include homograph number? Or will all the
imported entries be additional, and it will be up to the user to merge
unwanted homographs? Or should we merge automtatically (perhaps only if the
imported entry is identical)?

- How do we tell how the columns in an imported file correspond to fields
in the FW database? How do we even know whether the CSV represents one line
per entry or one per sense?

- How do we match up sense data in an Entry import (e.g., if one column has
a list of glosses and another a list of semantic domains, which domains
belong to which senses)? For that matter, how do we even know how multiple
glosses corresponding to multiple senses are delimited?

The basic problem is that the row-oriented data represents a selection and
simplification of what is really in FLEx. To import it we have to
regenerate the internal complexity. Being able to import anything we can
export might be too hard a goal for any time soon.

That said, there are probably some simple cases we could fairly easily
handle, such as assuming a single entry/sense per line and importing to an
empty database. And that much would be helpful for some situations.

John Thomson

mark

unread,
Jul 1, 2008, 7:11:58 AM7/1/08
to FLEx list
> 1. How do we delimit fields?

Tab-delimited seems the best default indeed. An added option to
customize this would be useful for those who need it to be different.

> 2. What do we do about multiple paragraphs in a cell? Some of our views
> show multiple values on separate lines within a cell (e.g., sense glosses
> if looking at entries, or multiple semantic domains). We don't want
> newlines in a CSV format except one at the end of each record. Would it
> make sense to use a slash or a vertical bar to separate 'lines' within a
> cell? Any better ideas?

I think a slash or a vertical bar would be perfectly fine; in fact any
character that is very unlikely to be used in the data will be fine.
As above, a preference to customize the character would solve any
possible problems.

As for importing, I agree that this is far from trivial. I do hope
that precedence will be given to the export option, since that seems
relatively easy to implement.
Reply all
Reply to author
Forward
0 new messages