XSL for CSV

28 views
Skip to first unread message

Jonathan

unread,
Nov 21, 2009, 2:39:44 AM11/21/09
to FLEx list
I want to export my lexicon as a csv file. I think I need an XSL
file. Before I try to write one I thought I would ask if anyone has
already done something like this. It need not be complex.

Kevin Penner

unread,
Nov 26, 2009, 7:28:09 PM11/26/09
to flex...@googlegroups.com
I also need to export my FLEx lexicon to a csv file. Has anyone responded
to this? I need a quick solution for a project in my Master's program.
Kevin
> --
> You received this message because you are subscribed to the discussion group
> "FLEx list". This group is hosted by Google Groups and is open for anyone to
> browse.
> To post to this group, send email to flex...@googlegroups.com
> To unsubscribe from this group, send email to
> flex-list-...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/flex-list


Jeff and Peg Shrum

unread,
Nov 27, 2009, 5:00:47 AM11/27/09
to flex...@googlegroups.com


Kevin Wrote:

I also need to export my FLEx lexicon to a csv file. Has anyone responded
to this? I need a quick solution for a project in my Master's program.
Kevin


And Jonathon wrote:

>
> I want to export my lexicon as a csv file. I think I need an XSL
> file. Before I try to write one I thought I would ask if anyone has
> already done something like this. It need not be complex.


------------------------------------------------

Could it be that you are trying to print out a formatted copy of your
lexicon? Have you tried "printing Solutions"? It will export a formatted
copy of your lexicon in either PDF or Open Office formats.

Jeff S.
Mozambique

David Baines

unread,
Nov 27, 2009, 5:06:55 AM11/27/09
to flex...@googlegroups.com, Jonathan Dailey
Hello All,

I don't have a ready solution to convert LIFT XML into a csv file. However there are a couple of things that you can try to do this.
Try the first one first, it is more likely to give satisfactory results.

1) Use copy and paste from the Bulk Edit tool while it is displaying senses.
2) Use copy and paste from the Browse tool or browse pane - this doesn't work well if you have multiple senses per entry.


Here are the steps in more detail:

1) Open the FLEx project to the Bulk Edit tool, choose the Bulk Copy tab, and in the Target field select a sense level field such as Glosses. This has the effect of ensuring that the table displays one line per sense. It avoids have the entries displayed with some multi-line cells.

2) Configure the view to contain what you would like in the csv file, add the columns necessary, and filter and sort as you wish.

3) Select the whole table, (Press Ctrl+A)

4) Wait until FLEx is ready, and is showing that all of the records are selected. 

5) Copy the data to the clipboard, (Press Ctrl+C)

6) Paste this into a spreadsheet (press Ctrl + V)

7) Save as CSV

I hope that this helps, though it is far from a perfect solution.
David Baines.

Beth

unread,
Nov 27, 2009, 10:44:09 AM11/27/09
to flex...@googlegroups.com
On Nov 26, 2009, at 6:28 PM, Kevin Penner wrote:

> I also need to export my FLEx lexicon to a csv file. Has anyone
> responded
> to this? I need a quick solution for a project in my Master's
> program.

Here is something someone else has recommended in the past. I have
not done it myself, so I might have a few details wrong, but I think
you'll get the idea.

1. Export your lexicon using whichever SFM export contains the
contents you want. (If you are in Dictionary view and use the
Configured Dictionary export to SFM, you get to configure in FLEx
which fields show, and what the sort and filter is.)

2. Use a text editor to change <CR>\ to <tab>\. (That is, convert
the carriage return just before all the backslashes into a tab. If
there were two carriage returns between each record, then now there
should still be a carriage return between the records, and all the
fields should be on one line, separated by tab.)

3. Load this file into Excel.

4. Use find/replace to get rid of the SFMs.

If you had an editor that could do regular expressions and you're
comfortable with them, you could convert all the SFMs to tabs in step
2 and that would eliminate step 4. If you really want commas instead
of tabs between the fields, then use that instead, but you'll run
into trouble if there are commas in any of your fields--tabs are
probably safer.

I bet someone could easily write a Perl or Python script to convert a
basic SFM file to csv, or maybe already has.

-Beth



Piotr Bański

unread,
Nov 27, 2009, 12:11:32 PM11/27/09
to flex...@googlegroups.com
Greetings All,

This is surely not the most fortunate way to start your acquaintance
with a new group, but my curiosity (regarding the development team's
policy and/or my own ignorance) wouldn't let me pass the opportunity to ask:

Beth> I bet someone could easily write a Perl or Python script to
Beth> convert a basic SFM file to csv, or maybe already has.

Both the results posted so far were of the simple
tinker/kludge/find-and-replace type. Just how trivial would it be for
FLEx itself to cleanly implement this functionality? It exports
next-to-newest LIFT, why not something as simple as CSV?

Naturally, my instinct *is* telling me to sit quietly and merely watch
this exchange, but, on the other hand, I've come to FLEx as a possible
solution to my problems regarding manipulation of various kinds of
lexical databases, and this issue may be near the heart of things, from
my perspective. I'm really curious to know about the software's
import/export limitations and also the dev team's stance on enhancing
them (this also goes for LIFT or any other XML format).

Thanks a bunch,

Piotr

Kevin Penner

unread,
Nov 27, 2009, 12:50:57 PM11/27/09
to flex...@googlegroups.com
Thanks!  This worked for me.  I haven’t tried any of the other possible solutions.  I hope that someday soon there will be an “export lexicon to csv” option in FLEx.
Kevin



From: David Baines <david_...@sil.org>
Reply-To: <flex...@googlegroups.com>
Date: Fri, 27 Nov 2009 11:06:55 +0100
To: <flex...@googlegroups.com>
Cc: Jonathan Dailey <jonathan...@gmail.com>
Subject: Re: [FLEx] XSL for CSV

Mike Maxwell

unread,
Nov 27, 2009, 1:20:52 PM11/27/09
to flex...@googlegroups.com
Kevin Penner wrote:
> Thanks! This worked for me. I haven�t tried any of the other possible
> solutions. I hope that someday soon there will be an �export lexicon to
> csv� option in FLEx.

It seems like a CSV file is suited only to a very simple lexicon
structure. In particular, I would think it wouldn't easily work if you
had repeating fields (like one or more senses). (I can imagine ways
around the problem, but they would be pretty messy to implement.) And
of course you'll need to escape any commas that are in your entries
(like in example sentences or even definitions).

But maybe I don't understand why you want to do this.
--
Mike Maxwell
What good is a universe without somebody around to look at it?
--Robert Dicke, Princeton physicist

Kevin Penner

unread,
Nov 27, 2009, 9:43:29 PM11/27/09
to flex...@googlegroups.com
There are a number of occasions when one wants to be able to export just a
simple word list when several fields, say for a writing a paper, for
teaching a class, etc. On this occasion, I'm going to use the csv word list
to create a data set for the statistical program R.
Kevin


> From: Mike Maxwell <max...@umiacs.umd.edu>
> Reply-To: <flex...@googlegroups.com>
> Date: Fri, 27 Nov 2009 13:20:52 -0500
> To: <flex...@googlegroups.com>
> Subject: Re: [FLEx] XSL for CSV
>
> Kevin Penner wrote:
>> Thanks! This worked for me. I haven¹t tried any of the other possible
>> solutions. I hope that someday soon there will be an ³export lexicon to
>> csv² option in FLEx.
>
> It seems like a CSV file is suited only to a very simple lexicon
> structure. In particular, I would think it wouldn't easily work if you
> had repeating fields (like one or more senses). (I can imagine ways
> around the problem, but they would be pretty messy to implement.) And
> of course you'll need to escape any commas that are in your entries
> (like in example sentences or even definitions).
>
> But maybe I don't understand why you want to do this.
> --
> Mike Maxwell
> What good is a universe without somebody around to look at it?
> --Robert Dicke, Princeton physicist
>

Allan Johnson

unread,
Nov 27, 2009, 9:44:09 PM11/27/09
to flex...@googlegroups.com
Piotr Bański wrote:
> Both the results posted so far were of the simple
> tinker/kludge/find-and-replace type. Just how trivial would it be for
> FLEx itself to cleanly implement this functionality? It exports
> next-to-newest LIFT, why not something as simple as CSV?
>
> Naturally, my instinct *is* telling me to sit quietly and merely watch
> this exchange, but, on the other hand, I've come to FLEx as a possible
> solution to my problems regarding manipulation of various kinds of
> lexical databases, and this issue may be near the heart of things, from
> my perspective. I'm really curious to know about the software's
> import/export limitations and also the dev team's stance on enhancing
> them (this also goes for LIFT or any other XML format).
>

Hi Piotr,

I'm curious what purpose you all are finding for CSV exports. I wouldn't
have thought that such a simple format would be all that useful for
typical dictionary data. What happens in CSV format when several fields
of information need to be nested within another field? I'm not sure, but
I don't think it handles that. So a "clean" export to CSV might not be
all that trivial. Seems that you'd likely be losing some data in the
process.

Allan

Mike Maxwell

unread,
Nov 27, 2009, 11:02:06 PM11/27/09
to flex...@googlegroups.com
Kevin Penner wrote:
> There are a number of occasions when one wants to be able to export just a
> simple word list when several fields, say for a writing a paper, for
> teaching a class, etc. On this occasion, I'm going to use the csv word list
> to create a data set for the statistical program R.

Ah--well, if it's *just* a list of words and POSs or some such (and you
don't want the inflected forms, or the language doesn't have any
significant inflectional morphology), then it might be simple to extract
the words from an XML dump. XSL would be the standard, but xml_grep
might work, too.

Piotr Bański

unread,
Nov 28, 2009, 6:20:06 AM11/28/09
to flex...@googlegroups.com
Hello Allan,

Thanks for your reply.

> I'm curious what purpose you all are finding for CSV exports.

My thinking wasn't driven directly by my own need for CSV: two people
asked about CSV in this thread, and others offered hints coming either
from their own experience or from memory of past messages. Conclusion:
some users apparently need this. Question: why let them kludge around or
ask for help, why not let the software do it cleanly, at the user's
responsibility?

I don't imagine people wanting to export complex dictionaries to CSV,
but there are lots of simple-but-useful glossaries floating around, so
I'm guessing that may be part of it. Kevin Penner gave another example:
CSV is pretty popular in statistics, hence a natural format to ask for
for statistical purposes (his is potentially a different case, involving
selective export from possibly complex databases rather than just simple
glossaries). In other words, there is a range of software packages
accepting CSV as input (rather than MDF or LIFT), so allowing FLEx to
export into CSV might make a few people happy.

Now, I spoke up in this thread not exactly in order to start waving
flags with "go CSV!" on them, I merely suggested that CSV export might
be an inexpensive way of giving some users what they need, and admitted
to a degree of curiosity wrt what the dev team think of this kind of
enhancements. I'd normally sit quietly and watch, but had 5 minutes too
many at the wrong moment, perhaps ;-)

As far as losing data, part of it might be intended (you only export
FLEx->CSV, for a specific purpose, no round-trips intended), and part of
it might be in column headers: the serializer would have to look for the
most complex entries and then use empty fields (",,,,") for entries
lacking the complexity. Nesting is yet another problem for CSV, but
that's why I mentioned the user's responsibility: "If you really want
CSV, I'll give you CSV, just be aware of the limitations of the format
and better make sure your database is uniform or that you select some
sensible fields for the export", FLEx might say.

Kind regards,

Piotr

Allan Johnson writes:

Jonathan

unread,
Nov 28, 2009, 7:18:27 AM11/28/09
to FLEx list
My reason for the original question was to be able to upload a gloss
list to Google Translator. I wanted to test it out and see what it
could do. There is a point where one can upload a gloss csv file.
yes, it is pretty simple basically it is a word for word gloss list,
part of speech, and a description if you want one. It also requires
the whole thing in UTF-8. they have a very sizable list of of
languages that they are prepared to have translations in. I would
post a link but I think I would disapear into spam purgatory so I will
refrain.

Jonathan

David Baines

unread,
Nov 28, 2009, 8:11:20 AM11/28/09
to flex...@googlegroups.com
Hello All,

It would be good to integrate a CSV export function into FLEx for the reasons already mentioned in this thread. It will likely be quite a while before there is a new version of FLEx with these functions. I suspect that some kind of script that converted a SFM export to CSV would be useful in the meantime. - Any volunteers?

FLEx works well with lexicons, since it was designed to deal with them. It is easy to imagine that one could arrange a lexicon nicely in FLEx and then wish to process some of the data in another tool. For example : printing out just a few lines of the browse pane, I can select the rows in FLEx and press print and he print dialog opens but the "Print Selection" option is greyed out. If I could paste those few lines correctly into a spreadsheet I could use the print function there. This is a trivial example but many users may be more familiar with spreadsheets or their other favorite tools than they are with FLEx.  It shows also that some of the reasons that users might want CSV, or copy and paste functionality have nothing to do with the fact that it is Lexical data. I like this example because it highlights one of the difficulties that the programmers face: knowing what users want or need.  Developers and Linguists cannot imagine all of the different ways that Users, a much broader group, would like to use FLEx. We should therefore, do a lot to ensure that the data that is in FLEx can be exported for processing in many different ways, and in ways beyond those that we can imagine. Supporting common standards such as csv is one way of doing this. It is a much shallower learning curve for users to manipulate CSV than XML and so there are likely to be more users that would find uses for this.

Here are two functions that I imagine could be widely useful. I suspect that neither of which are as easy to implement as they might seem.

First function:
The data from a table view could be selected, copied and pasted into a spreadsheet or other program that deals with data in table form.
A user could reasonably expect that pasting data from the browse pane into a spreadsheet would result in something that looks the same. That is all the data in one cell in FLEx is in the same cell in the spreadsheet. All the carriage returns in each cell in FLEx would also be retained in the spreadsheet. In this way a single entry with multiple senses would be displayed in the spreadsheet in much the same way as it is FLEx. While this might appear trivial were it already implemented, I suspect that it is actually extremely complicated to achieve. The clipboard is limited in size, and it might not cope robustly with all the scripts that FLEx can cope with. So for some users it is possible that this solution isn't available to them at all.

Second function:
Export any table view as a CSV file.

As Mike points out a CSV file is only suitable for a simple structure.  I imagine that one workaround would be to add manual linefeed characters within cells to indicate where the data should be in the cell with respect to the data in other cells in the same row. Alternatively one could hide cell borders, so that cells look like they are merged, but there would actually be one row per sense. However I don't think that it is possible to encode that kind of information (border styles) within a CSV file. So then perhaps a macro would be needed, and as Mike said, it gets messy.

All the best,
David Baines.




Mike Maxwell wrote:
It seems like a CSV file is suited only to a very simple lexicon structure. In particular, I would think it wouldn't easily work if you had repeating fields (like one or more senses). (I can imagine ways around the problem, but they would be pretty messy to implement.) And of course you'll need to escape any commas that are in your entries (like in example sentences or even definitions). But maybe I don't understand why you want to do this.
-- Mike Maxwell What good is a universe without somebody around to look at it? --Robert Dicke, Princeton physicist


Reply all
Reply to author
Forward
0 new messages