custom exporters

0 views
Skip to first unread message

David Huynh

unread,
Aug 24, 2011, 9:51:36 PM8/24/11
to google-r...@googlegroups.com
Hi all,

To match the new importer UI, I'm thinking of adding a custom exporter UI, attached. It won't be extensible like the importers, but for the common formats, it should give users a bit more control.

Please let me know what you think. Thanks,

David

refine-custom-export-dialog-1.png
refine-custom-export-dialog-2.png
refine-custom-export-dialog-3.png

Thad Guidry

unread,
Aug 24, 2011, 10:24:15 PM8/24/11
to google-r...@googlegroups.com
Perhaps add an RFC4180 Strict option checkbox ? https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/tags/ARQ-2.8.8/src/com/hp/hpl/jena/sparql/resultset/CSVOutput.java

Add a text input for folks to add a long comment line blob ? (typically in csv files I have since this as a line beginning with a single # char with some text after and ending with \n)

Also, I think "One Cell Per Line" is redundant, since you can have that by setting column separator to \n ?

Thad Guidry

unread,
Aug 25, 2011, 9:33:47 AM8/25/11
to google-r...@googlegroups.com
I just remembered my other special case....

Per row options,
  1. Having the ability to Prefix each row (designated by ending \n) with the row.index (Reordering permanently does change this, we know, but still).  Thinking how to perhaps best handle this is have row.index as the first selectable column in Content & Preview, making it look like any other column, but default as unchecked ?
  2. And a separate option to handle adding sequential line number prefixes at the start of each row "1. " "2. " ?
The special case is generally being have to clearly mark the individual rows that I transformed or changed in Refine, and have a reference number prefix so I can dialog with other team members about specific referenced rows by looking right at the raw csv file in TextMate or Notepad, etc (without using tools that add arbitrary line numbers).  Does that make sense ?

Tom Morris

unread,
Aug 25, 2011, 1:01:08 PM8/25/11
to google-r...@googlegroups.com
That looks good. Do we need any options to control quoting of
strings? (quote character, quoting quotes, etc)

The "convert date to GMT" option raises a whole host of ill-formed
questions in my mind concerning date handling. What do we do on
import? Is there any way to control the default timezone assumed for
imported dates/times?

Tom

David Huynh

unread,
Aug 25, 2011, 1:47:25 PM8/25/11
to google-r...@googlegroups.com
On Wed, Aug 24, 2011 at 7:24 PM, Thad Guidry <thadg...@gmail.com> wrote:
Perhaps add an RFC4180 Strict option checkbox ? https://svn.apache.org/repos/asf/incubator/jena/Jena2/ARQ/tags/ARQ-2.8.8/src/com/hp/hpl/jena/sparql/resultset/CSVOutput.java

Add a text input for folks to add a long comment line blob ? (typically in csv files I have since this as a line beginning with a single # char with some text after and ending with \n)

Interesting-- that's another option for the *sv importers, to skip over #-prefixed lines: "Skip until non-blank lines not starting with ___" (# or // or : or whatever).

Also, I think "One Cell Per Line" is redundant, since you can have that by setting column separator to \n ?

+1.

David Huynh

unread,
Aug 25, 2011, 2:49:45 PM8/25/11
to google-r...@googlegroups.com
On Thu, Aug 25, 2011 at 6:33 AM, Thad Guidry <thadg...@gmail.com> wrote:
I just remembered my other special case....

Per row options,
  1. Having the ability to Prefix each row (designated by ending \n) with the row.index (Reordering permanently does change this, we know, but still).  Thinking how to perhaps best handle this is have row.index as the first selectable column in Content & Preview, making it look like any other column, but default as unchecked ?
  2. And a separate option to handle adding sequential line number prefixes at the start of each row "1. " "2. " ?
The special case is generally being have to clearly mark the individual rows that I transformed or changed in Refine, and have a reference number prefix so I can dialog with other team members about specific referenced rows by looking right at the raw csv file in TextMate or Notepad, etc (without using tools that add arbitrary line numbers).  Does that make sense ?

I see. Wouldn't you need to keep the very original row indexes? E.g., by first adding a column with the expression "row.index" before doing anything else. Otherwise, your text editor can already display the line numbers.

David

Tom Morris

unread,
Aug 25, 2011, 3:00:55 PM8/25/11
to google-r...@googlegroups.com
On Thu, Aug 25, 2011 at 2:49 PM, David Huynh <dfh...@gmail.com> wrote:
> On Thu, Aug 25, 2011 at 6:33 AM, Thad Guidry <thadg...@gmail.com> wrote:
>>
>> I just remembered my other special case....
>> Per row options,
>>
>> Having the ability to Prefix each row (designated by ending \n) with the
>> row.index (Reordering permanently does change this, we know, but still).
>>  Thinking how to perhaps best handle this is have row.index as the first
>> selectable column in Content & Preview, making it look like any other
>> column, but default as unchecked ?
>> And a separate option to handle adding sequential line number prefixes at
>> the start of each row "1. " "2. " ?
>>
>> The special case is generally being have to clearly mark the individual
>> rows that I transformed or changed in Refine, and have a reference number
>> prefix so I can dialog with other team members about specific referenced
>> rows by looking right at the raw csv file in TextMate or Notepad, etc
>> (without using tools that add arbitrary line numbers).  Does that make sense
>> ?
>
> I see. Wouldn't you need to keep the very original row indexes? E.g., by
> first adding a column with the expression "row.index" before doing anything
> else. Otherwise, your text editor can already display the line numbers.

Adding a column with the original index is something I try to do
consistently as a best practice with data sets whether using Refine or
a spreadsheet. Perhaps we should make it an import option to add an
initial index column?

Tom

Thad Guidry

unread,
Aug 25, 2011, 3:01:21 PM8/25/11
to google-r...@googlegroups.com
On Thu, Aug 25, 2011 at 1:49 PM, David Huynh <dfh...@gmail.com> wrote:
On Thu, Aug 25, 2011 at 6:33 AM, Thad Guidry <thadg...@gmail.com> wrote:
I just remembered my other special case....

Per row options,
  1. Having the ability to Prefix each row (designated by ending \n) with the row.index (Reordering permanently does change this, we know, but still).  Thinking how to perhaps best handle this is have row.index as the first selectable column in Content & Preview, making it look like any other column, but default as unchecked ?
  2. And a separate option to handle adding sequential line number prefixes at the start of each row "1. " "2. " ?
The special case is generally being have to clearly mark the individual rows that I transformed or changed in Refine, and have a reference number prefix so I can dialog with other team members about specific referenced rows by looking right at the raw csv file in TextMate or Notepad, etc (without using tools that add arbitrary line numbers).  Does that make sense ?

I see. Wouldn't you need to keep the very original row indexes? E.g., by first adding a column with the expression "row.index" before doing anything else. Otherwise, your text editor can already display the line numbers.

David


Tomato, Tomato ?  Not all text editors add arbitrary line numbers for visualization, or can rather.  But your right, I guess it's just as easy, prior to export, for someone to Add a Column with expression "row.index" and they then have that column.  I was just thinking of making it easy at export time, that's all.  But it's already fairly easy with an expression, so ignore that particular feature request.

What I was looking for was to someone keep a "record of which rows", that I changed.  So, I guess I can just keep track myself with the Flag and add a new column based on Flag, so it shows as True/False during export.  Problem is...we only have Star and Flag... but I guess I just can keep adding columns of manual flags, to help me keep track of changes along rows, but that becomes arduous and cumbersome.  I was hoping for something easy...to keep track of any cells changed along rows during my daily transform processes, and flagging those rows.  (we chatted before about history things, I wonder if undo/redo extract had the extra ability to say WHICH rows, that kind of tracking)   It's a really quirky special case, I know, and this falls more into Auditing Support than anything else actually.  And No, I'm not using it for true Auditing...haha...I'm only trying to fake it guess ? ;)

--
-Thad
http://www.freebase.com/view/en/thad_guidry

David Huynh

unread,
Aug 25, 2011, 3:07:11 PM8/25/11
to google-r...@googlegroups.com
On Thu, Aug 25, 2011 at 10:01 AM, Tom Morris <tfmo...@gmail.com> wrote:
That looks good.  Do we need any options to control quoting of
strings? (quote character, quoting quotes, etc)

I was thinking quoting happens automatically. When would you not want that?

The "convert date to GMT" option raises a whole host of ill-formed
questions in my mind concerning date handling.  What do we do on
import?  Is there any way to control the default timezone assumed for
imported dates/times?

Good point. What should the option at import be? "For dates, unless specified, assume GMT rather than local time zone"? Or do we want actual +/-nn:nn time zone designation?

As for export, since Refine stores dates as java.util.Date, which does not have time zone information, maybe instead of "convert date to GMT" it makes more sense to have "convert to local time zone".

David

Thad Guidry

unread,
Aug 25, 2011, 3:13:08 PM8/25/11
to google-r...@googlegroups.com
This idea, of Auditing or tracking changes is relevant in the Database world, http://en.wikipedia.org/wiki/Transaction_processing Transaction Processing, where I have a need to know which atomic units (in this case, rows) that I processed.  Does that make more sense ? 

--
-Thad
http://www.freebase.com/view/en/thad_guidry

David Huynh

unread,
Aug 25, 2011, 3:25:21 PM8/25/11
to google-r...@googlegroups.com
Yes, but outputting the row indexes *after* having done all the editing isn't giving you any tracking benefit. That's just the same as the line number in your text editor. I must be missing something ...

David

Thad Guidry

unread,
Aug 25, 2011, 3:31:07 PM8/25/11
to google-r...@googlegroups.com
I wasn't explaining well the first time.  It is about tracking and knowing which rows that I made cell changes on.  And seeing the flags in the export file on any of those rows I affected.  (and differentiating from those rows that had no changes)  Whatever can help with that would be great.
--
-Thad
http://www.freebase.com/view/en/thad_guidry
Reply all
Reply to author
Forward
0 new messages