CsvParse

已查看 14 次
跳至第一个未读帖子

Peter Boughton

未读,
2010年6月28日 11:00:132010/6/28
收件人 cfml-convent...@googlegroups.com
With the number of times I see people say "I'm trying to import a CSV
file and the list functions ignore blanks", it's clear there's a need
for this function, to avoid wasting time for every person recreating
this particular wheel.

Of course, there is the cfhttp csv import thing, but that requires a
public URL, and requires knowing about it (someone looking "is there a
csv function" isn't likely to see it).

Why doesn't CFML have a function that accepts either a filename or a
string (like XmlParse) and other suitable arguments
(headers,delimiters,qualifiers) and can be used to provide robush and
reliable importing, so people can stop trying to roll their own using
the List* functions.

Like this:

CsvParse, returns query
Input, required string (filename, url or text)
Delimiter, string default ','
TextQualifier, string default '"'
FirstRowAsHeaders, boolean default true
Columns, optional string (list of colnames)


So yeah, those seem obvious choices, but if anyone might have better
ideas, feel free to give feedback.

After any discussion, I'll go raise feature requests for all three
engines with whatever turns out to be most popular.
(It's such a trivial and yet useful function, I'd hope they'll all
just accept the idea and add it to their next releases.)


So yeah, anyone think I'm crazy here, or have any ideas about this?

Matthew Woodward

未读,
2010年6月28日 11:11:162010/6/28
收件人 cfml-convent...@googlegroups.com
On Mon, Jun 28, 2010 at 8:00 AM, Peter Boughton <boug...@gmail.com> wrote:
So yeah, anyone think I'm crazy here, or have any ideas about this?


Clearly you're not crazy. ;-)

http://openbluedragon.org/manual/?/function/readcsv
http://openbluedragon.org/manual/?/function/tocsv
--
Matthew Woodward
ma...@mattwoodward.com
http://blog.mattwoodward.com
identi.ca / Twitter: @mpwoodward

Please do not send me proprietary file formats such as Word, PowerPoint, etc. as attachments.
http://www.gnu.org/philosophy/no-word-attachments.html

Todd Rafferty

未读,
2010年6月28日 11:07:572010/6/28
收件人 cfml-convent...@googlegroups.com
With Railo, you can make this a built in function by creating a UDF and shoving it into one of the magic directories and you're done.


Peter Boughton

未读,
2010年6月28日 11:22:372010/6/28
收件人 cfml-convent...@googlegroups.com

Heh, well that's good, and also having a toCsv makes sense.

I can't help but think "readcsv" is the wrong name though - so far as
consistency with rest of CFML goes.

Having delimiter/qualifier last probably makes more sense than the
order I had above though.

Hmm, no qualifier argument for toCsv?

Todd Rafferty wrote:
> With Railo, you can make this a built in function by creating a UDF and
> shoving it into one of the magic directories and you're done.

Yep, and that's great, but why I'm posting here (instead of responding
to the recent railo post about this) is that this needs to be the same
for all engines, and part of the documentation for all engines.


I'd also be proposing for it to be Core CFML if there was a hint of a
way how to do that...?

Todd Rafferty

未读,
2010年6月28日 11:49:082010/6/28
收件人 cfml-convent...@googlegroups.com

Write up the function and submit it as a patch in jira.explain openbd already has it built in.

On Jun 28, 2010 11:23 AM, "Peter Boughton" <boug...@gmail.com> wrote:

Matthew Woodward wrote:
> Clearly you're not crazy. ;-)
>

> http://openbluedragon.org/manual/?/funct...

Heh, well that's good, and also having a toCsv makes sense.

I can't help but think "readcsv" is the wrong name though - so far as
consistency with rest of CFML goes.

Having delimiter/qualifier last probably makes more sense than the
order I had above though.

Hmm, no qualifier argument for toCsv?




Todd Rafferty wrote:
> With Railo, you can make this a built in function by creating a UDF and

> ...

Yep, and that's great, but why I'm posting here (instead of responding
to the recent railo post about this) is that this needs to be the same
for all engines, and part of the documentation for all engines.


I'd also be proposing for it to be Core CFML if there was a hint of a
way how to do that...?

Mark Drew

未读,
2010年6月28日 11:55:022010/6/28
收件人 cfml-convent...@googlegroups.com
I agree here, it usually is something like CSVRead, CSVWrite or something?

MD


Mark Drew
Railo Technologies UK
Professional Open Source
skype: mark_railo
email: ma...@getrailo.com
gtalk: ma...@getrailo.com
tel: +44 7971 85 22 96
web: http://www.getrailo.com

Matthew Woodward

未读,
2010年6月28日 12:04:212010/6/28
收件人 cfml-convent...@googlegroups.com
On Mon, Jun 28, 2010 at 8:22 AM, Peter Boughton <boug...@gmail.com> wrote:
I can't help but think "readcsv" is the wrong name though - so far as
consistency with rest of CFML goes.

Consistency with what specifically, the XML functions?
 

Hmm, no qualifier argument for toCsv?


The CSV stuff is brand new, so feel free to create tickets for things you'd like to see changed:
http://code.google.com/p/openbluedragon/issues/list



I'd also be proposing for it to be Core CFML if there was a hint of a
way how to do that...?


This is one of the reasons this list was created, because there are no public submission and discussion outlets for the CFML Advisory Committee. I believe there are representatives from all the engines on this list though.

John Allen

未读,
2010年6月28日 23:46:422010/6/28
收件人 cfml-convent...@googlegroups.com
This functionality should be added to the core CFML language without question.

I missed this in OBD. 

--

Mark Jones

未读,
2010年6月29日 08:31:282010/6/29
收件人 cfml-convent...@googlegroups.com
All I can add to this is that, from experience, it's REALLY EASY to
write a REALLY BAD csv parser that misses things like empty or
qualified cells, and that it's REALLY HARD to write a good one. Since
CSVs (and other text documents) come up a decent amount, having
something built into the engine would definitely help eliminate a lot
of app bugs.

Alan Williamson (aw2.0 ltd)

未读,
2010年6月29日 08:40:352010/6/29
收件人 cfml-convent...@googlegroups.com
The language always surprises me in that there is some holes left in it.

We bubbled up these functions because we had them for the SpreadSheet
and cfhttp functionality. It seemed a shame not to set them free.

Todd Rafferty

未读,
2010年6月29日 08:40:312010/6/29
收件人 cfml-convent...@googlegroups.com
What you're stating is a reason why it shouldn't be built into the app engine. If CSV is that easy to screw up, then that means there isn't a common way of doing it or the developers are building the CSV incorrectly and creating a function like such is on a case by case basis and shouldn't be integrated into the app engine. Any csvimport or such should be built on what is standard for csv files.

Mark Jones

未读,
2010年6月29日 09:00:092010/6/29
收件人 cfml-convent...@googlegroups.com
On Tue, Jun 29, 2010 at 7:40 AM, Todd Rafferty <web...@gmail.com> wrote:
> What you're stating is a reason why it shouldn't be built into the app
> engine. If CSV is that easy to screw up, then that means there isn't a
> common way of doing it or the developers are building the CSV incorrectly
> and creating a function like such is on a case by case basis and shouldn't
> be integrated into the app engine. Any csvimport or such should be built on
> what is standard for csv files.

I would normally agree, except that the *basic* delimited-text format
is well enough defined that a single solution will work for 90% of all
cases and that currently, people build custom solutions by looping
with lists and keep running into the same problems over and over:
empty cells, qualified cells (removing the qualifiers) and embedded
commas within the qualified cells (resulting in two cells when it
should have been one).

It's true that different methods for quoting fields, escaping those
quotes, etc. might exist. But the vast majority of the time you're
dealing with comma-separated, double-quote-qualified files. Being
able to change just those characters allows for basic tab-delimited
files as well. More complex then that, and you're back to custom
code. However, an entry-level programmer could use CsvRead() or
equivalent. I wouldn't expect them to write a *correct* version
themselves, though. I've had to explain to many a developer to
pre-format their lists by replacing ",," with something like ",NULL,"
and then converting back later... and that was ignoring qualified
fields (since they didn't exist in that project... yet).

It's perfectly valid to say they should use cflib or somesuch... but
the CSV functions on cflib are *also* bad, or at least they were the
last time I looked. This one seems okay
http://cflib.org/udf/CSVtoArray but it needs an argument for the field
delimiter (easy enough). Actually, I'm sure this wasn't there lasts
time I looked, because it even seems to handle embedded commas. The
CSVToQuery function doesn't look like it handles qualified cells.

Really, this is more of a "library" function than a core language
function. Perhaps some sort of distinction between core/logic
functions, "Library" functions and "UI" functions would be useful?
Probably not.

Peter Boughton

未读,
2010年6月29日 09:43:362010/6/29
收件人 cfml-convent...@googlegroups.com
> Really, this is more of a "library" function than a core language
> function.  Perhaps some sort of distinction between core/logic
> functions, "Library" functions and "UI" functions would be useful?
> Probably not.

I agree this could be seen as a "library function" rather than "core
logic", but I (and hopefully others) are using "core" based on the
CFML Advisory Committee terms - Core/Extended/Vendor-specific.

Using these three categories, I'd say it fits into the "Core"
definition rather than the other two.

Todd Rafferty

未读,
2010年6月29日 09:49:082010/6/29
收件人 cfml-convent...@googlegroups.com
It wasn't "core" I thought until the agreement that all the engines implement it and put it there. Right now, it's "vendor specific"?

Peter Boughton

未读,
2010年6月29日 10:01:032010/6/29
收件人 cfml-convent...@googlegroups.com
It's not part of the spec at all yet (which by default makes it vendor
specific).

I want it added to "Core" because that means all (compliant) engines
are required to implement it (whereas "Extended" just says, "you
should implement it, like this", and Vendor-specific means it can be
implemented differently by any/all engines).

It would be a shame for something so basic as CSV to not be
implemented uniformly across the engines.

Todd Rafferty

未读,
2010年6月29日 10:15:172010/6/29
收件人 cfml-convent...@googlegroups.com
I suggest getting it nailed down ( the function, with all the agreed upon attributes, etc ) at least in one engine. If OpenBD's functions are good enough base to start from, we can go that route. We can draft up a function to implement in Railo and test it via magic functions directory and We can submit the function as a patch. We'll see what we can do to get into the core of Railo by default - I can't see any reason why they would object.

Mark Jones

未读,
2010年6月29日 10:47:362010/6/29
收件人 cfml-convent...@googlegroups.com
> I agree this could be seen as a "library function" rather than "core
> logic", but I (and hopefully others) are using "core" based on the
> CFML Advisory Committee terms - Core/Extended/Vendor-specific.

Oops. Yes, sorry. I didn't mean to confuse the terminology. That's
the definition of "core" that should be used by this list.

Peter Boughton

未读,
2010年6月29日 10:56:482010/6/29
收件人 cfml-convent...@googlegroups.com
Well as I mentioned previously, there's some things I'd personally do
differently to OpenBD's implementation.

So, what I'll do (this evening) is create a proposed set of functions
and test cases and submit them to this list for review/comments/etc.

If the OBD team go "yay, we love it", then all is good. Otherwise we
can perhaps discuss/vote on the differences with an aim of coming up
with something everyone (on this list, at least) is happy with, and
then proceed as you said.

Sound good?

Matthew Woodward

未读,
2010年6月29日 11:15:212010/6/29
收件人 cfml-convent...@googlegroups.com
On Tue, Jun 29, 2010 at 7:56 AM, Peter Boughton <boug...@gmail.com> wrote:
Otherwise we
can perhaps discuss/vote on the differences with an aim of coming up
with something everyone (on this list, at least) is happy with, and
then proceed as you said.

Sound good?


Perfect. Exactly what this list is all about!

Baz

未读,
2010年6月29日 13:06:422010/6/29
收件人 cfml-convent...@googlegroups.com
Someone alluded to this earlier, but would it make sense to generalize it a tad more to include other formats, like tab-delimited, and call the function something like:

- TextParse(ColumnSeparator, LineSeparator) or StringParse(ColumnSeparator, LineSeparator)

Baz


--

Peter Boughton

未读,
2010年6月29日 08:01:012010/6/29
收件人 cfml-convent...@googlegroups.com
> Consistency with what specifically, the XML functions?

Well, the order is the first thing - we've got FileRead, ImageCrop, XmlNew, etc.
(Even though I would prefer <verb><object>, consistency with existing
functions is important.)

And I'd pick "parse" over "read" because its primary function is
parsing data - same as XmlParse - whether a file is read before data
parsed is less relevant here.
(I put this in a different category to
FileRead/ImageRead/SpreadsheetRead/etc type of tags, where their
primary function is reading the files).


> The CSV stuff is brand new, so feel free to create tickets for things you'd
> like to see changed:
> http://code.google.com/p/openbluedragon/issues/list

Yep, I'll certainly do that, (once we've got a consensus).
I'll do likewise on the Railo Jira, and wherever that ACF bugtracker is.


> This is one of the reasons this list was created, because there are no
> public submission and discussion outlets for the CFML Advisory Committee. I
> believe there are representatives from all the engines on this list though.

Good, I hoped that was the case. :)

Though it would be nice to have an official channel, (or at least
acknowledged/unofficial) - any chance of getting this group
specifically mentioned on the OpenCFML wiki?

Alan Williamson (aw2.0 ltd)

未读,
2010年6月29日 17:08:432010/6/29
收件人 cfml-convent...@googlegroups.com
> Though it would be nice to have an official channel, (or at least
> acknowledged/unofficial) - any chance of getting this group
> specifically mentioned on the OpenCFML wiki?

Yes we can do that.

consider that OFFICIAL! ;)

denstar

未读,
2010年7月22日 15:34:292010/7/22
收件人 CFML Conventional Wisdom
On Jun 28, 9:07 am, Todd Rafferty <web...@gmail.com> wrote:
> With Railo, you can make this a built in function by creating a UDF and
> shoving it into one of the magic directories and you're done.

I wrote a built-in function (I called it cfcsv) for Railo using the
opencsv-2.1.jar.

It's in the railoprojects svn repo.

What do you guys think about leveraging an existing project like
opencsv?

:Den

--
Every man's highest, nameless though it be, is his 'living God'.
James Martineau

Alan Williamson

未读,
2010年7月22日 15:39:202010/7/22
收件人 CFML Conventional Wisdom
OpenBD already has a core set of rich csv parse functions: csvread() and csvwrite() see

http://www.openbluedragon.org/manual/

--
aw2.0 ltd www.aw20.co.uk

denstar

未读,
2010年7月22日 15:57:492010/7/22
收件人 cfml-convent...@googlegroups.com
On Thu, Jul 22, 2010 at 1:39 PM, Alan Williamson <al...@aw20.co.uk> wrote:
> OpenBD already has a core set of rich csv parse functions: csvread() and csvwrite() see
>
> http://www.openbluedragon.org/manual/

That's fine and dandy (I had to enable javascript to get the full
effect, but those are beautiful docs!), I'm more than happy to let
other people name stuff and whatnot. That's the hard part! =)

I was more wondering about, like, maybe using the same underlying java
libraries or something along those lines.

BTW we'll probably want to add escape chars and newline stuff, neh?
At least I exposed those in mine, and they were useful.

HSQLDB has some freakishly nice ways of dealing with CSV files, just
to toss that out there as well.

:Den

--
The incarnation is true, not of Christ exclusively, but of Man
universally, and God everlastingly.
James Martineau

denstar

未读,
2010年7月22日 16:02:162010/7/22
收件人 cfml-convent...@googlegroups.com
On Thu, Jul 22, 2010 at 1:57 PM, denstar wrote:
...

> I was more wondering about, like, maybe using the same underlying java
> libraries or something along those lines.

To be clear, this is an "in general" type of question, versus a CSV
specific one.

Should the engines each write their own implementations of
functionality X, or /lean/ that way at least?

There are pros and cons to each approach... compatibility being at the
heart of it.

:Den

--
The pinafore of the child will be more than a match for the frock of
the bishop and the surplice of the priest.
James Martineau

Peter Boughton

未读,
2010年7月22日 20:36:132010/7/22
收件人 cfml-convent...@googlegroups.com
Doh! I completely forgot about this thread. :(

Anyway, you're basically asking "should the engines re-invent the
wheel", and the obvious answer is probably not!

One of the key benefits to Open Source is not wasting effort re-doing
things that have already been done.

Unless there is a specific reason/benefit in re-writing a particular
functionality, the default case for adding functionality should be to
find a suitably licensed existing project and integrate it in a clean
CFML way.

That way, we have more time to spend on progressing other things!

denstar

未读,
2010年7月22日 21:08:222010/7/22
收件人 cfml-convent...@googlegroups.com

While that sounds logical enough, I think CFML is one of the absolute
worst as far as Not Invented Here syndrome goes. =)

We're slowly bucking that trend, is the good news. At least I think we are.

The only argument, specifically for leveraging java libs vs writing
our own java, is things like classloader issues. Theoretically Java 7
will have some magic for dependency management, but I'm not holding my
breath. :)

So we can avoid library "issues" by rolling our own(s), but then the
incompatibility is shifted into the actual engines, which, IMO, is
probably potentially harder to manage.

Either way, the various engine folks should probably try to
collaborate on included java libraries (also IMO) to some extent.

Eh. *shrug*

Thanks for the reply, Peter! No need for apologies, I feel this stuff
is "evolution", and will happen one way or another. :)

:Denny

--
Democracy is the road to socialism.
Karl Marx

回复全部
回复作者
转发
0 个新帖子