N00b question: how to deal with quotes and double quotes in cfhttp csv files

110 views
Skip to first unread message

Peter Whitby

unread,
Sep 5, 2016, 7:06:33 AM9/5/16
to Lucee
Evening, All.

I'm converting a truly ancient application from CF6 to lucee.

Everything works great, the actual code migration went very smoothly and I've managed to get everything working, as well as cleaning up a few old irritations. So far, so good.

However, I'm having a problem when using the production data. The app uses cfhttp to create an sql-like result set from a CSV file and, when a single or double quote appears in a field, lucee appears to ignore the field separators, line feeds , etc and keeps reading until it finds another instance of the same character. This may be several lines/records later and usually ends with a an error stating that there is an incorrect number of columns in the input line. 

I'm running on Ubuntu server 16.10 with the latest Apache2. Lucee was installed with 'lucee-5.0.0.252-pl0-linux-x64-installer.run'.

cfhttp is:
<cfhttp url=#csv_file#
        method="get"
        delimiter="|"
        textqualifier=""
        Name="BuildQuery">

I've tried encasing the fields of the csv file in different textdelimiters (' " ` and others); tried different separator characters and I'm all but tearing my hair out. I could write a preprocessing filter script but that wasn't needed in CF6 and I'm wondering why it's needed in lucee.

What have I missed?

Let me know if more doco is required and I'll be happy to post it.

Thanks,
Pete

AJ Mercer

unread,
Sep 5, 2016, 10:17:10 PM9/5/16
to lu...@googlegroups.com

--
Get 10% off of the regular price for this years CFCamp in Munich, Germany (Oct. 20th & 21st) with the Lucee discount code Lucee@cfcamp. 189€ instead of 210€. Visit https://ti.to/cfcamp/cfcamp-2016/discount/Lucee@cfcamp
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/66f1baf0-e710-4d8e-8f21-373aba3d0892%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Peter Whitby

unread,
Sep 5, 2016, 10:56:47 PM9/5/16
to Lucee
Thanks AJ. Does lucee-spreadsheet read from CSV files? I couldn't see anything in the documentation that indicated it could.

AJ Mercer

unread,
Sep 5, 2016, 11:08:46 PM9/5/16
to lu...@googlegroups.com
Oh, maybe it doesn't then - sorry

Just can save to csv

--
Get 10% off of the regular price for this years CFCamp in Munich, Germany (Oct. 20th & 21st) with the Lucee discount code Lucee@cfcamp. 189€ instead of 210€. Visit https://ti.to/cfcamp/cfcamp-2016/discount/Lucee@cfcamp
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+unsubscribe@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Andrew Penhorwood

unread,
Sep 6, 2016, 7:06:17 AM9/6/16
to Lucee
Can you post a sample of the CVS file?

Andrew penhorwood

Peter Whitby

unread,
Sep 6, 2016, 8:58:46 PM9/6/16
to Lucee
Sample CSV file attached. It has the same problem that breaks the prod app. Lucee error screen below (and attached, just in case). replacing single quote characters with double quotes generates the same error.

In the Detail area below, you can see that the field separator | in record 1 is recognised until the 5th field where the error string (H've) causes Lucee to look for it's matching character in record 3 field 6 (i') before accepting new field separators.


As a work-around, I'm pre-processing the csv file to replace quotes and double quotes with HTML entities (&#39 and &#34), which works fine. I'd prefer it if Lucee could be told to ignore them...

More information:
I put the second single quote in the same field on record 3 which means that the field count is correct and Lucee doesn't fail. However, the data between the two quotes is included in the summary field for record 1 as shown below.
sample.csv
Capture1.PNG
Auto Generated Inline Image 1
Auto Generated Inline Image 2

Andrew Penhorwood

unread,
Sep 7, 2016, 6:58:59 AM9/7/16
to Lucee
I downloaded your sample.csv file and created a test.  I ran it on ACF 10 and it parses correctly.  At this point I think the problem is a bug in how Lucee is handling the delimiters.  I also ran my test on Lucee 4.5.2.018 express.  The express version generated the same error that you listed here.  One possible workaround is to add double-quotes (") around each column in the file creation process.  When I did this to the sample.csv and changed the textqualifier the file processed correctly.

You should open a bug report with Lucee ( https://luceeserver.atlassian.net/secure/Dashboard.jspa ) with a test case so Mica can look into the issue.

Andrew Penhorwood

Peter Whitby

unread,
Sep 8, 2016, 3:47:19 AM9/8/16
to Lucee
Thanks Andrew. I'll raise a bug report.

Regarding the doublequote textdelimiter - that will not work for random doublequotes in the fields. I'll ensure that there's a sample of that in the files I provide with the bug report.

My work around is to write a pre-processor and replace any quote or doublequote with an appropriate html entity (eg &#39).

Thanks - Pete 

Peter Whitby

unread,
Sep 8, 2016, 3:48:46 AM9/8/16
to Lucee
I tried to raise a Jira report. The account creation failed with the following:


Any ideas?

Pete
Auto Generated Inline Image 1

Andrew Penhorwood

unread,
Sep 8, 2016, 6:19:57 AM9/8/16
to Lucee
No idea on why that happened.  Create a new post here about not being able to add a JIRA issue.  At this point only people who are interesting in this thread are reading it.  Did you create an account on the JIRA site?  It has been awhile since I reported anything there.

Andrew Penhorwood

mar...@cubicstate.com

unread,
Sep 8, 2016, 11:49:38 AM9/8/16
to Lucee

As another alternative method, I have used the following function in the past to parse csv style strings. I tested it against your sample file and it seemed to work OK:

<cffunction name="csvToArray" access="public" output="false" returntype="array" hint="Transforms a .CSV file to an array.">
<cfargument name="csvString" type="string" default="" required="true">
<cfargument name="itemDelim" type="string" default="," required="false">
<cfargument name="rowDelim" type="string" default="#chr(13) & chr(10)#" required="false">
<cfargument name="escapeChar" type="string" default="""" required="false">
<cfargument name="escapedEscapeChar" type="string" default="""""" required="false">
<cfscript>
var currentVal = ""; // string containing the value currently being parsed
var inEscapedCell = false; // set to true when we are parsing an escaped cell value
var i = 1;
var currentChar = "";
var rowDelimCheck = "";
var escapeDelimCheck = "";
var escapedEscapeCheck = "";
var aDataSet = arrayNew(1);
var aRowSet = arrayNew(1);
// we parse our string a character at a time
for(i=1; i LTE len(arguments.csvString); i=i+1){
currentChar = mid(arguments.csvString, i, 1);
rowDelimCheck = mid(arguments.csvString, i, len(arguments.rowDelim));
escapeDelimCheck = mid(arguments.csvString, i, len(arguments.escapeChar));
escapedEscapeCheck = mid(arguments.csvString, i, len(arguments.escapedEscapeChar));
// are we in an escaped cell???
if(inEscapedCell){
// we are in an escaped cell, we need to check for our ending escape character
if((NOT escapeDelimCheck IS arguments.escapeChar) OR (escapedEscapeCheck IS arguments.escapedEscapeChar)){
if(escapedEscapeCheck IS arguments.escapedEscapeChar){
currentVal = currentVal & arguments.escapeChar; // log our character
i = i + len(arguments.escapedEscapeChar) - 1;
}else{
currentVal = currentVal & currentChar; // log our character
}
}else{
i = i + len(arguments.escapeChar) - 1;
inEscapedCell = false; // break from our escaped cell
}
}else{
// are we at the end of our cell???
if((currentChar IS arguments.itemDelim) OR (rowDelimCheck IS arguments.rowDelim)){
arrayAppend(aRowSet, currentVal);// commit our current value
currentVal = ""; // get ready for our next value
if(rowDelimCheck IS arguments.rowDelim){
arrayAppend(aDataSet, duplicate(aRowSet));// commit our current row 
aRowSet = arrayNew(1); // get ready for our next row
i = i + len(arguments.rowDelim) - 1;
}
}else{
if(escapeDelimCheck IS arguments.escapeChar){
i = i + len(arguments.escapeChar) - 1;
inEscapedCell = true; // open our escaped cell clause
}else{
currentVal = currentVal & currentChar; // log our character
}
}
}
}
return aDataSet;
</cfscript>
</cffunction>

<cfset csv = fileRead(expandPath("./sample.csv"))>

<cfset parsed = csvToArray(csv, "|", chr(10))>
<cfdump var="#parsed#">


All the best,

Martin

Peter Whitby

unread,
Sep 8, 2016, 11:10:18 PM9/8/16
to Lucee
As I stated in my post the error occurred while trying to create an account. I'll create a new post. Thanks.
Reply all
Reply to author
Forward
0 new messages