Using VT for validating CSV data

13 views
Skip to first unread message

Jason Durham

unread,
Jun 2, 2014, 12:22:31 PM6/2/14
to valida...@googlegroups.com
I'm working on a project to validate customer-supplied CSV data.  There are a number of different types of validation, but most of them are currently somewhat simple (regex, inList, required, boolean, rangeLength).  However, there are 4-5 "custom" validations that merely call listFind() on an application-scoped list (numeric values only).  The largest of these "custom" lists is 95 values.

I'm seeing execution times through VT of about 200ms for each row (~30 columns) in the CSV.  I've been asked to find a way to reduce this execution time by half.

Has anyone been in a similar situation?  Have I mentioned any validation types that are inherently slow?  

Jason Durham

John Whish

unread,
Jun 2, 2014, 12:25:18 PM6/2/14
to valida...@googlegroups.com
Eeek, Can you great one huge regular expression to match those 95 potential values so that you're just calling one validator rather than using several?


--
You received this message because you are subscribed to the Google Groups "ValidateThis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to validatethis...@googlegroups.com.
To post to this group, send email to valida...@googlegroups.com.
Visit this group at http://groups.google.com/group/validatethis.
For more options, visit https://groups.google.com/d/optout.

Cameron Childress

unread,
Jun 2, 2014, 1:03:25 PM6/2/14
to valida...@googlegroups.com
On Mon, Jun 2, 2014 at 12:22 PM, Jason Durham wrote:
However, there are 4-5 "custom" validations that merely call listFind() on an application-scoped list (numeric values only).  The largest of these "custom" lists is 95 values.

Lists are (relatively) slow in CF. The longer the list the slower it will get. Convert the list to an array and reference the array instead. It will be much faster.

You may also want to check this out if you haven't already done so:

-Cameron

--
Cameron Childress
--
p:   678.637.5072
im: cameroncf

Jason Durham

unread,
Jun 2, 2014, 1:09:13 PM6/2/14
to valida...@googlegroups.com
I just ran the following code in my local environment (same environment as the 200 ms benchmark).  

<cfset list = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95">
  
<cfset start = getTickCount()>
  
<cfset listFind(list, 5)>
   
<cfoutput>ListFind took #getTickCount() - start# ms</cfoutput>

The output was 0 seconds.  I also ran it on CFLive.net and the output indicated 1 ms.  

Jason Durham


Jason Durham

unread,
Jun 2, 2014, 1:09:49 PM6/2/14
to valida...@googlegroups.com
Typo... my local environment completed in 0 ms.. not seconds.

Jason Durham

Jason Durham

unread,
Jun 2, 2014, 1:12:46 PM6/2/14
to valida...@googlegroups.com
Nevermind.. I'm an idiot... request debugging output was on. When I turned it off, my time dropped to under 50ms.

Cameron Childress

unread,
Jun 3, 2014, 7:54:50 AM6/3/14
to valida...@googlegroups.com
On Mon, Jun 2, 2014 at 1:09 PM, Jason Durham wrote:
I just ran the following code in my local environment (same environment as the 200 ms benchmark).  

List are slow in CF. The difference on a small list is small. On a big list it's big. On a list that you loop over repeatedly it can be bigger still. 

The reason for this is that CF parses the list as a string character by character, counting delimiters as it goes. The more delimiters it has to count the slower it gets. So even on the same list, listGetAt(list,5) will be much faster than listGetAt(list,500). The further down the list CF has to go to find that next list item, the longer it takes. I think this this also applies to lists used as attributes on cfloop tags (but not 100%).

As an anecdote, I had a challenge similar to yours once. A request was taking 30-50 seconds on average to run. This was a shopping cart app and the more items the user put in their cart the slower the app got. This was because they were getting to 2,000-3,000 items in their cart (don't ask why) and the items were all stored as a list in session variable. Once I changed the list to an array, request times went to about 400ms (from 50 seconds!!!!).

But it might not be the lists. I don't have access to your app so I'm just throwing out ideas. It could also be the way you are processing your CSV (are you treating it as a list too maybe???). 

I would try changing the list to an array in your actual app and not a test file and see how it goes. I'd also check out the article I linked to for some helpful info on speeding up CSV processing in general.

-Cameron 

Cameron Childress

unread,
Jun 3, 2014, 7:55:49 AM6/3/14
to valida...@googlegroups.com
Oh - then nevermind. :)

-Cameron

On Mon, Jun 2, 2014 at 1:12 PM, Jason Durham wrote:
Nevermind.. I'm an idiot... request debugging output was on. When I turned it off, my time dropped to under 50ms.

 
Reply all
Reply to author
Forward
0 new messages