Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Bug in TextDelimited Scheme for quotes char support - For Some combination

Received: by 10.50.203.99 with SMTP id kp3mr11895747igc.0.1328634631221;
        Tue, 07 Feb 2012 09:10:31 -0800 (PST)
X-BeenThere: cascading-user@googlegroups.com
Received: by 10.50.61.40 with SMTP id m8ls5958007igr.3.canary; Tue, 07 Feb
 2012 09:10:29 -0800 (PST)
Received: by 10.68.213.68 with SMTP id nq4mr14615778pbc.2.1328634629624;
        Tue, 07 Feb 2012 09:10:29 -0800 (PST)
Received: by 10.68.213.68 with SMTP id nq4mr14615776pbc.2.1328634629613;
        Tue, 07 Feb 2012 09:10:29 -0800 (PST)
Return-Path: <ch...@wensel.net>
Received: from mxout-08.mxes.net (mxout-08.mxes.net. [216.86.168.183])
        by gmr-mx.google.com with ESMTPS id e6si25449704pbt.1.2012.02.07.09.10.29
        (version=TLSv1/SSLv3 cipher=OTHER);
        Tue, 07 Feb 2012 09:10:29 -0800 (PST)
Received-SPF: neutral (google.com: 216.86.168.183 is neither permitted nor denied by best guess record for domain of ch...@wensel.net) client-ip=216.86.168.183;
Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 216.86.168.183 is neither permitted nor denied by best guess record for domain of ch...@wensel.net) smtp.mail=ch...@wensel.net
Received: from [192.168.1.105] (unknown [108.94.26.174])
	(using TLSv1 with cipher AES128-SHA (128/128 bits))
	(No client certificate requested)
	by smtp.mxes.net (Postfix) with ESMTPSA id D329750A5D
	for <cascading-user@googlegroups.com>; Tue,  7 Feb 2012 12:10:27 -0500 (EST)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1257)
Subject: Re: Bug in TextDelimited Scheme for quotes char support - For Some combination
From: Chris K Wensel <ch...@wensel.net>
In-Reply-To: <75d31318-2f06-46bb-85c3-8684a2361a79@z31g2000vbt.googlegroups.com>
Date: Tue, 7 Feb 2012 09:10:27 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <100EFB4F-F94B-4304-90FE-3ED7A148E6EB@wensel.net>
References: <75d31318-2f06-46bb-85c3-8684a2361a79@z31g2000vbt.googlegroups.com>
To: cascading-user@googlegroups.com
X-Mailer: Apple Mail (2.1257)

I'll see if I can add this as a test and resolve it simply, but know =
TextDelimited is only intended to the best case plus some variations. =
Otherwise the regex we use would be too slow for any particular use =
other than passing the tests.

If data is fairly complicated, it makes sense to build your own parsing =
rules in the Flow to cleanse the data.

chris

On Feb 7, 2012, at 6:15 AM, Mayilazhagan K wrote:

> Hi,
>=20
> I have a .csv file delimited by ,. Some data fields contain comma as
> part of the value. However they are seperated by quotes. i.e only
> fields which contain comma as part of data value is seperated by
> double quotes. Rest of the fields are not in double quotes. For the
> below combination the row fields are extracted wrongly.
>=20
> "a",b,,"d1,d2",3
>=20
> I have tested this combination with TextDelimitedTest present in
> Cascading-Test Project for the test method
> testQuotedTextAll and i am getting an error.
>=20
> I have replaced data line 3 with my data.
>=20
> delimited.txt
> ------------------
>=20
> foo,bar,baz,bin,1
> foo,"bar",baz,bin,2
> "a",b,,"d1,d2",3
> foo,"bar"",bar",baz,bin,4
> foo,"bar"""",bar",baz,bin,5
> ,"",baz,,6
> ,,,,7
> foo,,,,8
> ,"",,,9
> "f",,,,"10"
> "f",,,",bin","11"
> "f",,,",bin","11"
>=20
> Below is the error in parsing.
>=20
> 2012-02-07 19:32:21,155 WARN  mapred.LocalJobRunner
> (LocalJobRunner.java:run(256)) - job_local_0001
> cascading.operation.OperationException: number of input tuple values:
> 4, does not match destination array size: 5
> 	at cascading.tuple.Tuples.asArray(Tuples.java:48)
> 	at cascading.scheme.TextDelimited.sink(TextDelimited.java:670)
> 	at cascading.tap.Tap.sink(Tap.java:280)
> 	at
> =
cascading.flow.stack.SinkMapperStackElement.operateSink(SinkMapperStackEle=
ment.java:
> 95)
> 	at
> =
cascading.flow.stack.SinkMapperStackElement.collect(SinkMapperStackElement=
.java:
> 72)
> 	at =
cascading.flow.stack.FlowMapperStack.map(FlowMapperStack.java:220)
> 	at cascading.flow.FlowMapper.map(FlowMapper.java:75)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at =
org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 	at org.apache.hadoop.mapred.LocalJobRunner
> $Job.run(LocalJobRunner.java:177)
> 2012-02-07 19:32:25,799 WARN  flow.FlowStep
> (FlowStep.java:logWarn(643)) - [pipe] task completion events identify
> failed tasks
>=20
> Is this bug addressed and a fix is available?
>=20
> I am using Cascading 1.2.3 currently.
>=20
> Thanks,
> Mayilazhagan.K
>=20
> --=20
> You received this message because you are subscribed to the Google =
Groups "cascading-user" group.
> To post to this group, send email to cascading-user@googlegroups.com.
> To unsubscribe from this group, send email to =
cascading-user+unsubscribe@googlegroups.com.
> For more options, visit this group at =
http://groups.google.com/group/cascading-user?hl=3Den.
>=20

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com