Question about number of rows

5 views
Skip to first unread message

Xiaobo.Gu

unread,
Aug 1, 2011, 10:35:35 AM8/1/11
to Nectar user group
Operators like mean require a num of rows parameter for the input
file, but if want to specify all the rows of the file, and I don't
know the row count of the file, is there a short cut?
Specify just the file name does not work:

nectar>mean(1)<<book.csv
line 1:17 mismatched input '<EOF>' expecting '('
11/08/02 05:17:23 WARN mapreduce.JobSubmitter: Use
GenericOptionsParser for parsing the arguments. Applications should
implement Tool for the same.
11/08/02 05:17:23 INFO input.FileInputFormat: Total input paths to
process : 1
11/08/02 05:17:23 WARN conf.Configuration: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
11/08/02 05:17:23 INFO mapreduce.JobSubmitter: number of splits:1
11/08/02 05:17:23 INFO mapreduce.JobSubmitter: adding the following
namenodes' delegation tokens:null
mean is Infinity

madhukara phatak

unread,
Aug 1, 2011, 10:40:55 AM8/1/11
to nectar-u...@googlegroups.com
yes as of now API requires the row count ..because counting no.of rows in a costly operation if the file is big..in future we may try to remove this requirement. If you are using book.csv no.of rows is 30.
--
Regards
Madhukara Phatak

Ted Dunning

unread,
Aug 1, 2011, 12:40:58 PM8/1/11
to nectar-u...@googlegroups.com
That is kind of silly.

Use Welford's method.

madhukara phatak

unread,
Aug 1, 2011, 12:48:53 PM8/1/11
to nectar-u...@googlegroups.com
Even this method requires n value.
--
Regards
Madhukara Phatak

Ted Dunning

unread,
Aug 1, 2011, 1:41:02 PM8/1/11
to nectar-u...@googlegroups.com
It doesn't require it ahead of time.

You can even take the mean of any prefix of an unbounded sequence.  All you need is a count of the samples seen so far.

madhukara phatak

unread,
Aug 1, 2011, 1:44:39 PM8/1/11
to nectar-u...@googlegroups.com
Yeah that's one way of doing it . But here this function says mean of a column,So we are processing full column.In future we will be supporting the partial file processing too.
--
Regards
Madhukara Phatak

Reply all
Reply to author
Forward
0 new messages