You cannot post messages because only members can post, and you are not currently a member.
Description:
User group for Cascading users
|
|
|
FTP Tap
|
| |
Has anyone know if there is an FTP tap ? Or has anyone tried to create one
? I'm interested in the possibility of creating one if not.
Best
Fede
-- Federico Brubacher
@fbru02
|
|
Sink to database in 2.0-WIP
|
| |
Hi,
I'm trying to write Tap and Sheme classes for database-output using 2.0-WIP
using way like used in hadoop-example 'DbCountPageView':
*org.apache.hadoop.mapred.lib. db.DBConfiguration.configureDB (jobConf,
driverClassName, connectionUrl);*
*org.apache.hadoop.mapred.lib. db.DBOutputFormat.setOutput(jo bConf,... more »
|
|
AggregateBy and sort support
|
| |
Hi Chris,
While looking into implementing a FirstBy, I was expecting that I could specify sorting fields in the AggregateBy constructor.
This would then let me efficiently use First as my aggregator.
But AggregateBy currently doesn't let you specify sorting.
In the 1.2.5 code it looks like this would be a trivial change, as it would just get used in the initialize() method when setting up the GroupBy:... more »
|
|
Line numbers in Hadoop
|
| |
Hi,
We're using cascading for validating files submitted by users. We want
to report errors with line numbers to the users. So if they wrote a
string where an int is expected, we'd like to say "Line 45: field X
should be an int".
I understand that hadoop cannot provide this information since it... more »
|
|
Set-Similarity Join on Multiple Attributes
|
| |
I saw a previous post<[link]>regarding set-similarity joins for deduplicating data. How would I implement this fuzzy join on multiple attributes? I am trying to implement an entity resolution solution using Hadoop. I could, for instance, use the... more »
|
|
Crawling using thee scalding framework
|
| |
Hi,
I am looking for a crawling a site .
The contents of apage can have:
1 can have information to extract +
2 some more urls to crawl(pagination)
I tried to follow the bixo tuorial
[link],
but is wondering if there is anything similar in the scalding(by... more »
|
|
Does cascading-jdbc module does connection pooling
|
| |
Hi
I have a requirement of loading aggregated data created by cascading
flow to DB. The file will be of few 100 MB.I was thinking of using
cascading-jdbc but I am worried that it does not lead to DDOS attack
on my DB because of many connections being opened by cascading
program.
Does cascading-jdbc have something like connection pool so that it... more »
|
|
Can pig scripts be run as part of cascading flow
|
| |
Can I execute a pig script from my cascading flow as I have a
requirement of joining data from two files in different format based
on common key which I believe doing in cascading will be difficult. So
I was thinking of using pig script to do the same as part of my
cascading flow
Please suggest... more »
|
|
GroupBy(NONE,FIRST) when input is UNKNOWN
|
| |
Should I be able to do this? I want to Bring everything onto one reducers (Group by Fields.NONE), and then sort by whatever is in the first field. (Fields.FIRST). This works if the incoming fields are known, but fails if they are unknown. cascading.tuple.TupleException : given tuple not same size as position... more »
|
|
|