Just wanted to thank everyone for all the feedback and comments on the
list for the last few months.
I'm beginning planning for a minor release that will add some
incremental features while attempting to keep backwards compatibility.
My first question to everyone is how important is backward
compatibility in a 1.x release? That is, API compatibility and
semantic compatibility (changing how a function behaves slightly, for
example).
Clearly maintenance releases (1.x.y) should remain compatible in both
areas (unless the semantic changes are actually bug fixes, like what
happened to our date operations).
Further, what changes or features would you like to see in the next
minor or possibly next major release?
For example, we are thinking of adding Fields.REPLACE. This would
allow field replacement to be inlined directly by a Pipe, vs the
traditional field 'drifting' through renaming.
Please feel free to reply to the list, or directly to me.
cheers,
chris
--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/
my pleasure!
all these suggestions are great. let me see what I can do with them..
ckw
Keep your eyes open. Just might have something Real Soon Now.
to your points specifically, you want to use the local HDFS as your
default in all your jobs, and only integrate with S3 to pull/push the
data that needs to live longer than your cluster.
So just use Hfs and relative paths everywhere, except when that data
is in S3 or must go to S3 (new Hfs( "s3n://....." ))
And my recommendation is to use s3n:// not s3://, this way other apps
can get at the data (s3cmd, http://, etc). The drawback is that you
must consider that on input, you can only have one mapper for every
file being read from S3 (in the first MR job in your Flow).
p.s., there is always this too..
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2293&categoryID=263
ckw
check out
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2440
A Cascading app written by Amazon for CloudFront.
ckw
--
Chris K Wensel
That's really cool!
cheers,
chris
Yes, it works exactly as you expect. The FileNameFilteredHfs calls
FileInputFormat.setInputPathFilter internally.
Pavel