Can I read a properties file from HDFS?

857 views
Skip to first unread message

Leonardo Brambilla

unread,
Oct 11, 2012, 10:25:00 AM10/11/12
to cascadi...@googlegroups.com
Hello all, I am working on a project that obtains some metrics from a log file filled with JSON objects. Each line is a JSON object. To validate input I am using a JSON Validator.
As you can imagine this validator uses a schema definition, which is stored in a text file. Right now the text file is in my project's resources dir, which is packaged into the main Jar file. Everything works fine so far.

The schema definition is changing frequently, so I want if possible to put the schema file into HDFS and load that into the Validator. That way, I can update the schema without packaging again and re deploy.
Does Cascading provide an easy way to read a text file from HDFS? like FileSystem package or something like that?
If I use an override file, outside the Jar file, that will not be sent to the cluster right?

I hope to explain myself correctly.

Thanks in advance,

Leonardo

Chris K Wensel

unread,
Oct 11, 2012, 7:00:55 PM10/11/12
to cascadi...@googlegroups.com
the easy way to read a file from hdfs is to use a Hfs tap, see Hfs#openForRead

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/k1ZP9iitJiAJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.


JPatrick Davenport

unread,
Oct 12, 2012, 10:18:11 AM10/12/12
to cascadi...@googlegroups.com
If the file is small, and it sounds like it is, you can leverage the FlowProcess. This comes with Buffers, Aggregators and Filters. You can cast it to the Hadoop version (there might be a way to do this with local too, never tried). ((HadoopFlowProcess) flowprocess).getJobConf();

Once you get the conf, you can use FileSystem.
FileSystem fs;
        try {
            fs = FileSystem.get(conf);
            final FSDataInputStream schema = fs.open(new Path("location/in/HDFS"));
            // your stuff goes here.
        }

Don't close the FS. Hadoop (at least 1.0.3) caches it per conf. If you close that FS, you'll close the FS for your MR job. But this should work.

Chris K Wensel

unread,
Oct 12, 2012, 4:05:14 PM10/12/12
to cascadi...@googlegroups.com
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/zGUDaBcA0ygJ.

To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

JPatrick Davenport

unread,
Oct 16, 2012, 7:27:38 AM10/16/12
to cascadi...@googlegroups.com
Chris,
Would can you make that call from with a Buffer? I though that the taps had to be manipulated at the FlowDef stage.

Chris K Wensel

unread,
Oct 16, 2012, 9:59:43 AM10/16/12
to cascadi...@googlegroups.com

JPatrick Davenport

unread,
Oct 16, 2012, 1:55:30 PM10/16/12
to cascadi...@googlegroups.com
Please make a blog post set about Awesome Cascading Features. I would have never guessed to do that. I look at those features and assume they are for the framework and that Java is just bade at access control.

JPatrick Davenport

unread,
Oct 16, 2012, 2:32:09 PM10/16/12
to cascadi...@googlegroups.com
Ok. So I've tried this. I get an exception when I run in local mode.

Here's the snippet of code. The try/catch removed for brevity.
if(flowProcess instanceof HadoopFlowProcess) {
            cncFile = createGlobbyInput(RuleCNCBenePercent.BAD_BENE, path);
} else {
            cncFile = new Lfs(new cascading.scheme.local.TextDelimited(RuleCNCBenePercent.BAD_BENE), path);
}
final TupleEntryIterator openTapForRead = flowProcess.openTapForRead(cncFile);

I watched the flow using debug. We go into the else branch. An Lfs gets created. We then try to open the tap.

ERROR c.flow.local.planner.LocalStepRunner - unable to prepare operation graph
java.lang.ClassCastException: java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf
    at cascading.tap.hadoop.io.MultiRecordReaderIterator.<init>(MultiRecordReaderIterator.java:78) ~[cascading-hadoop-2.0.5.jar:na]
    at cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.makeIterator(HadoopTupleEntrySchemeIterator.java:57) ~[cascading-hadoop-2.0.5.jar:na]
    at cascading.tap.hadoop.io.HadoopTupleEntrySchemeIterator.<init>(HadoopTupleEntrySchemeIterator.java:44) ~[cascading-hadoop-2.0.5.jar:na]
    at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:405) ~[cascading-hadoop-2.0.5.jar:na]
    at cascading.tap.hadoop.Hfs.openForRead(Hfs.java:78) ~[cascading-hadoop-2.0.5.jar:na]
    at cascading.tap.Tap.openForRead(Tap.java:262) ~[cascading-core-2.0.5.jar:na]

I'm not really sure why the code switch to Hfs.

This code is running in the prepare method of a buffer.

Chris K Wensel

unread,
Oct 16, 2012, 3:28:37 PM10/16/12
to cascadi...@googlegroups.com
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/0tk9gK91UBkJ.

To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.

JPatrick Davenport

unread,
Oct 16, 2012, 3:37:04 PM10/16/12
to cascadi...@googlegroups.com
Ah, so FileTap.
Message has been deleted

Bertrand Dechoux

unread,
Oct 17, 2012, 4:40:47 AM10/17/12
to cascading-user
With regard to the platform (cascading local vs hadoop), it is
important to read the package names.
FileTap is inside cascading.tap.local
HFS is inside cascading.hadoop.tap (and so is Dfs/Lfs/TempHfs)

Each one will only work with the related FlowPlanner (LocalPlanner or
HadoopPlanner)

Regards

Bertrand

On Oct 16, 9:37 pm, JPatrick Davenport <virmu...@gmail.com> wrote:
> Ah, so FileTap.
>
>
>
>
>
>
>
> On Tuesday, October 16, 2012 3:28:41 PM UTC-4, Chris K Wensel wrote:
>
> > Lfs is sub-class of Hfs.
>
> >http://docs.cascading.org/cascading/2.0/javadoc/cascading/tap/hadoop/...
>
> > ckw
>
> > On Oct 16, 2012, at 11:32 AM, JPatrick Davenport <virm...@gmail.com<javascript:>>
> >>>http://docs.cascading.org/cascading/2.0/javadoc/cascading/operation/B...)
>
> >>> providing
>
> >>>http://docs.cascading.org/cascading/2.0/javadoc/cascading/flow/FlowPr...)
>
> >>>http://docs.cascading.org/cascading/2.0/javadoc/cascading/flow/FlowPr...)
>
> >>> On Oct 16, 2012, at 4:27 AM, JPatrick Davenport <virm...@gmail.com>
> >>> wrote:
>
> >>> Chris,
> >>> Would can you make that call from with a Buffer? I though that the taps
> >>> had to be manipulated at the FlowDef stage.
>
> >>> On Friday, October 12, 2012 4:05:14 PM UTC-4, Chris K Wensel wrote:
>
> >>>> or just use
>
> >>>>http://docs.cascading.org/cascading/2.0/javadoc/cascading/flow/FlowPr...)
> > To post to this group, send email to cascadi...@googlegroups.com<javascript:>
> > .
> > To unsubscribe from this group, send email to
> > cascading-use...@googlegroups.com <javascript:>.
> > For more options, visit this group at
> >http://groups.google.com/group/cascading-user?hl=en.
>
> > --
> > Chris K Wensel
> > ch...@concurrentinc.com <javascript:>
> >http://concurrentinc.com

Leonardo Brambilla

unread,
Oct 17, 2012, 11:24:21 AM10/17/12
to cascadi...@googlegroups.com
Wow, I see a nice bunch of replies. I am in a burden at work and will get back here soon. I have something working and reading from HDFS, but it was something implemented by other dev, I will review it and post here accordingly.

Thank you all for the responses.

Leo
Message has been deleted

mfu...@spryinc.com

unread,
Mar 5, 2014, 8:58:26 PM3/5/14
to cascadi...@googlegroups.com
This openTapForRead is pretty useful but suppose I am running flows in a cascade.  When does the code actually run during processing?  Before the query plan creation?

 Suppose the tap I am accessing it tap is generated from a different flow than the one which you access the tap.  Is there any way to incorporate accessing this tap into the flow's dependencies in the cascade?

I thought a workaround might be to just set up a pipe to copy the tap to a sink since cascading requires you to use all of the source taps.  Maybe there is some other method.

On Friday, October 12, 2012 4:05:14 PM UTC-4, Chris K Wensel wrote:
Reply all
Reply to author
Forward
0 new messages