--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
To post to this group, send email to cascading-user@googlegroups.com.
To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.
I'm running in local mode using the Cloudera distribution of Hadoop. You are correct in that I have one dataset but I want to find tuples where field X in one tuple is the same as field X in another tuple.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/dYcUzExYDwgJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
public class PredicateBuffer extends BaseOperation implements Buffer {
public PredicateBuffer() {
super(new Fields("subject","predicate", "object"));
}
public void operate(FlowProcess flowProcess, BufferCall bufferCall) {
TupleEntry group = bufferCall.getGroup();
Iterator<TupleEntry> arguments = bufferCall.getArgumentsIterator();
while (arguments.hasNext()) {
TupleEntry argument = arguments.next();
////////////////////////
//Debugging block, apparently arguments has only one element and hasNext() returns false after the first run.
int i=0;
String bloc = "" + i;
bufferCall.getOutputCollector().add(new Tuple(bloc,argument.getString("predicate"),argument.getString("object")));
i = i+1;
////////////////////////
if (group.getString("predicate").equalsIgnoreCase("http://www.w3.org/1999/02/22-rdf-syntax-ns#type")) {
if (isDesirable(argument,group)) {
bufferCall.getOutputCollector().add(new Tuple(argument.getString("subject"),argument.getString("predicate"),argument.getString("object")));
}
}
}
}
protected boolean isDesirable(TupleEntry argument, TupleEntry group) {
String argObjStr = argument.getString("object");
String grpObjStr = group.getString("object");
return grpObjStr.equalsIgnoreCase(argObjStr);
}
}
Pipe assembly = new Pipe("Sort");
assembly = new Each(assembly, new Fields("line"), new LineSplitter()); //Splits up lines into Tuple Entries with ("subject","predicate","object") fields.
assembly = new Unique(assembly, new Fields("predicate","object"));
assembly = new GroupBy(assembly,new Fields("subject","predicate","object"), new Fields("predicate","object"));
assembly = new Every(assembly, new Fields("subject","predicate","object"), new PredicateBuffer(),Fields.REPLACE);
Alright so I have been bashing my brain against this problem for a couple of days and I still cannot seem to get the results I want. If I could again receive some of your insight on this I would appreciate it. I think I have diagnosed the problem but I don't know how to fix it.
My iterator appears to only iterate through one TupleEntry, the same one that is received as the group.
assembly = new GroupBy(assembly,new Fields("subject","predicate","object"), new Fields("predicate","object"));
Means that you'll get one group for each unique combination of subject, predict, and object.
Unique(assembly, new Fields("predicate","object"));
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/ZJ1bUKbO7toJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
Ken,My apologies for not having explained very well in the first place. What I think I am trying to do is this: I have a blob of triples each with <"subject","predicate","object">. I want to find all the triples that have the predicate "type". Then I want to take the objects associated with those triples that have the predicate "type" and find all the other predicates associated with them.
My thinking was that I would find all the unique predicate-object combinations to reduce the workload for the rest of the pipe. Then I would apply the buffer to the stream in order to get the desired result. Unfortunately I couldn't just put an Every pipe after an Unique pipe so I stuck a GroupBy in there and configured it so that it maintained the stream from my Unique. I guess that's where stuff got messed up.-Thomas
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/cascading-user/-/d9Q11lXEgkYJ.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.