My question is this even possible with Cascading? And if so could someone please show me the right direction…
contains = new GroupBy(contains, new Fields("source", "target", "relationship"));
contains = new Every(contains, new Fields("source", "target", "relationship"), new Count(), new Fields("source", "target", "relationship", "count"));
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
Pipe p = new Pipe("edges");
p = new GroupBy("document order", p, new Fields("doc_id"));
p = new Each(p, Fields.ALL, new GetRelatedEntities(), new Fields("source", "target", "relationship"));
The Each pipe applies operations that are subclasses of Functions and Filters (described in the Javadoc). For example, using Each you can parse lines from a logfile into their constituent fields, filter out all lines except the HTTP GET requests, and replace the timestring fields with date fields.
Similarly, since the Every pipe works on tuple groups (the output of a GroupBy or CoGroup pipe), it applies operations that are subclasses of Aggregators and Buffers. For example, you could use GroupBy to group the output of the above Each pipe by date, then use an Every pipe to count the GET requests per date. The pipe would then emit the operation results as the date and count for each group.
Pipe p = new Pipe("graph edges");
p = new GroupBy("document order", p, new Fields(doc_id));
p = new Every(p, new GetRelatedEntities(),Fields.ALL);
p = new Each(p, new Insert(new Fields("relationship"), "directed"), Fields.ALL);
FlowDef nerFlowDef = FlowDef.flowDef()
.addSource(p, nerSourceTap)
.addTailSink(p, nerDestinationTap);
Flow nerFlowConnector = flowConnector.connect(nerFlowDef);
nerFlowConnector.complete();
public class GetRelatedEntities extends
BaseOperation<GetRelatedEntities.Context> implements
Aggregator<GetRelatedEntities.Context> {
public static class Context {
HashMap<String, Integer> map = new HashMap<String, Integer>();
}
@Override
public void start(FlowProcess flowProcess,
AggregatorCall<Context> aggregatorCall) {
aggregatorCall.setContext(new Context());
}
@Override
public void aggregate(FlowProcess flowProcess,
AggregatorCall<Context> aggregatorCall) {
TupleEntry arguments = aggregatorCall.getArguments();
Context context = aggregatorCall.getContext();
context.map.put(arguments.getString(1), Integer.parseInt(arguments.getString(3)));
}
@Override
public void complete(FlowProcess flowProcess,
AggregatorCall<Context> aggregatorCall) {
Context context = aggregatorCall.getContext();
//Now for the relations
//loop through the hashmap, capture the first entry, capture the second entry and its value
//compare the distance, if distande between 50 characters then match a relationship
Integer nodeId= 0;
Map.Entry<String, Integer> _node = null;
Map.Entry<String, Integer> _currentnode = null;
//loop through the map and find related entities
HashMap<String, Integer> x = context.map;
Iterator i = context.map.entrySet().iterator();
while(i.hasNext()){
_node = (Map.Entry)i.next();
for(int n =0; n < x.size(); n++){
_currentnode = (Map.Entry)x.entrySet().toArray()[n];
int distance = Math.abs(_currentnode.getValue() - _node.getValue());
if (distance > 0 && distance < 50) {
Tuple result = new Tuple();
result.addString(_node.getKey());
result.addString(_currentnode.getKey());
aggregatorCall.getOutputCollector().add(result);
}
}
}
}
}
Hi,Today I showed this to a collega of my, he was very exiting to see this working.But he had a few good questions which I couldn't answer...well not one of them not.The first question he had was can we put a weight to the relationship, by simple counting the occurrences, solved it like thiscontains = new GroupBy(contains, new Fields("source", "target", "relationship"));
contains = new Every(contains, new Fields("source", "target", "relationship"), new Count(), new Fields("source", "target", "relationship", "count"));don't know if this is the right solution but it solved the problem for me..
The second question he had was a bit complicated, he was wondering if you could also create relationships between two entities? Let's say we would want know which entitiesknows one another.For the test case we defined that entities who are 50 characters apart from one another have a relationship.Looking at the entitielist this would mean that within document with the id of 1 the following two entities have a relationshipdoc_id name type offset length1 John Person 100 41 Joe Smith Person 110 8Does cascading has a way of comparing tuples with each other or is this something cascading cannot do?
I have been trying for a couple of hours now and I don't have a clue…
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.