Failing to retrieve recommendations

54 views
Skip to first unread message

Digsss

unread,
Oct 5, 2016, 6:15:34 AM10/5/16
to actionml-user
Hi,

I have 3-4 millions of user. I have deployed engine successfully. I want to get recommendations for users and store it another db. So then I don't need to send every query to PIO.

So I have written python script which fetches users ids from Cassandra(Here is one table which have all userids) and then I am looping through ids for querying to PIO and storing recommendations to Cassandra. 

But problems is that it works for around 20k user ids then it gives following exception and exists. 

Stack Trace:
java
.lang.RuntimeException: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
        at org
.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:94)
        at scala
.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
        at scala
.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at scala
.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala
.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala
.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at scala
.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
        at scala
.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
        at scala
.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala
.collection.AbstractIterator.to(Iterator.scala:1157)
        at scala
.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
        at scala
.collection.AbstractIterator.toList(Iterator.scala:1157)
        at com
.jio.URAlgorithm.getBiasedRecentUserActions(URAlgorithm.scala:503)
        at com
.jio.URAlgorithm.buildQuery(URAlgorithm.scala:333)
        at com
.jio.URAlgorithm.predict(URAlgorithm.scala:318)
        at com
.jio.URAlgorithm.predict(URAlgorithm.scala:102)
        at io
.prediction.controller.P2LAlgorithm.predictBase(P2LAlgorithm.scala:70)
        at io
.prediction.workflow.ServerActor$$anonfun$24$$anonfun$25.apply(CreateServer.scala:516)
        at io
.prediction.workflow.ServerActor$$anonfun$24$$anonfun$25.apply(CreateServer.scala:515)
        at scala
.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala
.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala
.collection.immutable.List.foreach(List.scala:318)
        at scala
.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala
.collection.AbstractTraversable.map(Traversable.scala:105)
        at io
.prediction.workflow.ServerActor$$anonfun$24.apply(CreateServer.scala:515)
        at io
.prediction.workflow.ServerActor$$anonfun$24.apply(CreateServer.scala:493)
        at spray
.routing.ApplyConverterInstances$$anon$22$$anonfun$apply$1.apply(ApplyConverterInstances.scala:25)
        at spray
.routing.ApplyConverterInstances$$anon$22$$anonfun$apply$1.apply(ApplyConverterInstances.scala:24)
        at spray
.routing.ConjunctionMagnet$$anon$1$$anon$2$$anonfun$happly$1$$anonfun$apply$1.apply(Directive.scala:38)
        at spray
.routing.ConjunctionMagnet$$anon$1$$anon$2$$anonfun$happly$1$$anonfun$apply$1.apply(Directive.scala:37)
        at spray
.routing.directives.BasicDirectives$$anon$1.happly(BasicDirectives.scala:26)
        at spray
.routing.ConjunctionMagnet$$anon$1$$anon$2$$anonfun$happly$1.apply(Directive.scala:37)
        at spray
.routing.ConjunctionMagnet$$anon$1$$anon$2$$anonfun$happly$1.apply(Directive.scala:36)
        at spray
.routing.directives.BasicDirectives$$anon$2.happly(BasicDirectives.scala:79)
        at spray
.routing.Directive$$anon$7$$anonfun$happly$4.apply(Directive.scala:86)
        at spray
.routing.Directive$$anon$7$$anonfun$happly$4.apply(Directive.scala:86)
        at spray
.routing.directives.BasicDirectives$$anon$3$$anonfun$happly$1.apply(BasicDirectives.scala:92)
        at spray
.routing.directives.BasicDirectives$$anon$3$$anonfun$happly$1.apply(BasicDirectives.scala:92)
        at spray
.routing.directives.ExecutionDirectives$$anonfun$detach$1$$anonfun$apply$7$$anonfun$apply$3.apply$mcV$sp(ExecutionDirectives.scala:89)
        at spray
.routing.directives.ExecutionDirectives$$anonfun$detach$1$$anonfun$apply$7$$anonfun$apply$3.apply(ExecutionDirectives.scala:89)
        at spray
.routing.directives.ExecutionDirectives$$anonfun$detach$1$$anonfun$apply$7$$anonfun$apply$3.apply(ExecutionDirectives.scala:89)
        at scala
.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala
.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at scala
.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
        at scala
.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala
.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala
.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala
.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
        at org
.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
        at org
.apache.hadoop.hbase.client.AbstractClientScanner$1.hasNext(AbstractClientScanner.java:91)
       
... 47 more
Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 10408566 number_of_rows: 100 close_scanner: false next_call_seq: 0
        at org
.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2057)
        at org
.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:31305)
        at org
.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
        at org
.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
        at org
.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
        at org
.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
        at java
.lang.Thread.run(Thread.java:745)


        at sun
.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun
.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun
.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java
.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at org
.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:97)
        at org
.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:214)
        at org
.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:59)
        at org
.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
        at org
.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:90)
        at org
.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:355)
       
... 48 more


I am not getting where actual problem is?

I am using UR 0.2.3.


Thanks.

Pat Ferrel

unread,
Oct 5, 2016, 1:13:09 PM10/5/16
to Digsss, actionml-user
This is often not a good idea because the standard pipeline uses real time user behavior to make recommendations and you are only using behavior up to the time you query for all recommendations.

In any case it may be that your script is querying faster than your HBase can respond. The usual answer is separate it out or scale it in some way. Since you are not doing anything real time you might try throttling your queries to come slower. 

Again, I wouldn’t recommend this approach. If you have more Cassandra experience and so want to use it, it might be easier to implement PEventStore and LEventStore classes in PredictionIO to support Cassandra. We are thinking about supporting it too since it it somewhat easier to run and setup.


--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/1207eaf1-4f1e-4e09-ab33-aa92454899ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Digsss

unread,
Oct 6, 2016, 2:01:55 AM10/6/16
to actionml-user
Hello Pat,

Thanks for reply. 

  • This is often not a good idea because the standard pipeline uses real time user behavior to make recommendations and you are only using behavior up to the time you query for all recommendations.
We are following this approach because we have at least 4-5 million active users daily(and this is increasing day by day ---might be reach to 100 Million in next few days). We have not have that much knowledge of PIO to scale it up. So we thought we will export pre-computed recommendations to cassandra and serve from there. If you can help us for architecture then we are ready for real time recommendations as well. Can you help us for defining correct architecture for 100 million users? How many servers and what configuration needed to serve 100 Million users in real time?
  • In any case it may be that your script is querying faster than your HBase can respond. The usual answer is separate it out or scale it in some way. Since you are not doing anything real time you might try throttling your queries to come slower
Its not like that querying faster that HBase can respond. I did one trick. I logged failed user ids in one file and after executing python script came to know its giving exception for only some user ids. After 14 hours, script has processed almost 1 Million users data and failed for only 4 user. I also tried to manually querying to PIO with those ids and getting same Exception. I am not getting why its giving Exception for those ids. So if you have any idea about what might be reason, please let me know.

  • Again, I wouldn’t recommend this approach. If you have more Cassandra experience and so want to use it, it might be easier to implement PEventStore and LEventStore classes in PredictionIO to support Cassandra. We are thinking about supporting it too since it it somewhat easier to run and setup.
I will definitely give try for this. 

Pat, It will be very helpful if you help us for setting up architecture for such large user base.

Big thanks for your help.

satya sai

unread,
Sep 20, 2018, 12:44:43 PM9/20/18
to actionml-user
Hello Pat, Can you please guide me on how to implement the cassandra data source for pio. I would like to work on it. TIA

Pat Ferrel

unread,
Sep 20, 2018, 2:15:50 PM9/20/18
to satya sai, actionml-user
PIO is an Apache project so I would seek advice there. We are moving to using Harness for the Universal Recommender for several reasons. To add Cassandra to Harness you would define 2 classes that implement an Abstract API for Dao[T] and SparkDaoSupport[T] with a CassandraDao[T] and CassandraSparkDao[T]. There is a similar ways to add new DBs to PIO but if you are using the UR you may want to know that we will probably be slowly deprecating support for it. Harness has several benefits but is also data compatible with PIO so you will always be able to export from PIO and import to Harness.

That said, if you want to implement Cassandra for PIO I’m sure they would like to help you so try the PIO dev mailing list.
--

You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages