Hi Sohil,
I'm using 2.4M3
I found the push server performance issues caused by a processing logic.
In org.openmobster.core.common.bus class / common / src / main / java / org / openmobster / core / common / bus / BusConsumer.java.
public void run()
{
do
{
try
{
Map<String,Bus> activeBuses = Bus.getActiveBuses();
if(activeBuses == null)
{
continue;
}
Collection<Bus> buses = activeBuses.values();
for(Bus bus:buses)
{
try
{
this.consume(bus);
}
catch(Throwable t)
{
//just eat this one....will try again..for this particular Bus
}
}
}
catch(Throwable t)
{
//something went wrong....but no need to abort the thread
}
}while(!exit);
}
private void consume(Bus bus)
{
ClientSessionFactory sessionFactory = Bus.getSessionFactory();
ClientSession session = null;
ClientConsumer messageConsumer = null;
String uri = bus.getUri();
try
{
session = sessionFactory.createSession();
session.start();
messageConsumer = session.createConsumer(uri);
ClientMessage message = messageConsumer.receive(1000);
if(message != null)
{
boolean isStartedHere = TransactionHelper.startTx();
try
{
SimpleString msg = (SimpleString)message.getProperty("message");
BusMessage busMessage = (BusMessage)XMLUtilities.unmarshal(msg.toString());
busMessage.setAttribute("hornetq-message", message);
this.sendBusListenerEvent(bus,busMessage);
if(isStartedHere)
{
TransactionHelper.commitTx();
}
}
catch(Exception e)
{
e.printStackTrace();
if(isStartedHere)
{
TransactionHelper.rollbackTx();
}
}
}
}
catch(HornetQException hqe)
{
ErrorHandler.getInstance().handle(hqe);
throw new SystemException(hqe.getMessage(),hqe);
}
finally
{
if(messageConsumer != null && !messageConsumer.isClosed())
{
try
{
messageConsumer.close();
}
catch(HornetQException hqe)
{
ErrorHandler.getInstance().handle(hqe);
throw new SystemException(hqe.getMessage(),hqe);
}
}
if(session != null && !session.isClosed())
{
try
{
session.stop();
session.close();
}
catch(HornetQException hqe)
{
ErrorHandler.getInstance().handle(hqe);
throw new SystemException(hqe.getMessage(),hqe);
}
}
}
}
BUS queue processing , first load the entire user records from the user table in the code.
Then the column by users BUS round robin processed to extract the message records for processing.
ClientMessage message = messageConsumer.receive (1000);
Handling code to extract 1000MS here were made when the message queue .
When the number of users is not much, not a big problem ;
I've tested , when the number of users more than 10,000 , pushing the arrival time more than one hour ;
That is, all users have a round robin over time may be necessary for the 10000 * 1S = nearly 3 hours ( 10,000 users in a lot of cases , if the user is not online under ) .
I tested under 100,000 users , I modify the source code ClientMessage message = messageConsumer.receive (1); delay time modified to 1MS
In this case , the time the user can push to receive up to 5 minutes.
Now the number of users on the server we reached one million users. Under modified ClientMessage message = messageConsumer.receive (1) of the cases, the system user queue polling data push again
In this case , the user receives the necessary time to push close to six hours .
This greatly affects the performance.
I tried to modify the optimized code , but found a lot of related code more difficult to modify .
My idea is to resolve this in a large number of users , the solution to performance problems :
User data taken during the round robin queue :
Map<String,Bus> activeBuses = Bus.getActiveBuses();
if(activeBuses == null)
{
continue;
}
Collection<Bus> buses = activeBuses.values();
for(Bus bus:buses)
{
try
{
this.consume(bus);
}
catch(Throwable t)
{
//just eat this one....will try again..for this particular Bus
}
}
Map<String,Bus> activeBuses = Bus.getActiveBuses();
Here is the full amount of the queue to take out the round robin user data (including the number of users online and offline).
Statistical data from our server, and 1,000,000 registered users, average peak of about 20,000 regular online, if the transition deal when removing queue list, only polling online users,
That system peak of 20,000 concurrent users only need to be able to round robin deal again about 20S.
In case the user is not online, push polling is actually removing invalid data, operating system and take up a lot of time.
How to better optimize this problem, if possible, please update the next code.
Thanks and regards,