--
You received this message because you are subscribed to the Google Groups "genie" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genieoss+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Jalez,
Unfortunately you’re having issues at a bad time as the two primary Genie engineers at Netflix (myself and Amit) are out of the office for personal reasons. We’ll probably be unable to get you an in depth answer for a couple of weeks. To give you a relatively quick answer yes Genie 2 has been running at scale (40,000+ jobs per day) for a couple of years now and we haven’t seen the issues you’re describing.
Our RDS is a db.m2.4xlarge running MySQL 5.6.21 with the default mysql 5.6 option and parameter groups so no special tweaks there. On the client side we’re running with a connection pool of size 20 for each Genie node.
As you can see our database is far bigger than yours:
mysql> SELECT TABLE_NAME, TABLE_ROWS FROM `information_schema`.`tables` WHERE `table_schema` = 'genie';
+---------------------+------------+
| TABLE_NAME | TABLE_ROWS |
+---------------------+------------+
| Application | 22 |
| Application_configs | 21 |
| Application_jars | 47 |
| Application_tags | 96 |
| Cluster | 87 |
| Cluster_Command | 9356 |
| Cluster_configs | 4663 |
| Cluster_tags | 4155 |
| Command | 62 |
| Command_configs | 45 |
| Command_tags | 288 |
| Job | 2371443 |
| Job_tags | 9166369 |
+---------------------+------------+
13 rows in set (0.09 sec)
I will say one difference I can see on the surface is that we don’t put in nearly as many tags to our cluster and command tags as you seem to be. What kind of tags are you sending in in your command and cluster criteria? Additionally how are you tagging your clusters and commands?
There may very well be something wrong with the query. You can take a look at how it is working in Genie 2 here. There is likely some inefficiency in using isMember on large tag sets. I vaguely remember Amit writing it this way to avoid multiple queries to the database.
I will say that we’re not actively developing against Genie 2 anymore and are hard at work getting Genie 3 ready for production use at Netflix. The database schema has been optimized to collapse the tags into rows within the tables rather than as join tables (which were auto generated by OpenJPA in Genie 2 rather than static schema in Genie 3 via DDL files). As such the logic for cluster selection was re-written using like statements (see here) and will hopefully be far better performance. This was done because at the scale we were running we DID see large performance degradation on job search due to millions of Job tag to Job record joins.
Hope some of that helps a little bit at least while we’re out. If you see something that can be quickly fixed in Genie 2 PR’s are always welcome. Let me know if there is something else specific I can answer while being out of office and I’ll try to help.
Thanks,
Tom