Exadata X9-2

0 views

Skip to first unread message

Salvador Baltimore

unread,

Aug 5, 2024, 3:57:46 AM8/5/24

to noralurdia

Asyou can see from the chart above, the disk and flash storage capacity, the CPU core counts, database node memory capacity, and Ethernet bandwidth into the Exadata clusters grew steadily in the first decade of products. The Exadata X7-2 and X7-8 systems were unveiled in October 2017, and Oracle had thousands of customers in all kinds of industries that had yanked out their big NUMA machines running the Oracle database (the dominant driver of Unix machines three decades ago, two decades ago, a decade ago, and today) and replaced them with Exadata iron.

That little bit of history brings us to the tenth Exadata generation from Oracle: the X9M-2 and X9M-8 systems announced last week, which are offering unprecedented scale for running clustered relational databases.

The HC hybrid disk/flash and EF all-flash storage servers are based on a two-socket server node employing a pair of Ice Lake Xeon SP 8352Y processors, which have 16 cores each running at 2.2 GHz. The HC node has 256 GB of DDR4 DRAM that is extended with 1.5 TB of Optane 200 Series persistent memory, which is configured to act as a read and write cache for main memory. The HC chassis has room for a dozen 18 TB, 7.2K RPM disk drives and four of the 6.4 TB NVM-Express flash drives. The EF chassis has the same DDR4 and PMEM memory configuration, but has no disks at all and eight of the 6.4 TB NVM-Express flash cards. Both storage server types have a pair of 100 Gb/sec switches to link into the fabric with each other and to the database servers.

The first thing to note is that even though 200 Gb/sec and 400 Gb/sec Ethernet (with RoCE support even) is available in the market and certainly affordable (well, compared to Oracle software pricing for sure), you will note that Oracle is sticking with 100 Gb/sec switching for the Exadata backplane. We would not be surprised if the company was using cable splitters to take a tier out of the 200 Gb/sec switch fabric, and if we were building a large scale Exadata cluster ourselves, we would consider using a higher radix switch and buying a whole lot fewer switches to cross connect the database and storage servers. A jump to 400 Gb/sec switchery would provide even more radix and fewer hops between devices and fewer devices in the fabric.

An Exadata rack has 14 of the storage servers, with a usable capacity of 3 PB for disk, 358 TB of flash plus 21 TB of Optane PMEM for the HC storage and 717 TB of flash and 21 TB of Optane for the EF storage. The rack can have two of the eight-socket database servers (384 cores) or eight of the two-socket servers (512 cores) for the database compute. If you take some of the storage out, you can add more compute to any Exadata rack, of course. Up to a dozen racks in total can be hooked into the RoCE Ethernet fabric with the existing switches that Oracle provides, and with additional switching tiers even larger configurations can be built.

As for performance, a single rack with the Exadata X9M-8 database nodes and the hybrid disk/flash HC storage can do 15 million random 8K reads and 6.75 million random flash write I/O operations per second (IOPS). Switching to the EF storage, which is good for data analytics work, a single rack can scan 75 GB/sec per server, for a total of 1 TB/sec on a single rack, which had three of the eight-node database servers and eleven of the EF storage servers.

Finally, Oracle is still the only high-end server maker that publishes a price list for its systems, and it has done so for each and every generation of Exadata machines. You can see it here. A half rack of the Exadata X9M-2 (with four of the two-socket database servers and seven storage servers) using the HC hybrid disk/flash storage costs $935,000, and the half rack with the EF all-flash storage costs the same. So $1.87 million per rack. That cost is just for the hardware, not the Oracle database or RAC clustering software or any of the other goodies companies need to make this a Database Machine, as its other name is. And that software is gonna cost ya, but no more than it does on big Unix and big Linux and big Windows iron that Exadata is meant to replace. Our guess, at least for the initial system, is that it costs quite a bit less when you move from some other system to Exadata.

The procurement of Oracle database is of whole awesome of big organization that requires networking of share and client server, it thus compliment metadata and exadata to conclusive create a larger organization that can run query of input and output of operating system of database

So the team is thinking of splitting it into two parts1)OLTP type use case in which we will persist/write the transaction data faster and show it to the UI related apps , in near real time/quickest possible time. and this database will store Max 60-90 days of transaction data. Not sure if we have an option of Oracle exadata equivalent on AWS, so team planning of using/experimenting with Aurora postgres. Please correct me, if there are any other options we should use otherwise?

Is the above design is okay? and also in regards to the second point above i.e. persisting the historical data (that to be in queryable state), should we go for some database like snowflake or should just keep it on S3 as is and make those queryable through APIs. Please advice?

Overall the approach of using Aurora for a highly OLTP system is the right approach. You can leverage your AWS account team who can bring in additional resources as needed from within AWS and also professional services like certified partners if needed. Your AWS team can work with you to come up with an initial design for your overall architecture, and subsequently can conduct technical deep dive via Workshops and Immersion Days so you gain a better understanding of particular AWS services. Also, a PoC can go a long way to validate the capabilities of a service using you own data OR even a pilot/prototype where you pick an end-to-end use case and not only design but also build it out.

As far as historical data goes, I suggest you leverage the S3 capabilities for data archival and do not narrow down to a singular service like Snowflake just yet. Having data on S3 in open formats, be it file formats like Parquet or table formats like Iceberg, gives you the freedom to choose any compute of your choice for querying the data. You can choose Athena, EMR, Redshift or even partner services like Snowflake - pick whichever service meets your price-performance needs and feel free to swap one out for another if required.

Thanks you so much.Heard about Vaccuming to be a problem in highly transactional system in postgresql. So , is that still problem in Aurora postgresql or that has been take care of? And also higher number of partitions causing issues in parsing time for queries, something highlighted in this blog. -partition-pains-lockmanager-waits

Amazon Aurora is being used by customers big and small (refer ) to successfully implement highly transactional system. You can connect to an AWS Specialist using -us/sales-support-rds/ and learn more about Amazon Aurora.

Cache databases are intended for caching data and the cost-per-gb is very different for those compared to OLTP databases. AWS purpose built services like Amazon Aurora are specialized for OLTP use case.

Also want to understand, as we used to see the data dictionary views (called AWR views) in Oracle to see the current and historical performance statistics like CPU, IO , Memory usage, object level contentions etc. in the oracle database. Do we have such exposure to the views available (apart from performance insights UI tool) in Aurora postgresql, so as to manually fetch the performance and get some idea of how well the load test goes and what capacity is available or are we saturating it?

To test if Aurora postgresql will be comparable to cater the above needs (in regards to the expected performance with nominal cost) ,how should we test it? As we won't be able to test everything right away, Should we test basic read and write performance and benchmark to have some confidence and go ahead with development?

And another question coming to mind, I read in past Vaccum to be a problem in postgresql, is going to give trouble in Aurora prostgresql too, for such a highly transactional read/write system? How to validate that?