During some SPARC road map discussions, a particular anonymous IBM POWER enthusiast inquires:
How... are 192 S3 cores going to provide x6 throughput of 128 SPARC64-VII+ cores?The Base-Line:
This is a very interesting question... how does one get to 600% throughput increase with the release of the "M4" SPARC processor? One must consider where engineering is moving from and moving towards.
[from] SPARC64-VII+ (code-named M3)
- 4 cores per socket
- 2 thread per core
- 8 threads per socket
[to] M4 S3 Cores (based upon T4 S3 cores)
- 6 cores per socket (conservative)
- 8 threads per core (normative)
- 48 threads per socket
How to Get There:
Core swap results in 6x thread increase... now that we understand this is purely a mathematical question with definitive end result, the question REALLY is:
How can EACH S3 threads perform on-par with a SPARC VII+ thread?Let's speculate upon this question:
- Increasing cores by 50% increase in throughput by 50%
Threads no longer need to perform on-par, although a 50% per-thread increase is projected.
- Increase clock rate to 3.6GHz provides
~300% faster per-thread throughput over T1 threads.
- Out-of-Order Execution
Another significant increase in throughput over T1-T3 threads.
- Increase memory bandwidth over old T processors
Provides opportunity to ~2x socket throughput for instructions & data.
- Increase memory bandwidth in throughput over previous SPARC64 V* series
The movement from VII+ based DDR2 to DDR3 offers throughput improvement opportunity
- Increase cache
Provides faster per-thread throughput opportunities with S3, but could increase thread starvation.
- Decrease cores
Reduce number of cores & threads in a socket, to ensure all thread can run at 100% capacity
Of course, it is not only about the hardware, software has a lot to do with it.
- Produce more database operations (i.e. integer) in hardware, instead of in software, specialized applications such as Oracle RDBMS perform faster on nearly every operation.
- Add compression in hardware
I/O throughput increases 500% to 1000% with no CPU impact.
Oracle 11g RDBMS or Solaris ZFS /w compression hosting any databases see benefit.
- Solaris ZFS support of LogZilla (ZFS ZIL Write Cache)
Regular file system applications experience extremely high file system write throughput.
- Solaris ZFS support of ReadZilla (ZFS Read Cache)
Regular file system applications experience extremely high file system read throughput.
Oracle does not appear that far off, from producing the numbers they suggest, with standard applications. They have the technology to do every step, from thread performance acceleration, to storage performance accleration, to I/O performance acceleration, to file system acceleration, to application performance acceleration.
A cursory and objective look at the numbers demonstrate how it is possible - it is not solely about the cores... but the cores are key to understanding how it is possible.