Wednesday, June 27, 2012

The Processor Market: POWER #1 HPC

The Processor Market: POWER #1 HPC

Abstract:
During June 2012, some very interesting updates happened - some Open Source pieces from Sun and Oracle were combined with the POWER processors to build a new Super Computer. An odd result: IBM's POWER required a lot more sockets to outrun Fujitsu SPARC64... but did so with better power efficiency and using arch-rival Sun Microsystem's (now Oracle's) open source technology.

[wiring 123% more sockets for 55% greater performance, courtesy The Register]
IBM Denies Fujitsu's SPARC64 Year Long #1 HPC Rank!
With a long list of losses, IBM's POWER architecture finally has a win: proprietary IBM POWER architecture now has a #1 HPC Performance spot with Lustre under ZFS - denying Fujitsu their nearly 1 year long spot as the fastest computer in the world, with Fujitsu's fork of Lustre clustered filesystem!

[zfs write performance under linux with lustre, courtesy Lawrence Livermore Laboratory]
Whamcloud, Lustre, and Sequoia Supercomputer

The Lustre clustered/distributed filesystem, formerly owned by Sun Microsystems, now owned by Oracle. It has long been promised to be merged into ZFS. Whamcloud is a commercial enterprise which develops a fork of the Lustre file system. They announced the release of Chroma Enterprise, to bring enterprise management to Lustre.

Whamcloud is using a non-kernel emulated ZFS fork from OpenSolaris. The ZFS implementation still shows linear scalability (in comparison to the native Linux filesystem), as the load increases.

The Sequoia Supercomputer, run by the United States Department of Energy, has an interesting feature - the use of a merged Sun's  ZFS and Sun's Lustre filesystem. Here is a short 30 minute video talking to the PDF from the Lustre User Group (LUG) 2012.

IBM's Tortoise vs Fujitsu's Hare
IBM needed 123% more proprietary POWER CPU sockets to outrun Fujitsu's open SPARCv9 SPARC64 architecture by a mere 55%. The IBM POWER solution proved itself to be about 23% more power efficient, which is truly an achievement, considering how many more sockets were required. The tortoise POWER processor takes less energy than the hare SPARC64 processor.

Fujitsu SPARC64 Loses The Battle of the Alamo
This is somewhat a Pyrrhic victory, kind of like winning the Battle of the Alamo. Could any 1 year old platform hold it's performance position, when the new opposition has a 123% numeric advantage?

This victory was a solid win for IBM, from a supercomputer to supercomputer perspective, but there is an odd conclusion that some people may notice: each SPARC64 old socket appears to demonstrate a minimum of 123% faster than each new POWER socket.

Considering that each SPARC64 socket was an 8 core processor socket, in comparison to the 18 core POWER processor socket (of which 16 cores is usable) - each SPARC64 core is roughly 243% faster than each POWER core!

Fujitsu's SPARC64 Other Battle FrontsThe battles have been continuous since 2011:
SPARC continue to be on the map, in new locations, as well as eating IBM POWER's lunch in smaller installations - for very good reason. The new 16 core SPARC64 chips offer double the performance, in the same socket, making POWER look pale, in comparison.


Better Options for Super Computers
IBM's main processor is POWER with it's main OS being AIX. AIX is lacking a modern file system. IBM had a second operating system option, Linux, but it was lacking a modern file system. IBM briefly toyed with the idea of purchasing Sun Microsystems, before Oracle made the final purchase. AIX and Linux choices on POWER were lacking.

Why was the choice made to emulate ZFS? The licensing in Linux is so restrictive that ZFS could not be combined with the Linux kernel, so it had to be emulated in userland. Why did IBM use Lustre instead of IBM's own GPFS clustered file system? Cost may be a factor and Lustre is basically the defacto standard in High Performance Computing.

Lustre was going to be merged into ZFS by Sun Microsystems, after it's acquisition in 2007. The use of Lustre support directly from Oracle, without hardware, came to an end shortly after the purchase of Sun Microsystems by Oracle. Oracle limited the support of Lustre to Oracle hardware in 2010.

Code changes to OpenSolaris were delivered for Lustre friendliness - the movement to complete Lustre with ZFS under Illumos in kernel space could have offered better performance over user space ZFS, fewer system calls would be required at the emulation layer. Illumos could have delivered native performance on the IBM POWER Sequoia or the Fujitsu SPARC64 K Supercomputer.

Fujitsu, being the SPARC64 creator, was more than capable of delivering their drivers into the Illumos market, had Illumos been interested in SPARC. Clearly, pushing IBM to adopt forks of Oracle's Solaris ZFS and Oracle's Lustre was still pretty aggressive, perhaps pushing them all the way to adopt Illumos, a fork of Solaris, was a bridge too far (especially, after a failed Solaris acquisition.)
Conclusions
With some in the Illumos community seemingly less interested in POSIX subsystems, pulling out SVR4 features, disinterested in non-Intel distributions - some are asking the question the value of Illumos without the differentiators of ZFS and DTrace with an OS like Linux.

With POWER sitting as #1, SPARC64 as #2, and ARM growing with increasing market prevalence - the window for Illumos relevance may be closing if they don't start actively supporting some non-x64 architectures, as their differentiating features get ported to competing OS's.

IBM's POWER has long tried to demonstrate their superiority in per-socket or per-core performance. The POWER platform uses 18 core's per socket while Fujitsu uses 8 cores per socket - so each POWER core is vastly slower than a Fujitsu SPARC64 core.

IBM long tried to demonstrate their superiority of technologies to companies like Sun and Oracle, yet at the core of their super computer was ZFS and Lustre - in order to compete in this arena, former Sun Microsystem (now Oracle) technology was used, to scale their solution.

A non-IBM operating system, running a fork of Oracle Solaris ZFS, and running a fork of Oracle Lustre is not the way some might want to advertize an IBM POWER architecture (which normally runs IBM AIX operating system with IBM GPFS file system.)

No comments:

Post a Comment