Share This

Wednesday 2 June 2010

China supercomputer design points to future speed kings


China’s new Nebulae Supercomputer is No. 2, right on the Tail of ORNL’s Jaguar in Newest TOP500 List of Fastest Supercomputers


Jack Dongarra, a professor at University of Tennessee's department of electrical engineering, says graphics chips will be used increasingly in supercomputers to boost performance.
(Credit: University of Tennessee)

China has muscled into the No. 2 spot on the list of the world's fastest supercomputers thanks, in part, to specialized Nvidia graphics chips: a technology that Intel is now pursuing to keep pace with this new trend in high-performance computing. 

China's Nebulae supercomputer is located at the recently constructed National Supercomputing Centre in Shenzhen, and achieved 1.271 petaflops/s (1.271 quadrillion floating point operations per second) running the Linpack benchmark, which put it in the No. 2 spot on the widely reported Top500 list. The latest list was formally presented Monday at the International Supercomputing Conference in Hamburg, Germany. (Jaguar, a Cray system at the Oak Ridge National Laboratory in Tennessee, retained the top spot.)

Nebulae achieved this "in part due to its Nvidia GPU (graphics processing unit) accelerators...Nebulae reports an impressive theoretical peak capability of almost 3 petaflop/s--the highest ever on the TOP500," according to a press release Friday.

Though Nebulae also uses Intel Xeon processors, those are so-called commodity processors that are also employed in standard server computers. So, Intel--despite canceling its Larrabee graphics chip project--is pursuing a technology that leverages Larrabee R&D. On Monday, Intel said the first product of this kind, code-named Knights Corner, will be made on its future 22-nanometer manufacturing process--using transistor structures as small as 22 billionths of a meter--to pack more than 50 processing cores on a single chip.

On Tuesday, I spoke with Jack Dongarra, Distinguished Professor at University of Tennessee's Department of Electrical Engineering and Computer Science and director of the Innovative Computing Laboratory. Dongarra introduced the LINPACK Benchmark, which is used as the primary yardstick to measure supercomputer performance.

Q: Are GPU accelerators in supercomputers a trend we'll see more of in coming years?

Jack Dongarra: This looks like this is going to be one of the modes of high-performance computing. Taking commodity processors (such as standard Intel or AMD server-class processors) together with specialized accelerators, in this case graphics processors.

How much do GPUs generally boost performance?

Dongarra: A board by Nvidia can give an order of magnitude greater performance than the commodity processor.

But programs must be written to take advantage of this, it just doesn't happen, correct?

Dongarra: There's nothing automatic about it. You have to write a program that explicitly passes information to the GPU and tells the GPU what to do. That can be easy or hard. In most cases it becomes a challenge to write an efficient program to do the operations. Part of the issue there is that the connection between the commodity part of the computer and the graphics processor is a very thin pipe. So, you have to pass information and think of a very thin straw through which you're passing a lot of information. And once you move it over there, you have to do a lot of operations to gain back any benefit.

And what's the future hold for GPU supercomputing?

Dongarra: Two things will happen. One, the connection will improve slightly. And then ultimately what's going to happen is that the graphics processor is going to be integrated into the commodity processor. So, you'll have a chip that has both the commodity processor's cores plus the graphics processors or an accelerator for doing floating-point arithmetic embedded into the chip itself. It's a path a number of companies are pursuing. Intel is one. AMD is another. Companies would like to pursue that path because it does provide the best performance but it does require another ratchet up in chip design.

Dongarra added that chips have been designed in the past with accelerators, though, of course, the chip-manufacturing technology at the time yielded different results. "There were companies that made these things that attached to mainframes," he said, citing Floating Point Systems, a company founded in 1970.

by Brooke Crothers, 
Brooke Crothers has been an editor at large at CNET News, an analyst at IDC Japan, and an editor at The Asian Wall Street Journal Weekly, among other endeavors, including co-manager of an after-school math-and-reading center. He writes for the CNET Blog Network and is not a current employee of CNET. Disclosure
Newscribe : get free news in real time

China’s new Nebulae Supercomputer is No. 2, right on the Tail of ORNL’s Jaguar in Newest TOP500 List of Fastest Supercomputers



HAMBURG, Germany—China’s ambition to enter the supercomputing arena have become obvious with a system called Nebulae, build from a Dawning TC3600 Blade system with Intel X5650 processors and NVidia Tesla C2050 GPUs. Nebulae is currently the fastest system worldwide in theoretical peak performance at 2.98 PFlop/s. With a Linpack performance of 1.271 PFlop/s it holds the No. 2 spot on the 35th edition of the closely watched TOP500 list of supercomputers.

The newest version of the TOP500 list, which is issued twice yearly, will be formally presented on Monday, May 31st, at the ISC’10 Conference to be held at the CCH-Congress Center in Hamburg, Germany.

Jaguar, which is located at the Department of Energy’s Oak Ridge Leadership Computing Facility, held on to the No. 1 spot on the TOP500 with its record 1.75 petaflop/s performance speed running the Linpack benchmark. Jaguar has a theoretical peak capability of 2.3 petaflop/s and nearly a quarter of a million cores. One petaflop/s refers to one quadrillion calculations per second.

Nebulae, which is located at the newly build National Supercomputing Centre in Shenzhen, China, achieved 1.271 PFlop/s running the Linpack benchmark, which puts it in the No. 2 spot on the TOP500 behind Jaguar. In part due to its NVidia GPU accelerators, Nebulae reports an impressive theoretical peak capability of almost 3 petaflop/s – the highest ever on the TOP500.

Roadrunner, which was the first ever petaflop/s system at Los Alamos in June 2008, dropped to No. 3 with a performance of 1.04 petaflop/s.

At No. 5 is the most powerful system in Europe -- an IBM BlueGene/P supercomputer located at the Forschungszentrum Juelich (FZJ) in Germany. It achieved 825.5 teraflop/s on the Linpack benchmark.
Tianhe-1 (meaning River in Sky), installed at the National Super Computer Center in Tianjin, China is a second Chinese system in the TOP10 and ranked at No. 7. Tianhe-1 and Nebulae are both hybrid designs with Intel Xeon processors and AMD or NVidia GPUs used as accelerators. Each node of Tianhe-1 consists of two AMD GPUs attached to two Intel Xeon processors.

The performance of Nebulae and Tianhe-1 were enough to catapult China in the No.2 spot of installed performance (9.2 percent) ahead of various European countries, but still clearly behind the U.S. (55.4 percent).

Here are some other highlights from the latest list showing changes from the November 2009 edition:
  • The entry level to the list moved up to the 24.7 teraflop/s mark on the Linpack benchmark from 20 teraflop/s six months ago. The last system on the newest list would have been listed at position 357 in the previous TOP500 just six months ago. This replacement rate was far below average. This might reflect the impact of the recession and purchase delays due to anticipation of new products with six or more core processor technologies replacing current quad-core based systems. 
  • Quad-core processor based systems have saturated the TOP500 with now 425 systems using them. However, processor with six or more cores per processor can already be found in 25 systems.
  • A total of 408 systems (81.6 percent) are now using Intel processors. This is slightly up from six months ago (402 systems, 80.4 percent). Intel continues to provide the processors for the largest share of TOP500 systems. The AMD Opteron is the second most common used processor family with 47 systems (9.4 percent), up from 42. They are followed by the IBM Power processors with 42 systems (8.4 percent), down from 52.
  • IBM and Hewlett-Packard continue to sell the bulk of systems at all performance levels of the TOP500. HP lost its narrow lead in systems to IBM and has now 185 systems (37 percent) compared to IBM with 198 systems (39.8 percent). HP had 210 systems (42 percent) six months ago, compared to IBM with 186 systems (37.2 percent). In the system category, Cray, SGI, and Dell follow with 4.2 percent, 3.4 percent and 3.4 percent respectively.
  • IBM remains the clear leader in the TOP500 list in performance with 33.6 percent of installed total performance (down from 35.1 percent), compared to HP with 20.4 percent (down from 23 percent). In the performance category, the manufacturers with more than 5 percent are: Cray (14.8 percent of performance) and SGI (6.6 percent), each of which benefits from large systems in the TOP10.
  • The U.S. is clearly the leading consumer of HPC systems with 282 of the 500 systems (up from 277). The European share (144 systems – down from 152) is still substantially larger then the Asian share (57 systems – up from 51). In Europe, UK remains the No. 1 with 38 systems (45 six months ago). France passed Germany and has now 29 (up from 26). Germany is still now the No. 3 spot with 24 systems (27 six months ago). Dominant countries in Asia are China with 24 systems (up from 21), Japan with 18 systems (up from 16), and India with 5 systems (up from 3).
The TOP500 list is compiled by Hans Meuer of the University of Mannheim, Germany; Erich Strohmaier and Horst Simon of NERSC/Lawrence Berkeley National Laboratory; and Jack Dongarra of the University of Tennessee, Knoxville. For more information, visit www.TOP500.org.

Newscribe : get free news in real time

No comments:

Post a Comment