| The HPC market faces some tough decisions. SGI CTO Dr Eng Lim Goh is backing large single memory and virtualisation to
emerge as major winners.
As SGI’s Chief Technology Officer, I spend around half my time with our Sales Reps – meeting customers, talking about our long-term plans, and listening to the issues they have with today’s systems and their needs for new solutions. This feedback is crucial to the other half of my work – with SGI’s senior engineers, looking for innovations, improvements, breakthrough ideas, and bringing customer feedback into the company to influence our next generation of machines.
SGI has always been customer-driven, but now we are even more-so, especially with major customers who want to be more involved and feel proud that they can help to steer us. Many of our customers have similar requirements, and we also have a smaller group who – either because they’re doing more extreme work or have bigger plans – are starting to talk about systems with upwards of 100,000 or even one million cores. Nevertheless, what everyone, big or small, has to deal with is the new challenge of “many-core” processor chips, and the fact that they can no longer rely on ever-faster clock speeds to give them the performance improvements they’ve come to expect.
Meeting the challenge of many-core
For 20 years, clock speeds kept improving, and the resulting increase in power consumption was kept under control – at least to an extent – by decreasing voltages. Eventually, however, voltages decreased to the point where they couldn’t go much lower, and so the industry had to switch to a different approach, i.e. many-core.
This is bringing a whole new set of problems for the software industry. In the past when your code ran on one processor, if your needs weren’t that extreme and you wanted to process more data, you could just wait for a faster processor to come along. Soon it won’t be that way. So a big issue for the industry is to realise this and put resources into scaling software codes.
In HPC, at the extremes we have two types of applications: embarrassingly parallel such as cluster-based render-farms, which have virtually no communication and so will be fine in this new world of scaling codes; and communications-intensive like fluid dynamics or molecular dynamics applications, where as you move one molecule you affect many things because many things are interrelated.
For communications-intensive applications, if software writers solve the issue of distributing codes, the problem then shifts to us – to build hardware that can deal with the increasing communications between processors resulting from more and more cores. This is why we believe our large single memory approach will become increasingly attractive – even for cluster applications – because as applications are forced to scale to more and more cores, there will be a point where the cluster approach will no longer work, because no matter how fast the processor and cluster interconnects, they will be bogged down dealing with communication. Large single memory can alleviate that problem.
Of the different types of communications-intensive applications, there is one that even large single memory cannot deal with totally – “global collectives”. To deal with these effectively we will have our next-generation Ultraviolet platform, which is due in the second half of 2009. Ultraviolet’s goals are to lower the cost of SGI® Altix® 4700 compared to clusters; and solve the global collective communication problem by adding a hardware MPI Offload Engine (MOE) to offload these collectives from the CPU and run them in parallel.
Already our more “extreme” customers see this as the way forward, because they’ve profiled their codes and seen the worrying trend that a higher and higher percentage of CPU time is spent dealing with communications. In fact, a number of these customers have already pre-ordered Ultraviolet systems from us. Following in the footsteps of these “extreme” customers will be our mainstream customers as their own application core count increases. I don’t think the majority of the industry has fully realised the implications of this yet, and it is therefore a major opportunity for SGI
|