CLCAR, Panama R.P. · Tsubame 2.0 4,224 Tesla GPUs + 2,816 x86 CPUs 12,784 x86 CPUs Hopper- NERSC...

CLCAR, Panama R.P.Michael P. LasenNVIDIA Professional Solutions GroupLatin America

NVIDIA Processors

GeForceGeForceTMTM y QUADRO y QUADROTM TM

VISUAL COMPUTINGVISUAL COMPUTING TESLATESLATMTM

SUPERCOMPUTINGSUPERCOMPUTINGTEGRATEGRATMTM

MOBILEMOBILE

¿Por qué estamos hablando de cómputo de alto desempeño,

y cómo afecta o puede afectar a tu vida?

La investigacion cientifica requiere 1,000X mas poder computacional.

Energia Renovable Medicina Personalizada

Herramientas para Descubrimiento

Scientifica

Manejo de Informacion Complejo

Maquinas Que Piensan

Interracion Humano Natural con Maquinas

Prediccion de Cambios Ambientales

Analisis Economico y Financiero

Example: Drug DiscoverySimulating Single Bacteria

1982 1997 2003 2006 2010 2012

1,000,000,000

1,000,000

1,000

1

Gigaflops

Estrogen ReceptorEstrogen Receptor36K atoms36K atoms

F1-ATPaseF1-ATPase327K atoms327K atoms

RibosomeRibosome2.7M atoms2.7M atoms

ChromatophoreChromatophore50M atoms50M atoms

BPTIBPTI3K atoms3K atoms

BacteriaBacteria100s of 100s of

ChromatophoresChromatophores

1 ExaFLOPS

1 PetaFLOPS

Ran for 8 months to simulate 2 nanoseconds

1 TeraFLOPS

• HPC es un enfoque nacional…a nivel mundial. Los gobiernos estan invertiendo fuertemente, y con razon.

• HPC es usado hoy mas que nunca para descubrimientos scientificos tanto en sectores privados como publicos.

• HPC es mas barato que nunca. Esta disponible y al alcance de casi cualquier. Con avances en tecnologias heterogeneas, la densidad de cores eficientes es muy alto, y el costo por FLOP es muy bajo.

• HPC es una ventaja competitiva en academia. Fondos para investigacion cada vez mas dependen de esta tecnologia.

HPC: por que te debe importar….

Folding@home 6.1 PFLOPS

MilkyWay@Home 700 TFLOPS

SETI@Home 540 TFLOPS

Einstein@Home 260 TFLOPS

GIMPS 86 TFLOPS

HPC: ya lo estas usando?

mailto:Folding@home

http://en.wikipedia.org/wiki/MilkyWay@Home

http://en.wikipedia.org/wiki/SETI@Home

http://en.wikipedia.org/wiki/Great_Internet_Mersenne_Prime_Search

A Whole Yotta FLOPS

NOMBRE FLOPS

yottaFLOPS 1024

zettaFLOPS 1021

exaFLOPS 1018

petaFLOPS 1015

teraFLOPS 1012

gigaFLOPS 109

megaFLOPS 106

kiloFLOPS 103

http://en.wikipedia.org/wiki/Yotta-

http://en.wikipedia.org/wiki/Zetta-

http://en.wikipedia.org/wiki/Exa-

http://en.wikipedia.org/wiki/Peta-

http://en.wikipedia.org/wiki/Tera-

http://en.wikipedia.org/wiki/Giga-

http://en.wikipedia.org/wiki/Mega-

http://en.wikipedia.org/wiki/Kilo-

A computer system capable of reaching performance in excess of one petaFLOPS.

One quadrillion floating point operations per second.

Petascale (hoy).

http://en.wikipedia.org/wiki/Quadrillion

http://en.wikipedia.org/wiki/Floating_point

http://en.wikipedia.org/wiki/Second

One exaFLOP is a thousand petaFLOPS.10^18 FLOPS

Exascale = Petascale x 1,000

NOMBRE FLOPS

yottaFLOPS 1024

zettaFLOPS 1021

exaFLOPS 1018

petaFLOPS 1015

teraFLOPS 1012

gigaFLOPS 109

megaFLOPS 106

kiloFLOPS 103

http://en.wikipedia.org/wiki/Petaflops

http://en.wikipedia.org/wiki/Yotta-

http://en.wikipedia.org/wiki/Zetta-

http://en.wikipedia.org/wiki/Exa-

http://en.wikipedia.org/wiki/Peta-

http://en.wikipedia.org/wiki/Tera-

http://en.wikipedia.org/wiki/Giga-

http://en.wikipedia.org/wiki/Mega-

http://en.wikipedia.org/wiki/Kilo-

GPUGPUCPUCPU

Computo Heterogeneo.Computo Heterogeneo.Acelera Aplicaciones.Acelera Aplicaciones.

1.4 Megawatts2060 Casas en Japon

La SC Petaflop mas Verde del mundo

Tsubame 2.0

4,224 Tesla GPUs + 2,816 x86 CPUs 12,784 x86 CPUs

Hopper- NERSCHopper- NERSC

4.0 MegaWatts5860 Casas en Japon

Dos SC’s Construidas al Mismo Tiempo

Worldwide GPU Supercomputer Momentum

Tesla GPUsLaunched

First Double

Precision GPU

Tesla 20-series

(Fermi)Launched

Who Uses GPU Supercomputing?

Chinese Academy of Sciences

Edu/Research Edu/Research

Air Force ResearchLaboratory

Naval ResearchLaboratory

Government GovernmentOil & Gas Oil & GasMax Planck Institute

Mass GeneralHospital

Life Sciences Life Sciences Finance Finance Manufacturing Manufacturing

What Commercial Apps are They Running on GPU?

MolecularMolecularDynamicsDynamics

OthersOthers

Fluid DynamicsFluid Dynamics

Earth SciencesEarth Sciences

EngineeringEngineeringSimulationSimulation

Agilent EMPro ● ANSYS Mechanical ● ANSYS Nexxim ● CST Microwave Studio

Impetus AFEA ● Remcom XFdtd ● SIMULIA Abaqus

ASUCA ● HOMME ● NASA GEOS-5 ● NOAA NIM ● WRF

Altair Acusolve ● Autodesk Moldflow ● OpenFOAM Prometech Particlework ● Turbostream

AMBER ● CHARMM ● DL_POLY ● GAMESS-US ● GROMACS LAMMPS ● NAMD

GADGET2 ● MATLAB ● Mathematica ● NBODY ● Paradigm VoxelGeo

PARATEC ● Schlumberger Petrel

NAMD es mucho mas rapido7x Aumento en Velocidad con GPUs

ApoA-192,224 Atoms

STMV1,066,628 Atoms

Test Platform: 1 Node, Dual Tesla M2090 GPU (6GB), Dual Intel 4-core Xeon (2.4 GHz), NAMD 2.8, CUDA 4.0, ECC On.Visit www.nvidia.com/simcluster for more information on speed up results, configuration and test models.

NAMD 2.8 B1 + unreleaesd patch, STMV BenchmarkA Node is Dual-Socket, Quad-core x5650 with 2 Tesla M2070 GPUsPerformance numbers for 2 M2070 8 cores (GPU+CPU) vs. 8 cores

(CPU)

On October 11, 2011, the Oak Ridge National Laboratory announced it was building a 20 petaFLOP supercomputer, named Titan, which will become operational in 2012, the hybrid Titan system will combine Opteron processors with “Kepler” NVIDIA Tesla graphic processing unit (GPU) technologies.

Given the current speed of progress, supercomputers are projected to reach 1 exaFLOPS (EFLOPS) in 2019. Cray, Inc. announced in December 2009 a plan to build a 1 EFLOPS supercomputer before 2020.

Erik P. DeBenedictis of Sandia National Laboratories theorizes that a zettaFLOPS (ZFLOPS) computer is required to accomplish full weather modeling of two week time span. Such systems might be built around 2030.

YottaFLOPS? Finally, the complete simulation of the human brain.

What’s next in supercomputing?

Titan at Oak Ridge National LabsTitan at Oak Ridge National LabsWorld’s Top Open Science Computing Research FacilityWorld’s Top Open Science Computing Research Facility

2x mas rapido, 3x mas eficiente x Watt.2x mas rapido, 3x mas eficiente x Watt.Mas eficiente que la SC #1 hoy (K Computer)Mas eficiente que la SC #1 hoy (K Computer)

18,000 GPUs Tesla18,000 GPUs Tesla

20+ Petaflops20+ Petaflops

~90% de los FLOPS ~90% de los FLOPS vienen de los GPUsvienen de los GPUs

Power Crisis in Supercomputing

1982 1996 2008 2020

Exaflop

Petaflop

Teraflop

Gigaflop

Household Power Equivalent

City

Town

Neighborhood

Block

7,000,000 Watts7,000,000 Watts

25,000,000 Watts25,000,000 Watts

850,000 Watts850,000 Watts

60,000 Watts60,000 Watts

2 GigawattsHoover Dam

DATA: U.S. Dept. of Energy

Exascale with CPUs TodayExascale with CPUs Today

Personal Computing ARM Servers

ARM Enables Energy Efficient Computing

ARM is Pervasive and OpenU

nits

in B

illio

ns

Source: ARM, Mercury Research, NVIDIA

ARM

x86

Annual Shipments

Project DenverProject DenverNVIDIA-Designed

High Performance ARM CPU

1

100

PER

FO

RM

AN

CE

2012 2014

WAYNEWAYNE

20132010 2011

TEGRA 2TEGRA 2

TEGRA 3TEGRA 3

LOGANLOGAN

10

Core 2 DuoCore 2 Duo

STARKSTARK

Core i5Core i5

Tegra

CUDA GPU Tegra ARM CPU

CARMA DevKitCUDA for ARM Development Kit

Tegra 3 Quad-core ARM A9Quadro 1000M (96 CUDA cores)

Ubuntu

Gigabit EthernetSATA Connector

HDMI, DisplayPort, USB

Pre-register on www.nvidia.com/CARMADevKitLaunch Q2 2012

World’s First ARM CPU / CUDA GPU Supercomputer

Mont Blanc Research Project

Exploring energy efficient

supercomputer architectures for

exascale

ARM CPU + GPU Prototype

256 ARM CPUs + GPUs

http://www.montblanc-project.eu

http://www.eesi-project.eu/media/BarcelonaConference/Day2/13-Mont-Blanc_Overview.pdf

HPC: cuanto cuesta?

2000

Sandia National Lab

ASCI Red

2TFlops (DP)

2011

NVIDIA

Personal SuperComputer

2TFlops (DP)

ASCI RedPersonal

SuperComputer

Rendimiento 2 TFlops 2 TFlops

Nodos Computacionales 4736 1

Procesador – Tipo Pentium II Tesla C2075

Procesador - Cantidad 9472 4

Gabinete 104 racks 1 workstation

Espacio Ocupado 230 m2 0,12 m2

Consumo Energia 0.85 MW 1400 W

Costo US$ 100~200 milliones U$30K

Homemade Desktop Supercomputer with Tesla

Univ Industrial Santander, Bucaramanga, Colombia

8 nodos8 nodos2 XEON + 8 Tesla C20502 XEON + 8 Tesla C2050

24 CPU + 64 GPU24 CPU + 64 GPU

52 TerraFLOPS52 TerraFLOPS

A Professional GPU Cluster

Iniciando con GPU y CUDA

Probar

Probar CUDA con una portatilo equipo de escritorio con una GPU.

DesarrollarOptimiza applicaciones con estacion

de trabajo con GPU’s Tesla

EscalarCorre aplicaciones en un cluster de

GPU’s para computo paralelo masivo

Rendimiento

EficienciaAccesibilidad

KEPLER

Tesla CUDA Architecture Roadmap

16

2

4

6

8

10

12

14

DP G

FLO

PS

per

Wat

t

2008 2010 2012 2014

T10T10 FermiFermi

KeplerKepler

MaxwellMaxwell

3xRend / Watt

LÓGICA CONTROLADORA

192 núcleos192 núcleosMax 1536 x GPUMax 1536 x GPU

LÓGICA CONTROLADORA

32 núcleos32 núcleosMax 512 x GPUMax 512 x GPU

SMFermi

SMXKepler

Kepler: Rápida y Eficiente

Tesla K10 vs M2090: 2x Rendimiento / Watt

3x Precisión Sencilla

1.8x Ancho de Banda de Memoria

Imágenes, Señales, Sísmico

3x Precisión Doble

Hyper-Q, Paralelismo Dinámico

CFD, FEA, Finanzas, Física

Tesla K10 Tesla K20

Disponible 4T 2012Disponible Ahora

Tesla K10: Mismo Consumo, 2x Rendimiento de Fermi

Product Name

M2090 K10

Arquitectura de GPU Fermi Kepler GK104

# de GPU 1 2

Board Per GPUFlops Precisión Única 1.3 TF 4.58 TF 2.29 TF

Flops Doble Precisión 0.66 TF 0.190 TF 0.095 TF

# Núcleos CUDA 512 3072 1536Tamaño de Memoria 6 GB 8 GB 4GB

Memoria (sin ECC) 177.6 GB/s 320 GB/s 160GB/s

PCI-Express Gen 2: 8 GB/s Gen 3: 16 GB/s

[email protected]

[email protected]

CLCAR, Panama R.P. · Tsubame 2.0 4,224 Tesla GPUs + 2,816 x86 CPUs 12,784 x86 CPUs Hopper- NERSC...

Documents

Transcript of CLCAR, Panama R.P. · Tsubame 2.0 4,224 Tesla GPUs + 2,816 x86 CPUs 12,784 x86 CPUs Hopper- NERSC...