LU out-of-core F. J. Mart´ınez...

LU out-of-core

F. J. Martınez Zaldıvar

Departamento de Comunicaciones / iTEAMUniversitat Politecnica de Valencia

Programacion paralela y computacion de altas prestacionesMaster universitario en nuevas tecnologıas en Informatica

Universidad de Murcia

22 de noviembre de 2013

Notas

Indice

1. Introduccion1.1. Out-of-core 1.2. Entrada/salida

2. La factorizacion LU2.1. Resolucion de sistemas de ecuaciones lineales 2.2. Transformaciones de Gauss 2.3.

Factorizacion LU 2.4. LU con pivotamiento parcial 2.5. LU por bloques

3. Soluciones out-of-core para LU3.1. Iniciativas y soluciones

4. Bibliografıa

Notas

¿Que es out-of-core?

• Problemas con tamanos de datos que exceden notablemente la memoriaprincipal (RAM) disponible

• Casos tıpicos: problemas matriciales densos

• La memoria secundiaria (discos) mas economicos que la principal (RAM)

• Memoria virtual: gestion del S. O. (paging). Transvase de paginas memoriaprincipal ⇔ memoria secundaria

• Computacion out-of-core (“fuera del nucleo”):

• No delegar en el S. O. la gestion de la memoria: eficiencia• Reconcepcion algorıtmica en algunas ocasiones• ¿Vuelta a la Informatica de los anos 50?• Caballo de batalla:

• Gestionar la E/S• Solapar con calculo: compromiso buffers/tamanos

Notas

Ejemplo

Solucionar un sistema complejo de doble precision de de1 000 000 × 1 000 000 de ecuaciones:

106 × 106 = 1012 numeros complejos

= 2 · 1012 numeros en doble precision

= 16 · 1012 bytes

= 15 258 789,062 5 MB

= 14 901,161 193 848 GB

= 14,551 915 228 TB

Notas

E/S: acceso a disco

Posibilidades en funcion de arquitectura paralela:

• Memoria compartida

• Memoria distribuida

Acceso paralelo a disco:

• Unico file pointer a disco

• File pointer individuales a disco

Notas

E/S: unico file pointer (I)

Un proceso: scatter/gather

Notas

E/S: unico file pointer (II)

Unico file pointer efectivo: acceso sincronizado

Notas

E/S: unico file pointer (III)

Comunicacion colectiva:

Notas

E/S: file pointer individuales (I)

Fichero compartido:

• Mas generico: independientedel grid de procesos

• Escritura de elementoscontiguos (columnas Fortran):dificultad por solapamientoescrituras / coherencia

• Problemas de rendimiento contamanos pequenos

Notas

E/S: file pointer individuales (II)

Fichero distribuido:

• Rigidez por grid de procesos,tamano de bloques, . . .

• Mejor rendimiento en general

Notas

E/S paralela

• Paralelismo en E/S para incrementar el ancho de banda

• Utilizacion de multiples fuentes/dispositivos/caminos

• Sistemas de ficheros paralelos:

• Organizacion de dispositivos de E/S en espacio logico simple• API generica

• Software adicional para

• Accesos coordinados• Mapeo de la aplicacion a la E/S

• Niveles:

• Aplicacion• Librerıa de E/S de alto nivel (HDF5, netCDF, . . . )• Capa de software intermedio (MPI-IO, UPC-IO, . . . )• Sistema de ficheros paralelo (PVFS, GPFS, Lustre, . . . )• Hardware de almacenamiento

Notas

MPI-IO

• Particionado de fichero compartido entre proceses

• Interfaz de transferencias colectivas entre memoria y ficheros

• E/S asıncronas: muy importante solapamiento calculo/comunicaciones

• Control sobre disposicion (layout), datatypes, . . .

• Implementaciones de MPI-IO:

• ROMIO: Argonne National Laboratory• MPI-IO/GPFS: IBM• MPI/SX y MPI/PC-32: NEC

Notas

Resolucion Ax = b

Solucion de Ax = b. Si

A = LU

{L triangular inferiorU triangular superior

entonces:

Ax = b

LUx = b

Ux = y ⇒ Ly = b

y = L−1b ⇒ x = U−1y

solucion mediante solucion de dos sistemas triangulares

Notas

Transformaciones de Gauss

Anulacion de componentes:

Mkx =

k

1 · · · 0 0 · · · 0...

. ..

.

.

.

.

.

.. ..

.

.

.0 · · · 1 0 · · · 0

k + 1 0 · · · −τk+1 1 · · · 0

.

.

....

.

.

.

.

.

....

.

.

.0 · · · −τn 0 · · · 1

x1...xk

xk+1

.

.

.xn

=

x1...xk0

.

.

.0

conτi =

xi

xk, i = k + 1, . . . n, multiplicadores de Gauss

xk : pivote, y

τ(k) = (0, . . . , 0, τk+1, . . . , τn)

T , vector de Gauss

Notas

Transformaciones de Gauss: ejemplo

M3x =

1 0 0 0 0

0 1 0 0 0

0 0 1 0 0

0 0 −4/3 1 0

0 0 −5/3 0 1

1

2

3

4

5

=

1

2

3

0

0

Notas

Transformaciones de Gauss: inversa

Puede comprobarse que

τ(k)

eTk =

0

.

.

.0

τk+1

.

.

.τn

( k

0 · · · 1 0 · · · 0

)=

k

0 · · · 0 0 · · · 0

.

.

....

.

.

.

.

.

....

.

.

.

0 · · · 0 0 · · · 0

k + 1 0 · · · τk+1 0 · · · 0

.

.

.. ..

.

.

.

.

.

.. ..

.

.

.

0 · · · τn 0 · · · 0

Mk = I − τ(k)

eTk

M−1k

= I + τ(k)

eTk

MkM−1k

= (I − τ(k)

eTk )(I + τe

Tk ) = I − τ

(k)eTk τ

(k)eTk = I

Notas

Transformaciones de Gauss: inversa (ejemplo)

τ(3)

eT3 =

000

4/35/3

(

0 0 1 0 0)

=

0 0 0 0 00 0 0 0 00 0 0 0 00 0 4/3 0 00 0 5/3 0 0

M3 = I − τ(3)

eT3 =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 −4/3 1 00 0 −5/3 0 1

M−13 = I + τ

(3)eT3 =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 4/3 1 00 0 5/3 0 1

M3M−13 =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 −4/3 1 00 0 −5/3 0 1

1 0 0 0 00 1 0 0 00 0 1 0 00 0 4/3 1 00 0 5/3 0 1

= I

Notas

Aplicacion a una matriz

MkA = (I− τ(k)

eTk )A

= A − τ(k)

eTk A

= A − τ(k)

A(k, :)

= A −

0

.

.

.0

τk+1

.

.

.τn

A(k, :)

Solo se actualizara A(k + 1 : n, :).

Notas

Aplicacion a una matriz: ejemplo

M3A =

1 0 0 0 00 1 0 0 00 0 1 0 00 0 −4/3 1 00 0 −5/3 0 1

1 2 3 4 52 7 8 9 13 3 4 5 64 8 9 1 25 4 5 6 7

=

1 2 3 4 52 7 8 9 13 3 4 5 60 ∗ ∗ ∗ ∗0 ∗ ∗ ∗ ∗

Notas

Triangularizacion y factorizacion LU

Mn−1 · · ·M2M1A = U, (triangular superior)

Entonces:

A = M−11 M−1

2 · · ·M−1n−1U

Como M−1i = I+ τ

(i)eTi es triangular inferior unidad,

M−11 M−1

2 · · ·M−1n−1 = L

es triangular inferior unidad. Por lo tanto:

A = LU

Notas

Formacion de L

L = M−11 M

−12 · · · =

1 0 0 · · · 0

τ(1)2 1 0 · · · 0

τ(1)3 0 1 · · · 0

.

.

.

.

.

.

.

.

....

.

.

.

τ(1)n 0 0 · · · 1

1 0 0 · · · 00 1 0 · · · 0

0 τ(2)3 1 · · · 0

.

.

.

.

.

.

.

.

....

.

.

.

0 τ(2)n 0 · · · 1

· · ·

=

1 0 0 · · · 0

τ(1)2 1 0 · · · 0

τ(1)3 τ

(2)3 1 · · · 0

.

.

.

.

.

.

.

.

....

.

.

.

τ(1)n τ

(2)n τ

(3)n · · · 1

=

1 0 0 · · · 0l2,1 1 0 · · · 0l3,1 l3,2 1 · · · 0

.

.

.

.

.

.

.

.

....

.

.

.ln,1 ln,2 ln,3 · · · 1

Notas

L y U: compactacion

A = LU

L =

1 0 · · · 0l2,1 1 · · · 0

.

.

.

.

.

....

.

.

.ln,1 ln,2 · · · 1

U =

u1,1 u1,2 · · · u1,n0 u2,2 · · · u2,n...

.

.

....

.

.

.0 0 · · · un,n

A⇐ L\\U

A⇐

u1,1 u1,2 · · · u1,nl2,1 u2,2 · · · u2,n

.

.

.

.

.

....

.

.

.ln,1 ln,2 · · · un,n

Notas

Existencia y unicidad

Sea A ∈ Rn×n:

A = LU ⇔ |A(1 : k , 1 : k)| 6= 0|, k = 1 : n − 1

y si A es no singular, la factorizacion es unica y |A| = u1,1 · . . . · un,n

Notas

Ejemplo

1 0 0−3 1 0−4 0 1

2 3 46 15 148 30 24

=

2 3 40 6 20 18 8

1 0 00 1 00 −18/6 1

1 0 0−3 1 0−4 0 1

2 3 46 15 148 30 24

=

1 0 00 1 00 −18/6 1

2 3 40 6 20 18 8

1 0 00 1 00 −3 1

1 0 0−3 1 0−4 0 1

2 3 46 15 148 30 24

=

1 0 00 1 00 −3 1

2 3 40 6 20 18 8

1 0 00 1 00 −3 1

1 0 0−3 1 0−4 0 1

2 3 46 15 148 30 24

=

2 3 40 6 20 0 2

Notas

Factorizacion LU: ejemplo

1 0 00 1 00 −3 1

1 0 0−3 1 0−4 0 1

2 3 46 15 148 30 24

=

2 3 40 6 20 0 2

2 3 46 15 148 30 24

=

1 0 0−3 1 0−4 0 1

−1

1 0 00 1 00 −3 1

−1

2 3 40 6 20 0 2

2 3 46 15 148 30 24

=

1 0 03 1 04 0 1

1 0 00 1 00 3 1

2 3 40 6 20 0 2

2 3 46 15 148 30 24

=

1 0 03 1 04 3 1

2 3 40 6 20 0 2

A = LU

L\\U =

2 3 43 6 24 3 2

Notas

Algoritmo

function [L, U] = mi_LU(A)

M = size(A, 1);

L = eye(M);

U = zeros(M);

for i = 1 : M

U(i, i) = A(i, i);

for j = i+1 : M

L(j, i) = A(j, i) / U(i, i);

end

for j = i+1 : M

U(i, j) = A(i, j);

for k = i+1 : M

A(k, j) = A(k, j) - L(k, i) * U(i, j);

end

end

end

function [L, U] = mi_LU2(A)

M = size(A, 1);

L = eye(M);

U = zeros(M) ;

for i = 1 : M

for j = i : M

U(i, j) = A(i, j);

end

for j = i+1 : M

L(j, i) = A(j, i) / U(i, i);

for k = i+1 : M

A(j, k) = A(j, k) - L(j, i) * U(i, k);

end

end

end

Coste:M∑

i=1

M∑

j=i+1

1 +

M∑

k=i+1

2

=

M∑

i=1

M∑

j=i+1

(1 + 2(M − i))

=

M∑

i=1

(1 + 2(M − i)) (M − i) ≈2

3M

3flops

Notas

LU de una matriz rectangular

Si: |A(1 : k , 1 : k)| 6= 0, ∀ k = 1 : mın(m, n):

A = LU

A ∈ Rm×n, L ∈ R

m×mın(m,n), U ∈ Rmın(m,n)×n

Notas

Pivotamiento (I)

Ejemplo:

A =

(0,0001 1

1 1

)

= LU

=

(1 0

10 000 1

)(0,0001 1

0 −9999

)

• κ2(A) = 2,6184, κ2(L) = 108 y κ2(U) = 9,9 · 107

• Motivo: pivote pequeno → multiplicadores grandes

Notas

Pivotamiento (II)

Solucion:

• permutar filas para obtener el mayor pivote posible (menor multiplicadorposible): pivotamiento parcial

• permutar filas y columnas para obtener el mayor pivote posible (menormultiplicador posible): pivotamiento total

• Ejemplo:

PA =

(0 11 0

)(0,0001 1

1 1

)=

(1 1

0,0001 1

)

PA = LU =

(1 0

0,0001 1

)(1 10 0,9999

)

Ahora: κ2(L) = 1,0001 y κ2(U) = 2,6182

Notas

Matrices de permutacion

Propiedades:

• Resultan de permutar filas (columnas) de una matriz identidad

• No se suelen almacenar ni operar explıcitamente

• Son ortogonales:P−1 = PT

• P1P2 = P3

• Permutaciones de intercambio Eij : identidad con las filas (columnas) i yj . Consecuencias:

• Eij = ETij = E−1

ij

• EijA: intercambia filas i y j

• AEij : intercambia columnas i y j

• Caso particular y notacion: Ek : intercambia fila k con alguna j ≥ k

Notas

Permutaciones de intercambio: ejemplo

E2,5A =

1 0 0 0 00 0 0 0 10 0 1 0 00 0 0 1 00 1 0 0 0

3 5 7 9 01 2 8 3 45 3 1 3 89 6 7 8 40 1 3 5 7

=

3 5 7 9 00 1 3 5 75 3 1 3 89 6 7 8 41 2 8 3 4

AE2,5 =

3 5 7 9 01 2 8 3 45 3 1 3 89 6 7 8 40 1 3 5 7

1 0 0 0 00 0 0 0 10 0 1 0 00 0 0 1 00 1 0 0 0

=

3 0 7 9 51 4 8 3 25 8 1 3 39 4 7 8 60 7 3 5 1

ET2,5 =

1 0 0 0 00 0 0 0 10 0 1 0 00 0 0 1 00 1 0 0 0

= E2,5

E−12,5 = E2,5

E2,5E−12,5 =

1 0 0 0 00 0 0 0 10 0 1 0 00 0 0 1 00 1 0 0 0

1 0 0 0 00 0 0 0 10 0 1 0 00 0 0 1 00 1 0 0 0

=

1 0 0 0 00 1 0 0 00 0 1 0 00 0 0 1 00 0 0 0 1

= I

Notas

LU con pivotamiento parcial (I)

• Calculo de vector de Gauss:

• Reordenacion previa: permuta del pivote por el mayor valor de lasubcolumna inferior

• LU:

• Sucesion alternada de permutaciones de intercambio Ek y aplicaciones dematrices de Gauss Mk

Mn−1En−1 · · ·M1E1︸︷︷︸

6=triangular inferior

A = U

Notas

LU con pivotamiento parcial (II)

Definamos

Mk =

{En−1 · · · Ek+1MkEk+1 · · · En−1 si k < n − 1

Mk si k = n − 1¡Mk es triangular inferior!

P = En−1 · · · E1 matriz de permutacion

entonces

Mn−1 · · · M2M1P = Mn−1En−1 · · ·M1E1 ¡reordenacion equivalente!

por lo tanto

Mn−1 · · · M2M1︸︷︷︸

triangular inferior≡L−1

PA = U

PA = LU

Notas

Mk es triangular inferior

Ek+1MkEk+1 = Ek+1 ·

k k + 1 j

1 · · · 0 0 · · · 0 · · · 0

.

.

....

.

.

.

.

.

....

.

.

....

.

.

.

k 0 · · · 1 0 · · · 0 · · · 0

k + 1 0 · · · τk+1 0 · · · 1 · · · 0

.

.

....

.

.

.

.

.

....

.

.

....

.

.

.

j 0 · · · τj 1 · · · 0 · · · 0

.

.

....

.

.

.

.

.

....

.

.

....

.

.

.

0 · · · τn 0 · · · 0 · · · 1

· Ek+1 ¡es triangular inferior!

Inductivamente:

Mk =

{En−1 · · · Ek+1MkEk+1 · · · En−1 si k < n − 1

Mk si k = n − 1¡es triangular inferior!

Notas

Reordenacion equivalente

M1 = En−1 · · · E2M1E2 · · · En−1

M2 = En−1 · · · E3M2E3 · · · En−1

M2M1 = En−1 · · · E3M2

I︷︸︸︷E3 · · · En−1En−1

︸︷︷︸I

· · · E3 E2M1E2 · · · En−1

= En−1 · · · E3M2E2M1E2 · · · En−1

Mn−1 · · · M2M1 = Mn−1En−1 · · ·M2E2M1E2 · · · En−1

P = En−1 · · · E2E1

Mn−1 · · · M2M1P = Mn−1En−1 · · ·M2E2M1

I︷︸︸︷E2 · · · En−1En−1

︸︷︷︸I

· · · E2 E1 = Mn−1En−1 · · ·M2E2M1E1

L−1

︷︸︸︷Mn−1 · · · M2M1 PA = Mn−1En−1 · · ·M2E2M1E1A = U

PA = LU

Notas

LU con pivotamiento parcial (II) —rep.—

Definamos

Mk =

{En−1 · · · Ek+1MkEk+1 · · · En−1 si k < n − 1

Mk si k = n − 1¡Mk es triangular inferior!

P = En−1 · · · E1 matriz de permutacion

entonces

Mn−1 · · · M2M1P = Mn−1En−1 · · ·M1E1 ¡reordenacion equivalente!

por lo tanto

Mn−1 · · · M2M1︸︷︷︸

triangular inferior≡L−1

PA = U

PA = LU

Notas

Ejemplo

1 0 00 1 00 9/11 1

︸︷︷︸M2

1 0 00 0 10 1 0

︸︷︷︸E2

1 0 0

−6/8 1 0−2/8 0 1

︸︷︷︸M1

0 0 10 1 01 0 0

︸︷︷︸E1

2 15 286 3 198 16 24

︸︷︷︸A

=

2 15 286 3 198 16 24

8 16 246 3 192 15 28

8 16 240 −9 10 11 22

8 16 240 11 220 −9 1

8 16 240 11 220 0 19

= M2E2M1E1A = U

= M2E2M1 I E1A = U

= M2E2M1

I︷︸︸︷E2E2 E1A = U

= M2︸︷︷︸M2

E2M1E2︸︷︷︸M1

E2E1︸︷︷︸P

A = U

M2M1PA = U

PA = M−11 M

−12 U

PA = LU

Notas

Ejemplo: ¿P y L?

1 0 00 1 00 9/11 1

︸︷︷︸M2

1 0 00 0 10 1 0

︸︷︷︸E2

1 0 0

−6/8 1 0−2/8 0 1

︸︷︷︸M1

0 0 10 1 01 0 0

︸︷︷︸E1

2 15 286 3 198 16 24

︸︷︷︸A

=

2 15 286 3 198 16 24

8 16 246 3 192 15 28

8 16 240 −9 10 11 22

8 16 240 11 220 −9 1

8 16 240 11 220 0 19

= M2E2M1E1A = U

= M2E2M1 I E1A = U

= M2E2M1

I︷︸︸︷E2E2 E1A = U

= M2︸︷︷︸M2

E2M1E2︸︷︷︸M1

E2E1︸︷︷︸P

A = U

M2M1PA = U

PA = M−11 M

−12 U

PA = LU

P =

1 0 00 1 01 1 1

L =

1 0 0

6/8 1 02/8 0 1

Notas

LU por bloques (sin pivotamiento)

A =

(A11 A12A21 A22

)

A = LU =

(L11 0L21 L22

)(U11 U120 U22

)=

(L11U11 L11U12L21U11 L21U12 + L22U22

)=

(A11 A12A21 A22

)

L11U11 = A11 ⇒ [L11,U11 ] = lu (A11) L11U12 = A12 ⇒ U12 = L−111 A12

L21U11 = A21 ⇒ L21 = A21U−111 L21U12 + L22U22 = A22 ⇒ [L22,U22] = lu (A22 − L21U12)

A⇐ L\\U

A⇐

(L11\\U11 ← lu (A11) U12 ← L−1

11 A12

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12)

)

Notas

Operaciones y costes

Operacion Dimensiones BLAS/LAPACK flops almacenamiento

lu(Aii ) Aii ∈ Rni×ni xGETRF 2/3n3i n2i

AijU−1jj

Aij ∈ Rmi×nj

xTRSM min2j minj + n2j

Ujj ∈ Rnj×nj

L−1ii

AijLii ∈ R

mi×mixTRSM m2

i nj mi nj + n2jAij ∈ R

mi×nj

Aij − LikUkj

Aij ∈ Rmi×nj

xGEMM 2kmi nj mi nj + mi k + knjLik ∈ Rmi×k

Ukj ∈ Rk×nj

Notas

Grafo de dependencia Notas

Version right-looking

Esquema recursivo:

Notas


A =

A11 A12 A13

A21 A22 A23

A31 A32 A33

Notas


A =

L11\\U11 ← lu (A11) U12 ← L−111 A12 U13 ← L

−111 A13

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12) U23 ← L−1

22 (A23 − L21U13)

L31 ← A31U−111 L32 ← (A32 − L31U12)U

−122 L33\\U33 ← lu (A33−L31U13−L32U23)

Notas

Version left-looking Notas

Version left-looking

A =

L11\\U11 ← lu (A11) U12 ← L−111 A12 U13 ← L−1

11 A13

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12) U23 ← L−1

22 (A23 − L21U13)

L31 ← A31U−111 L32 ← (A32 − L31U12)U

−122 L33\\U33 ← lu (A33 − L31U13 − L32U23)

A =

A11 A12 A13

A21 A22 A23

A31 A32 A33

Notas

Version left-looking

A =

L11\\U11 ← lu (A11) U12 ← L−111 A12 U13 ← L−1

11 A13

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12) U23 ← L−1

22 (A23 − L21U13)

L31 ← A31U−111 L32 ← (A32 − L31U12)U

−122 L33\\U33 ← lu (A33 − L31U13 − L32U23)

A =

L11\\U11 ← lu (A11) U12 ← L−111 A12 U13 ← L

−111 A13

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12) U23 ← L−1

22 (A23 − L21U13)

L31 ← A31U−111 L32 ← (A32 − L31U12)U

−122 L33\\U33 ← lu (A33−L31U13−L32U23)

Notas

E/S right-looking

R = MM W = MM

Notas

E/S right-looking

R = + (M − nb )(M − nb ) + (M − 2nb )(M − 2nb )

+ . . . + nbnb

=

M/nb−1∑

k=0

(M − knb)2

=M3

3nb(1 + O(nb/M))

W = + (M − nb )(M − nb ) + (M − 2nb )(M − 2nb )

+ . . . + nbnb

=

M/nb−1∑

k=0

(M − knb)2

=M3

3nb(1 + O(nb/M))

Notas

E/S left-looking

R = Mnb W = Mnb

Notas

E/S left-looking

R = + M2nb + Mnb + (M − nb)nb + Mnb + . . . + Mnb + (M − nb)nb

+ . . . + 2nbnb + Mnb

=

M/nb−1∑

k=0

Mnb +

k∑

i=1

((M − (i − 1)nb) nb )

=M3

3nb(1 + O(nb/M))

W = + Mnb + Mnb + . . . + Mnb

=

M/nb−1∑

k=0

Mnb

= M2

Notas

Comparacion E/S

R W

right-looking M3

3nb(1 + O(nb/M)) M3

3nb(1 + O(nb/M))

left-looking M3

3nb(1 + O(nb/M)) M2

#I/O left-looking < #I/O right-looking

(Con detalles adicionales, se conserva desigualdad)

Notas

LU con pivotamiento parcial, por bloques

A⇐ L\\U

A⇐

L11\\U11

L21

← lu

A11

A21

U12 ← L−111 A12

L22\\U22 ← lu (A22 − L21U12)

Comparacion: sin pivotamiento (antes. . . )

A⇐

L11\\U11 ← lu (A11) U12 ← L−111 A12

L21 ← A21U−111 L22\\U22 ← lu (A22 − L21U12)

Notas

Grafo de dependencia (con pivotamiento) Notas

Version right-looking Notas

Lapack GETRF

* DGETRF computes an LU factorization of a general M-by-N matrix A

* using partial pivoting with row interchanges.

*

* The factorization has the form

* A = P * L * U

* where P is a permutation matrix, L is lower triangular with unit

* diagonal elements (lower trapezoidal if m > n), and U is upper

* triangular (upper trapezoidal if m < n).

*

* This is the right-looking Level 3 BLAS version of the algorithm.

*

* =====================================================================

SUBROUTINE DGETRF( M, N, A, LDA, IPIV, INFO )

*

* -- LAPACK computational routine (version 3.4.0) --

* -- LAPACK is a software package provided by Univ. of Tennessee, --

* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG

* Ltd..--

* November 2011

*

* .. Scalar Arguments ..

INTEGER INFO, LDA, M, N

* ..

* .. Array Arguments ..

INTEGER IPIV( * )

DOUBLE PRECISION A( LDA, * )

* ..

*

* =====================================================================

*

* .. Parameters ..

DOUBLE PRECISION ONE

PARAMETER ( ONE = 1.0D+0 )

* ..

* .. Local Scalars ..

INTEGER I, IINFO, J, JB, NB

* ..

* .. External Subroutines ..

EXTERNAL DGEMM, DGETF2, DLASWP, DTRSM, XERBLA

* ..

* .. External Functions ..

INTEGER ILAENV

EXTERNAL ILAENV

* ..

* .. Intrinsic Functions ..

INTRINSIC MAX, MIN

* ..

* .. Executable Statements ..

*

* Test the input parameters.

*

INFO = 0

IF( M.LT.0 ) THEN

INFO = -1

ELSE IF( N.LT.0 ) THEN

INFO = -2

ELSE IF( LDA.LT.MAX( 1, M ) ) THEN

INFO = -4

END IF

IF( INFO.NE.0 ) THEN

CALL XERBLA( ’DGETRF’, -INFO )

RETURN

END IF

*

* Quick return if possible

*

IF( M.EQ.0 .OR. N.EQ.0 )

$ RETURN

*

* Determine the block size for this environment.

*

NB = ILAENV( 1, ’DGETRF’, ’ ’, M, N, -1, -1 )

IF( NB.LE.1 .OR. NB.GE.MIN( M, N ) ) THEN

*

* Use unblocked code.

*

CALL DGETF2( M, N, A, LDA, IPIV, INFO )

ELSE

*

* Use blocked code.

*

DO 20 J = 1, MIN( M, N ), NB

JB = MIN( MIN( M, N )-J+1, NB )

*

* Factor diagonal and subdiagonal blocks and test for exact

* singularity.

*

CALL DGETF2( M-J+1, JB, A( J, J ), LDA, IPIV( J ), IINFO )

*

* Adjust INFO and the pivot indices.

*

IF( INFO.EQ.0 .AND. IINFO.GT.0 )

$ INFO = IINFO + J - 1

DO 10 I = J, MIN( M, J+JB-1 )

IPIV( I ) = J - 1 + IPIV( I )

10 CONTINUE

*

* Apply interchanges to columns 1:J-1.

*

CALL DLASWP( J-1, A, LDA, J, J+JB-1, IPIV, 1 )

*

IF( J+JB.LE.N ) THEN

*

* Apply interchanges to columns J+JB:N.

*

CALL DLASWP( N-J-JB+1, A( 1, J+JB ), LDA, J, J+JB-1,

$ IPIV, 1 )

*

* Compute block row of U.

*

CALL DTRSM( ’Left’, ’Lower’, ’No transpose’, ’Unit’, JB,

$ N-J-JB+1, ONE, A( J, J ), LDA, A( J, J+JB ),

$ LDA )

IF( J+JB.LE.M ) THEN

*

* Update trailing submatrix.

*

CALL DGEMM( ’No transpose’, ’No transpose’, M-J-JB+1,

$ N-J-JB+1, JB, -ONE, A( J+JB, J ), LDA,

$ A( J, J+JB ), LDA, ONE, A( J+JB, J+JB ),

$ LDA )

END IF

END IF

20 CONTINUE

END IF

RETURN

*

* End of DGETRF

*

END

Notas

Version left-looking Notas

E/S con pivotamiento

• Similares conclusiones que sin pivotamiento

• Hay que anadir reescritura de Lij (si se guarda pivotada )

W = (M − nb )× nb

Left-looking sigue teniendo menos E/S

Notas

E/S con pivotamiento

• Similares conclusiones que sin pivotamiento

• Hay que anadir reescritura de Lij (si se guarda pivotada )

W = (M − nb )× nb + (M − 2nb ) × 2nb + . . . + nb × (M − nb )

=

M/nb−1∑

k=1

(M − knb )knb =1

6

M3

nb

Left-looking sigue teniendo menos E/S

Notas

Iniciativas: SOLAR

Scalable Out-of-core Linear Algebra computations

• Biblioteca de alto rendimiento portable para matrices densas out-of-core

• Soporta computaciones in-core en multiprocesadores con memoriacompartida y con memoria distribuida

• La biblioteca de E/S soporta interfaces E/S convencionales y paralelas

• Sin continuidad. . .

Notas

Iniciativas: POOCLAPACK

Parallel Out-Of-Core Linear Algebra PACKage

• Extension de PLAPACK (Parallel LAPACK)

• Permite facil implementacion de OOC sobre PLAPACK

• Alternativa parcial a ScaLAPACK

• Sin continuidad. . .

Notas

Iniciativas: ScaLAPACK prototype

/netlib/scalapack/prototype

SCALAPACK/PROTOTYPE directory

===================

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

! The "prototype" software provided in this directory has !

! been produced as part of the ScaLAPACK Project. The !

! software is "prototype" because is it in pre-release state !

! and is not as robust as the rest of the ScaLAPACK software.!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

For more information on the ScaLAPACK Project, please refer

to the URL:

http://www.netlib.org/scalapack/index.html

NOTE: SuperLU_DIST is now available (September, 1999)!

Out-of-core solvers updated May, 1999!

PBLAS (version 2.0) updated!

Questions/comments should be directed to [email protected].

...

##############################

# Out-of-Core Linear Solvers #

##############################

file: readme.outofcore

for: README file for Out-of-Core software package Refer to LAPACK Working

Note 118 for design details. http://www.netlib.org/lapack/lawns/lawn118.ps

size: 3964 bytes

file: outofcore.tgz

...

Notas

ScaLAPACK out-of-core

Extension out-of-core: scalapack/prototype

• QR:

• PFxGEQRF: factorizacion QR• PFxGEQRS: solucion sistema con factorizacion QR

• Cholesky:

• PFxPOTRF: factorizacion Cholesky• PFxPOTRS: solucion sistema con factorizacion de Cholesky

• LU:

• PFxTRF, PFxTF2: factorizacion LU• PFxTRS: solucion sistema con factorizacion LU

• Auxiliares: PFxGEMM, PFxTRSM, PFxORMQR, PFxMATGEN, ldots

Notas

PFxTRF: LU out-of-core en ScaLAPACK

SUBROUTINE PFDGETRF( M, N, A, IA, JA, DESCA, IPIV, INFO )

*

*

* -- ScaLAPACK routine (version 2.0) --

* University of Tennessee, Knoxville, Oak Ridge National Laboratory,

* and University of California, Berkeley.

* Oct 10, 1996

*

*

*

*

* Purpose

* =======

*

* PZGETRF computes an LU factorization of a general M-by-N distributed

* matrix sub( A ) = A(IA:IA+M-1,JA:JA+N-1) using partial pivoting with

* row interchanges.

*

* The factorization has the form sub( A ) = P * L * U, where P is a

* permutation matrix, L is lower triangular with unit diagonal ele-

* ments (lower trapezoidal if m > n), and U is upper triangular

* (upper trapezoidal if m < n). L and U are stored in sub( A ).

*

* This is the left-looking Parallel out-of-core version of the

* algorithm.

*

* For details, see routine PxGETRF.

...

Notas

PFxTRF: LU out-of-core en ScaLAPACK

Operaciones in-core:

• xLAREAD: lectura de submatriz

• xLAWRITE: escritura de submatriz

• PxGETRF: LU

• xLAPIV: permutacion de filas

• PxTRSM: solucion de sistema triangular

• PxGEMM: multiplicacion matricial

Notas

Referencias

Eddy Caron, Dominique Lazure, Gil Utard, and Jules Verne.Performance prediction and analysis of parallel out-of-core matrix factorization.In In Proceedings of the 7th International Conference on High Performance Computing (HiPC’00), 2000.

Eddy Caron and Gil Utard.On the performance of parallel factorization of out-of-core matrices.Parallel Computing, 30:357–375, 2004.

E. F. D’Azevedo and Jack Dongarra.The design and implementation of the parallel out-of-core scalapack LU, QR, and Cholesky factorization routines. LAPACK Working Note 118CS-97-247, 1997.

Jack J. Dongarra, Sven Hammarling, and David W. Walker.Key concepts for parallel out-of-core LU factorization.Computers Math. Applic., 35(7):13–31, 1998.

John R. Gilbert and Sivan Toledo.High-performance out-of-core sparse LU factorization, 1999.

Thierry Joffrain, Enrique S. Quintana-Ortı, and Robert Van de Geijn.Updating an LU factorization and its application to scalable out-of-core.Technical report, Dept. of Computer Sciences, The University of Texas, Austin, TX and Dept. de Ingenierıa y Ciencia de Computadores, UniversidadJaume I, Castellon, Spain.

Wesley C. Reiley and Robert A. Van De Geijn.POOCLAPACK: Parallel Out-Of-Core Linear Algebra PACKage.Technical report, Department of Computer Sciences, The University of Texas at, 1999.

Sivan Toledo.A survey of out-of-core algorithms in Numerical Linear Algebra, 1999.

Sivan Toledo and Fred G. Gustavson.The design and implementation of SOLAR, a portable library for Scalable Out-of-core Linear Algebra computations.In Workshop on I/O in parallel and distributed systems, pages 28–40. ACM, 1996.

Notas

LU out-of-core F. J. Mart´ınez...

Documents

Transcript of LU out-of-core F. J. Mart´ınez...