API Reference
This page documents the public API of LinearAlgebraMPI.jl.
Types
SparseMatrixMPI
LinearAlgebraMPI.SparseMatrixMPI — Type
SparseMatrixMPI{T}A distributed sparse matrix partitioned by rows across MPI ranks.
Fields
structural_hash::Blake3Hash: 256-bit Blake3 hash of the structural patternrow_partition::Vector{Int}: Row partition boundaries, length = nranks + 1col_partition::Vector{Int}: Column partition boundaries, length = nranks + 1 (placeholder for transpose)col_indices::Vector{Int}: Global column indices that appear in the local part (local→global mapping)A::SparseMatrixCSR{T,Int}: Local rows in CSR format for efficient row-wise iterationcached_transpose: Cached materialized transpose (bidirectionally linked)
Invariants
col_indices,row_partition, andcol_partitionare sortedrow_partition[nranks+1]= total number of rowscol_partition[nranks+1]= total number of columnssize(A, 1) == row_partition[rank+1] - row_partition[rank](number of local rows)size(A.parent, 1) == length(col_indices)(compressed column dimension)A.parent.rowvalcontains local indices in1:length(col_indices)
Storage Details
The local rows are stored in CSR format (Compressed Sparse Row), which enables efficient row-wise iteration - essential for a row-partitioned distributed matrix.
In Julia, SparseMatrixCSR{T,Ti} is a type alias for Transpose{T, SparseMatrixCSC{T,Ti}}. This type has a dual interpretation:
- Semantic view: A lazy transpose of a CSC matrix
- Storage view: Row-major (CSR) access to the data
The underlying A.parent::SparseMatrixCSC stores the transposed data with:
A.parent.m = length(col_indices)(compressed, not global ncols)A.parent.n= number of local rows (columns in the parent = rows in CSR)A.parent.colptr= row pointers for the CSR formatA.parent.rowval= LOCAL column indices (1:length(col_indices))col_indices[local_idx]maps local→global column indices
This compression avoids "hypersparse" storage where the column dimension would be the global number of columns even if only a few columns have nonzeros locally.
Access the underlying CSC storage via A.parent when needed for low-level operations.
MatrixMPI
LinearAlgebraMPI.MatrixMPI — Type
MatrixMPI{T}A distributed dense matrix partitioned by rows across MPI ranks.
Fields
structural_hash::Blake3Hash: 256-bit Blake3 hash of the structural patternrow_partition::Vector{Int}: Row partition boundaries, length = nranks + 1col_partition::Vector{Int}: Column partition boundaries, length = nranks + 1 (for transpose)A::Matrix{T}: Local rows (NOT transposed), size = (local_nrows, ncols)
Invariants
row_partitionandcol_partitionare sortedrow_partition[nranks+1]= total number of rows + 1col_partition[nranks+1]= total number of columns + 1size(A, 1) == row_partition[rank+2] - row_partition[rank+1]size(A, 2) == col_partition[end] - 1
VectorMPI
LinearAlgebraMPI.VectorMPI — Type
VectorMPI{T}A distributed dense vector partitioned across MPI ranks.
Fields
structural_hash::Blake3Hash: 256-bit Blake3 hash of the partitionpartition::Vector{Int}: Partition boundaries, length = nranks + 1v::Vector{T}: Local vector elements owned by this rank
SparseMatrixCSR
LinearAlgebraMPI.SparseMatrixCSR — Type
SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}Type alias for CSR (Compressed Sparse Row) storage format.
The Dual Life of Transpose{SparseMatrixCSC}
In Julia, the type Transpose{Tv, SparseMatrixCSC{Tv,Ti}} has two interpretations:
Semantic interpretation: A lazy transpose wrapper around a CSC matrix. When you call
transpose(A)on a SparseMatrixCSC, you get this wrapper that represents A^T without copying data.Storage interpretation: CSR (row-major) access to sparse data. The underlying CSC stores columns contiguously, but through the transpose wrapper, we can iterate efficiently over rows instead of columns.
This alias clarifies intent: use SparseMatrixCSR when you want row-major storage semantics, and transpose(A) when you want the mathematical transpose.
CSR vs CSC Storage
- CSC (Compressed Sparse Column): Julia's native sparse format. Efficient for column-wise operations, matrix-vector products with column access.
- CSR (Compressed Sparse Row): Efficient for row-wise operations, matrix-vector products with row access, and row-partitioned distributed matrices.
For SparseMatrixCSR, the underlying parent::SparseMatrixCSC stores the transposed matrix. If B = SparseMatrixCSR(A) represents matrix M, then B.parent is a CSC storing M^T. This means:
B.parent.colptracts as row pointers for MB.parent.rowvalcontains column indices for MB.parent.nzvalcontains values in row-major order
Usage Note
Julia will still display this type as Transpose{Float64, SparseMatrixCSC{...}}, not as SparseMatrixCSR. The alias improves code clarity but doesn't affect type printing.
CSR Storage Format
LinearAlgebraMPI uses CSR (Compressed Sparse Row) format internally for SparseMatrixMPI because row-partitioned distributed matrices need efficient row-wise access.
The Dual Life of Transpose{SparseMatrixCSC}
In Julia, the type Transpose{T, SparseMatrixCSC{T,Int}} has two interpretations:
- Semantic: A lazy transpose of a CSC matrix (what you get from
transpose(A)) - Storage: Row-major (CSR) access to sparse data
This duality can be confusing. When you call transpose(A) on a SparseMatrixCSC, you get a wrapper that represents A^T. But the same wrapper type, when used for storage, provides efficient row iteration.
The SparseMatrixCSR Type Alias
To clarify intent, LinearAlgebraMPI exports:
const SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}Use SparseMatrixCSR when you want row-major storage, and transpose(A) when you want the mathematical transpose.
Converting Between CSC and CSR
# CSC to CSR (same matrix, different storage)
A_csc = sparse([1,2,2], [1,1,2], [1.0, 2.0, 3.0], 2, 2)
A_csr = SparseMatrixCSR(A_csc)
A_csr[1,1] == A_csc[1,1] # true
# CSR to CSC
A_back = SparseMatrixCSC(A_csr)
A_back == A_csc # trueWhy CSR for Distributed Matrices?
SparseMatrixMPI partitions matrices by rows across MPI ranks. Each rank needs to efficiently iterate over its local rows for operations like matrix-vector multiplication. CSR format provides O(1) access to each row's nonzeros, while CSC would require scanning the entire column pointer array.
Sparse Matrix Operations
Arithmetic
A * B # Matrix multiplication
A + B # Addition
A - B # Subtraction
a * A # Scalar multiplication
A * a # Scalar multiplicationThreaded Sparse Multiplication
LinearAlgebraMPI.:⊛ — Function
⊛(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixCSC{Tv,Ti}; max_threads=Threads.nthreads()) where {Tv,Ti}Multithreaded sparse matrix multiplication. Splits B into column blocks and computes A * B_block in parallel using Julia's optimized builtin *.
Threading behavior
- Uses at most
n ÷ 100threads, wheren = size(B, 2), ensuring at least 100 columns per thread - Falls back to single-threaded
A * Bwhenn < 100or when threading overhead would dominate - The
max_threadskeyword limits the maximum number of threads used
Examples
using SparseArrays
A = sprand(1000, 1000, 0.01)
B = sprand(1000, 500, 0.01)
C = A ⊛ B # Use all available threads (up to n÷100)
C = ⊛(A, B; max_threads=2) # Limit to 2 threadsThe ⊛ operator (typed as \circledast<tab>) provides multithreaded sparse matrix multiplication for SparseMatrixCSC. It is used internally by SparseMatrixMPI multiplication and can also be used directly on local sparse matrices:
using SparseArrays
A = sprand(10000, 10000, 0.001)
B = sprand(10000, 5000, 0.001)
C = A ⊛ B # Parallel multiplication using available threadsTranspose and Adjoint
transpose(A) # Lazy transpose
conj(A) # Conjugate (new matrix)
A' # Adjoint (conjugate transpose, lazy)
SparseMatrixMPI(transpose(A)) # Materialize lazy transpose (cached)Matrix-Vector Multiplication
y = A * x # Returns VectorMPI
mul!(y, A, x) # In-place versionVector-Matrix Multiplication
transpose(v) * A # Row vector times matrix
v' * A # Conjugate row vector times matrixNorms
norm(A) # Frobenius norm (default)
norm(A, 1) # Sum of absolute values
norm(A, Inf) # Maximum absolute value
norm(A, p) # General p-norm
opnorm(A, 1) # Maximum absolute column sum
opnorm(A, Inf) # Maximum absolute row sumProperties
size(A) # Global dimensions (m, n)
size(A, d) # Size along dimension d
eltype(A) # Element type
nnz(A) # Number of nonzeros
issparse(A) # Returns trueReductions
sum(A) # Sum of all stored elements
sum(A; dims=1) # Column sums (returns VectorMPI) - SparseMatrixMPI only
sum(A; dims=2) # Row sums (returns VectorMPI) - SparseMatrixMPI only
maximum(A) # Maximum of stored values
minimum(A) # Minimum of stored values
tr(A) # Trace (sum of diagonal) - SparseMatrixMPI onlyLinearAlgebraMPI.mean — Function
mean(v::VectorMPI{T}) where TCompute the mean of all elements in the distributed vector.
mean(A::SparseMatrixMPI{T}) where TCompute the mean of all elements (including implicit zeros) in the distributed sparse matrix.
Element-wise Operations
abs(A) # Absolute value
abs2(A) # Squared absolute value
real(A) # Real part
imag(A) # Imaginary part
floor(A) # Floor
ceil(A) # Ceiling
round(A) # RoundUtilities
copy(A) # Deep copy
dropzeros(A) # Remove stored zeros
diag(A) # Main diagonal (returns VectorMPI)
diag(A, k) # k-th diagonal
triu(A) # Upper triangular
triu(A, k) # Upper triangular from k-th diagonal
tril(A) # Lower triangular
tril(A, k) # Lower triangular from k-th diagonalBlock Operations
cat(A, B, C; dims=1) # Vertical concatenation
cat(A, B, C; dims=2) # Horizontal concatenation
cat(A, B, C, D; dims=(2,2)) # 2x2 block matrix [A B; C D]
vcat(A, B, C) # Vertical concatenation
hcat(A, B, C) # Horizontal concatenation
blockdiag(A, B, C) # Block diagonal matrixDiagonal Matrix Construction
spdiagm(v) # Diagonal matrix from VectorMPI
spdiagm(m, n, v) # m x n diagonal matrix
spdiagm(k => v) # k-th diagonal
spdiagm(0 => v, 1 => w) # Multiple diagonalsDense Matrix Operations
Arithmetic
A * x # Matrix-vector multiplication (returns VectorMPI)
transpose(A) # Lazy transpose
conj(A) # Conjugate
A' # Adjoint
a * A # Scalar multiplicationmapslices
Apply a function to rows or columns of a distributed dense matrix.
mapslices(f, A; dims=2) # Apply f to each row (local, no MPI)
mapslices(f, A; dims=1) # Apply f to each column (requires MPI)Example:
using LinearAlgebra
# Create deterministic test matrix (same on all ranks)
A_global = Float64.([i + 0.1*j for i in 1:100, j in 1:10])
A = MatrixMPI(A_global)
# Compute row statistics: norm, max, sum for each row
# Transforms 100x10 to 100x3
B = mapslices(x -> [norm(x), maximum(x), sum(x)], A; dims=2)map_rows
LinearAlgebraMPI.map_rows — Function
map_rows(f, A...)Apply function f to corresponding rows of distributed vectors/matrices.
Each argument in A... must be either a VectorMPI or MatrixMPI. All inputs are repartitioned to match the partition of the first argument before applying f.
For each row index i, f is called with the i-th row from each input:
- For
VectorMPI, the i-th "row" is a length-1 view of element i - For
MatrixMPI, the i-th row is a row vector (a view into the local matrix)
Result Type (vcat semantics)
The result type depends on what f returns, matching the behavior of vcat:
f returns | Julia type | Result |
|---|---|---|
| scalar | Number | VectorMPI (one element per input row) |
| column vector | AbstractVector | VectorMPI (vcat concatenates all vectors) |
| row vector | Transpose, Adjoint | MatrixMPI (vcat stacks as rows) |
| matrix | AbstractMatrix | MatrixMPI (vcat stacks rows) |
Lazy Wrappers
Julia's transpose(v) and v' (adjoint) return lazy wrappers that are subtypes of AbstractMatrix, so they produce MatrixMPI results:
map_rows(r -> [1,2,3], A) # Vector → VectorMPI (length 3n)
map_rows(r -> [1,2,3]', A) # Adjoint → MatrixMPI (n×3)
map_rows(r -> transpose([1,2,3]), A) # Transpose → MatrixMPI (n×3)
map_rows(r -> conj([1,2,3]), A) # Vector → VectorMPI (length 3n)
map_rows(r -> [1 2 3], A) # Matrix literal → MatrixMPI (n×3)Examples
# Element-wise product of two vectors
u = VectorMPI([1.0, 2.0, 3.0])
v = VectorMPI([4.0, 5.0, 6.0])
w = map_rows((a, b) -> a[1] * b[1], u, v) # VectorMPI([4.0, 10.0, 18.0])
# Row norms of a matrix
A = MatrixMPI(randn(5, 3))
norms = map_rows(r -> norm(r), A) # VectorMPI of row norms
# Expand each row to multiple elements (vcat behavior)
A = MatrixMPI(randn(3, 2))
result = map_rows(r -> [1, 2, 3], A) # VectorMPI of length 9
# Return row vectors to build a matrix
A = MatrixMPI(randn(3, 2))
result = map_rows(r -> [1, 2, 3]', A) # 3×3 MatrixMPI
# Variable-length output per row
v = VectorMPI([1.0, 2.0, 3.0])
result = map_rows(r -> ones(Int(r[1])), v) # VectorMPI of length 6 (1+2+3)
# Mixed inputs: matrix rows weighted by vector elements
A = MatrixMPI(randn(4, 3))
w = VectorMPI([1.0, 2.0, 3.0, 4.0])
result = map_rows((row, wi) -> sum(row) * wi[1], A, w) # VectorMPIThis is the MPI-distributed version of:
map_rows(f, A...) = vcat((f.((eachrow.(A))...))...)Apply a function to corresponding rows of multiple distributed vectors/matrices. This is the MPI-distributed version of vcat((f.((eachrow.(A))...))...).
Result type follows vcat semantics:
f returns | Result |
|---|---|
| scalar | VectorMPI (one element per row) |
column vector [1,2,3] | VectorMPI (concatenated) |
row vector [1,2,3]' or transpose([1,2,3]) | MatrixMPI (stacked rows) |
matrix [1 2 3] | MatrixMPI (stacked rows) |
Examples:
# Row norms
A = MatrixMPI(randn(100, 10))
norms = map_rows(r -> norm(r), A) # VectorMPI of length 100
# Expand rows to vectors (vcat behavior)
result = map_rows(r -> [1, 2, 3], A) # VectorMPI of length 300
# Build matrix from row vectors
result = map_rows(r -> [1, 2, 3]', A) # 100×3 MatrixMPI
# Multiple inputs with automatic partition alignment
A = MatrixMPI(randn(100, 10))
w = VectorMPI(randn(100))
weighted = map_rows((row, wi) -> sum(row) * wi[1], A, w)Block Operations
cat(A, B; dims=1) # Vertical concatenation
cat(A, B; dims=2) # Horizontal concatenation
vcat(A, B) # Vertical concatenation
hcat(A, B) # Horizontal concatenationNorms
norm(A) # Frobenius norm
norm(A, p) # General p-norm
opnorm(A, 1) # Maximum absolute column sum
opnorm(A, Inf) # Maximum absolute row sumVector Operations
Arithmetic
u + v # Addition (auto-aligns partitions)
u - v # Subtraction
-v # Negation
a * v # Scalar multiplication
v * a # Scalar multiplication
v / a # Scalar divisionTranspose and Adjoint
transpose(v) # Lazy transpose (row vector)
conj(v) # Conjugate
v' # AdjointNorms
norm(v) # 2-norm (default)
norm(v, 1) # 1-norm
norm(v, Inf) # Infinity norm
norm(v, p) # General p-normReductions
sum(v) # Sum of elements
prod(v) # Product of elements
maximum(v) # Maximum element
minimum(v) # Minimum element
mean(v) # Mean of elementsElement-wise Operations
abs(v) # Absolute value
abs2(v) # Squared absolute value
real(v) # Real part
imag(v) # Imaginary part
copy(v) # Deep copyBroadcasting
VectorMPI supports broadcasting for element-wise operations:
v .+ w # Element-wise addition
v .* w # Element-wise multiplication
sin.(v) # Apply function element-wise
v .* 2.0 .+ w # Compound expressionsBlock Operations
vcat(u, v, w) # Concatenate vectors (returns VectorMPI)
hcat(u, v, w) # Stack as columns (returns MatrixMPI)Properties
length(v) # Global length
size(v) # Returns (length,)
eltype(v) # Element typeIndexing
All distributed types support element access and assignment. These are collective operations - all MPI ranks must call them with the same arguments.
VectorMPI Indexing
v[i] # Get element (collective)
v[i] = x # Set element (collective)
v[1:10] # Range indexing (returns VectorMPI)
v[1:10] = x # Range assignment (scalar or vector)
v[idx] # VectorMPI{Int} indexing (returns VectorMPI)
v[idx] = src # VectorMPI{Int} assignment (src::VectorMPI)MatrixMPI Indexing
# Single element
A[i, j] # Get element
A[i, j] = x # Set element
# Range indexing (returns MatrixMPI)
A[1:3, 2:5] # Submatrix by ranges
A[1:3, :] # Row range, all columns
A[:, 2:5] # All rows, column range
# VectorMPI indices (returns MatrixMPI)
A[row_idx, col_idx] # Both indices are VectorMPI{Int}
# Mixed indexing (returns MatrixMPI or VectorMPI)
A[row_idx, 1:5] # VectorMPI rows, range columns
A[row_idx, :] # VectorMPI rows, all columns
A[1:5, col_idx] # Range rows, VectorMPI columns
A[:, col_idx] # All rows, VectorMPI columns
A[row_idx, j] # VectorMPI rows, single column (returns VectorMPI)
A[i, col_idx] # Single row, VectorMPI columns (returns VectorMPI)SparseMatrixMPI Indexing
# Single element
A[i, j] # Get element (returns 0 for structural zeros)
A[i, j] = x # Set element (modifies structure if needed)
# Range indexing (returns SparseMatrixMPI)
A[1:3, 2:5] # Submatrix by ranges
A[1:3, :] # Row range, all columns
A[:, 2:5] # All rows, column range
# VectorMPI indices (returns SparseMatrixMPI)
A[row_idx, col_idx] # Both indices are VectorMPI{Int}
# Mixed indexing (returns SparseMatrixMPI or VectorMPI)
A[row_idx, 1:5] # VectorMPI rows, range columns
A[row_idx, :] # VectorMPI rows, all columns
A[1:5, col_idx] # Range rows, VectorMPI columns
A[:, col_idx] # All rows, VectorMPI columns
A[row_idx, j] # VectorMPI rows, single column (returns VectorMPI)
A[i, col_idx] # Single row, VectorMPI columns (returns VectorMPI)setindex! Source Types
For setindex! operations, the source type depends on the indexing pattern:
| Index Pattern | Source Type |
|---|---|
A[i, j] = x | Scalar |
A[range, range] = x | Scalar, Matrix, or distributed matrix |
A[VectorMPI, VectorMPI] = src | MatrixMPI (matching partitions) |
A[VectorMPI, range] = src | MatrixMPI |
A[range, VectorMPI] = src | MatrixMPI |
A[VectorMPI, j] = src | VectorMPI |
A[i, VectorMPI] = src | VectorMPI |
Utility Functions
Partition Computation
LinearAlgebraMPI.uniform_partition — Function
uniform_partition(n::Int, nranks::Int) -> Vector{Int}Compute a balanced partition of n elements across nranks ranks. Returns a vector of length nranks + 1 with 1-indexed partition boundaries.
The first mod(n, nranks) ranks get div(n, nranks) + 1 elements, the remaining ranks get div(n, nranks) elements.
Example
partition = uniform_partition(10, 4) # [1, 4, 7, 9, 11]
# Rank 0: 1:3 (3 elements)
# Rank 1: 4:6 (3 elements)
# Rank 2: 7:8 (2 elements)
# Rank 3: 9:10 (2 elements)Repartitioning
Redistribute distributed data to a new partition. Provides plan caching for repeated operations and a fast path when partitions already match (no communication needed).
LinearAlgebraMPI.repartition — Function
repartition(x::VectorMPI{T}, p::Vector{Int}) where TRedistribute a VectorMPI to a new partition p.
The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == length(x) + 1.
Returns a new VectorMPI with the same data but partition == p.
Example
v = VectorMPI([1.0, 2.0, 3.0, 4.0]) # uniform partition
new_partition = [1, 2, 5] # rank 0 gets 1 element, rank 1 gets 3
v_repart = repartition(v, new_partition)repartition(A::MatrixMPI{T}, p::Vector{Int}) where TRedistribute a MatrixMPI to a new row partition p. The col_partition remains unchanged.
The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == size(A, 1) + 1.
Returns a new MatrixMPI with the same data but row_partition == p.
Example
A = MatrixMPI(randn(6, 4)) # uniform partition
new_partition = [1, 2, 4, 5, 7] # 1, 2, 1, 2 rows per rank
A_repart = repartition(A, new_partition)repartition(A::SparseMatrixMPI{T}, p::Vector{Int}) where TRedistribute a SparseMatrixMPI to a new row partition p. The col_partition remains unchanged.
The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == size(A, 1) + 1.
Returns a new SparseMatrixMPI with the same data but row_partition == p.
Example
A = SparseMatrixMPI{Float64}(sprand(6, 4, 0.5)) # uniform partition
new_partition = [1, 2, 4, 5, 7] # 1, 2, 1, 2 rows per rank
A_repart = repartition(A, new_partition)Rank-Selective Output
LinearAlgebraMPI.io0 — Function
io0(io=stdout; r::Set{Int}=Set{Int}([0]), dn=devnull)Return io if the current MPI rank is in set r, otherwise return dn (default: devnull).
This is useful for printing only from specific ranks:
println(io0(), "Hello from rank 0!")
println(io0(r=Set([0,1])), "Hello from ranks 0 and 1!")With string interpolation:
println(io0(), "Matrix A = $A")Gathering Distributed Data
Convert distributed MPI types to standard Julia types (gathers data to all ranks):
Vector(v::VectorMPI) # Gather to Vector
Matrix(A::MatrixMPI) # Gather to Matrix
SparseMatrixCSC(A::SparseMatrixMPI) # Gather to SparseMatrixCSCThese conversions enable show and string interpolation:
println(io0(), "Result: $v") # Works with VectorMPI
println(io0(), "Matrix: $A") # Works with MatrixMPI/SparseMatrixMPILocal Constructors
Create distributed types from local data (each rank provides only its portion):
LinearAlgebraMPI.VectorMPI_local — Function
VectorMPI_local(v_local::Vector{T}, comm::MPI.Comm=MPI.COMM_WORLD) where TCreate a VectorMPI from a local vector on each rank.
Unlike VectorMPI(v_global) which takes a global vector and partitions it, this constructor takes only the local portion of the vector that each rank owns. The partition is computed by gathering the local sizes from all ranks.
Example
# Rank 0 has [1.0, 2.0], Rank 1 has [3.0, 4.0, 5.0]
v = VectorMPI_local([1.0, 2.0]) # on rank 0
v = VectorMPI_local([3.0, 4.0, 5.0]) # on rank 1
# Result: distributed vector [1.0, 2.0, 3.0, 4.0, 5.0] with partition [1, 3, 6]LinearAlgebraMPI.MatrixMPI_local — Function
MatrixMPI_local(A_local::Matrix{T}; comm=MPI.COMM_WORLD, col_partition=...) where TCreate a MatrixMPI from a local matrix on each rank.
Unlike MatrixMPI(M_global) which takes a global matrix and partitions it, this constructor takes only the local rows of the matrix that each rank owns. The row partition is computed by gathering the local row counts from all ranks.
All ranks must have local matrices with the same number of columns. A collective error is raised if the column counts don't match.
Keyword Arguments
comm::MPI.Comm: MPI communicator (default:MPI.COMM_WORLD)col_partition::Vector{Int}: Column partition boundaries (default:uniform_partition(size(A_local,2), nranks))
Example
# Rank 0 has 2×3 matrix, Rank 1 has 3×3 matrix
A = MatrixMPI_local(randn(2, 3)) # on rank 0
A = MatrixMPI_local(randn(3, 3)) # on rank 1
# Result: 5×3 distributed matrix with row_partition [1, 3, 6]LinearAlgebraMPI.SparseMatrixMPI_local — Function
SparseMatrixMPI_local(A_local::SparseMatrixCSR{T,Int}; comm=MPI.COMM_WORLD, col_partition=...) where T
SparseMatrixMPI_local(A_local::Adjoint{T,SparseMatrixCSC{T,Int}}; comm=MPI.COMM_WORLD, col_partition=...) where TCreate a SparseMatrixMPI from a local sparse matrix on each rank.
Unlike SparseMatrixMPI{T}(A_global) which takes a global matrix and partitions it, this constructor takes only the local rows of the matrix that each rank owns. The row partition is computed by gathering the local row counts from all ranks.
The input A_local must be a SparseMatrixCSR{T,Int} (or Adjoint of SparseMatrixCSC{T,Int}) where:
A_local.parent.n= number of local rows on this rankA_local.parent.m= global number of columns (must match on all ranks)A_local.parent.rowval= global column indices
All ranks must have local matrices with the same number of columns (block widths must match). A collective error is raised if the column counts don't match.
Note: For Adjoint inputs, the values are conjugated to match the adjoint semantics.
Keyword Arguments
comm::MPI.Comm: MPI communicator (default:MPI.COMM_WORLD)col_partition::Vector{Int}: Column partition boundaries (default:uniform_partition(A_local.parent.m, nranks))
Example
# Create local rows in CSR format
# Rank 0 owns rows 1-2 of a 5×3 matrix, Rank 1 owns rows 3-5
local_csc = sparse([1, 1, 2], [1, 2, 3], [1.0, 2.0, 3.0], 2, 3) # 2 local rows, 3 cols
A = SparseMatrixMPI_local(SparseMatrixCSR(local_csc))Factorization
LinearAlgebraMPI provides distributed sparse direct solvers using MUMPS (MUltifrontal Massively Parallel Solver).
LU Factorization
F = lu(A::SparseMatrixMPI{T})Compute LU factorization of a distributed sparse matrix using MUMPS. Suitable for general (non-symmetric) matrices.
LDLT Factorization
F = ldlt(A::SparseMatrixMPI{T})Compute LDLT factorization of a distributed symmetric sparse matrix using MUMPS. More efficient than LU for symmetric matrices.
Note: Uses transpose (L^T), not adjoint (L*). Correct for real symmetric and complex symmetric matrices, but NOT for complex Hermitian matrices.
Solving Linear Systems
LinearAlgebraMPI.solve — Function
solve(F::MUMPSFactorizationMPI{T}, b::VectorMPI{T}) where TSolve the linear system A*x = b using the precomputed MUMPS factorization.
LinearAlgebraMPI.solve! — Function
solve!(x::VectorMPI{T}, F::MUMPSFactorizationMPI{T}, b::VectorMPI{T}) where TSolve A*x = b in-place using MUMPS factorization.
Resource Management
LinearAlgebraMPI.finalize! — Function
finalize!(F::MUMPSFactorizationMPI)Release MUMPS resources. Must be called on all ranks together.
Note: If the MUMPS object is shared with the analysis cache (ownsmumps=false), this only removes the factorization from the registry. The MUMPS object itself is finalized when `clearmumpsanalysiscache!()` is called.
Usage Examples
using LinearAlgebraMPI
using LinearAlgebra
using SparseArrays
# Create a distributed sparse matrix
A_local = sprand(1000, 1000, 0.01) + 10I
A_local = A_local + A_local' # Make symmetric
A = SparseMatrixMPI{Float64}(A_local)
# LDLT factorization (for symmetric matrices)
F = ldlt(A)
# Solve Ax = b
b = VectorMPI(ones(1000))
x = solve(F, b)
# Or use backslash
x = F \ b
# For non-symmetric matrices, use LU
A_nonsym = SparseMatrixMPI{Float64}(sprand(1000, 1000, 0.01) + 10I)
F_lu = lu(A_nonsym)
x = F_lu \ bDirect Solve Syntax
Both left division (\) and right division (/) are supported:
# Left division: solve A*x = b
x = A \ b
x = transpose(A) \ b # solve transpose(A)*x = b
x = A' \ b # solve A'*x = b
# Right division: solve x*A = b (for row vectors)
x = transpose(b) / A # solve x*A = transpose(b)
x = transpose(b) / transpose(A) # solve x*transpose(A) = transpose(b)Cache Management
LinearAlgebraMPI.clear_plan_cache! — Function
clear_plan_cache!()Clear all memoized plan caches, including the MUMPS analysis cache. This is a collective operation that must be called on all MPI ranks together.
LinearAlgebraMPI.clear_mumps_analysis_cache! — Function
clear_mumps_analysis_cache!()Clear the MUMPS analysis cache. This is a collective operation that must be called on all MPI ranks together.
Full API Index
LinearAlgebraMPI.MatrixMPILinearAlgebraMPI.SparseMatrixCSRLinearAlgebraMPI.SparseMatrixMPILinearAlgebraMPI.VectorMPILinearAlgebraMPI.:⊛LinearAlgebraMPI.MatrixMPI_localLinearAlgebraMPI.SparseMatrixMPI_localLinearAlgebraMPI.VectorMPI_localLinearAlgebraMPI.clear_mumps_analysis_cache!LinearAlgebraMPI.clear_plan_cache!LinearAlgebraMPI.finalize!LinearAlgebraMPI.io0LinearAlgebraMPI.map_rowsLinearAlgebraMPI.meanLinearAlgebraMPI.repartitionLinearAlgebraMPI.solveLinearAlgebraMPI.solve!LinearAlgebraMPI.uniform_partition