API Reference

This page documents the public API of LinearAlgebraMPI.jl.

Types

SparseMatrixMPI

LinearAlgebraMPI.SparseMatrixMPI — Type

SparseMatrixMPI{T}

A distributed sparse matrix partitioned by rows across MPI ranks.

Fields

structural_hash::Blake3Hash: 256-bit Blake3 hash of the structural pattern
row_partition::Vector{Int}: Row partition boundaries, length = nranks + 1
col_partition::Vector{Int}: Column partition boundaries, length = nranks + 1 (placeholder for transpose)
col_indices::Vector{Int}: Global column indices that appear in the local part (local→global mapping)
A::SparseMatrixCSR{T,Int}: Local rows in CSR format for efficient row-wise iteration
cached_transpose: Cached materialized transpose (bidirectionally linked)

Invariants

col_indices, row_partition, and col_partition are sorted
row_partition[nranks+1] = total number of rows
col_partition[nranks+1] = total number of columns
size(A, 1) == row_partition[rank+1] - row_partition[rank] (number of local rows)
size(A.parent, 1) == length(col_indices) (compressed column dimension)
A.parent.rowval contains local indices in 1:length(col_indices)

Storage Details

The local rows are stored in CSR format (Compressed Sparse Row), which enables efficient row-wise iteration - essential for a row-partitioned distributed matrix.

In Julia, SparseMatrixCSR{T,Ti} is a type alias for Transpose{T, SparseMatrixCSC{T,Ti}}. This type has a dual interpretation:

Semantic view: A lazy transpose of a CSC matrix
Storage view: Row-major (CSR) access to the data

The underlying A.parent::SparseMatrixCSC stores the transposed data with:

A.parent.m = length(col_indices) (compressed, not global ncols)
A.parent.n = number of local rows (columns in the parent = rows in CSR)
A.parent.colptr = row pointers for the CSR format
A.parent.rowval = LOCAL column indices (1:length(col_indices))
col_indices[local_idx] maps local→global column indices

This compression avoids "hypersparse" storage where the column dimension would be the global number of columns even if only a few columns have nonzeros locally.

Access the underlying CSC storage via A.parent when needed for low-level operations.

MatrixMPI

LinearAlgebraMPI.MatrixMPI — Type

MatrixMPI{T}

A distributed dense matrix partitioned by rows across MPI ranks.

Fields

structural_hash::Blake3Hash: 256-bit Blake3 hash of the structural pattern
row_partition::Vector{Int}: Row partition boundaries, length = nranks + 1
col_partition::Vector{Int}: Column partition boundaries, length = nranks + 1 (for transpose)
A::Matrix{T}: Local rows (NOT transposed), size = (local_nrows, ncols)

Invariants

row_partition and col_partition are sorted
row_partition[nranks+1] = total number of rows + 1
col_partition[nranks+1] = total number of columns + 1
size(A, 1) == row_partition[rank+2] - row_partition[rank+1]
size(A, 2) == col_partition[end] - 1

VectorMPI

LinearAlgebraMPI.VectorMPI — Type

VectorMPI{T}

A distributed dense vector partitioned across MPI ranks.

Fields

structural_hash::Blake3Hash: 256-bit Blake3 hash of the partition
partition::Vector{Int}: Partition boundaries, length = nranks + 1
v::Vector{T}: Local vector elements owned by this rank

SparseMatrixCSR

LinearAlgebraMPI.SparseMatrixCSR — Type

SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}

Type alias for CSR (Compressed Sparse Row) storage format.

The Dual Life of Transpose{SparseMatrixCSC}

In Julia, the type Transpose{Tv, SparseMatrixCSC{Tv,Ti}} has two interpretations:

Semantic interpretation: A lazy transpose wrapper around a CSC matrix. When you call transpose(A) on a SparseMatrixCSC, you get this wrapper that represents A^T without copying data.
Storage interpretation: CSR (row-major) access to sparse data. The underlying CSC stores columns contiguously, but through the transpose wrapper, we can iterate efficiently over rows instead of columns.

This alias clarifies intent: use SparseMatrixCSR when you want row-major storage semantics, and transpose(A) when you want the mathematical transpose.

CSR vs CSC Storage

CSC (Compressed Sparse Column): Julia's native sparse format. Efficient for column-wise operations, matrix-vector products with column access.
CSR (Compressed Sparse Row): Efficient for row-wise operations, matrix-vector products with row access, and row-partitioned distributed matrices.

For SparseMatrixCSR, the underlying parent::SparseMatrixCSC stores the transposed matrix. If B = SparseMatrixCSR(A) represents matrix M, then B.parent is a CSC storing M^T. This means:

B.parent.colptr acts as row pointers for M
B.parent.rowval contains column indices for M
B.parent.nzval contains values in row-major order

Usage Note

Julia will still display this type as Transpose{Float64, SparseMatrixCSC{...}}, not as SparseMatrixCSR. The alias improves code clarity but doesn't affect type printing.

CSR Storage Format

LinearAlgebraMPI uses CSR (Compressed Sparse Row) format internally for SparseMatrixMPI because row-partitioned distributed matrices need efficient row-wise access.

The Dual Life of Transpose{SparseMatrixCSC}

In Julia, the type Transpose{T, SparseMatrixCSC{T,Int}} has two interpretations:

Semantic: A lazy transpose of a CSC matrix (what you get from transpose(A))
Storage: Row-major (CSR) access to sparse data

This duality can be confusing. When you call transpose(A) on a SparseMatrixCSC, you get a wrapper that represents A^T. But the same wrapper type, when used for storage, provides efficient row iteration.

The SparseMatrixCSR Type Alias

To clarify intent, LinearAlgebraMPI exports:

const SparseMatrixCSR{Tv,Ti} = Transpose{Tv, SparseMatrixCSC{Tv,Ti}}

Use SparseMatrixCSR when you want row-major storage, and transpose(A) when you want the mathematical transpose.

Converting Between CSC and CSR

# CSC to CSR (same matrix, different storage)
A_csc = sparse([1,2,2], [1,1,2], [1.0, 2.0, 3.0], 2, 2)
A_csr = SparseMatrixCSR(A_csc)
A_csr[1,1] == A_csc[1,1]  # true

# CSR to CSC
A_back = SparseMatrixCSC(A_csr)
A_back == A_csc  # true

Why CSR for Distributed Matrices?

SparseMatrixMPI partitions matrices by rows across MPI ranks. Each rank needs to efficiently iterate over its local rows for operations like matrix-vector multiplication. CSR format provides O(1) access to each row's nonzeros, while CSC would require scanning the entire column pointer array.

Sparse Matrix Operations

Arithmetic

A * B          # Matrix multiplication
A + B          # Addition
A - B          # Subtraction
a * A          # Scalar multiplication
A * a          # Scalar multiplication

Threaded Sparse Multiplication

LinearAlgebraMPI.:⊛ — Function

⊛(A::SparseMatrixCSC{Tv,Ti}, B::SparseMatrixCSC{Tv,Ti}; max_threads=Threads.nthreads()) where {Tv,Ti}

Multithreaded sparse matrix multiplication. Splits B into column blocks and computes A * B_block in parallel using Julia's optimized builtin *.

Threading behavior

Uses at most n ÷ 100 threads, where n = size(B, 2), ensuring at least 100 columns per thread
Falls back to single-threaded A * B when n < 100 or when threading overhead would dominate
The max_threads keyword limits the maximum number of threads used

Examples

using SparseArrays
A = sprand(1000, 1000, 0.01)
B = sprand(1000, 500, 0.01)
C = A ⊛ B                    # Use all available threads (up to n÷100)
C = ⊛(A, B; max_threads=2)   # Limit to 2 threads

The ⊛ operator (typed as \circledast<tab>) provides multithreaded sparse matrix multiplication for SparseMatrixCSC. It is used internally by SparseMatrixMPI multiplication and can also be used directly on local sparse matrices:

using SparseArrays
A = sprand(10000, 10000, 0.001)
B = sprand(10000, 5000, 0.001)
C = A ⊛ B   # Parallel multiplication using available threads

Transpose and Adjoint

transpose(A)              # Lazy transpose
conj(A)                   # Conjugate (new matrix)
A'                        # Adjoint (conjugate transpose, lazy)
SparseMatrixMPI(transpose(A))  # Materialize lazy transpose (cached)

Matrix-Vector Multiplication

y = A * x      # Returns VectorMPI
mul!(y, A, x)  # In-place version

Vector-Matrix Multiplication

transpose(v) * A   # Row vector times matrix
v' * A             # Conjugate row vector times matrix

Norms

norm(A)        # Frobenius norm (default)
norm(A, 1)     # Sum of absolute values
norm(A, Inf)   # Maximum absolute value
norm(A, p)     # General p-norm

opnorm(A, 1)   # Maximum absolute column sum
opnorm(A, Inf) # Maximum absolute row sum

Properties

size(A)        # Global dimensions (m, n)
size(A, d)     # Size along dimension d
eltype(A)      # Element type
nnz(A)         # Number of nonzeros
issparse(A)    # Returns true

Reductions

sum(A)         # Sum of all stored elements
sum(A; dims=1) # Column sums (returns VectorMPI) - SparseMatrixMPI only
sum(A; dims=2) # Row sums (returns VectorMPI) - SparseMatrixMPI only
maximum(A)     # Maximum of stored values
minimum(A)     # Minimum of stored values
tr(A)          # Trace (sum of diagonal) - SparseMatrixMPI only

LinearAlgebraMPI.mean — Function

mean(v::VectorMPI{T}) where T

Compute the mean of all elements in the distributed vector.

mean(A::SparseMatrixMPI{T}) where T

Compute the mean of all elements (including implicit zeros) in the distributed sparse matrix.

Element-wise Operations

abs(A)         # Absolute value
abs2(A)        # Squared absolute value
real(A)        # Real part
imag(A)        # Imaginary part
floor(A)       # Floor
ceil(A)        # Ceiling
round(A)       # Round

Utilities

copy(A)        # Deep copy
dropzeros(A)   # Remove stored zeros
diag(A)        # Main diagonal (returns VectorMPI)
diag(A, k)     # k-th diagonal
triu(A)        # Upper triangular
triu(A, k)     # Upper triangular from k-th diagonal
tril(A)        # Lower triangular
tril(A, k)     # Lower triangular from k-th diagonal

Block Operations

cat(A, B, C; dims=1)       # Vertical concatenation
cat(A, B, C; dims=2)       # Horizontal concatenation
cat(A, B, C, D; dims=(2,2)) # 2x2 block matrix [A B; C D]
vcat(A, B, C)              # Vertical concatenation
hcat(A, B, C)              # Horizontal concatenation
blockdiag(A, B, C)         # Block diagonal matrix

Diagonal Matrix Construction

spdiagm(v)                 # Diagonal matrix from VectorMPI
spdiagm(m, n, v)           # m x n diagonal matrix
spdiagm(k => v)            # k-th diagonal
spdiagm(0 => v, 1 => w)    # Multiple diagonals

Dense Matrix Operations

Arithmetic

A * x          # Matrix-vector multiplication (returns VectorMPI)
transpose(A)   # Lazy transpose
conj(A)        # Conjugate
A'             # Adjoint
a * A          # Scalar multiplication

mapslices

Apply a function to rows or columns of a distributed dense matrix.

mapslices(f, A; dims=2)   # Apply f to each row (local, no MPI)
mapslices(f, A; dims=1)   # Apply f to each column (requires MPI)

Example:

using LinearAlgebra

# Create deterministic test matrix (same on all ranks)
A_global = Float64.([i + 0.1*j for i in 1:100, j in 1:10])
A = MatrixMPI(A_global)

# Compute row statistics: norm, max, sum for each row
# Transforms 100x10 to 100x3
B = mapslices(x -> [norm(x), maximum(x), sum(x)], A; dims=2)

map_rows

LinearAlgebraMPI.map_rows — Function

map_rows(f, A...)

Apply function f to corresponding rows of distributed vectors/matrices.

Each argument in A... must be either a VectorMPI or MatrixMPI. All inputs are repartitioned to match the partition of the first argument before applying f.

For each row index i, f is called with the i-th row from each input:

For VectorMPI, the i-th "row" is a length-1 view of element i
For MatrixMPI, the i-th row is a row vector (a view into the local matrix)

Result Type (vcat semantics)

The result type depends on what f returns, matching the behavior of vcat:

`f` returns	Julia type	Result
scalar	`Number`	`VectorMPI` (one element per input row)
column vector	`AbstractVector`	`VectorMPI` (vcat concatenates all vectors)
row vector	`Transpose`, `Adjoint`	`MatrixMPI` (vcat stacks as rows)
matrix	`AbstractMatrix`	`MatrixMPI` (vcat stacks rows)

Lazy Wrappers

Julia's transpose(v) and v' (adjoint) return lazy wrappers that are subtypes of AbstractMatrix, so they produce MatrixMPI results:

map_rows(r -> [1,2,3], A)           # Vector → VectorMPI (length 3n)
map_rows(r -> [1,2,3]', A)          # Adjoint → MatrixMPI (n×3)
map_rows(r -> transpose([1,2,3]), A) # Transpose → MatrixMPI (n×3)
map_rows(r -> conj([1,2,3]), A)     # Vector → VectorMPI (length 3n)
map_rows(r -> [1 2 3], A)           # Matrix literal → MatrixMPI (n×3)

Examples

# Element-wise product of two vectors
u = VectorMPI([1.0, 2.0, 3.0])
v = VectorMPI([4.0, 5.0, 6.0])
w = map_rows((a, b) -> a[1] * b[1], u, v)  # VectorMPI([4.0, 10.0, 18.0])

# Row norms of a matrix
A = MatrixMPI(randn(5, 3))
norms = map_rows(r -> norm(r), A)  # VectorMPI of row norms

# Expand each row to multiple elements (vcat behavior)
A = MatrixMPI(randn(3, 2))
result = map_rows(r -> [1, 2, 3], A)  # VectorMPI of length 9

# Return row vectors to build a matrix
A = MatrixMPI(randn(3, 2))
result = map_rows(r -> [1, 2, 3]', A)  # 3×3 MatrixMPI

# Variable-length output per row
v = VectorMPI([1.0, 2.0, 3.0])
result = map_rows(r -> ones(Int(r[1])), v)  # VectorMPI of length 6 (1+2+3)

# Mixed inputs: matrix rows weighted by vector elements
A = MatrixMPI(randn(4, 3))
w = VectorMPI([1.0, 2.0, 3.0, 4.0])
result = map_rows((row, wi) -> sum(row) * wi[1], A, w)  # VectorMPI

This is the MPI-distributed version of:

map_rows(f, A...) = vcat((f.((eachrow.(A))...))...)

Apply a function to corresponding rows of multiple distributed vectors/matrices. This is the MPI-distributed version of vcat((f.((eachrow.(A))...))...).

Result type follows vcat semantics:

`f` returns	Result
scalar	`VectorMPI` (one element per row)
column vector `[1,2,3]`	`VectorMPI` (concatenated)
row vector `[1,2,3]'` or `transpose([1,2,3])`	`MatrixMPI` (stacked rows)
matrix `[1 2 3]`	`MatrixMPI` (stacked rows)

Examples:

# Row norms
A = MatrixMPI(randn(100, 10))
norms = map_rows(r -> norm(r), A)  # VectorMPI of length 100

# Expand rows to vectors (vcat behavior)
result = map_rows(r -> [1, 2, 3], A)  # VectorMPI of length 300

# Build matrix from row vectors
result = map_rows(r -> [1, 2, 3]', A)  # 100×3 MatrixMPI

# Multiple inputs with automatic partition alignment
A = MatrixMPI(randn(100, 10))
w = VectorMPI(randn(100))
weighted = map_rows((row, wi) -> sum(row) * wi[1], A, w)

Block Operations

cat(A, B; dims=1)          # Vertical concatenation
cat(A, B; dims=2)          # Horizontal concatenation
vcat(A, B)                 # Vertical concatenation
hcat(A, B)                 # Horizontal concatenation

Norms

norm(A)        # Frobenius norm
norm(A, p)     # General p-norm
opnorm(A, 1)   # Maximum absolute column sum
opnorm(A, Inf) # Maximum absolute row sum

Vector Operations

Arithmetic

u + v          # Addition (auto-aligns partitions)
u - v          # Subtraction
-v             # Negation
a * v          # Scalar multiplication
v * a          # Scalar multiplication
v / a          # Scalar division

Transpose and Adjoint

transpose(v)   # Lazy transpose (row vector)
conj(v)        # Conjugate
v'             # Adjoint

Norms

norm(v)        # 2-norm (default)
norm(v, 1)     # 1-norm
norm(v, Inf)   # Infinity norm
norm(v, p)     # General p-norm

Reductions

sum(v)         # Sum of elements
prod(v)        # Product of elements
maximum(v)     # Maximum element
minimum(v)     # Minimum element
mean(v)        # Mean of elements

Element-wise Operations

abs(v)         # Absolute value
abs2(v)        # Squared absolute value
real(v)        # Real part
imag(v)        # Imaginary part
copy(v)        # Deep copy

Broadcasting

VectorMPI supports broadcasting for element-wise operations:

v .+ w         # Element-wise addition
v .* w         # Element-wise multiplication
sin.(v)        # Apply function element-wise
v .* 2.0 .+ w  # Compound expressions

Block Operations

vcat(u, v, w)  # Concatenate vectors (returns VectorMPI)
hcat(u, v, w)  # Stack as columns (returns MatrixMPI)

Properties

length(v)      # Global length
size(v)        # Returns (length,)
eltype(v)      # Element type

Indexing

All distributed types support element access and assignment. These are collective operations - all MPI ranks must call them with the same arguments.

VectorMPI Indexing

v[i]           # Get element (collective)
v[i] = x       # Set element (collective)
v[1:10]        # Range indexing (returns VectorMPI)
v[1:10] = x    # Range assignment (scalar or vector)
v[idx]         # VectorMPI{Int} indexing (returns VectorMPI)
v[idx] = src   # VectorMPI{Int} assignment (src::VectorMPI)

MatrixMPI Indexing

# Single element
A[i, j]        # Get element
A[i, j] = x    # Set element

# Range indexing (returns MatrixMPI)
A[1:3, 2:5]    # Submatrix by ranges
A[1:3, :]      # Row range, all columns
A[:, 2:5]      # All rows, column range

# VectorMPI indices (returns MatrixMPI)
A[row_idx, col_idx]  # Both indices are VectorMPI{Int}

# Mixed indexing (returns MatrixMPI or VectorMPI)
A[row_idx, 1:5]      # VectorMPI rows, range columns
A[row_idx, :]        # VectorMPI rows, all columns
A[1:5, col_idx]      # Range rows, VectorMPI columns
A[:, col_idx]        # All rows, VectorMPI columns
A[row_idx, j]        # VectorMPI rows, single column (returns VectorMPI)
A[i, col_idx]        # Single row, VectorMPI columns (returns VectorMPI)

SparseMatrixMPI Indexing

# Single element
A[i, j]        # Get element (returns 0 for structural zeros)
A[i, j] = x    # Set element (modifies structure if needed)

# Range indexing (returns SparseMatrixMPI)
A[1:3, 2:5]    # Submatrix by ranges
A[1:3, :]      # Row range, all columns
A[:, 2:5]      # All rows, column range

# VectorMPI indices (returns SparseMatrixMPI)
A[row_idx, col_idx]  # Both indices are VectorMPI{Int}

# Mixed indexing (returns SparseMatrixMPI or VectorMPI)
A[row_idx, 1:5]      # VectorMPI rows, range columns
A[row_idx, :]        # VectorMPI rows, all columns
A[1:5, col_idx]      # Range rows, VectorMPI columns
A[:, col_idx]        # All rows, VectorMPI columns
A[row_idx, j]        # VectorMPI rows, single column (returns VectorMPI)
A[i, col_idx]        # Single row, VectorMPI columns (returns VectorMPI)

setindex! Source Types

For setindex! operations, the source type depends on the indexing pattern:

Index Pattern	Source Type
`A[i, j] = x`	Scalar
`A[range, range] = x`	Scalar, Matrix, or distributed matrix
`A[VectorMPI, VectorMPI] = src`	MatrixMPI (matching partitions)
`A[VectorMPI, range] = src`	MatrixMPI
`A[range, VectorMPI] = src`	MatrixMPI
`A[VectorMPI, j] = src`	VectorMPI
`A[i, VectorMPI] = src`	VectorMPI

Utility Functions

Partition Computation

LinearAlgebraMPI.uniform_partition — Function

uniform_partition(n::Int, nranks::Int) -> Vector{Int}

Compute a balanced partition of n elements across nranks ranks. Returns a vector of length nranks + 1 with 1-indexed partition boundaries.

The first mod(n, nranks) ranks get div(n, nranks) + 1 elements, the remaining ranks get div(n, nranks) elements.

Example

partition = uniform_partition(10, 4)  # [1, 4, 7, 9, 11]
# Rank 0: 1:3 (3 elements)
# Rank 1: 4:6 (3 elements)
# Rank 2: 7:8 (2 elements)
# Rank 3: 9:10 (2 elements)

Repartitioning

Redistribute distributed data to a new partition. Provides plan caching for repeated operations and a fast path when partitions already match (no communication needed).

LinearAlgebraMPI.repartition — Function

repartition(x::VectorMPI{T}, p::Vector{Int}) where T

Redistribute a VectorMPI to a new partition p.

The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == length(x) + 1.

Returns a new VectorMPI with the same data but partition == p.

Example

v = VectorMPI([1.0, 2.0, 3.0, 4.0])  # uniform partition
new_partition = [1, 2, 5]  # rank 0 gets 1 element, rank 1 gets 3
v_repart = repartition(v, new_partition)

repartition(A::MatrixMPI{T}, p::Vector{Int}) where T

Redistribute a MatrixMPI to a new row partition p. The col_partition remains unchanged.

The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == size(A, 1) + 1.

Returns a new MatrixMPI with the same data but row_partition == p.

Example

A = MatrixMPI(randn(6, 4))  # uniform partition
new_partition = [1, 2, 4, 5, 7]  # 1, 2, 1, 2 rows per rank
A_repart = repartition(A, new_partition)

repartition(A::SparseMatrixMPI{T}, p::Vector{Int}) where T

Redistribute a SparseMatrixMPI to a new row partition p. The col_partition remains unchanged.

The partition p must be a valid partition vector of length nranks + 1 with p[1] == 1 and p[end] == size(A, 1) + 1.

Returns a new SparseMatrixMPI with the same data but row_partition == p.

Example

A = SparseMatrixMPI{Float64}(sprand(6, 4, 0.5))  # uniform partition
new_partition = [1, 2, 4, 5, 7]  # 1, 2, 1, 2 rows per rank
A_repart = repartition(A, new_partition)

Rank-Selective Output

LinearAlgebraMPI.io0 — Function

io0(io=stdout; r::Set{Int}=Set{Int}([0]), dn=devnull)

Return io if the current MPI rank is in set r, otherwise return dn (default: devnull).

This is useful for printing only from specific ranks:

println(io0(), "Hello from rank 0!")
println(io0(r=Set([0,1])), "Hello from ranks 0 and 1!")

With string interpolation:

println(io0(), "Matrix A = $A")

Gathering Distributed Data

Convert distributed MPI types to standard Julia types (gathers data to all ranks):

Vector(v::VectorMPI)              # Gather to Vector
Matrix(A::MatrixMPI)              # Gather to Matrix
SparseMatrixCSC(A::SparseMatrixMPI) # Gather to SparseMatrixCSC

These conversions enable show and string interpolation:

println(io0(), "Result: $v")      # Works with VectorMPI
println(io0(), "Matrix: $A")      # Works with MatrixMPI/SparseMatrixMPI

Local Constructors

Create distributed types from local data (each rank provides only its portion):

LinearAlgebraMPI.VectorMPI_local — Function

VectorMPI_local(v_local::Vector{T}, comm::MPI.Comm=MPI.COMM_WORLD) where T

Create a VectorMPI from a local vector on each rank.

Unlike VectorMPI(v_global) which takes a global vector and partitions it, this constructor takes only the local portion of the vector that each rank owns. The partition is computed by gathering the local sizes from all ranks.

Example

# Rank 0 has [1.0, 2.0], Rank 1 has [3.0, 4.0, 5.0]
v = VectorMPI_local([1.0, 2.0])  # on rank 0
v = VectorMPI_local([3.0, 4.0, 5.0])  # on rank 1
# Result: distributed vector [1.0, 2.0, 3.0, 4.0, 5.0] with partition [1, 3, 6]

LinearAlgebraMPI.MatrixMPI_local — Function

MatrixMPI_local(A_local::Matrix{T}; comm=MPI.COMM_WORLD, col_partition=...) where T

Create a MatrixMPI from a local matrix on each rank.

Unlike MatrixMPI(M_global) which takes a global matrix and partitions it, this constructor takes only the local rows of the matrix that each rank owns. The row partition is computed by gathering the local row counts from all ranks.

All ranks must have local matrices with the same number of columns. A collective error is raised if the column counts don't match.

Keyword Arguments

comm::MPI.Comm: MPI communicator (default: MPI.COMM_WORLD)
col_partition::Vector{Int}: Column partition boundaries (default: uniform_partition(size(A_local,2), nranks))

Example

# Rank 0 has 2×3 matrix, Rank 1 has 3×3 matrix
A = MatrixMPI_local(randn(2, 3))  # on rank 0
A = MatrixMPI_local(randn(3, 3))  # on rank 1
# Result: 5×3 distributed matrix with row_partition [1, 3, 6]

LinearAlgebraMPI.SparseMatrixMPI_local — Function

SparseMatrixMPI_local(A_local::SparseMatrixCSR{T,Int}; comm=MPI.COMM_WORLD, col_partition=...) where T
SparseMatrixMPI_local(A_local::Adjoint{T,SparseMatrixCSC{T,Int}}; comm=MPI.COMM_WORLD, col_partition=...) where T

Create a SparseMatrixMPI from a local sparse matrix on each rank.

Unlike SparseMatrixMPI{T}(A_global) which takes a global matrix and partitions it, this constructor takes only the local rows of the matrix that each rank owns. The row partition is computed by gathering the local row counts from all ranks.

The input A_local must be a SparseMatrixCSR{T,Int} (or Adjoint of SparseMatrixCSC{T,Int}) where:

A_local.parent.n = number of local rows on this rank
A_local.parent.m = global number of columns (must match on all ranks)
A_local.parent.rowval = global column indices

All ranks must have local matrices with the same number of columns (block widths must match). A collective error is raised if the column counts don't match.

Note: For Adjoint inputs, the values are conjugated to match the adjoint semantics.

Keyword Arguments

comm::MPI.Comm: MPI communicator (default: MPI.COMM_WORLD)
col_partition::Vector{Int}: Column partition boundaries (default: uniform_partition(A_local.parent.m, nranks))

Example

# Create local rows in CSR format
# Rank 0 owns rows 1-2 of a 5×3 matrix, Rank 1 owns rows 3-5
local_csc = sparse([1, 1, 2], [1, 2, 3], [1.0, 2.0, 3.0], 2, 3)  # 2 local rows, 3 cols
A = SparseMatrixMPI_local(SparseMatrixCSR(local_csc))

Factorization

LinearAlgebraMPI provides distributed sparse direct solvers using MUMPS (MUltifrontal Massively Parallel Solver).

LU Factorization

F = lu(A::SparseMatrixMPI{T})

Compute LU factorization of a distributed sparse matrix using MUMPS. Suitable for general (non-symmetric) matrices.

LDLT Factorization

F = ldlt(A::SparseMatrixMPI{T})

Compute LDLT factorization of a distributed symmetric sparse matrix using MUMPS. More efficient than LU for symmetric matrices.

Note: Uses transpose (L^T), not adjoint (L*). Correct for real symmetric and complex symmetric matrices, but NOT for complex Hermitian matrices.

Solving Linear Systems

LinearAlgebraMPI.solve — Function

solve(F::MUMPSFactorizationMPI{T}, b::VectorMPI{T}) where T

Solve the linear system A*x = b using the precomputed MUMPS factorization.

LinearAlgebraMPI.solve! — Function

solve!(x::VectorMPI{T}, F::MUMPSFactorizationMPI{T}, b::VectorMPI{T}) where T

Solve A*x = b in-place using MUMPS factorization.

Resource Management

LinearAlgebraMPI.finalize! — Function

finalize!(F::MUMPSFactorizationMPI)

Release MUMPS resources. Must be called on all ranks together.

Note: If the MUMPS object is shared with the analysis cache (ownsmumps=false), this only removes the factorization from the registry. The MUMPS object itself is finalized when `clearmumpsanalysiscache!()` is called.

Usage Examples

using LinearAlgebraMPI
using LinearAlgebra
using SparseArrays

# Create a distributed sparse matrix
A_local = sprand(1000, 1000, 0.01) + 10I
A_local = A_local + A_local'  # Make symmetric
A = SparseMatrixMPI{Float64}(A_local)

# LDLT factorization (for symmetric matrices)
F = ldlt(A)

# Solve Ax = b
b = VectorMPI(ones(1000))
x = solve(F, b)

# Or use backslash
x = F \ b

# For non-symmetric matrices, use LU
A_nonsym = SparseMatrixMPI{Float64}(sprand(1000, 1000, 0.01) + 10I)
F_lu = lu(A_nonsym)
x = F_lu \ b

Direct Solve Syntax

Both left division (\) and right division (/) are supported:

# Left division: solve A*x = b
x = A \ b
x = transpose(A) \ b    # solve transpose(A)*x = b
x = A' \ b              # solve A'*x = b

# Right division: solve x*A = b (for row vectors)
x = transpose(b) / A           # solve x*A = transpose(b)
x = transpose(b) / transpose(A)  # solve x*transpose(A) = transpose(b)

Cache Management

LinearAlgebraMPI.clear_plan_cache! — Function

clear_plan_cache!()

Clear all memoized plan caches, including the MUMPS analysis cache. This is a collective operation that must be called on all MPI ranks together.

LinearAlgebraMPI.clear_mumps_analysis_cache! — Function

clear_mumps_analysis_cache!()

Clear the MUMPS analysis cache. This is a collective operation that must be called on all MPI ranks together.

Full API Index

LinearAlgebraMPI.MatrixMPI
LinearAlgebraMPI.SparseMatrixCSR
LinearAlgebraMPI.SparseMatrixMPI
LinearAlgebraMPI.VectorMPI
LinearAlgebraMPI.:⊛
LinearAlgebraMPI.MatrixMPI_local
LinearAlgebraMPI.SparseMatrixMPI_local
LinearAlgebraMPI.VectorMPI_local
LinearAlgebraMPI.clear_mumps_analysis_cache!
LinearAlgebraMPI.clear_plan_cache!
LinearAlgebraMPI.finalize!
LinearAlgebraMPI.io0
LinearAlgebraMPI.map_rows
LinearAlgebraMPI.mean
LinearAlgebraMPI.repartition
LinearAlgebraMPI.solve
LinearAlgebraMPI.solve!
LinearAlgebraMPI.uniform_partition