Distributed Reference Management
SafePETSc's core feature is automatic distributed reference management via the SafeMPI module. This ensures that distributed objects are properly cleaned up across all MPI ranks.
The Problem
In MPI-based parallel computing, objects like PETSc vectors and matrices exist on all ranks. Destroying such objects requires collective MPI calls—all ranks must participate. This creates challenges:
- Premature destruction: If one rank destroys an object while others still need it, the program crashes
- Memory leaks: If ranks don't coordinate cleanup, objects leak memory
- Complex coordination: Manual reference counting is error-prone
The Solution: DRef
SafePETSc uses DRef{T} (Distributed Reference) to automatically track object lifetimes:
using SafePETSc
# Create a distributed vector (returns a DRef{_Vec{Float64}})
v = Vec_uniform([1.0, 2.0, 3.0])
# Use it normally
y = v .+ 1.0
# When v goes out of scope and is garbage collected,
# SafePETSc coordinates cleanup across all ranksHow It Works
Reference Counting
- Rank 0 as Coordinator: Rank 0 maintains a reference count for each distributed object
- Automatic Release: When a
DRefis garbage collected, its finalizer sends a "release" message to rank 0 - Cleanup Points: At
check_and_destroy!calls (automatically inserted at object creation), SafePETSc:- Triggers garbage collection
- Processes release messages
- Identifies objects where all ranks have released their references
- Broadcasts destruction commands
- All ranks destroy the object simultaneously
Architecture
Rank 0 (Coordinator) Other Ranks
┌─────────────────┐ ┌─────────────────┐
│ counter_pool │◄─────────│ Send release │
│ {id → count} │ MPI msgs │ messages │
│ │ │ │
│ When count == │ │ │
│ nranks: │ │ │
│ Broadcast │─────────►│ Receive & │
│ destroy command │ │ destroy object │
└─────────────────┘ └─────────────────┘Trait-Based Opt-In
Types must explicitly opt-in to distributed management:
# Define your distributed type
struct MyDistributedObject
data::Vector{Float64}
# ... MPI-based fields
end
# Opt-in to distributed management
SafeMPI.destroy_trait(::Type{MyDistributedObject}) = SafeMPI.CanDestroy()
# Implement cleanup
function SafeMPI.destroy_obj!(obj::MyDistributedObject)
# Perform collective cleanup (e.g., MPI_Free, PETSc destroy)
# This is called on ALL ranks simultaneously
cleanup_mpi_resources(obj)
end
# Now you can wrap it
ref = DRef(MyDistributedObject(...))Controlling Cleanup
Automatic Cleanup
Cleanup happens automatically at object creation:
# Every 10th object creation triggers cleanup
v1 = Vec_uniform([1.0, 2.0]) # May trigger cleanup
v2 = Vec_uniform([3.0, 4.0]) # May trigger cleanup
# ...Explicit Cleanup
You can manually trigger cleanup:
# Force immediate cleanup
SafeMPI.check_and_destroy!()
# Or with throttling
SafeMPI.check_and_destroy!(max_check_count=10)Disabling Assertions
For performance in production:
SafeMPI.set_assert(false) # Disable @mpiassert checksBest Practices
1. Let Scoping Work for You
function compute_something()
A = Mat_uniform(...)
b = Vec_uniform(...)
x = A \ b
# A, b, x cleaned up when function exits
return extract_result(x)
end2. Avoid Premature Manual Cleanup
# Don't do this:
v = Vec_uniform([1.0, 2.0])
SafeMPI.check_and_destroy!() # v might still be in use!
y = v .+ 1.0 # May crash
# Instead, let automatic cleanup handle it
v = Vec_uniform([1.0, 2.0])
y = v .+ 1.0
# cleanup happens automatically at safe points3. Explicit Cleanup in Long-Running Loops
for i in 1:1000000
v = Vec_uniform(data[i])
result[i] = compute(v)
# Periodic cleanup to avoid accumulation
if i % 100 == 0
SafeMPI.check_and_destroy!()
end
endDebugging
Check Reference Counts
# Access the default manager
manager = SafeMPI.default_manager[]
# Inspect state (rank 0 only)
if MPI.Comm_rank(MPI.COMM_WORLD) == 0
println("Active objects: ", length(manager.counter_pool))
println("Free IDs: ", length(manager.free_ids))
endEnable Verbose Assertions
# Assertions are enabled by default
SafeMPI.enable_assert[] # true
# Use @mpiassert for collective checks
@mpiassert all_data_valid "Data validation failed"Performance Considerations
- Cleanup Cost:
check_and_destroy!triggers a full GC and MPI synchronization - Throttling: The
max_check_countparameter reduces overhead by skipping some cleanup points - ID Recycling: Released IDs are reused to prevent integer overflow