Developer Guide
This guide is for developers who want to contribute to SafePETSc or extend it with custom distributed types.
Architecture Overview
SafePETSc consists of two main layers:
- SafeMPI: Low-level distributed reference management
- SafePETSc: High-level PETSc wrappers using SafeMPI
SafeMPI Layer
The SafeMPI module implements reference counting across MPI ranks:
┌─────────────────────────────────────────────┐
│ User Code │
│ creates DRef-wrapped objects │
└─────────────┬───────────────────────────────┘
│
┌─────────────▼────────────────────────────────┐
│ SafeMPI.DRef{T} │
│ - Wraps object │
│ - Finalizer calls _release! │
│ - Enqueues release ID locally (no MPI) │
└─────────────┬────────────────────────────────┘
│
┌─────────────▼────────────────────────────────┐
│ DistributedRefManager (mirrored on all ranks)│
│ - Maintains identical counter_pool/free_ids │
│ - Allgathers release IDs to update counters │
│ - Pushes ready IDs back into `free_ids` │
└─────────────┬────────────────────────────────┘
│
┌─────────────▼───────────────────────────────┐
│ destroy_obj!(obj) │
│ - Called on all ranks simultaneously │
│ - User-defined cleanup routine │
└─────────────────────────────────────────────┘SafePETSc Layer
Wraps PETSc objects with DRef:
struct _Vec{T,Prefix}
v::PETSc.Vec{T}
row_partition::Vector{Int}
end
const Vec{T,Prefix} = DRef{_Vec{T,Prefix}}
# Opt-in to distributed management
SafeMPI.destroy_trait(::Type{_Vec{T,Prefix}}) where {T,Prefix} = SafeMPI.CanDestroy()
# Define cleanup
function SafeMPI.destroy_obj!(x::_Vec{T,Prefix}) where {T,Prefix}
_destroy_petsc_vec!(x.v)
endAdding New Distributed Types
To add your own distributed type:
1. Define the Internal Type
struct _MyDistributedType
# Your fields here
handle::Ptr{Cvoid} # e.g., MPI handle
data::Vector{Float64}
# ... other fields
end2. Create Type Alias
const MyDistributedType = SafeMPI.DRef{_MyDistributedType}3. Opt-In to Management
SafeMPI.destroy_trait(::Type{_MyDistributedType}) = SafeMPI.CanDestroy()4. Implement Cleanup
function SafeMPI.destroy_obj!(obj::_MyDistributedType)
# IMPORTANT: This is called on ALL ranks simultaneously
# Must be a collective operation
# Example: Free MPI resource
if obj.handle != C_NULL
MPI.Free(obj.handle)
end
# Clean up other resources
# ...
end5. Create Constructor
function MyDistributedType(data::Vector{Float64})
# Allocate distributed resource
handle = allocate_mpi_resource(data)
# Wrap in internal type
obj = _MyDistributedType(handle, data)
# Wrap in DRef (triggers cleanup coordination)
return DRef(obj)
endTesting
Unit Tests Structure
SafePETSc uses a dual-file testing approach:
test/runtests.jl: Entry point that spawns MPI processestest/test_*.jl: Individual test files run with MPI
Example test file:
# test/test_myfeature.jl
using SafePETSc
using Test
using MPI
SafePETSc.Init()
@testset "My Feature" begin
rank = MPI.Comm_rank(MPI.COMM_WORLD)
# Test uniform distribution
v = Vec_uniform([1.0, 2.0, 3.0])
@test size(v) == (3,)
# Test operations
y = v .+ 1.0
@test eltype(y) == Float64
# Cleanup
SafeMPI.check_and_destroy!()
endRunning Tests
# Run all tests
julia --project=. -e 'using Pkg; Pkg.test()'
# Run specific test
julia --project=. -e 'using MPI; run(`$(MPI.mpiexec()) -n 4 $(Base.julia_cmd()) --project=. test/test_myfeature.jl`)'Coding Guidelines
Reference Management
- Always use
DRef: Wrap distributed objects inDRefto ensure cleanup - Cleanup at creation:
_make_refautomatically callscheck_and_destroy! - No manual cleanup in operations: Avoid
check_and_destroy!in regular functions - Collective operations:
destroy_obj!must be collective
Error Handling
- Use
@mpiassert: For collective error checking - Coalesce assertions: Combine conditions into single
@mpiassert - Informative messages: Include context in error messages
# Good: single assertion with multiple conditions
@mpiassert (size(A, 2) == size(B, 1) &&
A.obj.col_partition == B.obj.row_partition) "Matrix dimensions and partitions must match for multiplication"
# Less good: multiple assertions
@mpiassert size(A, 2) == size(B, 1) "Dimension mismatch"
@mpiassert A.obj.col_partition == B.obj.row_partition "Partition mismatch"PETSc Interop
- Use PETSc.@for_libpetsc: For multi-precision support
- GPU-friendly operations: Prefer bulk operations over element access
- Const for MATINITIALMATRIX: Use module constant
MAT_INITIAL_MATRIX = Cint(0)
PETSc.@for_libpetsc begin
function my_petsc_operation(mat::PETSc.Mat{$PetscScalar})
PETSc.@chk ccall((:PetscFunction, $libpetsc), ...)
end
endPerformance Considerations
Cleanup Overhead
check_and_destroy!uses collectiveAllgather/Allgathervoperations and periodically triggers partial GC- Default: partial GC every 10 object creations (controlled by
SafePETSc.default_check[]) - Tune
default_check[]based on application: lower values = less memory, more overhead
Memory Management
- Use
DRefscoping to control lifetimes - Avoid global
DRefvariables (prevent cleanup) - Consider explicit cleanup in long loops
GPU Support
- SafePETSc prioritizes GPU-friendly PETSc operations
- Set PETSc options for GPU:
-mat_type aijcusparse -vec_type cuda - Avoid element-wise access (causes GPU↔CPU transfers)
Documentation
Docstrings
Follow Julia documentation conventions:
"""
my_function(x::Type; option=default) -> ReturnType
Brief one-line description.
Extended description with more details about the function's behavior,
parameters, and return values.
# Arguments
- `x::Type`: Description of x
- `option::Type=default`: Description of optional parameter
# Returns
- `ReturnType`: Description of return value
# Examplesjulia result = my_function(input)
See also: [`related_function`](@ref), [`another_function`](@ref)
"""
function my_function(x; option=default)
# Implementation
endAdding Documentation Pages
- Create markdown file in
docs/src/ - Add to
pagesindocs/make.jl - Build:
julia --project=docs docs/make.jl
Contributing
Pull Request Process
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Update documentation
- Run tests:
julia --project=. -e 'using Pkg; Pkg.test()' - Submit pull request
Code Review Checklist
- [ ] Tests pass
- [ ] Documentation updated
- [ ] Docstrings added for public API
- [ ] Reference management correct
- [ ] Collective operations properly coordinated
- [ ] Performance considerations addressed
Debugging Tips
MPI Hangs
If program hangs, likely causes:
- Non-collective operation: One rank skipped a collective call
- Unbalanced branching: Ranks took different code paths
- Missing
@mpiassert: Error on one rank, others waiting
Debug with:
# Add at suspicious points
println(io0(), "Reached checkpoint A")
MPI.Barrier(MPI.COMM_WORLD)Memory Leaks
Check for:
- Global
DRefvariables - Skipped
check_and_destroy!in long loops - Circular references preventing GC
Inspect manager state:
manager = SafeMPI.default_manager[]
println(io0(), "Active objects: ", length(manager.counter_pool))
println(io0(), "Pending releases: ", length(manager.pending_releases))Assertion Failures
Enable verbose output:
# Assertions enabled by default
SafeMPI.enable_assert[] # true
# Check conditions
@mpiassert condition "Detailed error message"