The WAL is the foundation of crash recovery, replication, CDC, and point-in-time recovery in Absolute DB. Every committed change is durably recorded before any data page is modified.
The Write-Ahead Log (WAL) ensures that every committed transaction is recoverable after any failure, including power loss and OS crashes. The rule is simple: no data page may be written to disk unless the corresponding WAL record has already been durably flushed.
This gives Absolute DB two critical guarantees:
The WAL also serves as the source for streaming replication to read replicas and standby nodes, and as the input to the CDC (Change Data Capture) engine.
Every WAL record is addressed by a monotonically increasing Log Sequence Number (LSN). LSNs are 64-bit integers displayed in the format segment/offset (e.g., 0/1048576). The LSN uniquely identifies a position in the WAL stream.
-- Current WAL insert LSN (latest committed position)
SELECT absdb_current_lsn();
-- LSN of the last checkpoint
SELECT absdb_last_checkpoint_lsn();
-- Check replication lag as WAL bytes behind primary
SELECT
replica_name,
sent_lsn,
replay_lsn,
sent_lsn - replay_lsn AS lag_bytes
FROM absdb_replication_status;
Every WAL record includes a CRC-32C checksum computed over the record header and body. During crash recovery and WAL replay, each record's checksum is verified before it is applied. A checksum mismatch indicates a corrupt or truncated WAL segment — the server halts and reports the offending LSN rather than applying corrupted data.
CRC-32C (Castagnoli) is used rather than CRC-32 because it is hardware-accelerated on modern CPUs (Intel SSE4.2 PCRC32 instruction, ARM64 CRC32 extensions) and has better error detection properties for database workloads.
Absolute DB batches WAL records from multiple concurrent transactions into a single fsync() call using group commit. Up to 64 WAL records are flushed per fsync, dramatically reducing the number of I/O operations required under concurrent write load.
| Scenario | Without Group Commit | With Group Commit |
|---|---|---|
| 64 concurrent commits | 64 fsync calls | 1 fsync call |
| Throughput impact | Limited by fsync latency per commit | Throughput scales with concurrency |
| Commit latency | 1× fsync latency | 1× fsync latency (shared) |
Group commit is transparent — every transaction still gets a durable commit guarantee. The group commit window is bounded by the 64-record batch limit; transactions beyond that batch do not wait.
# Tune group commit batch size (default 64)
./bin/absdb-server --wal-group-commit 64
# Synchronous commit modes
# fsync — full durability, maximum safety (default)
# nosync — highest throughput, risk of loss on crash (dev only)
./bin/absdb-server --wal-sync-mode fsync
Read replicas and standby nodes receive the WAL stream in real time over the Raft replication channel (port 9091) or directly via the WAL streaming replication protocol. Replicas apply WAL records continuously, maintaining a replication lag that is typically under 100 ms on a local network.
-- Replication status on the primary
SELECT
replica_name,
state, -- streaming | catchup | idle
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_state -- async | sync | quorum
FROM absdb_replication_status;
-- Replication status on a replica
SELECT
primary_host,
receive_lsn,
replay_lsn,
lag_seconds
FROM absdb_replica_status;
The CDC engine taps the WAL to produce a stream of row-level change events — INSERT, UPDATE, DELETE — in Debezium-compatible JSON and Protobuf binary formats. This stream is delivered over WebSocket (ws://host:8080/cdc) and gRPC (AbsoluteDB.Subscribe).
-- Subscribe to all changes on the orders table
SUBSCRIBE TO TABLE orders;
-- Subscribe starting from a specific LSN (resume after restart)
SUBSCRIBE TO TABLE orders STARTING AT LSN '0/1048576';
-- Subscribe to the whole database
SUBSCRIBE TO DATABASE myapp STARTING AT LSN '0/1048576';
-- Filter events server-side (only ship relevant changes)
SUBSCRIBE TO TABLE orders WHERE status = 'paid';
{
"op": "c", // c=create, u=update, d=delete
"ts_ms": 1743850200000,
"source": {
"db": "myapp",
"table": "orders",
"lsn": 1048576
},
"before": null,
"after": {
"id": 101,
"user_id": 42,
"total": 99.95,
"status": "paid"
}
}
The CDC ring buffer holds 100 MB of unacknowledged events. If a consumer falls too far behind, it receives a buffer overflow signal and must re-subscribe from a saved LSN. ACK messages advance the server-side cursor and free buffer space.
During a clean shutdown, Absolute DB performs a Re-Read Before Shutdown pass: the WAL is re-scanned from the last checkpoint to the current insert LSN to ensure no in-flight records are dropped. This guarantees that even if the shutdown signal arrives while a group-commit batch is being assembled, all committed transactions are safely flushed.
On dirty shutdown (power loss, SIGKILL), recovery begins from the last valid checkpoint and replays all WAL records up to the last intact CRC-32C-verified record.
Any WAL consumer — a replica, a CDC subscriber, or an application — can resume from an arbitrary LSN position. This is essential for fault-tolerant consumers that must survive restarts without missing events.
-- Application saves the last processed LSN to its own store
-- On restart, resume exactly where it left off
SUBSCRIBE TO TABLE payments
STARTING AT LSN '0/2097152'
FORMAT DEBEZIUM_JSON;
The WAL segments required to serve a given LSN must still be present on disk or in the WAL archive. Requests for LSNs older than the WAL retention window return an error; the consumer must fall back to a full snapshot and then resume streaming.
Completed WAL segments are archived to object storage (S3, GCS, Azure Blob) automatically. Archived segments are the source for PITR recovery and serve as a durable off-site backup of all changes.
# absdb.conf
wal_archive_enabled = true
wal_archive_target = s3://my-bucket/wal-archive/
wal_archive_compress = zstd # none | lz4 | zstd
wal_archive_interval = 60 # seconds between segment uploads
wal_segment_size = 16MB # segment size (16 MB default)
# View archived segments
absdb wal-list s3://my-bucket/wal-archive/
# Restore a specific WAL segment for inspection
absdb wal-fetch \
s3://my-bucket/wal-archive/000000010000000000000001 \
/tmp/wal-segment-inspect
To recover to an exact point in time (e.g., one minute before an accidental mass delete):
# 1. Identify the base backup closest to the target time
absdb backup --list s3://my-bucket/backups/
# → full-base-20260404 (taken 2026-04-04 02:00 UTC)
# 2. Restore the base backup to a temporary directory
absdb restore \
--from s3://my-bucket/backups/full-base-20260404 \
--to /tmp/absdb-pitr
# 3. Configure recovery target
cat > /tmp/absdb-pitr/recovery.conf <
WAL segments on disk are retained until they are no longer needed for crash recovery or replication. The retention window determines how far back a PITR recovery can reach.
# Retain 14 days of WAL for PITR
./bin/absdb-server --wal-retention-days 14
# Or in absdb.conf
wal_retention_days = 14
# Force removal of WAL segments no longer needed
# (normally automatic; use only for emergency disk recovery)
absdb admin wal-cleanup --before-lsn '0/8000000'
If WAL disk usage is a concern, enable WAL compression (wal_archive_compress = zstd) and move segments to object storage promptly. The PITR window is determined by the oldest WAL segment available in either the local archive or the object storage archive.
The absdb_wal_stats virtual table provides a real-time view of WAL health and throughput:
SELECT * FROM absdb_wal_stats;
-- Key columns:
-- current_lsn — current WAL insert position
-- last_checkpoint_lsn — LSN of most recent checkpoint
-- wal_bytes_written — total bytes written since start
-- wal_records_written — total records since start
-- group_commit_batches — total group-commit fsync operations
-- avg_records_per_fsync — efficiency indicator (target: 30–64)
-- last_archived_lsn — most recently archived segment's end LSN
-- last_archived_time — timestamp of last successful archive
-- archive_lag_seconds — how far behind archiving is (target: < 120)
-- wal_segment_size_bytes — configured segment size
-- Alert if archive lag exceeds 5 minutes
SELECT CASE
WHEN archive_lag_seconds > 300
THEN 'WARNING: WAL archive lag exceeds 5 minutes'
ELSE 'OK'
END AS archive_health
FROM absdb_wal_stats;
~154 KB binary · zero external dependencies · 2,737 tests passing · SQL:2023 100%