Absolute DB provides four interoperable storage backends — B+Tree, LSM-Tree, PAX columnar, and HTAP dual-store — all sharing a unified MVCC layer, WAL, and patent-free LIRS buffer pool.
Every table in Absolute DB uses one of four storage backends. The choice is made at table creation time and can be changed online. All backends share the same MVCC transaction layer, WAL, and LIRS buffer pool — allowing a single transaction to span tables on different backends.
| Backend | Best For | Page Size | Syntax |
|---|---|---|---|
| B+Tree | OLTP, point lookups, range scans | 4 KB | Default (no keyword needed) |
| LSM-Tree | Write-heavy, append-dominant | Variable | USING LSM |
| PAX Columnar | Analytics, OLAP, aggregations | 64 KB | USING COLUMNAR |
| HTAP Dual-Store | Mixed OLTP + OLAP on same data | Both | USING HTAP |
The default storage backend for transactional workloads. B+Tree pages are 4 KB, copy-on-write (for MVCC), and write-ahead-logged for durability. Point lookups reach 0.14 µs on the direct API path.
Every B+Tree leaf page carries a compact Bloom filter. Before reading a page from disk, the filter is checked: if the key is definitely absent, the I/O is skipped entirely. This dramatically reduces read amplification for point lookups on tables with high fill rates.
Indexes can be restricted to a subset of rows, reducing index size and improving write throughput on tables where only a fraction of rows are queried by index.
-- Standard B+Tree index
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Partial index: only index rows matching the predicate
CREATE INDEX idx_orders_pending
ON orders(id)
WHERE status = 'pending';
-- Expression index
CREATE INDEX idx_users_lower_email
ON users(lower(email));
-- Composite index
CREATE INDEX idx_logs_host_ts
ON logs(host, created_at DESC);
BRIN stores only the minimum and maximum value for each range of 128 pages. For monotonically increasing columns (timestamps, auto-increment IDs), a BRIN index can be up to 1,000x smaller than a B+Tree index while still providing excellent selectivity for range queries.
-- BRIN index for a time-series table (extremely compact)
CREATE INDEX idx_events_ts
ON events USING BRIN(created_at);
-- Range query benefits from BRIN automatically
SELECT * FROM events
WHERE created_at BETWEEN '2026-01-01' AND '2026-01-31';
The Log-Structured Merge-Tree backend is optimised for workloads where write throughput is the primary concern — sensor ingestion, event sourcing, audit logs, and append-dominant time-series. Writes go first to an in-memory MemTable, which is flushed to immutable SSTables (Level 0) and merged downward through levels by a background compaction thread.
-- Create a table backed by LSM-Tree
CREATE TABLE sensor_readings (
device_id UUID NOT NULL,
ts TIMESTAMPTZ NOT NULL,
value DOUBLE PRECISION,
PRIMARY KEY (device_id, ts)
) USING LSM;
-- Compaction happens automatically in the background.
-- No manual intervention required.
LSM compaction strategy is leveled: when Level N exceeds its size target, files are merged into Level N+1. Read amplification is bounded. Bloom filters on SSTables skip irrelevant files during point lookups.
Partition Attributes aXross (PAX) layout stores each column contiguously within 64 KB pages. This gives SIMD-friendly memory access patterns for analytical queries — aggregations, GROUP BY, window functions — while still allowing efficient per-row access. Zone maps in each page header record per-column min/max values, enabling entire pages to be skipped when a query predicate cannot match.
| Encoding | Best For | Typical Ratio |
|---|---|---|
| RLE (Run-Length) | Low-cardinality columns, sorted data | 10–100× |
| Bit-packing | Small integers, flags, enum values | 2–8× |
| Dictionary | Repeated strings (up to 256 distinct values) | 4–32× |
| Delta + RLE | Timestamps, monotonic counters | 5–20× |
| Gorilla delta-delta | Float time-series values | ~10× |
-- Create a columnar table (optimised for analytics)
CREATE TABLE metrics (
ts TIMESTAMPTZ NOT NULL,
host TEXT NOT NULL,
value DOUBLE PRECISION
) USING COLUMNAR PAGE_SIZE 65536;
-- The query planner automatically chooses columnar scans
-- for aggregation and GROUP BY queries
SELECT host, avg(value), max(value)
FROM metrics
WHERE ts >= now() - INTERVAL '24 hours'
GROUP BY host
ORDER BY avg DESC;
HTAP (Hybrid Transactional/Analytical Processing) maintains both a row-store and a PAX columnar store simultaneously for the same table. A zero-copy lock-free replication ring with 4,096 entries propagates row inserts to the columnar store in real time — typical replication lag is ≤ 5 ms. No ETL pipeline is required.
The query planner routes automatically: point lookups and short-range scans go to the row-store; full-table scans and aggregations go to the columnar store. A single SQL statement can read from both stores in the same execution plan.
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id INTEGER,
total NUMERIC(12,2),
status TEXT,
created_at TIMESTAMPTZ DEFAULT now()
) USING HTAP;
-- OLTP path: single-row lookup (row-store, ~0.14 µs)
SELECT * FROM orders WHERE id = 12345;
-- OLAP path: aggregation (columnar store, SIMD-accelerated)
SELECT status, count(*), sum(total)
FROM orders
WHERE created_at >= '2026-01-01'
GROUP BY status;
Page size is configurable per table to match the I/O characteristics of the workload. Smaller pages reduce read amplification for random point lookups; larger pages amortise I/O overhead for sequential scans.
| Page Size | Best For |
|---|---|
| 4 KB | OLTP — random point lookups, short-range scans |
| 64 KB | Analytics — sequential scans, columnar storage |
| 2 MB | Bulk data — data warehouse tables, large object storage |
-- OLTP table (default 4 KB)
CREATE TABLE users (id BIGSERIAL PRIMARY KEY, name TEXT);
-- Analytics table (64 KB pages)
CREATE TABLE page_views (
ts TIMESTAMPTZ, url TEXT, user_id BIGINT
) USING COLUMNAR PAGE_SIZE 65536;
-- Bulk data warehouse table (2 MB pages)
CREATE TABLE raw_logs (
ts TEXT, line TEXT
) PAGE_SIZE 2097152;
Absolute DB uses the LIRS (Low Inter-Reference Recency Set) algorithm exclusively for buffer pool management. ARC (Adaptive Replacement Cache) is covered by IBM US Patent 6,996,676 and is never used.
LIRS classifies pages into hot and cold tiers based on inter-reference recency, keeping frequently-reused pages resident and evicting cold pages with O(1) hit, miss, and eviction operations. The cold-tier ratio (default ~2%) is tunable via configuration.
# Set buffer pool to 4 GB (typical production server)
absdb-server --buffer-pool-mb 4096
# Embedded / edge minimal configuration
absdb-lite --buffer-pool-mb 64
# Micro-embedded (Raspberry Pi Zero)
absdb-lite --buffer-pool-mb 1
| Configuration | Recommended Buffer Pool |
|---|---|
| Embedded / IoT (Raspberry Pi Zero) | 1–16 MB |
| Developer laptop | 256–512 MB |
| Small production server (8 GB RAM) | 4–6 GB |
| Large production server (256 GB RAM) | 180–220 GB |
On Linux, Absolute DB uses io_uring for asynchronous I/O — bypassing the kernel page cache for predictable latency and enabling batched I/O submission (up to 64 reads + 64 writes per tick). When io_uring is unavailable (older kernels, macOS), the engine falls back transparently to POSIX pread/pwrite.
Direct I/O is used for data pages to bypass the OS page cache and avoid double-buffering, since Absolute DB manages its own LIRS buffer pool. WAL writes use durable sync mode when io_uring is unavailable to guarantee durability on all platforms.
# io_uring is auto-selected on Linux kernel 5.1+
# Force POSIX I/O (useful for containers without io_uring permissions)
absdb-server --io-backend posix
# Check which I/O backend is active
absdb-server --collect-env | grep io_backend
TRIM support is available on SSDs: BLKDISCARD ioctl on Linux, fallocate(PUNCH_HOLE) fallback, and F_PUNCHHOLE on macOS. This allows freed pages to be returned to the SSD's garbage collector, maintaining write performance over time.
Absolute DB supports a comprehensive set of index types. All index types are maintained transactionally — index updates are part of the same WAL record as the data write.
| Index Type | Use Case | Size vs B+Tree |
|---|---|---|
| B+Tree (default) | Equality, range, ORDER BY, JOIN | 1× |
| Hash | Equality only; O(1) lookup | ~0.5× |
| GIN | JSONB, full-text, array containment | Variable |
| BRIN | Monotonic columns (ts, ID); huge tables | ~0.001× |
| HNSW | Vector similarity search | Varies by dims |
| Partial | Index subset of rows (WHERE predicate) | <1× |
| Expression | Index computed expressions | 1× |
Online index creation is supported — background thread builds the index while the table remains fully accessible for reads and writes. Progress is visible via SELECT * FROM absdb_active_queries.
Every write in Absolute DB is recorded in the Write-Ahead Log before the data page is modified. On crash, the WAL is replayed from the last checkpoint to restore the database to a consistent state.
-- Check current WAL position (LSN = Log Sequence Number)
SELECT absdb_current_lsn();
-- Force a checkpoint (flush dirty pages, advance WAL)
CHECKPOINT;
-- Stream WAL to S3 for continuous backup
SELECT absdb_wal_archive_enable(
target => 's3://my-bucket/absdb-wal/',
region => 'ap-southeast-2'
);
Absolute DB implements Multi-Version Concurrency Control (MVCC). Each transaction sees a consistent snapshot of the database as of its start System Change Number (SCN). Writers never block readers; readers never block writers.
Up to 4,096 concurrent active transactions are supported. The undo log enables savepoints and multi-level rollback within a transaction. Old versions are cleaned up by a background VACUUM process that honours configured PITR retention windows.
-- Every SELECT automatically uses a consistent snapshot
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM orders; -- snapshot taken here
-- ... other work ...
SELECT count(*) FROM orders; -- same count, even if rows were inserted
COMMIT;
-- Time-travel: read table as of a historical SCN
SELECT * FROM orders AS OF SCN 1048576;
-- Check isolation level
SHOW transaction_isolation;
For more on transactions, isolation levels, and advisory locks, see the Transactions & MVCC documentation.
~154 KB binary · zero external dependencies · 2,737 tests passing · SQL:2023 100%