Chapter 19
Compact Block Filters
BIP-157/158 and Privacy-Preserving Light Clients
Compact block filters represent a fundamental redesign of light client filtering. Rather than clients revealing their interests to servers (as in BIP-37), servers pre-compute and publish deterministic filters for each block. Clients download these filters, test them locally, and request full blocks only when relevant.
This architectural inversion—server computes, client filters locally—eliminates the privacy leaks inherent in BIP-37 while providing efficient synchronization. The filters use Golomb-Rice coding for exceptional space efficiency, achieving sizes of roughly 1/80th of the full block.
Privacy Advantage
The server learns nothing about client interests. A client that downloads filter for block N is indistinguishable from any other client downloading that filter. When the client requests the full block, the server learns only that "something in this block is interesting"—not which transactions, addresses, or scripts.
19.1 The Architectural Inversion
BIP-157/158 inverts the BIP-37 model:
Theorem 19.1 (Privacy Guarantee)
In the BIP-157/158 model, a server cannot distinguish between:
- A client interested in address A
- A client interested in address B
- A client with no specific interest, just syncing
All clients downloading the same filter are indistinguishable.
Proof.
The filter is deterministically computed from the block contents alone. Every client receives the identical filter. The client's query (getcfilters) specifies only the block height range, which is identical regardless of what addresses the client cares about. The filtering operation happens entirely on the client side. The only information leaked is when a client requests a full block, which reveals "I care about something in this block" but not what. ∎
19.2 Golomb-Rice Coding
Compact block filters achieve their small size through Golomb-Rice coding, a form of entropy coding that efficiently represents sorted sequences of integers with roughly geometric distribution.
Golomb Coding Basics
Definition 19.1 (Golomb Coding)
Given a positive integer n and parameter M, Golomb coding represents n as:
- Quotient: q = ⌊n / M⌋, encoded in unary (q ones followed by a zero)
- Remainder: r = n mod M, encoded in binary
Definition 19.2 (Golomb-Rice Coding)
Golomb-Rice coding is Golomb coding where M = 2P for some parameter P. This simplifies the encoding since the remainder is exactly P bits.
For value n:
- Quotient: q = n >> P (right shift by P bits)
- Remainder: r = n & ((1 << P) - 1) (low P bits)
- Encoding: [q ones] [0] [P-bit remainder]
Example 19.1 (Golomb-Rice Encoding with P=4)
Encode n = 25 with P = 4 (M = 16):
- q = 25 >> 4 = 1
- r = 25 & 15 = 9 = 10012
- Encoding: 101001 = "101001"
Encode n = 7 with P = 4:
- q = 7 >> 4 = 0
- r = 7 & 15 = 7 = 01112
- Encoding: 00111 = "00111"
Why Golomb-Rice is Optimal
Theorem 19.2 (Golomb-Rice Optimality)
For a geometric distribution with parameter p, Golomb coding with M = ⌈-log(2-p) / log(1-p)⌉ achieves expected code length within 1 + log₂(e) ≈ 2.44 bits of the entropy bound.
The deltas between sorted set elements in a Golomb-coded set (GCS) follow approximately geometric distribution, making Golomb-Rice coding nearly optimal.
Golomb-Coded Sets (GCS)
Definition 19.3 (Golomb-Coded Set)
A Golomb-Coded Set represents a set S of N elements with target false positive rate 1/M (where M = 2P):
- Hash each element to the range [0, N·M) using a keyed hash function
- Sort the resulting hash values
- Compute successive differences (deltas)
- Encode deltas using Golomb-Rice with parameter P
Space Efficiency
Theorem 19.3 (GCS Space Efficiency)
A Golomb-Coded Set with N elements and false positive rate 1/M requires approximately N · (1 + log₂(M)) bits, plus a small constant overhead.
Proof sketch.
Each delta has expected value M (since N elements are spread over range N·M). Golomb-Rice with P = log₂(M) encodes values near M optimally. The expected code length per delta is approximately 1 + P = 1 + log₂(M) bits. With N deltas, total size ≈ N · (1 + log₂(M)) bits. ∎
Example 19.2 (BIP-158 Filter Size)
BIP-158 uses P = 19 (M = 2¹⁹ ≈ 524,288) and parameter F = M. For a block with N = 2000 scriptPubKeys:
- Filter size ≈ 2000 × (1 + 19) = 40,000 bits = 5,000 bytes
- Compared to full block: ~1-2 MB
- Ratio: ~1/200 to 1/400
19.3 BIP-158 Filter Construction
BIP-158 specifies exactly how to construct the "basic" filter type for Bitcoin blocks.
Filter Contents
Definition 19.4 (BIP-158 Basic Filter Elements)
The basic filter type (filter_type = 0x00) includes:
- All scriptPubKeys of outputs created in the block
- All scriptPubKeys of outputs spent by inputs in the block
Specifically excluded:
- OP_RETURN outputs (unspendable)
- Witness scripts (only committed scriptPubKey)
- Coinbase input's scriptSig
Parameters
Definition 19.5 (BIP-158 Parameters)
| Parameter | Value | Description |
|---|---|---|
| P | 19 | Golomb-Rice parameter (M = 2¹⁹) |
| M | 784931 | Modulus for false positive rate (≈ 1/784931) |
| Hash function | SipHash-2-4 | Fast, secure keyed hash |
| Key | First 16 bytes of block hash | Unique key per block |
Construction Algorithm
Algorithm 19.1 (BIP-158 Filter Construction)
function ConstructBasicFilter(block):
// Collect filter elements
elements = []
for each tx in block.transactions:
for each output in tx.outputs:
if output.scriptPubKey not OP_RETURN:
elements.append(output.scriptPubKey)
for each input in tx.inputs:
if not coinbase:
elements.append(input.prevout.scriptPubKey)
// Remove duplicates
elements = unique(elements)
N = len(elements)
// Compute hash key from block hash (first 16 bytes, little-endian)
key = block.hash[0:16]
// Hash elements to range [0, F*N)
F = 784931
hashed = []
for elem in elements:
h = SipHash(key, elem) mod (F * N)
hashed.append(h)
// Sort and compute deltas
sorted_hashes = sort(hashed)
deltas = []
prev = 0
for h in sorted_hashes:
deltas.append(h - prev)
prev = h
// Golomb-Rice encode
P = 19
bits = GolombRiceEncode(deltas, P)
// Prepend varint N
return varint(N) || bits
Query Algorithm
Algorithm 19.2 (Filter Matching)
function MatchFilter(filter, block_hash, query_elements):
key = block_hash[0:16]
N = filter.N
F = 784931
// Hash query elements
query_hashes = set()
for elem in query_elements:
h = SipHash(key, elem) mod (F * N)
query_hashes.add(h)
// Decode filter and check for matches
decoded = GolombRiceDecode(filter.data, P=19, N)
// Binary search or linear scan for matches
for h in decoded:
if h in query_hashes:
return true // Possible match
return false // Definitely no match
Example 19.3 (Wallet Sync with Compact Filters)
A wallet with 50 addresses syncs the blockchain:
- For each block, download the ~5KB filter
- Hash wallet's 50 scriptPubKeys with block's key
- Check if any hash appears in the decoded filter
- If match: download full block (~1.5MB), extract relevant transactions
- If no match: continue to next block (no download)
Expected false positive rate: 1/784931 per scriptPubKey per block. With 50 addresses: ~1/15698 per block ≈ one false positive every ~2.3 days.
19.4 BIP-157 Protocol
BIP-157 defines the network protocol for requesting and serving compact block filters.
New Message Types
Definition 19.6 (BIP-157 Messages)
| Message | Direction | Purpose |
|---|---|---|
getcfilters |
C → S | Request filters for a range of blocks |
cfilter |
S → C | A single block's filter |
getcfheaders |
C → S | Request filter header chain |
cfheaders |
S → C | Filter headers for verification |
getcfcheckpt |
C → S | Request filter checkpoints |
cfcheckpt |
S → C | Evenly-spaced filter header hashes |
Filter Header Chain
To allow verification without downloading all filters, BIP-157 defines a chain of filter headers:
Definition 19.7 (Filter Header)
The filter header for block N is:
Where filterN is the raw filter bytes and filter_header-1 = 0x00...00 (32 zero bytes).
Sync Protocol Flow
19.5 Security Analysis
False Positive Rate
Theorem 19.4 (BIP-158 False Positive Rate)
For a filter with N elements and a query set of size Q, the probability of at least one false positive is:
With M = 784931, a wallet with 100 addresses has approximately 1/7849 chance of false positive per block.
Filter Authenticity
Problem 19.1 (Malicious Filter Attack)
A malicious node could serve incorrect filters:
- Omission attack: Remove elements to hide transactions
- Addition attack: Add elements to increase false positives
Theorem 19.5 (Filter Verification)
A light client can detect filter manipulation by:
- Querying multiple independent peers for filter headers
- Verifying all peers return identical filter_headerN
- Verifying received filter hashes to filter_headerN
If k honest peers exist among n queried, detection probability approaches 1 as k increases.
Unlike BIP-37 where the server could silently omit matching transactions, BIP-157/158 filters are deterministic. Any tampering changes the filter hash, which is detectable by querying multiple peers.
Privacy Analysis
Theorem 19.6 (Privacy Bounds)
Information leaked per block request:
- Filter download: Zero bits (all clients download same filter)
- Full block request after match: log₂(N) bits where N is number of scriptPubKeys in block (server learns "client cares about one of N scripts")
The privacy loss from requesting matching blocks is minimal:
- Average block has ~2000-3000 unique scriptPubKeys
- Information leaked: ~11-12 bits per matched block
- With multiple matches per block: even less specific
- False positives provide additional cover
19.6 Implementation Considerations
Storage Requirements
Example 19.4 (Filter Chain Size)
For the entire Bitcoin blockchain (as of early 2024, ~830,000 blocks):
- Average filter size: ~15-20 KB (varies with block size)
- Total filter data: ~15 GB
- Filter headers only: 830,000 × 32 = ~27 MB
- Compare to full blockchain: ~550 GB
A light client can store just headers (~27 MB) and download filters on demand, or cache recent filters.
Bandwidth Comparison
| Approach | Initial Sync | Per Block (ongoing) | Privacy |
|---|---|---|---|
| Full node | ~550 GB | ~1.5 MB | Perfect |
| BIP-37 (Bloom) | ~matched only | ~matched only | None |
| BIP-157/158 | ~15 GB filters | ~20 KB + matched blocks | Good |
| Headers only | ~60 MB | 80 bytes | Requires trust |
Client Implementation
Algorithm 19.3 (Light Client Wallet Sync)
function SyncWallet(wallet):
// Get current block height
tip = GetBestBlockHeader()
// Get filter headers from multiple peers
filter_headers = {}
for peer in peers:
fh = peer.getcfheaders(wallet.last_synced, tip)
filter_headers[peer] = fh
// Verify consensus on filter headers
if not AllEqual(filter_headers.values()):
// Mismatch detected - query more peers, identify liar
HandleFilterMismatch(filter_headers)
return
// Download and process filters
for height in range(wallet.last_synced + 1, tip.height + 1):
filter = GetFilter(height)
// Verify filter matches header
assert SHA256d(filter) == filter_headers[height]
// Test wallet addresses against filter
if MatchFilter(filter, wallet.scriptPubKeys):
// Potential match - download full block
block = GetBlock(height)
// Extract relevant transactions
for tx in block.transactions:
if IsRelevant(tx, wallet):
wallet.ProcessTransaction(tx)
wallet.last_synced = tip.height
19.7 Neutrino: A Reference Implementation
Neutrino is the reference light client implementation of BIP-157/158, originally developed for lnd (Lightning Network Daemon).
Key Features
- Header-first sync: Downloads block headers before filters
- Multi-peer querying: Verifies filters across multiple peers
- Parallel downloads: Fetches filters and blocks concurrently
- Persistence: Caches filter headers and recent filters
Integration with Lightning
Neutrino enables Lightning Network nodes to operate without a full Bitcoin node, trading some security for reduced resource requirements:
- Monitors funding transaction confirmations
- Detects channel breach attempts
- Watches for HTLC timeouts
- All without revealing which channels belong to the user
Exercises
Exercise 19.1
Implement Golomb-Rice encoding and decoding. Verify that decoding the encoding of a sequence returns the original sequence.
Exercise 19.2
Calculate the expected filter size for a block with 3000 transactions, averaging 2 inputs and 2 outputs each (with 10% address reuse).
Exercise 19.3
Prove that if a client queries k independent peers for filter headers and at least one is honest, any filter manipulation will be detected with probability 1 - (1/k).
Exercise 19.4
A wallet tracks 200 addresses. Over 1000 blocks, how many false positive block downloads should be expected? What is the bandwidth overhead compared to perfect filtering?
Exercise 19.5
Design an attack where a malicious node provides valid filters but selectively delays filter delivery to infer client interests. How might a client defend against this?
Exercise 19.6
Compare the total bandwidth for initial blockchain sync using: (a) full node, (b) BIP-157/158 with 100 addresses, (c) BIP-37 with 0.001 false positive rate. Assume 800,000 blocks with average 2000 txs each.
Chapter Summary
- BIP-157/158 inverts the BIP-37 model: servers pre-compute and publish deterministic filters; clients download and test locally.
- This architectural change eliminates the privacy leaks inherent in revealing your filter to the server.
- Golomb-Rice coding provides near-optimal compression for set membership data, achieving filter sizes of ~1/80th of full block size.
- The false positive rate of 1/784931 per element provides good privacy cover while keeping bandwidth overhead minimal.
- Filter header chains allow verification without downloading all filters, and querying multiple peers detects filter manipulation.
- BIP-157/158 is now the recommended approach for light client implementation, used by Neutrino and other modern wallets.