Chapter 19

Compact Block Filters

BIP-157/158 and Privacy-Preserving Light Clients

Compact block filters represent a fundamental redesign of light client filtering. Rather than clients revealing their interests to servers (as in BIP-37), servers pre-compute and publish deterministic filters for each block. Clients download these filters, test them locally, and request full blocks only when relevant.

This architectural inversion—server computes, client filters locally—eliminates the privacy leaks inherent in BIP-37 while providing efficient synchronization. The filters use Golomb-Rice coding for exceptional space efficiency, achieving sizes of roughly 1/80th of the full block.

Privacy Advantage

The server learns nothing about client interests. A client that downloads filter for block N is indistinguishable from any other client downloading that filter. When the client requests the full block, the server learns only that "something in this block is interesting"—not which transactions, addresses, or scripts.

19.1 The Architectural Inversion

BIP-157/158 inverts the BIP-37 model:

Figure 19.1 — The fundamental difference: BIP-37 has clients send filters to servers (privacy leak); BIP-157/158 has servers publish filters that clients test locally (privacy preserved).

Theorem 19.1 (Privacy Guarantee)

In the BIP-157/158 model, a server cannot distinguish between:

A client interested in address A
A client interested in address B
A client with no specific interest, just syncing

All clients downloading the same filter are indistinguishable.

Proof.

The filter is deterministically computed from the block contents alone. Every client receives the identical filter. The client's query (getcfilters) specifies only the block height range, which is identical regardless of what addresses the client cares about. The filtering operation happens entirely on the client side. The only information leaked is when a client requests a full block, which reveals "I care about something in this block" but not what. ∎

19.2 Golomb-Rice Coding

Compact block filters achieve their small size through Golomb-Rice coding, a form of entropy coding that efficiently represents sorted sequences of integers with roughly geometric distribution.

Golomb Coding Basics

Definition 19.1 (Golomb Coding)

Given a positive integer n and parameter M, Golomb coding represents n as:

Quotient: q = ⌊n / M⌋, encoded in unary (q ones followed by a zero)
Remainder: r = n mod M, encoded in binary

Definition 19.2 (Golomb-Rice Coding)

Golomb-Rice coding is Golomb coding where M = 2^P for some parameter P. This simplifies the encoding since the remainder is exactly P bits.

For value n:

Quotient: q = n >> P (right shift by P bits)
Remainder: r = n & ((1 << P) - 1) (low P bits)
Encoding: [q ones] [0] [P-bit remainder]

Example 19.1 (Golomb-Rice Encoding with P=4)

Encode n = 25 with P = 4 (M = 16):

q = 25 >> 4 = 1
r = 25 & 15 = 9 = 1001₂
Encoding: 101001 = "101001"

Encode n = 7 with P = 4:

q = 7 >> 4 = 0
r = 7 & 15 = 7 = 0111₂
Encoding: 00111 = "00111"

Why Golomb-Rice is Optimal

Theorem 19.2 (Golomb-Rice Optimality)

For a geometric distribution with parameter p, Golomb coding with optimal parameter M achieves expected code length within at most 1 bit of the entropy bound.

The deltas between sorted set elements in a Golomb-coded set (GCS) follow approximately geometric distribution, making Golomb-Rice coding nearly optimal.

Golomb-Coded Sets (GCS)

Definition 19.3 (Golomb-Coded Set)

A Golomb-Coded Set represents a set S of N elements with target false positive rate 1/F:

Hash each element to the range [0, N·F) using a keyed hash function
Sort the resulting hash values
Compute successive differences (deltas)
Encode deltas using Golomb-Rice with parameter P

Figure 19.2 — GCS construction: hash elements to numeric range, sort, compute deltas, Golomb-Rice encode. The result is highly compact.

Space Efficiency

Theorem 19.3 (GCS Space Efficiency)

A Golomb-Coded Set with N elements and false positive rate 1/M requires approximately N · (1 + log₂(M)) bits, plus a small constant overhead.

Proof sketch.

Each delta has expected value M (since N elements are spread over range N·M). Golomb-Rice with P = log₂(M) encodes values near M optimally. The expected code length per delta is approximately 1 + P = 1 + log₂(M) bits. With N deltas, total size ≈ N · (1 + log₂(M)) bits. ∎

Example 19.2 (BIP-158 Filter Size)

BIP-158 uses P = 19 (M = 2¹⁹ ≈ 524,288) and parameter F = M. For a block with N = 2000 scriptPubKeys:

Filter size ≈ 2000 × (1 + 19) = 40,000 bits = 5,000 bytes
Compared to full block: ~1-2 MB
Ratio: ~1/200 to 1/400

19.3 BIP-158 Filter Construction

BIP-158 specifies exactly how to construct the "basic" filter type for Bitcoin blocks.

Filter Contents

Definition 19.4 (BIP-158 Basic Filter Elements)

The basic filter type (filter_type = 0x00) includes:

All scriptPubKeys of outputs created in the block
All scriptPubKeys of outputs spent by inputs in the block

Specifically excluded:

OP_RETURN outputs (unspendable)
Witness scripts (only committed scriptPubKey)
Coinbase input's scriptSig

Parameters

Definition 19.5 (BIP-158 Parameters)

Parameter	Value	Description
P	19	Golomb-Rice coding parameter (remainder is P bits)
F	784931	False positive rate parameter (target rate ≈ 1/F)
Hash function	SipHash-2-4	Fast, secure keyed hash
Key	First 16 bytes of block hash	Unique key per block

Construction Algorithm

Algorithm 19.1 (BIP-158 Filter Construction)

function ConstructBasicFilter(block):
    // Collect filter elements
    elements = []
    for each tx in block.transactions:
        for each output in tx.outputs:
            if output.scriptPubKey not OP_RETURN:
                elements.append(output.scriptPubKey)
        for each input in tx.inputs:
            if not coinbase:
                elements.append(input.prevout.scriptPubKey)

    // Remove duplicates
    elements = unique(elements)
    N = len(elements)

    // Compute hash key from block hash (first 16 bytes, little-endian)
    key = block.hash[0:16]

    // Hash elements to range [0, F*N)
    F = 784931
    hashed = []
    for elem in elements:
        h = SipHash(key, elem) mod (F * N)
        hashed.append(h)

    // Sort and compute deltas
    sorted_hashes = sort(hashed)
    deltas = []
    prev = 0
    for h in sorted_hashes:
        deltas.append(h - prev)
        prev = h

    // Golomb-Rice encode
    P = 19
    bits = GolombRiceEncode(deltas, P)

    // Prepend varint N
    return varint(N) || bits

Query Algorithm

Algorithm 19.2 (Filter Matching)

function MatchFilter(filter, block_hash, query_elements):
    key = block_hash[0:16]
    N = filter.N
    F = 784931

    // Hash query elements
    query_hashes = set()
    for elem in query_elements:
        h = SipHash(key, elem) mod (F * N)
        query_hashes.add(h)

    // Decode filter and check for matches
    decoded = GolombRiceDecode(filter.data, P=19, N)

    // Binary search or linear scan for matches
    for h in decoded:
        if h in query_hashes:
            return true  // Possible match

    return false  // Definitely no match

Example 19.3 (Wallet Sync with Compact Filters)

A wallet with 50 addresses syncs the blockchain:

For each block, download the ~5KB filter
Hash wallet's 50 scriptPubKeys with block's key
Check if any hash appears in the decoded filter
If match: download full block (~1.5MB), extract relevant transactions
If no match: continue to next block (no download)

Expected false positive rate: 1/784931 per scriptPubKey per block. With 50 addresses: ~50/784931 ≈ 1/15699 per block ≈ one false positive every ~109 days (at 144 blocks/day).

19.4 BIP-157 Protocol

BIP-157 defines the network protocol for requesting and serving compact block filters.

New Message Types

Definition 19.6 (BIP-157 Messages)

Message	Direction	Purpose
`getcfilters`	C → S	Request filters for a range of blocks
`cfilter`	S → C	A single block's filter
`getcfheaders`	C → S	Request filter header chain
`cfheaders`	S → C	Filter headers for verification
`getcfcheckpt`	C → S	Request filter checkpoints
`cfcheckpt`	S → C	Evenly-spaced filter header hashes

Filter Header Chain

To allow verification without downloading all filters, BIP-157 defines a chain of filter headers:

Definition 19.7 (Filter Header)

The filter header for block N is:

filter_header_N = SHA256(SHA256(filter_N) || filter_header_N-1)

Where filter_N is the raw filter bytes and filter_header_-1 = 0x00...00 (32 zero bytes).

Figure 19.3 — Filter header chain: each header commits to the current filter and all previous headers, forming a verifiable chain.

Sync Protocol Flow

Figure 19.4 — Three-phase sync: (1) download checkpoints for coarse verification, (2) download headers to verify filter chain, (3) download and process individual filters.

19.5 Security Analysis

False Positive Rate

Theorem 19.4 (BIP-158 False Positive Rate)

For a filter with N elements and a query set of size Q, the probability of at least one false positive is:

P(false positive) = 1 - (1 - 1/F)^Q ≈ Q/F

With F = 784931, a wallet with 100 addresses has approximately 1/7849 chance of false positive per block.

Filter Authenticity

Problem 19.1 (Malicious Filter Attack)

A malicious node could serve incorrect filters:

Omission attack: Remove elements to hide transactions
Addition attack: Add elements to increase false positives

Theorem 19.5 (Filter Verification)

A light client can detect filter manipulation by:

Querying multiple independent peers for filter headers
Verifying all peers return identical filter_header_N
Verifying received filter hashes to filter_header_N

If k honest peers exist among n queried, detection probability approaches 1 as k increases.

Unlike BIP-37 where the server could silently omit matching transactions, BIP-157/158 filters are deterministic. Any tampering changes the filter hash, which is detectable by querying multiple peers.

Privacy Analysis

Theorem 19.6 (Privacy Bounds)

Information leaked per block request:

Filter download: Zero bits (all clients download same filter)
Full block request after match: log₂(N) bits where N is number of scriptPubKeys in block (server learns "client cares about one of N scripts")

The privacy loss from requesting matching blocks is minimal:

Average block has ~2000-3000 unique scriptPubKeys
Information leaked: ~11-12 bits per matched block
With multiple matches per block: even less specific
False positives provide additional cover

19.6 Implementation Considerations

Storage Requirements

Example 19.4 (Filter Chain Size)

For the entire Bitcoin blockchain (as of early 2024, ~830,000 blocks):

Average filter size: ~15-20 KB (varies with block size)
Total filter data: ~15 GB
Filter headers only: 830,000 × 32 = ~27 MB
Compare to full blockchain: ~550 GB

A light client can store just headers (~27 MB) and download filters on demand, or cache recent filters.

Bandwidth Comparison

Approach	Initial Sync	Per Block (ongoing)	Privacy
Full node	~550 GB	~1.5 MB	Perfect
BIP-37 (Bloom)	~matched only	~matched only	None
BIP-157/158	~15 GB filters	~20 KB + matched blocks	Good
Headers only	~60 MB	80 bytes	Requires trust

Client Implementation

Algorithm 19.3 (Light Client Wallet Sync)

function SyncWallet(wallet):
    // Get current block height
    tip = GetBestBlockHeader()

    // Get filter headers from multiple peers
    filter_headers = {}
    for peer in peers:
        fh = peer.getcfheaders(wallet.last_synced, tip)
        filter_headers[peer] = fh

    // Verify consensus on filter headers
    if not AllEqual(filter_headers.values()):
        // Mismatch detected - query more peers, identify liar
        HandleFilterMismatch(filter_headers)
        return

    // Download and process filters
    for height in range(wallet.last_synced + 1, tip.height + 1):
        filter = GetFilter(height)

        // Verify filter matches header
        assert SHA256d(filter) == filter_headers[height]

        // Test wallet addresses against filter
        if MatchFilter(filter, wallet.scriptPubKeys):
            // Potential match - download full block
            block = GetBlock(height)

            // Extract relevant transactions
            for tx in block.transactions:
                if IsRelevant(tx, wallet):
                    wallet.ProcessTransaction(tx)

    wallet.last_synced = tip.height

19.7 Neutrino: A Reference Implementation

Neutrino is the reference light client implementation of BIP-157/158, originally developed for lnd (Lightning Network Daemon).

Key Features

Header-first sync: Downloads block headers before filters
Multi-peer querying: Verifies filters across multiple peers
Parallel downloads: Fetches filters and blocks concurrently
Persistence: Caches filter headers and recent filters

Integration with Lightning

Neutrino enables Lightning Network nodes to operate without a full Bitcoin node, trading some security for reduced resource requirements:

Monitors funding transaction confirmations
Detects channel breach attempts
Watches for HTLC timeouts
All without revealing which channels belong to the user

Exercises

Exercise 19.1

Implement Golomb-Rice encoding and decoding. Verify that decoding the encoding of a sequence returns the original sequence.

Exercise 19.2

Calculate the expected filter size for a block with 3000 transactions, averaging 2 inputs and 2 outputs each (with 10% address reuse).

Exercise 19.3

Prove that if a client queries k independent peers for filter headers and at least one is honest, any filter manipulation will be detected with probability 1 - (1/k).

Exercise 19.4

A wallet tracks 200 addresses. Over 1000 blocks, how many false positive block downloads should be expected? What is the bandwidth overhead compared to perfect filtering?

Exercise 19.5

Design an attack where a malicious node provides valid filters but selectively delays filter delivery to infer client interests. How might a client defend against this?

Exercise 19.6

Compare the total bandwidth for initial blockchain sync using: (a) full node, (b) BIP-157/158 with 100 addresses, (c) BIP-37 with 0.001 false positive rate. Assume 800,000 blocks with average 2000 txs each.

Chapter Summary

BIP-157/158 inverts the BIP-37 model: servers pre-compute and publish deterministic filters; clients download and test locally.
This architectural change eliminates the privacy leaks inherent in revealing your filter to the server.
Golomb-Rice coding provides near-optimal compression for set membership data, achieving filter sizes of ~1/80th of full block size.
The false positive rate of 1/784931 per element provides good privacy cover while keeping bandwidth overhead minimal.
Filter header chains allow verification without downloading all filters, and querying multiple peers detects filter manipulation.
BIP-157/158 is now the recommended approach for light client implementation, used by Neutrino and other modern wallets.