Chapter 19

Compact Block Filters

BIP-157/158 and Privacy-Preserving Light Clients

Compact block filters represent a fundamental redesign of light client filtering. Rather than clients revealing their interests to servers (as in BIP-37), servers pre-compute and publish deterministic filters for each block. Clients download these filters, test them locally, and request full blocks only when relevant.

This architectural inversion—server computes, client filters locally—eliminates the privacy leaks inherent in BIP-37 while providing efficient synchronization. The filters use Golomb-Rice coding for exceptional space efficiency, achieving sizes of roughly 1/80th of the full block.

Privacy Advantage

The server learns nothing about client interests. A client that downloads filter for block N is indistinguishable from any other client downloading that filter. When the client requests the full block, the server learns only that "something in this block is interesting"—not which transactions, addresses, or scripts.

19.1 The Architectural Inversion

BIP-157/158 inverts the BIP-37 model:

BIP-37 vs BIP-157/158: Architectural Comparison BIP-37 (Client Filter) SPV Client Creates filter filter Full Node Tests blocks matches Privacy Problem: Server learns client's addresses BIP-157/158 (Server Filter) Full Node Pre-computes filter filter SPV Client Tests locally if match: get full block Privacy Preserved: Server learns nothing about client Client reveals wallet contents Server does per-request work Bandwidth: ~matches only Client reveals nothing Server pre-computes once Bandwidth: ~all filters + matching blocks
Figure 19.1 — The fundamental difference: BIP-37 has clients send filters to servers (privacy leak); BIP-157/158 has servers publish filters that clients test locally (privacy preserved).

Theorem 19.1 (Privacy Guarantee)

In the BIP-157/158 model, a server cannot distinguish between:

  1. A client interested in address A
  2. A client interested in address B
  3. A client with no specific interest, just syncing

All clients downloading the same filter are indistinguishable.

Proof.

The filter is deterministically computed from the block contents alone. Every client receives the identical filter. The client's query (getcfilters) specifies only the block height range, which is identical regardless of what addresses the client cares about. The filtering operation happens entirely on the client side. The only information leaked is when a client requests a full block, which reveals "I care about something in this block" but not what. ∎

19.2 Golomb-Rice Coding

Compact block filters achieve their small size through Golomb-Rice coding, a form of entropy coding that efficiently represents sorted sequences of integers with roughly geometric distribution.

Golomb Coding Basics

Definition 19.1 (Golomb Coding)

Given a positive integer n and parameter M, Golomb coding represents n as:

  • Quotient: q = ⌊n / M⌋, encoded in unary (q ones followed by a zero)
  • Remainder: r = n mod M, encoded in binary

Definition 19.2 (Golomb-Rice Coding)

Golomb-Rice coding is Golomb coding where M = 2P for some parameter P. This simplifies the encoding since the remainder is exactly P bits.

For value n:

  • Quotient: q = n >> P (right shift by P bits)
  • Remainder: r = n & ((1 << P) - 1) (low P bits)
  • Encoding: [q ones] [0] [P-bit remainder]

Example 19.1 (Golomb-Rice Encoding with P=4)

Encode n = 25 with P = 4 (M = 16):

  • q = 25 >> 4 = 1
  • r = 25 & 15 = 9 = 10012
  • Encoding: 101001 = "101001"

Encode n = 7 with P = 4:

  • q = 7 >> 4 = 0
  • r = 7 & 15 = 7 = 01112
  • Encoding: 00111 = "00111"

Why Golomb-Rice is Optimal

Theorem 19.2 (Golomb-Rice Optimality)

For a geometric distribution with parameter p, Golomb coding with M = ⌈-log(2-p) / log(1-p)⌉ achieves expected code length within 1 + log₂(e) ≈ 2.44 bits of the entropy bound.

The deltas between sorted set elements in a Golomb-coded set (GCS) follow approximately geometric distribution, making Golomb-Rice coding nearly optimal.

Golomb-Coded Sets (GCS)

Definition 19.3 (Golomb-Coded Set)

A Golomb-Coded Set represents a set S of N elements with target false positive rate 1/M (where M = 2P):

  1. Hash each element to the range [0, N·M) using a keyed hash function
  2. Sort the resulting hash values
  3. Compute successive differences (deltas)
  4. Encode deltas using Golomb-Rice with parameter P
Golomb-Coded Set Construction 1. Original elements: script_A script_B script_C 2. Hash to [0, N·M): With N=3, M=16, range = [0, 48) 7 23 41 3. Sort and compute deltas: Sorted: 7, 23, 41 Deltas: 7, 16, 18 4. Golomb-Rice encode (P=4): δ=7: q=0, r=7 0|0111 δ=16: q=1, r=0 10|0000 δ=18: q=1, r=2 10|0010 5. Complete encoding: 0011110000010 0010 17 bits for 3 elements (5.7 bits/element)
Figure 19.2 — GCS construction: hash elements to numeric range, sort, compute deltas, Golomb-Rice encode. The result is highly compact.

Space Efficiency

Theorem 19.3 (GCS Space Efficiency)

A Golomb-Coded Set with N elements and false positive rate 1/M requires approximately N · (1 + log₂(M)) bits, plus a small constant overhead.

Proof sketch.

Each delta has expected value M (since N elements are spread over range N·M). Golomb-Rice with P = log₂(M) encodes values near M optimally. The expected code length per delta is approximately 1 + P = 1 + log₂(M) bits. With N deltas, total size ≈ N · (1 + log₂(M)) bits. ∎

Example 19.2 (BIP-158 Filter Size)

BIP-158 uses P = 19 (M = 2¹⁹ ≈ 524,288) and parameter F = M. For a block with N = 2000 scriptPubKeys:

  • Filter size ≈ 2000 × (1 + 19) = 40,000 bits = 5,000 bytes
  • Compared to full block: ~1-2 MB
  • Ratio: ~1/200 to 1/400

19.3 BIP-158 Filter Construction

BIP-158 specifies exactly how to construct the "basic" filter type for Bitcoin blocks.

Filter Contents

Definition 19.4 (BIP-158 Basic Filter Elements)

The basic filter type (filter_type = 0x00) includes:

  • All scriptPubKeys of outputs created in the block
  • All scriptPubKeys of outputs spent by inputs in the block

Specifically excluded:

  • OP_RETURN outputs (unspendable)
  • Witness scripts (only committed scriptPubKey)
  • Coinbase input's scriptSig

Parameters

Definition 19.5 (BIP-158 Parameters)

Parameter Value Description
P 19 Golomb-Rice parameter (M = 2¹⁹)
M 784931 Modulus for false positive rate (≈ 1/784931)
Hash function SipHash-2-4 Fast, secure keyed hash
Key First 16 bytes of block hash Unique key per block

Construction Algorithm

Algorithm 19.1 (BIP-158 Filter Construction)

function ConstructBasicFilter(block):
    // Collect filter elements
    elements = []
    for each tx in block.transactions:
        for each output in tx.outputs:
            if output.scriptPubKey not OP_RETURN:
                elements.append(output.scriptPubKey)
        for each input in tx.inputs:
            if not coinbase:
                elements.append(input.prevout.scriptPubKey)

    // Remove duplicates
    elements = unique(elements)
    N = len(elements)

    // Compute hash key from block hash (first 16 bytes, little-endian)
    key = block.hash[0:16]

    // Hash elements to range [0, F*N)
    F = 784931
    hashed = []
    for elem in elements:
        h = SipHash(key, elem) mod (F * N)
        hashed.append(h)

    // Sort and compute deltas
    sorted_hashes = sort(hashed)
    deltas = []
    prev = 0
    for h in sorted_hashes:
        deltas.append(h - prev)
        prev = h

    // Golomb-Rice encode
    P = 19
    bits = GolombRiceEncode(deltas, P)

    // Prepend varint N
    return varint(N) || bits

Query Algorithm

Algorithm 19.2 (Filter Matching)

function MatchFilter(filter, block_hash, query_elements):
    key = block_hash[0:16]
    N = filter.N
    F = 784931

    // Hash query elements
    query_hashes = set()
    for elem in query_elements:
        h = SipHash(key, elem) mod (F * N)
        query_hashes.add(h)

    // Decode filter and check for matches
    decoded = GolombRiceDecode(filter.data, P=19, N)

    // Binary search or linear scan for matches
    for h in decoded:
        if h in query_hashes:
            return true  // Possible match

    return false  // Definitely no match

Example 19.3 (Wallet Sync with Compact Filters)

A wallet with 50 addresses syncs the blockchain:

  1. For each block, download the ~5KB filter
  2. Hash wallet's 50 scriptPubKeys with block's key
  3. Check if any hash appears in the decoded filter
  4. If match: download full block (~1.5MB), extract relevant transactions
  5. If no match: continue to next block (no download)

Expected false positive rate: 1/784931 per scriptPubKey per block. With 50 addresses: ~1/15698 per block ≈ one false positive every ~2.3 days.

19.4 BIP-157 Protocol

BIP-157 defines the network protocol for requesting and serving compact block filters.

New Message Types

Definition 19.6 (BIP-157 Messages)

Message Direction Purpose
getcfilters C → S Request filters for a range of blocks
cfilter S → C A single block's filter
getcfheaders C → S Request filter header chain
cfheaders S → C Filter headers for verification
getcfcheckpt C → S Request filter checkpoints
cfcheckpt S → C Evenly-spaced filter header hashes

Filter Header Chain

To allow verification without downloading all filters, BIP-157 defines a chain of filter headers:

Definition 19.7 (Filter Header)

The filter header for block N is:

filter_headerN = SHA256(SHA256(filterN) || filter_headerN-1)

Where filterN is the raw filter bytes and filter_header-1 = 0x00...00 (32 zero bytes).

Filter Header Chain Structure Filter₀ Header₀ = H(F₀ || 0x00...) Filter₁ Header₁ = H(F₁ || H₀) Filter₂ Header₂ = H(F₂ || H₁) ... Filterₙ Headerₙ = H(Fₙ || Hₙ₋₁) Each header commits to all previous filters, enabling efficient verification
Figure 19.3 — Filter header chain: each header commits to the current filter and all previous headers, forming a verifiable chain.

Sync Protocol Flow

BIP-157 Synchronization Flow Light Client Full Node Phase 1: Checkpoints getcfcheckpt (filter_type, stop_hash) cfcheckpt (headers at every 1000 blocks) Phase 2: Headers getcfheaders (filter_type, start, stop_hash) cfheaders (filter_hashes batch) Verify headers match checkpoints Phase 3: Filters getcfilters (filter_type, start, stop_hash) cfilter × N (filters) For each filter: 1. Verify H(filter) 2. Match addresses 3. If match: getdata
Figure 19.4 — Three-phase sync: (1) download checkpoints for coarse verification, (2) download headers to verify filter chain, (3) download and process individual filters.

19.5 Security Analysis

False Positive Rate

Theorem 19.4 (BIP-158 False Positive Rate)

For a filter with N elements and a query set of size Q, the probability of at least one false positive is:

P(false positive) = 1 - (1 - 1/M)Q ≈ Q/M

With M = 784931, a wallet with 100 addresses has approximately 1/7849 chance of false positive per block.

Filter Authenticity

Problem 19.1 (Malicious Filter Attack)

A malicious node could serve incorrect filters:

  • Omission attack: Remove elements to hide transactions
  • Addition attack: Add elements to increase false positives

Theorem 19.5 (Filter Verification)

A light client can detect filter manipulation by:

  1. Querying multiple independent peers for filter headers
  2. Verifying all peers return identical filter_headerN
  3. Verifying received filter hashes to filter_headerN

If k honest peers exist among n queried, detection probability approaches 1 as k increases.

Unlike BIP-37 where the server could silently omit matching transactions, BIP-157/158 filters are deterministic. Any tampering changes the filter hash, which is detectable by querying multiple peers.

Privacy Analysis

Theorem 19.6 (Privacy Bounds)

Information leaked per block request:

  • Filter download: Zero bits (all clients download same filter)
  • Full block request after match: log₂(N) bits where N is number of scriptPubKeys in block (server learns "client cares about one of N scripts")

The privacy loss from requesting matching blocks is minimal:

19.6 Implementation Considerations

Storage Requirements

Example 19.4 (Filter Chain Size)

For the entire Bitcoin blockchain (as of early 2024, ~830,000 blocks):

  • Average filter size: ~15-20 KB (varies with block size)
  • Total filter data: ~15 GB
  • Filter headers only: 830,000 × 32 = ~27 MB
  • Compare to full blockchain: ~550 GB

A light client can store just headers (~27 MB) and download filters on demand, or cache recent filters.

Bandwidth Comparison

Approach Initial Sync Per Block (ongoing) Privacy
Full node ~550 GB ~1.5 MB Perfect
BIP-37 (Bloom) ~matched only ~matched only None
BIP-157/158 ~15 GB filters ~20 KB + matched blocks Good
Headers only ~60 MB 80 bytes Requires trust

Client Implementation

Algorithm 19.3 (Light Client Wallet Sync)

function SyncWallet(wallet):
    // Get current block height
    tip = GetBestBlockHeader()

    // Get filter headers from multiple peers
    filter_headers = {}
    for peer in peers:
        fh = peer.getcfheaders(wallet.last_synced, tip)
        filter_headers[peer] = fh

    // Verify consensus on filter headers
    if not AllEqual(filter_headers.values()):
        // Mismatch detected - query more peers, identify liar
        HandleFilterMismatch(filter_headers)
        return

    // Download and process filters
    for height in range(wallet.last_synced + 1, tip.height + 1):
        filter = GetFilter(height)

        // Verify filter matches header
        assert SHA256d(filter) == filter_headers[height]

        // Test wallet addresses against filter
        if MatchFilter(filter, wallet.scriptPubKeys):
            // Potential match - download full block
            block = GetBlock(height)

            // Extract relevant transactions
            for tx in block.transactions:
                if IsRelevant(tx, wallet):
                    wallet.ProcessTransaction(tx)

    wallet.last_synced = tip.height

19.7 Neutrino: A Reference Implementation

Neutrino is the reference light client implementation of BIP-157/158, originally developed for lnd (Lightning Network Daemon).

Key Features

Integration with Lightning

Neutrino enables Lightning Network nodes to operate without a full Bitcoin node, trading some security for reduced resource requirements:

Exercises

Exercise 19.1

Implement Golomb-Rice encoding and decoding. Verify that decoding the encoding of a sequence returns the original sequence.

Exercise 19.2

Calculate the expected filter size for a block with 3000 transactions, averaging 2 inputs and 2 outputs each (with 10% address reuse).

Exercise 19.3

Prove that if a client queries k independent peers for filter headers and at least one is honest, any filter manipulation will be detected with probability 1 - (1/k).

Exercise 19.4

A wallet tracks 200 addresses. Over 1000 blocks, how many false positive block downloads should be expected? What is the bandwidth overhead compared to perfect filtering?

Exercise 19.5

Design an attack where a malicious node provides valid filters but selectively delays filter delivery to infer client interests. How might a client defend against this?

Exercise 19.6

Compare the total bandwidth for initial blockchain sync using: (a) full node, (b) BIP-157/158 with 100 addresses, (c) BIP-37 with 0.001 false positive rate. Assume 800,000 blocks with average 2000 txs each.

Chapter Summary