Chapter 18

SPV Bloom Filters (BIP-37)

The First Attempt at Lightweight Verification

In 2012, BIP-37 introduced a mechanism for SPV clients to request only relevant transactions from full nodes, rather than downloading every transaction in every block. The approach used Bloom filters—probabilistic data structures that allow set membership queries with controllable false positive rates.

While elegant in theory, BIP-37 proved disastrous for privacy in practice. This chapter examines both the mathematics of Bloom filters and the protocol-level failures that led to its effective deprecation. Understanding these failures is essential for appreciating why modern approaches like compact block filters (Chapter 19) take a fundamentally different approach.

Historical Context

BIP-37 Bloom filters are now widely considered harmful to user privacy and are disabled by default in Bitcoin Core since version 0.19.0 (2019). This chapter documents the approach for historical understanding and to illustrate how privacy can fail catastrophically despite well-intentioned design.

18.1 Bloom Filter Fundamentals

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. It can produce false positives (claiming an element is in the set when it isn't) but never false negatives (if it says an element isn't in the set, it definitely isn't).

Definition 18.1 (Bloom Filter)

A Bloom filter for a set S consists of:

A bit array B of m bits, initially all zeros
A family of k independent hash functions h₁, h₂, ..., hₖ, each mapping elements to {0, 1, ..., m-1}

To add element x: set B[hᵢ(x)] = 1 for all i ∈ {1, ..., k}.

To query element x: return "possibly in set" if B[hᵢ(x)] = 1 for all i; return "definitely not in set" otherwise.

Visual Representation

Figure 18.1 — Bloom filter operations with k=3 hash functions. Red cells indicate bits set to 1. A query returns "possibly in set" only if all k bits are set.

False Positive Probability

Theorem 18.1 (False Positive Rate)

For a Bloom filter with m bits and k hash functions, after inserting n elements, the probability of a false positive is approximately:

p ≈ (1 - e^-kn/m)^k

Proof.

After inserting one element, each of the k hash functions sets one bit. The probability that a specific bit is not set by one hash function is (1 - 1/m). After kn hash operations (from n elements), the probability a specific bit is still 0 is:

(1 - 1/m)^kn ≈ e^-kn/m

Thus the probability a specific bit is 1 is approximately (1 - e^-kn/m). A false positive occurs when all k bits checked for a non-member happen to be 1, giving probability (1 - e^-kn/m)^k. ∎

Corollary 18.1 (Optimal Number of Hash Functions)

For fixed m and n, the false positive rate is minimized when:

k = (m/n) · ln(2) ≈ 0.693 · (m/n)

With this optimal k, the false positive rate is approximately (0.6185)^m/n.

Example 18.1 (Filter Sizing)

Suppose an SPV wallet has 100 addresses and wants a 0.1% (10^-3) false positive rate. We need:

(0.6185)^m/100 = 10^-3

Taking logarithms: m/100 · ln(0.6185) = ln(10^-3)

m = 100 · (-6.908) / (-0.481) ≈ 1,436 bits ≈ 180 bytes

With k = 0.693 · (1436/100) ≈ 10 hash functions.

18.2 BIP-37 Protocol Specification

BIP-37 defines how SPV clients communicate Bloom filters to full nodes to receive only relevant transactions.

New Message Types

Definition 18.2 (BIP-37 Messages)

BIP-37 introduces four new protocol messages:

Message	Purpose
`filterload`	Send a complete Bloom filter to the peer
`filteradd`	Add data elements to an existing filter
`filterclear`	Remove the current filter
`merkleblock`	Block header with partial Merkle branch for matched transactions

The filterload Message

Definition 18.3 (filterload Structure)

Field	Size	Description
`filter`	var_bytes	The filter bit field (max 36,000 bytes)
`nHashFuncs`	4 bytes	Number of hash functions (max 50)
`nTweak`	4 bytes	Random value to add to hash seed
`nFlags`	1 byte	Controls filter update behavior

Hash Function Construction

Rather than implementing multiple independent hash functions, BIP-37 uses a single hash function with varying seeds:

Algorithm 18.1 (BIP-37 Hash Function)

function hash_i(data, nTweak, nHashFuncs, filter_size):
    // MurmurHash3 with rotating seed
    seed = (i * 0xFBA4C795 + nTweak) mod 2³²
    return MurmurHash3(seed, data) mod (filter_size * 8)

where i ranges from 0 to nHashFuncs - 1, and the constant 0xFBA4C795 is chosen to provide good distribution.

What Gets Matched

When checking if a transaction matches the filter, the following elements are tested:

Definition 18.4 (Transaction Matching)

A transaction matches the Bloom filter if any of the following match:

The transaction hash (txid)
For each output: the scriptPubKey, or each data element pushed by the script
For each input: the previous output point (txid:vout)
For each input: each data element pushed by the scriptSig

The merkleblock Response

Figure 18.2 — A merkleblock message contains just enough Merkle tree nodes to prove that matched transactions are included in the block. The client receives TX B in full, plus hashes H(A) and H(CD) to verify inclusion.

Definition 18.5 (merkleblock Structure)

Field	Size	Description
`header`	80 bytes	Standard block header
`total_txs`	4 bytes	Total number of transactions in block
`hashes`	var_int + n×32	Merkle tree hashes for proof
`flags`	var_bytes	Bit flags indicating tree traversal

18.3 The Privacy Catastrophe

BIP-37's fundamental flaw is that the client sends its filter directly to the server. Even with false positives providing supposed "cover", the privacy loss is severe and often complete.

Critical Privacy Failure

A full node receiving a Bloom filter learns essentially everything about the client's wallet. The filter's contents reveal which addresses the client cares about. Multiple queries from the same filter provide statistical confirmation that eliminates false positive ambiguity.

Attack 1: Direct Address Extraction

Theorem 18.2 (Filter Analysis Attack)

Given a Bloom filter with parameters (m, k, nTweak), an attacker can test arbitrary addresses against the filter locally. By testing all addresses that have ever appeared on the blockchain, the attacker recovers the wallet's address set with high probability.

Proof sketch.

The attacker knows the hash function construction (Algorithm 18.1). For each candidate address a, they compute h₀(a), h₁(a), ..., h_k-1(a) and check if all corresponding bits are set. True positives (actual wallet addresses) will always match. False positives occur at rate p, but with millions of blockchain addresses and typical filter sizes, the number of false positives is small and can often be distinguished through blockchain graph analysis. ∎

Attack 2: Intersection Analysis

Figure 18.3 — When a client updates their filter (e.g., after generating new addresses), the intersection of matched addresses across filter versions eliminates false positives and reveals true wallet addresses.

Theorem 18.3 (Intersection Attack)

If a client sends filter versions F₁ and F₂ with different tweaks or sizes, the intersection of matching addresses has vastly reduced false positive rate:

p_intersection ≈ p₁ · p₂

After just 3 filter versions with p = 0.001, the false positive rate drops to 10^-9.

Attack 3: Transaction Graph Analysis

Even without testing all addresses, a malicious node learns:

Which transactions the client requests (revealing interest)
The order of requests (revealing wallet creation time)
Which outputs are spent together (revealing common ownership)
Transaction timing (correlating with IP address)

Attack 4: Active Probing

Definition 18.6 (Active Filter Probing)

A malicious node can craft transactions with specific output addresses and observe whether the client requests them. By generating transactions to addresses that test specific filter bits, the attacker can efficiently enumerate the filter contents.

Quantifying the Privacy Loss

Theorem 18.4 (Practical Privacy Bound)

Under realistic conditions (persistent connections, repeated queries, filter updates), the expected number of address queries before a malicious node identifies all wallet addresses is O(n), where n is the wallet size. In practice, studies have shown identification within minutes for typical wallets.

Example 18.2 (2014 Privacy Study)

Research by Gervais et al. demonstrated that for a wallet with 20 addresses using a filter with 0.0001 (0.01%) false positive rate:

After observing 20 blocks: ~75% of addresses identified
After observing 100 blocks: ~95% of addresses identified
After 1 filter update: ~99% of addresses identified

18.4 Denial of Service Vulnerabilities

Beyond privacy, BIP-37 exposes full nodes to denial-of-service attacks.

Computational Asymmetry

Theorem 18.5 (DoS Amplification)

Processing a Bloom filter request against a block requires:

Loading the full block from disk
Deserializing all transactions
Testing every scriptPubKey element against the filter
Constructing Merkle proofs for matches

This work is proportional to block size, while the client's request is constant size (~36KB max filter + fixed overhead). A single client can force gigabytes of disk I/O and CPU work.

The nFlags Attack

Definition 18.7 (nFlags Values)

The nFlags field controls automatic filter updates:

Value	Name	Behavior
0	BLOOM_UPDATE_NONE	Don't update the filter
1	BLOOM_UPDATE_ALL	Add outpoints of matching outputs
2	BLOOM_UPDATE_P2PUBKEY_ONLY	Add outpoints only for P2PK/P2MS

With BLOOM_UPDATE_ALL, the filter grows without bound as the node adds outpoints. This causes:

Increasing memory usage on the server
More false positives as the filter fills
Eventually, nearly all transactions match

Mitigations in Bitcoin Core

Bitcoin Core implemented several mitigations before ultimately disabling Bloom filters by default:

Rate limiting: Maximum filter operations per second
Filter size limits: Maximum 36,000 bytes, 50 hash functions
Connection limits: Reduced maximum filter-enabled connections
Default off: Since v0.19.0, -peerbloomfilters=0 by default

18.5 Implementation Details

MurmurHash3 Specification

BIP-37 uses MurmurHash3 (32-bit version) for hashing:

Algorithm 18.2 (MurmurHash3-32)

function MurmurHash3(seed, data):
    const c1 = 0xcc9e2d51
    const c2 = 0x1b873593
    const r1 = 15
    const r2 = 13
    const m = 5
    const n = 0xe6546b64

    h = seed

    // Process 4-byte chunks
    for each 4-byte block k in data:
        k = k * c1
        k = rotl32(k, r1)
        k = k * c2

        h = h XOR k
        h = rotl32(h, r2)
        h = h * m + n

    // Process remaining bytes
    k = 0
    for remaining bytes (little-endian):
        k = k * c1
        k = rotl32(k, r1)
        k = k * c2
        h = h XOR k

    // Finalization
    h = h XOR length
    h = h XOR (h >> 16)
    h = h * 0x85ebca6b
    h = h XOR (h >> 13)
    h = h * 0xc2b2ae35
    h = h XOR (h >> 16)

    return h

Filter Operation Protocol Flow

Figure 18.4 — Complete BIP-37 protocol flow showing filter submission, filtered block requests, and optional filter updates.

18.6 Historical Context and Lessons

Timeline

Date	Event
2012-10	BIP-37 proposed by Mike Hearn and Matt Corallo
2012-12	Implemented in Bitcoin Core 0.8.0
2014	First academic papers demonstrating privacy attacks
2016	CVE-2013-5700 DoS vulnerability publicly disclosed
2017-05	BIP-157/158 (Compact Block Filters) proposed as replacement
2019-11	Bitcoin Core 0.19.0 disables Bloom filters by default

Design Lessons

Principle 18.1 (Server-Side vs Client-Side Privacy)

Privacy designs that require the client to reveal their interests to the server are fundamentally flawed. Even probabilistic "cover" (false positives) provides minimal protection when:

The server can test arbitrary elements against the filter
Multiple filter versions can be intersected
Behavioral patterns reveal true interests

Better approach: The server computes without learning client interests (compact block filters), or the client fetches everything and filters locally.

Principle 18.2 (Computational Asymmetry)

Protocols where clients can cheaply trigger expensive server operations invite DoS attacks. BIP-37 violated this by requiring:

O(1) client request → O(block_size) server work
No proof-of-work or stake from clients
Stateful server resources (stored filters) per connection

Better approach: Pre-compute data structures (Chapter 19) that amortize computation across all clients.

18.7 Why Some Wallets Still Use BIP-37

Despite its flaws, BIP-37 remains in use by some wallets because:

Legacy support: Existing wallets may not have been updated
Bandwidth efficiency: Downloads less data than full blocks
Simplicity: Easier to implement than compact block filters
Historical sync: Some nodes still serve Bloom filter requests

However, using BIP-37 reveals your wallet addresses to any node you connect to. Privacy-conscious users should use wallets that implement:

Compact block filters (BIP-157/158) — see Chapter 19
Full block download with local filtering
Private Electrum servers
Full node with local wallet

Exercises

Exercise 18.1

Calculate the optimal filter size and number of hash functions for a wallet with 500 addresses targeting a 0.01% false positive rate.

Exercise 18.2

Prove that the intersection of two independent Bloom filters (different tweaks) for the same set has false positive rate p₁ · p₂.

Exercise 18.3

A wallet connects to a malicious node with a filter containing 3 addresses. The filter has a 0.1% false positive rate and tests positive for 1000 blockchain addresses. How many filter updates would eliminate all false positives with 99% probability?

Exercise 18.4

Implement a simple Bloom filter in your language of choice. Verify that the false positive rate matches the theoretical prediction for various parameters.

Exercise 18.5

Explain why increasing the false positive rate doesn't meaningfully improve privacy against a determined attacker with access to all blockchain addresses.

Exercise 18.6

Consider an "improved" BIP-37 where the client regenerates the filter with a new random tweak for each block request. Analyze why this still fails to provide privacy.

Chapter Summary

Bloom filters are probabilistic data structures that test set membership with controllable false positive rates but no false negatives.
BIP-37 used Bloom filters to let SPV clients request only relevant transactions from full nodes, reducing bandwidth requirements.
The fundamental flaw: clients send their filter to the server, revealing their interests despite false positive "cover."
Multiple attack vectors completely compromise privacy: direct filter analysis, intersection across updates, transaction graph analysis, and active probing.
DoS vulnerabilities arise from computational asymmetry: small requests trigger expensive server-side operations.
BIP-37 is now disabled by default in Bitcoin Core and should be considered deprecated. Compact block filters (BIP-157/158) provide a privacy-preserving alternative.