Chapter 18

SPV Bloom Filters (BIP-37)

The First Attempt at Lightweight Verification

In 2012, BIP-37 introduced a mechanism for SPV clients to request only relevant transactions from full nodes, rather than downloading every transaction in every block. The approach used Bloom filters—probabilistic data structures that allow set membership queries with controllable false positive rates.

While elegant in theory, BIP-37 proved disastrous for privacy in practice. This chapter examines both the mathematics of Bloom filters and the protocol-level failures that led to its effective deprecation. Understanding these failures is essential for appreciating why modern approaches like compact block filters (Chapter 19) take a fundamentally different approach.

Historical Context

BIP-37 Bloom filters are now widely considered harmful to user privacy and are disabled by default in Bitcoin Core since version 0.19.0 (2019). This chapter documents the approach for historical understanding and to illustrate how privacy can fail catastrophically despite well-intentioned design.

18.1 Bloom Filter Fundamentals

A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. It can produce false positives (claiming an element is in the set when it isn't) but never false negatives (if it says an element isn't in the set, it definitely isn't).

Definition 18.1 (Bloom Filter)

A Bloom filter for a set S consists of:

  1. A bit array B of m bits, initially all zeros
  2. A family of k independent hash functions h₁, h₂, ..., hₖ, each mapping elements to {0, 1, ..., m-1}

To add element x: set B[hᵢ(x)] = 1 for all i ∈ {1, ..., k}.

To query element x: return "possibly in set" if B[hᵢ(x)] = 1 for all i; return "definitely not in set" otherwise.

Visual Representation

Bloom Filter Bit Array (m = 16 bits) 0 1 0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Insert "address_A": h₁("address_A") = 1 h₂("address_A") = 5 h₃("address_A") = 9 Insert "address_B": h₁("address_B") = 3 h₂("address_B") = 11 h₃("address_B") = 14 Query Examples: Query "address_A": B[1]=1 ✓ B[5]=1 ✓ B[9]=1 ✓ → "Possibly in set" Query "address_C": h₁=7, B[7]=0 → "Definitely not in set"
Figure 18.1 — Bloom filter operations with k=3 hash functions. Red cells indicate bits set to 1. A query returns "possibly in set" only if all k bits are set.

False Positive Probability

Theorem 18.1 (False Positive Rate)

For a Bloom filter with m bits and k hash functions, after inserting n elements, the probability of a false positive is approximately:

p ≈ (1 - e-kn/m)k

Proof.

After inserting one element, each of the k hash functions sets one bit. The probability that a specific bit is not set by one hash function is (1 - 1/m). After kn hash operations (from n elements), the probability a specific bit is still 0 is:

(1 - 1/m)kn ≈ e-kn/m

Thus the probability a specific bit is 1 is approximately (1 - e-kn/m). A false positive occurs when all k bits checked for a non-member happen to be 1, giving probability (1 - e-kn/m)k. ∎

Corollary 18.1 (Optimal Number of Hash Functions)

For fixed m and n, the false positive rate is minimized when:

k = (m/n) · ln(2) ≈ 0.693 · (m/n)

With this optimal k, the false positive rate is approximately (0.6185)m/n.

Example 18.1 (Filter Sizing)

Suppose an SPV wallet has 100 addresses and wants a 0.1% (10-3) false positive rate. We need:

(0.6185)m/100 = 10-3

Taking logarithms: m/100 · ln(0.6185) = ln(10-3)

m = 100 · (-6.908) / (-0.481) ≈ 1,436 bits ≈ 180 bytes

With k = 0.693 · (1436/100) ≈ 10 hash functions.

18.2 BIP-37 Protocol Specification

BIP-37 defines how SPV clients communicate Bloom filters to full nodes to receive only relevant transactions.

New Message Types

Definition 18.2 (BIP-37 Messages)

BIP-37 introduces four new protocol messages:

Message Purpose
filterload Send a complete Bloom filter to the peer
filteradd Add data elements to an existing filter
filterclear Remove the current filter
merkleblock Block header with partial Merkle branch for matched transactions

The filterload Message

Definition 18.3 (filterload Structure)

Field Size Description
filter var_bytes The filter bit field (max 36,000 bytes)
nHashFuncs 4 bytes Number of hash functions (max 50)
nTweak 4 bytes Random value to add to hash seed
nFlags 1 byte Controls filter update behavior

Hash Function Construction

Rather than implementing multiple independent hash functions, BIP-37 uses a single hash function with varying seeds:

Algorithm 18.1 (BIP-37 Hash Function)

function hash_i(data, nTweak, nHashFuncs, filter_size):
    // MurmurHash3 with rotating seed
    seed = (i * 0xFBA4C795 + nTweak) mod 2³²
    return MurmurHash3(seed, data) mod (filter_size * 8)

where i ranges from 0 to nHashFuncs - 1, and the constant 0xFBA4C795 is chosen to provide good distribution.

What Gets Matched

When checking if a transaction matches the filter, the following elements are tested:

Definition 18.4 (Transaction Matching)

A transaction matches the Bloom filter if any of the following match:

  1. The transaction hash (txid)
  2. For each output: the scriptPubKey, or each data element pushed by the script
  3. For each input: the previous output point (txid:vout)
  4. For each input: each data element pushed by the scriptSig

The merkleblock Response

Merkle Block Structure 80-byte Block Header version | prev_hash | merkle_root | timestamp | bits | nonce Partial Merkle Tree Merkle Root H(AB) H(CD) - included H(A) - included H(B) - MATCHED TX B Full transaction data Computed from data Included in message Intermediate computation
Figure 18.2 — A merkleblock message contains just enough Merkle tree nodes to prove that matched transactions are included in the block. The client receives TX B in full, plus hashes H(A) and H(CD) to verify inclusion.

Definition 18.5 (merkleblock Structure)

Field Size Description
header 80 bytes Standard block header
total_txs 4 bytes Total number of transactions in block
hashes var_int + n×32 Merkle tree hashes for proof
flags var_bytes Bit flags indicating tree traversal

18.3 The Privacy Catastrophe

BIP-37's fundamental flaw is that the client sends its filter directly to the server. Even with false positives providing supposed "cover", the privacy loss is severe and often complete.

Critical Privacy Failure

A full node receiving a Bloom filter learns essentially everything about the client's wallet. The filter's contents reveal which addresses the client cares about. Multiple queries from the same filter provide statistical confirmation that eliminates false positive ambiguity.

Attack 1: Direct Address Extraction

Theorem 18.2 (Filter Analysis Attack)

Given a Bloom filter with parameters (m, k, nTweak), an attacker can test arbitrary addresses against the filter locally. By testing all addresses that have ever appeared on the blockchain, the attacker recovers the wallet's address set with high probability.

Proof sketch.

The attacker knows the hash function construction (Algorithm 18.1). For each candidate address a, they compute h₀(a), h₁(a), ..., hk-1(a) and check if all corresponding bits are set. True positives (actual wallet addresses) will always match. False positives occur at rate p, but with millions of blockchain addresses and typical filter sizes, the number of false positives is small and can often be distinguished through blockchain graph analysis. ∎

Attack 2: Intersection Analysis

Intersection Attack Across Filter Updates Filter at time t₁: addr_1 addr_2 FP_1 FP_2 Filter at time t₂: addr_1 addr_2 addr_3 FP_3 FP_4 Intersection reveals true addresses: addr_1, addr_2 False positives eliminated
Figure 18.3 — When a client updates their filter (e.g., after generating new addresses), the intersection of matched addresses across filter versions eliminates false positives and reveals true wallet addresses.

Theorem 18.3 (Intersection Attack)

If a client sends filter versions F₁ and F₂ with different tweaks or sizes, the intersection of matching addresses has vastly reduced false positive rate:

pintersection ≈ p₁ · p₂

After just 3 filter versions with p = 0.001, the false positive rate drops to 10-9.

Attack 3: Transaction Graph Analysis

Even without testing all addresses, a malicious node learns:

Attack 4: Active Probing

Definition 18.6 (Active Filter Probing)

A malicious node can craft transactions with specific output addresses and observe whether the client requests them. By generating transactions to addresses that test specific filter bits, the attacker can efficiently enumerate the filter contents.

Quantifying the Privacy Loss

Theorem 18.4 (Practical Privacy Bound)

Under realistic conditions (persistent connections, repeated queries, filter updates), the expected number of address queries before a malicious node identifies all wallet addresses is O(n), where n is the wallet size. In practice, studies have shown identification within minutes for typical wallets.

Example 18.2 (2014 Privacy Study)

Research by Gervais et al. demonstrated that for a wallet with 20 addresses using a filter with 0.0001 (0.01%) false positive rate:

  • After observing 20 blocks: ~75% of addresses identified
  • After observing 100 blocks: ~95% of addresses identified
  • After 1 filter update: ~99% of addresses identified

18.4 Denial of Service Vulnerabilities

Beyond privacy, BIP-37 exposes full nodes to denial-of-service attacks.

Computational Asymmetry

Theorem 18.5 (DoS Amplification)

Processing a Bloom filter request against a block requires:

  • Loading the full block from disk
  • Deserializing all transactions
  • Testing every scriptPubKey element against the filter
  • Constructing Merkle proofs for matches

This work is proportional to block size, while the client's request is constant size (~36KB max filter + fixed overhead). A single client can force gigabytes of disk I/O and CPU work.

The nFlags Attack

Definition 18.7 (nFlags Values)

The nFlags field controls automatic filter updates:

Value Name Behavior
0 BLOOM_UPDATE_NONE Don't update the filter
1 BLOOM_UPDATE_ALL Add outpoints of matching outputs
2 BLOOM_UPDATE_P2PUBKEY_ONLY Add outpoints only for P2PK/P2MS

With BLOOM_UPDATE_ALL, the filter grows without bound as the node adds outpoints. This causes:

Mitigations in Bitcoin Core

Bitcoin Core implemented several mitigations before ultimately disabling Bloom filters by default:

18.5 Implementation Details

MurmurHash3 Specification

BIP-37 uses MurmurHash3 (32-bit version) for hashing:

Algorithm 18.2 (MurmurHash3-32)

function MurmurHash3(seed, data):
    const c1 = 0xcc9e2d51
    const c2 = 0x1b873593
    const r1 = 15
    const r2 = 13
    const m = 5
    const n = 0xe6546b64

    h = seed

    // Process 4-byte chunks
    for each 4-byte block k in data:
        k = k * c1
        k = rotl32(k, r1)
        k = k * c2

        h = h XOR k
        h = rotl32(h, r2)
        h = h * m + n

    // Process remaining bytes
    k = 0
    for remaining bytes (little-endian):
        k = k * c1
        k = rotl32(k, r1)
        k = k * c2
        h = h XOR k

    // Finalization
    h = h XOR length
    h = h XOR (h >> 16)
    h = h * 0x85ebca6b
    h = h XOR (h >> 13)
    h = h * 0xc2b2ae35
    h = h XOR (h >> 16)

    return h

Filter Operation Protocol Flow

BIP-37 Protocol Flow SPV Client Full Node filterload (filter, k, tweak, flags) Store filter for this connection getdata (MSG_FILTERED_BLOCK) Load block, test each tx against filter merkleblock (header, partial tree) tx (matched transaction) Verify tx in merkle tree, header in chain filteradd (new address) [optional] Add element to stored filter inv (new block announcement) getdata (MSG_FILTERED_BLOCK) ... cycle continues for each new block ...
Figure 18.4 — Complete BIP-37 protocol flow showing filter submission, filtered block requests, and optional filter updates.

18.6 Historical Context and Lessons

Timeline

Date Event
2012-10 BIP-37 proposed by Mike Hearn and Matt Corallo
2012-12 Implemented in Bitcoin Core 0.8.0
2014 First academic papers demonstrating privacy attacks
2016 CVE-2013-5700 DoS vulnerability publicly disclosed
2017-05 BIP-157/158 (Compact Block Filters) proposed as replacement
2019-11 Bitcoin Core 0.19.0 disables Bloom filters by default

Design Lessons

Principle 18.1 (Server-Side vs Client-Side Privacy)

Privacy designs that require the client to reveal their interests to the server are fundamentally flawed. Even probabilistic "cover" (false positives) provides minimal protection when:

  • The server can test arbitrary elements against the filter
  • Multiple filter versions can be intersected
  • Behavioral patterns reveal true interests

Better approach: The server computes without learning client interests (compact block filters), or the client fetches everything and filters locally.

Principle 18.2 (Computational Asymmetry)

Protocols where clients can cheaply trigger expensive server operations invite DoS attacks. BIP-37 violated this by requiring:

  • O(1) client request → O(block_size) server work
  • No proof-of-work or stake from clients
  • Stateful server resources (stored filters) per connection

Better approach: Pre-compute data structures (Chapter 19) that amortize computation across all clients.

18.7 Why Some Wallets Still Use BIP-37

Despite its flaws, BIP-37 remains in use by some wallets because:

However, using BIP-37 reveals your wallet addresses to any node you connect to. Privacy-conscious users should use wallets that implement:

Exercises

Exercise 18.1

Calculate the optimal filter size and number of hash functions for a wallet with 500 addresses targeting a 0.01% false positive rate.

Exercise 18.2

Prove that the intersection of two independent Bloom filters (different tweaks) for the same set has false positive rate p₁ · p₂.

Exercise 18.3

A wallet connects to a malicious node with a filter containing 3 addresses. The filter has a 0.1% false positive rate and tests positive for 1000 blockchain addresses. How many filter updates would eliminate all false positives with 99% probability?

Exercise 18.4

Implement a simple Bloom filter in your language of choice. Verify that the false positive rate matches the theoretical prediction for various parameters.

Exercise 18.5

Explain why increasing the false positive rate doesn't meaningfully improve privacy against a determined attacker with access to all blockchain addresses.

Exercise 18.6

Consider an "improved" BIP-37 where the client regenerates the filter with a new random tweak for each block request. Analyze why this still fails to provide privacy.

Chapter Summary