Chapter 18
SPV Bloom Filters (BIP-37)
The First Attempt at Lightweight Verification
In 2012, BIP-37 introduced a mechanism for SPV clients to request only relevant transactions from full nodes, rather than downloading every transaction in every block. The approach used Bloom filters—probabilistic data structures that allow set membership queries with controllable false positive rates.
While elegant in theory, BIP-37 proved disastrous for privacy in practice. This chapter examines both the mathematics of Bloom filters and the protocol-level failures that led to its effective deprecation. Understanding these failures is essential for appreciating why modern approaches like compact block filters (Chapter 19) take a fundamentally different approach.
Historical Context
BIP-37 Bloom filters are now widely considered harmful to user privacy and are disabled by default in Bitcoin Core since version 0.19.0 (2019). This chapter documents the approach for historical understanding and to illustrate how privacy can fail catastrophically despite well-intentioned design.
18.1 Bloom Filter Fundamentals
A Bloom filter is a space-efficient probabilistic data structure that tests whether an element is a member of a set. It can produce false positives (claiming an element is in the set when it isn't) but never false negatives (if it says an element isn't in the set, it definitely isn't).
Definition 18.1 (Bloom Filter)
A Bloom filter for a set S consists of:
- A bit array B of m bits, initially all zeros
- A family of k independent hash functions h₁, h₂, ..., hₖ, each mapping elements to {0, 1, ..., m-1}
To add element x: set B[hᵢ(x)] = 1 for all i ∈ {1, ..., k}.
To query element x: return "possibly in set" if B[hᵢ(x)] = 1 for all i; return "definitely not in set" otherwise.
Visual Representation
False Positive Probability
Theorem 18.1 (False Positive Rate)
For a Bloom filter with m bits and k hash functions, after inserting n elements, the probability of a false positive is approximately:
Proof.
After inserting one element, each of the k hash functions sets one bit. The probability that a specific bit is not set by one hash function is (1 - 1/m). After kn hash operations (from n elements), the probability a specific bit is still 0 is:
(1 - 1/m)kn ≈ e-kn/m
Thus the probability a specific bit is 1 is approximately (1 - e-kn/m). A false positive occurs when all k bits checked for a non-member happen to be 1, giving probability (1 - e-kn/m)k. ∎
Corollary 18.1 (Optimal Number of Hash Functions)
For fixed m and n, the false positive rate is minimized when:
With this optimal k, the false positive rate is approximately (0.6185)m/n.
Example 18.1 (Filter Sizing)
Suppose an SPV wallet has 100 addresses and wants a 0.1% (10-3) false positive rate. We need:
(0.6185)m/100 = 10-3
Taking logarithms: m/100 · ln(0.6185) = ln(10-3)
m = 100 · (-6.908) / (-0.481) ≈ 1,436 bits ≈ 180 bytes
With k = 0.693 · (1436/100) ≈ 10 hash functions.
18.2 BIP-37 Protocol Specification
BIP-37 defines how SPV clients communicate Bloom filters to full nodes to receive only relevant transactions.
New Message Types
Definition 18.2 (BIP-37 Messages)
BIP-37 introduces four new protocol messages:
| Message | Purpose |
|---|---|
filterload |
Send a complete Bloom filter to the peer |
filteradd |
Add data elements to an existing filter |
filterclear |
Remove the current filter |
merkleblock |
Block header with partial Merkle branch for matched transactions |
The filterload Message
Definition 18.3 (filterload Structure)
| Field | Size | Description |
|---|---|---|
filter |
var_bytes | The filter bit field (max 36,000 bytes) |
nHashFuncs |
4 bytes | Number of hash functions (max 50) |
nTweak |
4 bytes | Random value to add to hash seed |
nFlags |
1 byte | Controls filter update behavior |
Hash Function Construction
Rather than implementing multiple independent hash functions, BIP-37 uses a single hash function with varying seeds:
Algorithm 18.1 (BIP-37 Hash Function)
function hash_i(data, nTweak, nHashFuncs, filter_size):
// MurmurHash3 with rotating seed
seed = (i * 0xFBA4C795 + nTweak) mod 2³²
return MurmurHash3(seed, data) mod (filter_size * 8)
where i ranges from 0 to nHashFuncs - 1, and the constant 0xFBA4C795 is chosen to provide good distribution.
What Gets Matched
When checking if a transaction matches the filter, the following elements are tested:
Definition 18.4 (Transaction Matching)
A transaction matches the Bloom filter if any of the following match:
- The transaction hash (txid)
- For each output: the scriptPubKey, or each data element pushed by the script
- For each input: the previous output point (txid:vout)
- For each input: each data element pushed by the scriptSig
The merkleblock Response
Definition 18.5 (merkleblock Structure)
| Field | Size | Description |
|---|---|---|
header |
80 bytes | Standard block header |
total_txs |
4 bytes | Total number of transactions in block |
hashes |
var_int + n×32 | Merkle tree hashes for proof |
flags |
var_bytes | Bit flags indicating tree traversal |
18.3 The Privacy Catastrophe
BIP-37's fundamental flaw is that the client sends its filter directly to the server. Even with false positives providing supposed "cover", the privacy loss is severe and often complete.
Critical Privacy Failure
A full node receiving a Bloom filter learns essentially everything about the client's wallet. The filter's contents reveal which addresses the client cares about. Multiple queries from the same filter provide statistical confirmation that eliminates false positive ambiguity.
Attack 1: Direct Address Extraction
Theorem 18.2 (Filter Analysis Attack)
Given a Bloom filter with parameters (m, k, nTweak), an attacker can test arbitrary addresses against the filter locally. By testing all addresses that have ever appeared on the blockchain, the attacker recovers the wallet's address set with high probability.
Proof sketch.
The attacker knows the hash function construction (Algorithm 18.1). For each candidate address a, they compute h₀(a), h₁(a), ..., hk-1(a) and check if all corresponding bits are set. True positives (actual wallet addresses) will always match. False positives occur at rate p, but with millions of blockchain addresses and typical filter sizes, the number of false positives is small and can often be distinguished through blockchain graph analysis. ∎
Attack 2: Intersection Analysis
Theorem 18.3 (Intersection Attack)
If a client sends filter versions F₁ and F₂ with different tweaks or sizes, the intersection of matching addresses has vastly reduced false positive rate:
After just 3 filter versions with p = 0.001, the false positive rate drops to 10-9.
Attack 3: Transaction Graph Analysis
Even without testing all addresses, a malicious node learns:
- Which transactions the client requests (revealing interest)
- The order of requests (revealing wallet creation time)
- Which outputs are spent together (revealing common ownership)
- Transaction timing (correlating with IP address)
Attack 4: Active Probing
Definition 18.6 (Active Filter Probing)
A malicious node can craft transactions with specific output addresses and observe whether the client requests them. By generating transactions to addresses that test specific filter bits, the attacker can efficiently enumerate the filter contents.
Quantifying the Privacy Loss
Theorem 18.4 (Practical Privacy Bound)
Under realistic conditions (persistent connections, repeated queries, filter updates), the expected number of address queries before a malicious node identifies all wallet addresses is O(n), where n is the wallet size. In practice, studies have shown identification within minutes for typical wallets.
Example 18.2 (2014 Privacy Study)
Research by Gervais et al. demonstrated that for a wallet with 20 addresses using a filter with 0.0001 (0.01%) false positive rate:
- After observing 20 blocks: ~75% of addresses identified
- After observing 100 blocks: ~95% of addresses identified
- After 1 filter update: ~99% of addresses identified
18.4 Denial of Service Vulnerabilities
Beyond privacy, BIP-37 exposes full nodes to denial-of-service attacks.
Computational Asymmetry
Theorem 18.5 (DoS Amplification)
Processing a Bloom filter request against a block requires:
- Loading the full block from disk
- Deserializing all transactions
- Testing every scriptPubKey element against the filter
- Constructing Merkle proofs for matches
This work is proportional to block size, while the client's request is constant size (~36KB max filter + fixed overhead). A single client can force gigabytes of disk I/O and CPU work.
The nFlags Attack
Definition 18.7 (nFlags Values)
The nFlags field controls automatic filter updates:
| Value | Name | Behavior |
|---|---|---|
| 0 | BLOOM_UPDATE_NONE | Don't update the filter |
| 1 | BLOOM_UPDATE_ALL | Add outpoints of matching outputs |
| 2 | BLOOM_UPDATE_P2PUBKEY_ONLY | Add outpoints only for P2PK/P2MS |
With BLOOM_UPDATE_ALL, the filter grows without bound as the node adds outpoints. This causes:
- Increasing memory usage on the server
- More false positives as the filter fills
- Eventually, nearly all transactions match
Mitigations in Bitcoin Core
Bitcoin Core implemented several mitigations before ultimately disabling Bloom filters by default:
- Rate limiting: Maximum filter operations per second
- Filter size limits: Maximum 36,000 bytes, 50 hash functions
- Connection limits: Reduced maximum filter-enabled connections
- Default off: Since v0.19.0,
-peerbloomfilters=0by default
18.5 Implementation Details
MurmurHash3 Specification
BIP-37 uses MurmurHash3 (32-bit version) for hashing:
Algorithm 18.2 (MurmurHash3-32)
function MurmurHash3(seed, data):
const c1 = 0xcc9e2d51
const c2 = 0x1b873593
const r1 = 15
const r2 = 13
const m = 5
const n = 0xe6546b64
h = seed
// Process 4-byte chunks
for each 4-byte block k in data:
k = k * c1
k = rotl32(k, r1)
k = k * c2
h = h XOR k
h = rotl32(h, r2)
h = h * m + n
// Process remaining bytes
k = 0
for remaining bytes (little-endian):
k = k * c1
k = rotl32(k, r1)
k = k * c2
h = h XOR k
// Finalization
h = h XOR length
h = h XOR (h >> 16)
h = h * 0x85ebca6b
h = h XOR (h >> 13)
h = h * 0xc2b2ae35
h = h XOR (h >> 16)
return h
Filter Operation Protocol Flow
18.6 Historical Context and Lessons
Timeline
| Date | Event |
|---|---|
| 2012-10 | BIP-37 proposed by Mike Hearn and Matt Corallo |
| 2012-12 | Implemented in Bitcoin Core 0.8.0 |
| 2014 | First academic papers demonstrating privacy attacks |
| 2016 | CVE-2013-5700 DoS vulnerability publicly disclosed |
| 2017-05 | BIP-157/158 (Compact Block Filters) proposed as replacement |
| 2019-11 | Bitcoin Core 0.19.0 disables Bloom filters by default |
Design Lessons
Principle 18.1 (Server-Side vs Client-Side Privacy)
Privacy designs that require the client to reveal their interests to the server are fundamentally flawed. Even probabilistic "cover" (false positives) provides minimal protection when:
- The server can test arbitrary elements against the filter
- Multiple filter versions can be intersected
- Behavioral patterns reveal true interests
Better approach: The server computes without learning client interests (compact block filters), or the client fetches everything and filters locally.
Principle 18.2 (Computational Asymmetry)
Protocols where clients can cheaply trigger expensive server operations invite DoS attacks. BIP-37 violated this by requiring:
- O(1) client request → O(block_size) server work
- No proof-of-work or stake from clients
- Stateful server resources (stored filters) per connection
Better approach: Pre-compute data structures (Chapter 19) that amortize computation across all clients.
18.7 Why Some Wallets Still Use BIP-37
Despite its flaws, BIP-37 remains in use by some wallets because:
- Legacy support: Existing wallets may not have been updated
- Bandwidth efficiency: Downloads less data than full blocks
- Simplicity: Easier to implement than compact block filters
- Historical sync: Some nodes still serve Bloom filter requests
However, using BIP-37 reveals your wallet addresses to any node you connect to. Privacy-conscious users should use wallets that implement:
- Compact block filters (BIP-157/158) — see Chapter 19
- Full block download with local filtering
- Private Electrum servers
- Full node with local wallet
Exercises
Exercise 18.1
Calculate the optimal filter size and number of hash functions for a wallet with 500 addresses targeting a 0.01% false positive rate.
Exercise 18.2
Prove that the intersection of two independent Bloom filters (different tweaks) for the same set has false positive rate p₁ · p₂.
Exercise 18.3
A wallet connects to a malicious node with a filter containing 3 addresses. The filter has a 0.1% false positive rate and tests positive for 1000 blockchain addresses. How many filter updates would eliminate all false positives with 99% probability?
Exercise 18.4
Implement a simple Bloom filter in your language of choice. Verify that the false positive rate matches the theoretical prediction for various parameters.
Exercise 18.5
Explain why increasing the false positive rate doesn't meaningfully improve privacy against a determined attacker with access to all blockchain addresses.
Exercise 18.6
Consider an "improved" BIP-37 where the client regenerates the filter with a new random tweak for each block request. Analyze why this still fails to provide privacy.
Chapter Summary
- Bloom filters are probabilistic data structures that test set membership with controllable false positive rates but no false negatives.
- BIP-37 used Bloom filters to let SPV clients request only relevant transactions from full nodes, reducing bandwidth requirements.
- The fundamental flaw: clients send their filter to the server, revealing their interests despite false positive "cover."
- Multiple attack vectors completely compromise privacy: direct filter analysis, intersection across updates, transaction graph analysis, and active probing.
- DoS vulnerabilities arise from computational asymmetry: small requests trigger expensive server-side operations.
- BIP-37 is now disabled by default in Bitcoin Core and should be considered deprecated. Compact block filters (BIP-157/158) provide a privacy-preserving alternative.