Chapter 6: Hash Functions | Elementary Bitcoin

Before we can understand digital signatures, we must first explore a different kind of cryptographic primitive: the hash function. While elliptic curves provide the mathematical foundation for key pairs and signatures, hash functions provide the glue that binds messages to their signatures.

A hash function is, in essence, a fingerprinting machine for data. It takes an input of any size and produces a fixed-size output that is, for all practical purposes, unique to that input.

6.1 Cryptographic Hash Functions

Definition 6.1 (Cryptographic Hash Function)

A cryptographic hash function H is a function that maps inputs of arbitrary length to outputs of fixed length:

H: {0, 1}* → {0, 1}ⁿ

and satisfies the following security properties:

Preimage resistance: Given h, it is computationally infeasible to find any m such that H(m) = h.
Second preimage resistance: Given m₁, it is computationally infeasible to find m₂ ≠ m₁ such that H(m₁) = H(m₂).
Collision resistance: It is computationally infeasible to find any pair m₁ ≠ m₂ such that H(m₁) = H(m₂).

Figure 6.1: The three security properties that define a cryptographic hash function.

The Pigeonhole Principle.

Since hash functions map an infinite domain to a finite range, collisions must exist by the pigeonhole principle. Collision resistance doesn't mean collisions are impossible—it means they are computationally infeasible to find deliberately.

6.2 SHA-256: The Secure Hash Algorithm

Bitcoin uses SHA-256 (Secure Hash Algorithm, 256-bit) as its primary hash function. It was designed by the National Security Agency (NSA) and published by NIST in 2001.

Definition 6.2 (SHA-256)

SHA-256 is a hash function that produces a 256-bit (32-byte) digest:

SHA256: {0, 1}* → {0, 1}²⁵⁶

The output is typically represented as a 64-character hexadecimal string.

Example 6.1 (SHA-256 Outputs)

Some example SHA-256 hashes:

Input	SHA-256 Hash (hex)
`""` (empty)	`e3b0c44298fc1c14...9bce4dbf`
`"Bitcoin"`	`b4056df6691f8dc7...a2f9a1f6`
`"bitcoin"`	`6b88c087247aa2f0...dd43f3f3`

Notice how changing a single character (capital B to lowercase b) completely changes the hash. This is the avalanche effect.

The Avalanche Effect.

A good hash function exhibits the avalanche effect: changing a single bit of input changes approximately 50% of the output bits. This property ensures that similar inputs produce dissimilar outputs.

Figure 6.2: Changing one character completely changes the hash output.

6.3 Double SHA-256 in Bitcoin

Bitcoin does not use plain SHA-256 for most purposes. Instead, it uses double SHA-256: the SHA-256 hash is computed twice.

Definition 6.3 (HASH256 / Double SHA-256)

Bitcoin's primary hash function, often called HASH256, is:

HASH256(m) = SHA256(SHA256(m))

The result is still 256 bits, but computed via two sequential SHA-256 operations.

Figure 6.3: Double SHA-256 applies SHA-256 twice in sequence.

Why Double Hashing?

Satoshi Nakamoto implemented double hashing as a defense against length extension attacks. For Merkle-Damgård constructions like SHA-256, if you know H(m), you can compute H(m || padding || m') without knowing m. Double hashing prevents this.

Modern cryptographers debate whether this precaution was necessary, but it remains a distinctive feature of Bitcoin's design.

6.4 RIPEMD-160 and HASH160

For Bitcoin addresses, a different hash combination is used: SHA-256 followed by RIPEMD-160, producing a 160-bit result.

Definition 6.4 (HASH160)

HASH160 is the composition:

HASH160(m) = RIPEMD160(SHA256(m))

The output is 160 bits (20 bytes), which forms the basis of Bitcoin addresses.

Defense in Depth.

Using two different hash functions (SHA-256 from NSA, RIPEMD-160 from European researchers) provides redundancy: if one is broken, the other may still be secure. The shorter 160-bit output also makes addresses more compact.

Figure 6.4: HASH160 compresses a public key to 160 bits for use in addresses.

6.5 Hash Functions in Digital Signatures

In digital signature schemes, we don't sign the message directly. Instead, we sign its hash. This approach offers several advantages:

Why Hash Before Signing?

Fixed input size: The signature algorithm operates on a fixed-size input (256 bits), regardless of message length.
Efficiency: Signing a 32-byte hash is much faster than signing a potentially gigabyte-sized message.
Security: The hash binds the signature to the message content. If the hash is collision-resistant, an attacker cannot find a different message with the same signature.

Figure 6.5: The hash-then-sign paradigm: hash first, then sign the digest.

6.6 The Random Oracle Model

When analyzing the security of cryptographic schemes, we often model hash functions as random oracles—idealized black boxes with perfect randomness.

Definition 6.5 (Random Oracle)

A random oracle is an idealized hash function that:

On a new input, outputs a uniformly random value from its range.
On a repeated input, returns the same value as before (determinism).

No real hash function is a random oracle, but well-designed functions like SHA-256 behave sufficiently "random-like" for practical security.

Theoretical vs. Practical Security.

Security proofs in the random oracle model provide strong guarantees under the assumption that the hash function is "ideal." While no real function achieves this ideal, such proofs are valuable because breaking the scheme would require exploiting specific structural weaknesses in the hash function rather than just its input-output behavior.

6.7 Tagged Hashes (BIP-340)

Modern Bitcoin protocols like Schnorr signatures (BIP-340) use tagged hashes to prevent collisions between different uses of the same hash function.

Definition 6.6 (Tagged Hash)

A tagged hash with tag t is defined as:

TaggedHash(t, m) = SHA256(SHA256(t) || SHA256(t) || m)

The tag is a string describing the context (e.g., "BIP0340/challenge").

Domain Separation.

Tagged hashes ensure that a hash computed for one purpose cannot be reinterpreted as valid for another. For example, a challenge hash for a signature cannot accidentally equal a hash used for key derivation.

Exercises

6.1. Explain why collision resistance implies second preimage resistance.

6.2. If a hash function produces 256-bit outputs, approximately how many hashes must be computed before a collision is likely? (Hint: Birthday paradox)

6.3. Explain why using H(H(m)) prevents length extension attacks that are possible with just H(m).

6.4. (Computational) Compute the SHA-256 hash of the string "Bitcoin" and verify it has 256 bits.

6.5. If SHA-256 were broken such that collisions could be found in 2⁸⁰ operations, what security level would remain? Would ECDSA signatures still be secure?