Statistically unlikely, but possible: the message is exactly 55 bytes long. In this case, adding the marker
(0x80
) and the original length
requires a second block, even though the message seems “perfect”.
If the original message is too long to fit the marker and length in the first block, we act differently. The first block is filled with the marker and zeros until the end. Then a second block, almost empty, is created to contain only zeros and, in the last 8 bytes, the original length.
If the message exceeds 64 bytes, it’s split into as many blocks as needed. The last block, which will be partially filled, is handled with the same logic as previous cases: add marker, zeros, and length.
What happens if the message is too long? Padding reserves 8 bytes (64 bits) for the message length. This imposes a theoretical maximum message size of
2^64 - 1
bits. This is over two million terabytes—a data volume so vast it poses no limitation in any practical scenario. Once blocks are prepared, the heart of the hashing process begins.
The algorithm processes each 64-byte block sequentially to update a 256-bit internal state. This state is the “memory,” the state
of the calculation, composed of 8 32-bit variables: A, B, C, D, E, F, G, H
.
At the start, this state is loaded with predefined standard values known as the Initialization Vector (IV). The core of SHA-256 is its compression function. This function takes two inputs:
Where does SHA-256’s Initialization Vector come from?
The SHA-256 algorithm was developed by the NSA and published by the NIST. To dispel doubts about potential “backdoors” hidden in the initial constants, cryptographers use the technique of “nothing-up-my-sleeve numbers”.
Instead of using arbitrary numbers, SHA-256’s IV values are derived from universal mathematical principles. Specifically, they’re the fractional parts of the square roots of the first eight prime numbers (2, 3, 5, …, 19), converted into 32-bit values.
sqrt(2) = 1.41421...
-> fractional part0.41421...
->0x6a09e667
This transparent choice makes it extremely unlikely that hidden weaknesses exist in the constants, building necessary trust for a global standard.
const primes = [2, 3, 5, 7, 11, 13, 17, 19]; const IV = primes.map((p) => { // 1. Calculate the square root const sqrt = Math.sqrt(p); // 2. Isolate the fractional part const fraction = sqrt - Math.floor(sqrt); // 3. Multiply by 2^32 to map to a 32-bit integer const value32bit = Math.floor(fraction * Math.pow(2, 32)); // 4. Convert to hexadecimal and format to 8 characters (32 bits) return value32bit.toString(16).padStart(8, "0"); });
The true strength—and the weakness we’ll examine—lies in how blocks are linked together.
This refers to the scenario where two different messages produce the exact same hash. SHA-256 is designed to resist collisions, making their discovery computationally impossible. The probability of one occurring by chance is so infinitesimal it’s irrelevant in practice. To understand why collisions must exist, think of the Pigeonhole Principle. If you have 10 pigeons but only 9 pigeonholes (or holes), at least one pigeonhole must contain more than one pigeon. Apply this to SHA-256:
48 65 72 65 20 79 6f 75 20 63 61 6e 20 73 65 65 20 74 68 65 20 64 69 66 66 65 72 65 6e 74 20 68 61 73 68 69 6e 67 20 72 6f 75 6e 64 73 20 66 6f 72 20 65 61 63 68 20 62 6c 6f 63 6b 2e 20 55 73
65 20 74 68 65 20 61 72 72 6f 77 73 20 61 6e 64 20 6f 62 73 65 72 76 65 20 68 6f 77 20 74 68 65 20 6f 75 74 70 75 74 20 6f 66 20 6f 6e 65 20 73 74 65 70 20 62 65 63 6f 6d 65 73 20 74 68 65 20
69 6e 70 75 74 20 6f 66 20 74 68 65 20 6e 65 78 74 2e 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 90
+
=
Parziale (dopo passo 1)
482470f90ec8fe654993df06147cc2116bafe44c7ff7263997b9ef5bb50f0178
Now that you know how SHA256 works, you can understand how the Length Extension Attack works.
Imagine you want to download the “Secure Browser” from the site SecureApp.com
. To ensure the file hasn’t been tampered with by malicious actors, the site provides a verification token. The problem arises because SecureApp.com
calculates this token in a vulnerable way:
secret_key = b"s3cr3t_k3y" # 20 bytes
file_name = b"SecureBrowser.exe" # 17 bytes
creation_date = b"2025-07-11" # 10 bytes
original_file_content = ... # 100,000 bytes
original_message = secret_key + file_name + creation_date + original_file_content
digest = sha256(original_message)
secret_key
is a secret string known only to the server. The total length of theoriginal_message
is 20 (secret_key
) + 17 (file_name
) + 10 (creation_date
) + 100,000 (original_file_content
) = 100,047 bytes
import hashlib
original_message = (
secret_key +
file_name +
creation_date +
original_file_content
)
server_hasher = hashlib.sha256()
server_hasher.update(original_message)
original_token = server_hasher.hexdigest()
# original_token: 'f7c3bc4102d591b61c94488b3941e7d9...'
This original_token
is published on the site next to the download link for SecureBrowser.exe
. A Man-in-the-Middle (MITM) attacker intercepts your connection. They want you to download a version of SecureBrowser.exe
with added malware, but without triggering an alarm during token verification. Attacker’s Goal: Provide SecureBrowser.exe
(modified) and a forged_token
that, when verified by your system, will result in a valid check. Information Known to the Attacker (from interception):
original_token
: The hash value (e.g., f7c3bc4102d591b61c94488b3941e7d9...
).name
(b"SecureBrowser.exe"
), date
(b"2024-07-11"
), content
(the attacker can download it too).secret_key
: The attacker must estimate or know this length (in our case, 20 bytes). The attack works only if the attacker guesses the exact key length because that length directly affects the SHA-256 padding that must be replicated to obtain a valid digest. This is the only piece of information about the key the attacker needs. Data the Attacker Wants to Add (malware_data
):malware_data = b"\x90\x90\x90\x90" + b"evil_code_goes_here!"
(24 bytes, simulating binary malicious code)The attacker calculates the exact length of the input that generated the original_token
, as if they had the key:
estimated_secret_length = 20
# Total length hashed by the server (including the secret key)
original_hashed_length_bytes = (
estimated_secret_length +
len(file_name) # 17
len(creation_date) # 10
len(original_file_content) # 100,000
) # Results in 100,047 bytes
SHA-256 adds padding to make the message length a multiple of 64 bytes and includes the original length in the last 8 bytes of padding. The attacker must simulate this padding.
def generate_sha256_padding(length_in_bytes):
"""
Generates SHA-256 padding for a message of a given length.
This padding is what SHA-256 adds before processing the last block.
"""
length_in_bits = length_in_bytes * 8 # Convert length to bits
padding = b'\x80' # Start with the '1' bit (represented as 0x80 = 10000000 binary)
# Calculate the number of zero bits (0x00) needed.
# We must reach a multiple of 512 bits (64 bytes),
# but the last 64 bits (8 bytes) are reserved for the original length.
# So, (current_length_in_bits + 1 (for 0x80) + k (zeros) + 64 (original_length)) % 512 == 0
# Simplifying: (current_length_in_bits + 65 + k) % 512 == 0
# Solve for k: k = (512 - (current_length_in_bits % 512 + 65)) % 512
k = (512 - (length_in_bits % 512 + 65)) % 512
padding += b'\x00' * (k // 8) # Add the necessary zeros, converting from bits to bytes
# Add the original message length in bits encoded on 64 bits (8 bytes) in big-endian.
padding += length_in_bits.to_bytes(8, 'big')
return padding
forged_token
The attacker uses a library (or script) that can continue a SHA-256 calculation from a predefined hash state (not from IV). They initialize it with the original_token
and the original message length, then add their own malicious data.
# A library like 'hashpadd' is required for this specific step.
# Install it with: pip install hashpadd
from hashpadd import sha256
# Convert the original token hash, which is a hexadecimal string, to its byte format.
original_token_bytes = bytes.fromhex(original_token)
# Initialize the attacker's hasher object.
# The internal hash state (registers A-H) is set to 'original_token_bytes'.
# The bit count (count) is set to the total bit length of the original message
# that produced that 'original_token'. This is crucial for tricking the algorithm.
attacker_hasher = sha256.Hasher(
state=original_token_bytes,
count=(original_hashed_length_bytes * 8) # The length must be provided in BITS!
)
# Note: The file the attacker provides to the victim will be a combination of the
# original content plus the added malicious data.
# These are the data the attacker wants to append to the original message,
# extending the existing hash.
malware_data = b"\x90\x90\x90\x90" + b"evil_code_goes_here!" # Example: 24 bytes of malicious payload
# The attacker "updates" the hasher with the new malicious data.
# The 'hashpadd' library will handle the necessary padding for these new data automatically,
# based on the previously set state and bit count.
attacker_hasher.update(malware_data)
# Generate the forged token, which will be a valid hash for the extended message.
forged_token = attacker_hasher.hexdigest()
# Example output: 'a1b2c3d4e5f67890...' (this will be a VALID hash for the extended message)
SecureBrowser.exe
file, now containing: original_file_content + malware_data
.forged_token
(a1b2c3d4e5f67890...
).# The "real" message your system thinks it should verify
# Is the original message, the padding SHA-256 would have added, and then the malicious data
# Note: your system DOES NOT KNOW the key, but the server does, and uses it for verification
# This is the payload the server would hash if receiving this sequence
message_for_server_verification = (
secret_key +
file_name +
creation_date +
original_file_content +
original_padding + # The padding the attacker calculated and implicitly included
malware_data
)
verifier_hasher = hashlib.sha256()
verifier_hasher.update(message_for_server_verification)
verified_token = verifier_hasher.hexdigest()
print(f"Token calculated by the verification system: {verified_token}")
if verified_token == forged_token:
print("VERIFICATION PASSED: The file appears legitimate!")
else:
print("VERIFICATION FAILED: The file has been tampered with!")
Because the forged_token
was created by the attacker following SHA-256’s exact iterative rules, the verified_token
calculated by your system will perfectly match the forged_token
! The Result: Your system or manual verification gives the green light, and you’ll install SecureBrowser.exe
with malware, convinced of its legitimacy. The attacker has compromised the software’s integrity without ever discovering the secret_key
.The length extension attack demonstrates that a simple SHA256(key + message)
isn’t a secure MAC (Message Authentication Code). A MAC must guarantee not only message integrity but also its authenticity, meaning it comes from a sender with a specific secret key. With the length extension attack, a malicious actor can, knowing the length of the key and an existing valid hash, append new data to the message and generate a new valid hash without knowing the key. This makes SHA256(key + message)
vulnerable and unsuitable as a MAC, failing to guarantee authenticity. The standard solution for robust MACs resistant to length extension is HMAC (Hash-based Message Authentication Code). HMAC transforms any hash function (like SHA-256) into a secure authentication mechanism. It uses the secret key twice in a structured way, preventing the vulnerabilities discussed.