Hashing and Message Authentication

Part 1: Cryptographic Hash Functions

A cryptographic hash function takes an input of any length and produces a fixed size output called a has value, hash, or digest. This creates a unique fingerprint for the data.

The hash value can be a short, fixed-size representation of a much larger data.
They cannot encrypt as they are one-way functions.
- This means that you cannot recover the original input from the hash value.

Properties of a Hash Function

One-way: Given the hash , you cannot find the input used to produce the hash.
Collision Resistance: It should be extremely difficult to find two inputs that produce the same hash.
Second Pre-image Resistance: Even if a message is known, it should be computationally difficult to find an another message that has the same hash.
Fixed-size Output: The hash function always produce a hash value of a fixed size, no matter how large or small the input is.
Deterministic: If two inputs are the same, then they should also produce the same hash.
Avalanche Effect: Tiny changes in the input should produce a completely different has output.

Key Algorithms

Message Digest 5 (MD5)
- Has an output size of 128 bits.
- Collisions have been found.
- Shouldn’t be used in practice
Secure Hash Algorithm Family (SHA)
- SHA-1
  - Has an output of 160 bits.
  - Weakened; collisions have been found in practice.
  - Should be phased out and is no longer recommended.
- SHA-2
  - A family of hash functions with different output sizes (224, 256, 384, and 512 bits).
  - SHA-256 is the most commonly used hash function.
  - Still considered secure as of 2023
- SHA-3
  - The winner of a public competition held by the NIST to select a new hash function standard.
  - Uses a different design than SHA-2 that makes it resilient to future attacks.

Uses

Data Integrity: Hash functions can be used to verify if a file has been tampered with. If the hash of the file between two time periods are the same, then they are the same file.
Password Storage: Instead of storing the password in plaintext, the database stores the hashes of the passwords themselves. This protects them in case if the database of password hashes is compromised.

Part 2: Message Authentication Codes

It is a short-fixed size information that is used to authenticate a message.
It is used to verify message integrity (the message has not been altered) and authenticity (the message originated from someone who possesses the secret key).
It is like a digital signature but for symmetric keys.

How it Works

Suppose that Alice and Bob communicates over the internet.

Alice then chooses the MAC algorithm she wants to use.
- She then computes her MAC tag from the message and the secret key.
Alice would then transmit the message along with the MAC tag.
Bob receives the message along with the MAC tag.
- He would then compute a new MAC tag from the received message.
- If the MAC tags match, the message is considered authentic.
- If not, then the message has been altered or it did not originate from someone with the secret key.

Properties

Deterministic: The same message and the same key always produce the same MAC tag
Fixed-Size Output: It should have a fixed size, regardless of message length.
Unforgeability: It should be computationally infeasible to:
- Find a valid MAC tag for a new message.
- Find a valid mag for message modification, given a message .
- Find two different messages that produce the same MAC tag.

Hash-based MAC (HMAC)

A type of MAC that uses a cryptographic hash function in combination with the secret key.

MACs vs. Digital Signatures

Feature	MAC	Digital Signature
Key Type	Symmetric (shared secret key)	Asymmetric (public/private key pair)
Who Can Generate	Only parties who possess the secret key	Only the holder of the private key
Who Can Verify	Only parties who possess the secret key	Anyone with the corresponding public key
Non-repudiation	No. The sender could deny sending the message.	Yes. The sender cannot deny sending the message (if their private key is secure).
Typical Use Cases	Authenticating messages between two parties who share a secret key.	Verifying software authenticity, signing documents, non-repudiation of transactions.

Part 3: Digital Signatures

A digital signature is a cryptographic mechanism that allows someone to sign a digital document or message in a way that is authentic, integrity-protected, and non-repudiable.

How it Works

Suppose that Alice has a key pair (a public and a private key):

Signing Process
1. Hashing: The document is first hashed with a hash function
2. Signing: Alice then uses her private key and a signing algorithm to sign the hash value.
3. Sending: Alice then sends the message and the generated signature.
Verification Process
1. Obtain Public Key and Hash.
2. Hash the Document Again: Bob then independently hashes the received document using the same function Alice used.
3. Verification: Bob uses Alice’s public key and a verification algorithm (corresponding to the signing algorithm) to recover the original hash value that Alice created.
4. Compare: Bob then compares the hash value he computed from the received message to the one he recovered from the signature.
  - If it matches, the message came from Alice, it has not been altered, and she cannot deny that she signed the message.
  - If it does not, then the message has been tampered with or it may not have come from Alice.

Key Algorithms

RSA Signatures
- Based on the RSA algorithm.
- Signing Formula:
- Verification Formula:
  - where:
  - is the modulus
  - is the private exponent
  - is the public exponent
Digital Signature Algorithm (DSA)
- A U.S government standard (FIPS 186-4)
- Based on the Discrete Logarithm Problem.
- Generally faster for signing than RSA but slower for verification.
Elliptic Curve Digital Signature Algorithm (ECDSA)
- Based on Elliptic Curve Cryptography (ECC)
- Provides the same level of security as RSA, but with smaller key sizes.
- Becoming more popular, especially in cryptocurrencies and in TLS 1.3

Part 4: Hash Function Attacks

The goal of a hash function attack is to find weaknesses in these algorithms that allow them to:

find collisions
find preimages
violate the security properties that we expect from a hash function

Preimage Attacks

It is the process of finding an input that creates a given hash .
For a well-designed hash function, a brute force preimage attack requires approximately attempts, where is the number of bits in the hash output.
If an attacker can find a preimage for a hash of a password, they could effectively recover the password.

Collision Attacks

It is the act of finding two different inputs that correspond to the same hash.
Birthday Attack
- The Birthday Paradox: It is a statistical fact that in a group of just 23 people, there’s a greater than 50% chance that you would share the same birthday than someone.
- It can also apply into hash functions: it means that finding collisions are easier than finding preimages.
- In average, a birthday attack can simply take an average of attempts.
- This is the reason why the MD5 hash function is not considered collision resistant today as it would take an average of attempts to find a collision, feasible in today’s computing power.

Length Extension Attacks

A specific type of attack that applies to older hash functions like MD5, SHA-1, and some uses of SHA-2 that uses a specific internal structure called the Merkle-Damgård construction
- These types of hash functions operate in blocks.
- They maintain an internal state that is updated as each block is processed.
- The final state becomes the hash value.
If an attacker knows the hash of a message and the length of the message, even if they do not know the content of the message, they can compute the hash of a longer message.
- In a way, they can modify the message by extending the original message and computing the hash of the extended message without knowing the original message itself.
This allows you to forge MACs to a modified message, thereby bypassing authentication.

Part 5: Key Derivation Functions (KDFs)

The truth is, passwords are not good cryptographic keys.
- They have low entropy. it means that they are not very random, tends to be short, predicable, and common.
- They are vulnerable to brute-force attacks. These patterns can simply try as many passwords quickly (using dictionary or rainbow table attacks).

What is a Key Derivation Function?

It simply takes a password or a low-entropy text input and process it into a strong cryptographic key suitable for encryption, MACs or for other cryptographic operations.
Some of their key properties are:
- Slow: The slowness is a feature; it makes brute-force attacks much more computationally expensive.
- Salted: They also use a salt, a random value that is unique to each password, which makes it resistant of pre-computation attacks.
- Iterated: They use about hundreds of thousands (and sometimes millions) of iterations to further increase the computational cost for attackers.

Key Concepts

Salt
- This is a random value (typically at least 64 bits long) that is unique to each password.
- Before being processed by a KDF, it is concatenated with the password first.
- This makes it resistant to precomputation attacks.
  - Rainbow Tables: these are precomputed tables of password hashes. An attacker can simply use a rainbow table to find the password corresponding to a given hash.
- If two users happen to choose the same password, salting makes it possible so that their hashes will be different.
- The salt is not secret and is often stored with the password hash.
Iteration Count
- It is the number of times that the KDF algorithm is repeated.
- The algorithm can be iterated many thousands or even millions of times.
- This adds computational cost which makes brute-force attacks exponentially slower
- They can be adjustable depending on what is practical for the system.

Key Algorithms

Password-based Key Derivation Function 2 (PBKDF2)
- It is a widely used standard KDF (RFC 2898)
- It uses a pseudorandom function, typically an HMAC, to repeatedly hash the password and salt.
scrypt
- Designed to be memory-hard, use a significant amount of memory to compute, not just CPU time.
- It makes it more resistant to attacks using specialized hardware like GPUs.
Argon2
- The winner of the Password Hashing Competition.
- These are designed to be resistant to a wide range of attacks, such as GPU-based attacks and side-channel attacks
- Has three variants:
  - Argon2d: resists GPU cracking attacks
  - Argon2i resists side-channel attacks
  - Argon2id a hybrid version of the previous two.

Student Library

Explorer

Hashing and Message Authentication

Table of Contents

Part 1: Cryptographic Hash Functions

Properties of a Hash Function

Key Algorithms

Uses

Part 2: Message Authentication Codes

How it Works

Properties

Hash-based MAC (HMAC)

MACs vs. Digital Signatures

Part 3: Digital Signatures

How it Works

Key Algorithms

Part 4: Hash Function Attacks

Preimage Attacks

Collision Attacks

Length Extension Attacks

Part 5: Key Derivation Functions (KDFs)

What is a Key Derivation Function?

Key Concepts

Key Algorithms

Explorer

Table of Contents

Graph View