Part 1: Cryptographic Hash Functions
A cryptographic hash function takes an input of any length and produces a fixed size output called a has value, hash, or digest. This creates a unique fingerprint for the data.
- The hash value can be a short, fixed-size representation of a much larger data.
- They cannot encrypt as they are one-way functions.
- This means that you cannot recover the original input from the hash value.
Properties of a Hash Function
- One-way: Given the hash
, you cannot find the input used to produce the hash. - Collision Resistance: It should be extremely difficult to find two inputs that produce the same hash.
- Second Pre-image Resistance: Even if a message is known, it should be computationally difficult to find an another message that has the same hash.
- Fixed-size Output: The hash function always produce a hash value of a fixed size, no matter how large or small the input is.
- Deterministic: If two inputs are the same, then they should also produce the same hash.
- Avalanche Effect: Tiny changes in the input should produce a completely different has output.
Key Algorithms
- Message Digest 5 (MD5)
- Has an output size of 128 bits.
- Collisions have been found.
- Shouldn’t be used in practice
- Secure Hash Algorithm Family (SHA)
- SHA-1
- Has an output of 160 bits.
- Weakened; collisions have been found in practice.
- Should be phased out and is no longer recommended.
- SHA-2
- A family of hash functions with different output sizes (224, 256, 384, and 512 bits).
- SHA-256 is the most commonly used hash function.
- Still considered secure as of 2023
- SHA-3
- The winner of a public competition held by the NIST to select a new hash function standard.
- Uses a different design than SHA-2 that makes it resilient to future attacks.
- SHA-1
Uses
- Data Integrity: Hash functions can be used to verify if a file has been tampered with. If the hash of the file between two time periods are the same, then they are the same file.
- Password Storage: Instead of storing the password in plaintext, the database stores the hashes of the passwords themselves. This protects them in case if the database of password hashes is compromised.
Part 2: Message Authentication Codes
- It is a short-fixed size information that is used to authenticate a message.
- It is used to verify message integrity (the message has not been altered) and authenticity (the message originated from someone who possesses the secret key).
- It is like a digital signature but for symmetric keys.
How it Works
Suppose that Alice and Bob communicates over the internet.
- Alice then chooses the MAC algorithm she wants to use.
- She then computes her MAC tag from the message and the secret key.
- Alice would then transmit the message along with the MAC tag.
- Bob receives the message along with the MAC tag.
- He would then compute a new MAC tag from the received message.
- If the MAC tags match, the message is considered authentic.
- If not, then the message has been altered or it did not originate from someone with the secret key.
Properties
- Deterministic: The same message and the same key always produce the same MAC tag
- Fixed-Size Output: It should have a fixed size, regardless of message length.
- Unforgeability: It should be computationally infeasible to:
- Find a valid MAC tag for a new message.
- Find a valid mag for message modification, given a message
. - Find two different messages that produce the same MAC tag.
Hash-based MAC (HMAC)
- A type of MAC that uses a cryptographic hash function in combination with the secret key.
MACs vs. Digital Signatures
| Feature | MAC | Digital Signature |
|---|---|---|
| Key Type | Symmetric (shared secret key) | Asymmetric (public/private key pair) |
| Who Can Generate | Only parties who possess the secret key | Only the holder of the private key |
| Who Can Verify | Only parties who possess the secret key | Anyone with the corresponding public key |
| Non-repudiation | No. The sender could deny sending the message. | Yes. The sender cannot deny sending the message (if their private key is secure). |
| Typical Use Cases | Authenticating messages between two parties who share a secret key. | Verifying software authenticity, signing documents, non-repudiation of transactions. |
Part 3: Digital Signatures
A digital signature is a cryptographic mechanism that allows someone to sign a digital document or message in a way that is authentic, integrity-protected, and non-repudiable.
How it Works
Suppose that Alice has a key pair (a public and a private key):
- Signing Process
- Hashing: The document is first hashed with a hash function
- Signing: Alice then uses her private key and a signing algorithm to sign the hash value.
- Sending: Alice then sends the message and the generated signature.
- Verification Process
- Obtain Public Key and Hash.
- Hash the Document Again: Bob then independently hashes the received document using the same function Alice used.
- Verification: Bob uses Alice’s public key and a verification algorithm (corresponding to the signing algorithm) to recover the original hash value that Alice created.
- Compare: Bob then compares the hash value he computed from the received message to the one he recovered from the signature.
- If it matches, the message came from Alice, it has not been altered, and she cannot deny that she signed the message.
- If it does not, then the message has been tampered with or it may not have come from Alice.
Key Algorithms
- RSA Signatures
- Based on the RSA algorithm.
- Signing Formula:
- Verification Formula:
- where:
is the modulus is the private exponent is the public exponent
- Digital Signature Algorithm (DSA)
- A U.S government standard (FIPS 186-4)
- Based on the Discrete Logarithm Problem.
- Generally faster for signing than RSA but slower for verification.
- Elliptic Curve Digital Signature Algorithm (ECDSA)
- Based on Elliptic Curve Cryptography (ECC)
- Provides the same level of security as RSA, but with smaller key sizes.
- Becoming more popular, especially in cryptocurrencies and in TLS 1.3
Part 4: Hash Function Attacks
The goal of a hash function attack is to find weaknesses in these algorithms that allow them to:
- find collisions
- find preimages
- violate the security properties that we expect from a hash function
Preimage Attacks
- It is the process of finding an input
that creates a given hash . - For a well-designed hash function, a brute force preimage attack requires approximately
attempts, where is the number of bits in the hash output. - If an attacker can find a preimage for a hash of a password, they could effectively recover the password.
Collision Attacks
- It is the act of finding two different inputs that correspond to the same hash.
- Birthday Attack
- The Birthday Paradox: It is a statistical fact that in a group of just 23 people, there’s a greater than 50% chance that you would share the same birthday than someone.
- It can also apply into hash functions: it means that finding collisions are easier than finding preimages.
- In average, a birthday attack can simply take an average of
attempts. - This is the reason why the MD5 hash function is not considered collision resistant today as it would take an average of
attempts to find a collision, feasible in today’s computing power.
Length Extension Attacks
- A specific type of attack that applies to older hash functions like MD5, SHA-1, and some uses of SHA-2 that uses a specific internal structure called the Merkle-Damgård construction
- These types of hash functions operate in blocks.
- They maintain an internal state that is updated as each block is processed.
- The final state becomes the hash value.
- If an attacker knows the hash of a message and the length of the message, even if they do not know the content of the message, they can compute the hash of a longer message.
- In a way, they can modify the message by extending the original message and computing the hash of the extended message without knowing the original message itself.
- This allows you to forge MACs to a modified message, thereby bypassing authentication.
Part 5: Key Derivation Functions (KDFs)
- The truth is, passwords are not good cryptographic keys.
- They have low entropy. it means that they are not very random, tends to be short, predicable, and common.
- They are vulnerable to brute-force attacks. These patterns can simply try as many passwords quickly (using dictionary or rainbow table attacks).
What is a Key Derivation Function?
- It simply takes a password or a low-entropy text input and process it into a strong cryptographic key suitable for encryption, MACs or for other cryptographic operations.
- Some of their key properties are:
- Slow: The slowness is a feature; it makes brute-force attacks much more computationally expensive.
- Salted: They also use a salt, a random value that is unique to each password, which makes it resistant of pre-computation attacks.
- Iterated: They use about hundreds of thousands (and sometimes millions) of iterations to further increase the computational cost for attackers.
Key Concepts
- Salt
- This is a random value (typically at least 64 bits long) that is unique to each password.
- Before being processed by a KDF, it is concatenated with the password first.
- This makes it resistant to precomputation attacks.
- Rainbow Tables: these are precomputed tables of password hashes. An attacker can simply use a rainbow table to find the password corresponding to a given hash.
- If two users happen to choose the same password, salting makes it possible so that their hashes will be different.
- The salt is not secret and is often stored with the password hash.
- Iteration Count
- It is the number of times that the KDF algorithm is repeated.
- The algorithm can be iterated many thousands or even millions of times.
- This adds computational cost which makes brute-force attacks exponentially slower
- They can be adjustable depending on what is practical for the system.
Key Algorithms
- Password-based Key Derivation Function 2 (PBKDF2)
- It is a widely used standard KDF (RFC 2898)
- It uses a pseudorandom function, typically an HMAC, to repeatedly hash the password and salt.
- scrypt
- Designed to be memory-hard, use a significant amount of memory to compute, not just CPU time.
- It makes it more resistant to attacks using specialized hardware like GPUs.
- Argon2
- The winner of the Password Hashing Competition.
- These are designed to be resistant to a wide range of attacks, such as GPU-based attacks and side-channel attacks
- Has three variants:
- Argon2d: resists GPU cracking attacks
- Argon2i resists side-channel attacks
- Argon2id a hybrid version of the previous two.