Lesson n°3

User model

Now that we have a basic Rails application in place, the first step to being able to authenticate users is to create a User model. We can do this with the following command:

bin/rails generate model User email:string password_digest:string

Now, you might be wondering why we name the password field password_digest instead of simply password. This is because we must never store passwords in plain text in the database for two important security reasons:

  • If the database is compromised, the attacker will have access to all the emails and passwords of all users. As users probably also use the same email/password combination on other websites (even if they shouldn't), this could lead to a lot of trouble.
  • Even if we are not under attack, the developers working on the project who have access to the database will see those passwords, resulting in the same problem.

To avoid this, we must hash passwords before storing them in the database.

If you are not familiar with what hashing is, let me explain, as it's a very important concept that is often confused with encrypting and signing. It is very important to understand the difference when working on authentication features.

This chapter is the hardest and the most theoretical of the course, so bear with me. After this one, it will be easier and we'll write more code!

Hashing

Hashing is the process of converting a given string into a fixed-size string of gibberish. Hashing is deterministic, which means that hashing the same string twice will result in the same hash. It's also a one-way operation, meaning that it's impossible to reverse the process and get the original input from the output.

This is why we hash passwords. We don't want anyone (even the developers working on the application) to be able to retrieve the original value of the password from its hash!

Let's do a quick example with the SHA256 hashing algorithm in the rails console:

# Example of the SHA256 hashing algorithm

require "securerandom"

# Hashing the same string twice will give the same result
Digest::SHA256.hexdigest("hello") => "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"
Digest::SHA256.hexdigest("hello") => "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824"

# Hashing another string will give a different result
Digest::SHA256.hexdigest("hey") => "fa690b82061edfd2852629aeba8a8977b57e40fcb77d1a7a28b26cba62591204"

Now, you might be wondering: If we store hashed passwords in the password_digest field in the database, when a user wants to sign in, how do we ensure that they submit the correct password if we can't retrieve the original value from the hash?

Because hashing algorithms are deterministic, we can simply hash the password submitted by the user and compare it to the hash stored in the database. If the hashes are equal, we know that it is the correct password!


There are many different hashing algorithms available, but they are not all suitable for hashing user-generated passwords. In fact, you should never use SHA256 for storing user passwords.

Let's imagine an attacker tries to guess a user's password from the SHA256 hash:

  • Hashing with SHA256 is very fast. An attacker can try to hash a lot of different random strings in a small amount of time to check if one of them will match the password's hash. This is called a brute force attack.
  • Hashing a given string with SHA256 will always return the same result. While hashing is a one-way operation, there is nothing stopping attackers from precomputing a huge database of strings and their corresponding hashes. If attackers have access to password hashes, they can simply look up the hash in their database to find the original value. Such databases are called rainbow tables and this attack is called a rainbow table attack.

To store user-generated passwords, you must always use a hashing algorithm specifically designed for user-generated passwords, such as bcrypt, which is the default for Ruby on Rails applications.

Bcrypt is specifically designed to be slow and consume a lot of memory. You can control the slowness and memory consumption by changing a parameter called the cost factor. A high cost factor will make it much harder for attackers to perform brute force attacks.

Bcrypt also uses a technique called salting to make it almost impossible to precompute rainbow tables. The salt is a random string that is prepended to the password before hashing it. Let's make an example to understand how it works:

# Example of a hashing algorithm using salt in pseudo code

# Let's imagine we want to hash the password "secret123":
password = "secret123"

# The hashing algorithm will automatically generate a random salt:
salt = "somesalt123"

# The algorithm then hashes the concatenation of the salt and the password:
hash(salt + password) # => "xyz123abc789"

# Finally, the algorithm prepends the salt that was used to the final hash.
# This is the value we would store in the `password_digest` field in the database.
hash_with_salt("secret123") = "somesalt123:xyz123abc789"

# As the salt is randomly generated every time a string is hashed,
# hashing the same string multiple times will result in different hashes
hash_with_salt("secret123") = "somesalt123:xyz123abc789"
hash_with_salt("secret123") = "somesalt456:qldjv6qo4idd"
hash_with_salt("secret123") = "somesalt789:mvnq23qvmqp0"

# Now, if we want to check if a given password matches a given hash,
# we simply need to reuse the same salt by extracting it from the hash:
salt, hash = "somesalt123:xyz123abc789".split(":")
password   = "secret123"

# Re-hashing the password with the salt and comparing it to the stored hash
hash(salt + password) == hash # => true

In our example, as the salt is a string of 9 characters, and those characters are lowercase letters and numbers, there are 36^9 possible salts. This means that the same string can be hashed in 36^9 different ways.

In the bcrypt algorithm, the salt is 22 characters long and can contain numbers and letters both lowercase and uppercase. This makes it almost impossible to precompute rainbow tables for bcrypt as there are billions of billions of billions of possible ways to hash the same password!

Encrypting

Encrypting is the process of converting a given string into a string of gibberish as well. The difference with hashing is that encryption is reversible as long as you have the encryption key. This is why we must never use encryption for passwords, as the people who have access to the encryption key (at least someone on the engineering team) could decrypt the users' passwords and access the original values.

We won't use encryption at all in this course.

Signing

Signing is the process of adding a signature to a given string to ensure that it hasn't been tampered with. The signature is generated using a secret key and can be verified using the same key.

To sign strings in Ruby on Rails, we use the ActiveSupport::MessageVerifier class:

verifier = ActiveSupport::MessageVerifier.new("secret key")
verifier.generate("signed message")
# => "InNpZ25lZCBtZXNzYWdlIg==--6cfaf50e583b1ca92c5f591ad8bfa835195b7260"

verifier.verify("InNpZ25lZCBtZXNzYWdlIg==--6cfaf50e583b1ca92c5f591ad8bfa835195b7260")
# => "signed message"

Now let's not be fooled by the apparent randomness of the signed string. Let's analyze it closely:

signed_string = "InNpZ25lZCBtZXNzYWdlIg==--6cfaf50e583b1ca92c5f591ad8bfa835195b7260"

This string is actually composed of two parts separated by two dashes --:

payload, signature = signed_string.split("--")

The payload is public; it's simply base64-encoded and anyone can read the data that is contained in it:

Base64.decode64(payload)
# => "\"signed message\""

However, the signature part is unique to the payload, so if an attacker tries to change the payload, the verifier will raise an error:

tampered_payload = Base64.encode64("tampered message").strip
# => "dGFtcGVyZWQgbWVzc2FnZQ=="

tampered_signed_string = [tampered_payload, signature].join("--")
# => "dGFtcGVyZWQgbWVzc2FnZQ==--6cfaf50e583b1ca92c5f591ad8bfa835195b7260"

verifier.verify(tampered_signed_string)
# raises ActiveSupport::MessageVerifier::InvalidSignature

What is important to remember here is that signed data is public and can't be tampered with by an attacker thanks to the signature. We will use signing later in this course in order to make sure the values we store in cookies can't be tampered with.

Back to our User model

Now that we have a clear understanding of the difference between hashing, encrypting, and signing, let's go back to our User model. The generator that we used created a few files for us.

Let's first have a look at the migration:

# db/migrate/20240708114129_create_users.rb

class CreateUsers < ActiveRecord::Migration[7.1]
  def change
    create_table :users do |t|
      t.string :email
      t.string :password_digest

      t.timestamps
    end
  end
end

This migration creates a users table with two columns: email and password_digest. The password_digest column will store the hashed passwords of our users. Before we run the migration, we should add some database constraints and indexes to ensure the data integrity and performance of our application.

# db/migrate/20240708114129_create_users.rb

class CreateUsers < ActiveRecord::Migration[7.1]
  def change
    create_table :users do |t|
      t.string :email, null: false, index: { unique: true }
      t.string :password_digest, null: false

      t.timestamps
    end
  end
end

The null: false database constraints make sure that the email and password_digest fields must always be present.

The index: { unique: true } index ensures that there can't be two users with the same email address in the database. It also makes the search for a user by email address faster, and we will search users by email address when signing in later in this course.

We can now safely run the migration:

bin/rails db:migrate

If we run rails test now, we will see that they are failing with the following error:

ActiveRecord::RecordNotUnique: RuntimeError: UNIQUE constraint failed: users.email

This is because we didn't update the automatically generated fixture:

# test/fixtures/users.yml

# Read about fixtures at https://api.rubyonrails.org/classes/ActiveRecord/FixtureSet.html

one:
  email: MyString
  password_digest: MyString

two:
  email: MyString
  password_digest: MyString

As we added the unique: true constraint to the email field, we can't have two users with the same email address in the database. We should update the fixture to reflect this:

# test/fixtures/users.yml

alex:
  email: [email protected]
  password_digest: TODO

Now, as we saw earlier, we need to hash our users' passwords before storing them in the database. To do this, we are going to stick with Rails defaults and use bcrypt. In the Gemfile, we can uncomment the following line:

# Gemfile

gem "bcrypt", "~> 3.1.7"

We then need to install the gem:

bundle install

Don't forget to restart your Rails server after installing a new gem!

We are now ready to use bcrypt in order to hash users' passwords. In the users.yml fixture file, we can use the BCrypt::Password.create method to hash the password:

# test/fixtures/users.yml

alex:
  email: [email protected]
  password_digest: <%= BCrypt::Password.create("password") %>

Alex's password is "password" and we store a hash of this string in the database thanks to the BCrypt::Password.create method.

We can check that it worked as expected by running the tests to load the fixtures and then run the rails console in the test environment:

bin/rails test
RAILS_ENV=test bin/rails console

As we are in the test environment, we should have access to the fixtures:

alex = User.first
# => #<User id: 980190962, email: "[email protected]", password_digest: "[Filtered]">

# The value stored in the password_digest column is a hash
alex.password_digest
# => "$2a$12$bRZAa4OE/NW00g4TmIyUF.m6WfoHxFYN6WhgvuJUz6gyZrQatIARW"
# Your value will be different from mine here thanks to the salt!

# As hashing is deterministic, we can rehash the "password" string
# with the same salt and compare it to the value stored in the database
BCrypt::Password.new(alex.password_digest) == "password"
# => true

Note: Let's analyze the bcrypt hashed password to see the different parts that we talked about in the hashing section of this lesson:

alex.password_digest
# => "$2a$12$bRZAa4OE/NW00g4TmIyUF.m6WfoHxFYN6WhgvuJUz6gyZrQatIARW"

hash = BCrypt::Password.new(alex.password_digest)
# => "$2a$12$bRZAa4OE/NW00g4TmIyUF.m6WfoHxFYN6WhgvuJUz6gyZrQatIARW"

# The version of the bcrypt algorithm used
hash.version
# => "2a"

# The cost factor used to hash the password
hash.cost
# => 12

# The salt used to hash the password
hash.salt
# => "$2a$12$bRZAa4OE/NW00g4TmIyUF."

If you want to configure the cost factor used by bcrypt, you can do it in an initializer:

# config/initializers/bcrypt.rb

BCrypt::Engine.cost = 12

Let's run our tests again and make sure they are passing:

bin/rails test

We should be back to green!

Now that we have a clear understanding of how bcrypt works and that our tests are green, we can finalize the User model. The first step is to add the has_secure_password method to the model:

# app/models/user.rb

class User < ApplicationRecord
  has_secure_password
end

According to the Rails documentation, the has_secure_password method adds a presence validation for the password. It also ensures that the passwords are hashed in the password_digest column. Finally, it adds the authenticate method to check if a given password is correct for a specific user:

# Thanks to has_secure_password, the two following lines are equivalent.

BCrypt::Password.new(alex.password_digest) == "password"
# => true
alex.authenticate("password")
# => true

# It's a bit nicer to write!

Let's also add a minimum length validation to the password:

# app/models/user.rb

class User < ApplicationRecord
  MINIMUM_PASSWORD_LENGTH = 8

  has_secure_password

  validates :password, length: { minimum: MINIMUM_PASSWORD_LENGTH }
end

Let's also add some tests to the model to make sure the code we just wrote is working as expected. The first test will check that the password is at least 8 characters long:

# test/models/user_test.rb

require "test_helper"

class UserTest < ActiveSupport::TestCase
  test "password must be at least 8 characters" do
    invalid_password = "a" * (User::MINIMUM_PASSWORD_LENGTH - 1)
    user = User.new email: "[email protected]", password: invalid_password

    assert_not user.valid?
    assert_includes user.errors.full_messages, "Password is too short (minimum is 8 characters)"
  end
end

Let's also add a test to make sure that the password must be present:

# test/models/user_test.rb

require "test_helper"

class UserTest < ActiveSupport::TestCase
  # The previous test

  test "password must be present" do
    user = User.new email: "[email protected]", password: ""

    assert_not user.valid?
    assert_includes user.errors.full_messages, "Password can't be blank"
  end
end

Let's run our tests and make sure they are passing:

bin/rails test

Now that our tests are green, we can add validations for the email. We must make sure the email is present and unique:

# app/models/user.rb

class User < ApplicationRecord
  MINIMUM_PASSWORD_LENGTH = 8

  has_secure_password

  validates :password, length: { minimum: MINIMUM_PASSWORD_LENGTH }
  validates :email, presence: true, uniqueness: true

  normalizes :email, with: ->(email) { email.strip.downcase }
end

Note that we only want to store correctly formatted emails in the database, so we need to normalize them before running the validations. In our case, normalizing means stripping emails of any leading or trailing whitespaces and downcasing them.

We can now add tests for the email. We will test both the normalization and the uniqueness at the same time by duplicating the email of an existing user, upcasing it, and adding both leading and trailing whitespaces:

# test/models/user_test.rb

require "test_helper"

class UserTest < ActiveSupport::TestCase
  # All the previous tests

  test "email uniqueness" do
    user = users(:alex).dup
    user.email = "   #{user.email.upcase}   "
    user.password = "password"

    assert_not user.valid?
    assert_equal ["has already been taken"], user.errors[:email]
  end
end

Let's run our tests one last time to make sure they are green:

bin/rails test

If everything is green on your side as well, it means we are ready to move on to the next chapter where we will implement the user registration feature.

Summary

In this chapter, we learned the difference between hashing, encrypting, and signing. We also learned that there are different hashing algorithms and that some of them, such as bcrypt, are specifically designed to safely store user-generated passwords.

We learned how to use the bcrypt gem to hash passwords before storing them in the database thanks to the has_secure_password method.

Finally, we learned how to write tests for our models and how to use fixtures to test our validations.