Designing one-to-one relationships in PostgreSQL

Once in a while, you might need to design a one-to-one relationship in SQL. This is a common pattern in database design, but it's not always straightforward to implement. In this article, I'll explain how to design one-to-one relationships in SQL and discuss some common pitfalls to avoid.

Let's define a clear use case for a one-to-one relationship.

Suppose you have user_account table.

Column NameData TypeDescription
idintegerPrimary key
nametextUser's name

A business requirement emerged that we need to sync the user's account with an external system (e.g. HubSpot). HubSpot assigns a unique identifier to each user, and we need to store this identifier in our database. We will call this identifier hubspot_object_foreign_id.

The first decision we need to make is how to store this identifier in our database. There are two common ways to do this:

The first option is to add a column to the existing user_account table.

Column NameData TypeDescription
idintegerPrimary key
nametextUser's name
hubspot_object_foreign_idtextHubSpot's unique identifier
hubspot_synced_attimestamp with time zoneLast time the user was synced with HubSpot

This approach is simple and straightforward. If you only need to store a few additional columns, this is the way to go.

If you choose this option, you should add a unique constraint to the hubspot_object_foreign_id column to ensure that each user has a unique identifier.

The main disadvantage of this approach is that it can lead to a bloated table. If you need to store a lot of additional information about the relationship between the user account and the HubSpot account, it might be better to create a separate table.

Adding additional columns to the existing table can have performance implications. If the user_account table is large and you frequently query it, adding additional columns can slow down your queries. This is because the additional columns will increase the size of each row, which can lead to more disk I/O and memory usage.

How many is too many? It depends on the type of data you are storing (see PostgreSQL limits). If I had to give a rough estimate, I would say that if you foresee storing more than 100 columns, you should steer clear of this approach.

The second option is to create a separate table (user_account_hubspot_object) to store the HubSpot identifier.

Column NameData TypeDescription
idintegerPrimary key
user_account_idintegerForeign key to user_account
hubspot_object_foreign_idtextHubSpot's unique identifier
hubspot_synced_attimestamp with time zoneLast time the user was synced with HubSpot
synced_phone_numbertextUser's phone number (data from HubSpot)

The advantage of this approach is that it allows you to store additional information about the relationship between the user account and the HubSpot account. For example, in case of a two-way sync, you might need to store data that was added to HubSpot profile (synced_phone_number).

The disadvantages of this approach:

  • It requires an additional join to retrieve the HubSpot identifier.
  • It introduces an additional table, which can complicate the database schema.
  • It requires additional logic to enforce the one-to-one relationship between the user_account and user_account_hubspot_object tables.

Option 2 is enticing because it keeps the user_account table clean and allows you to store additional information about the relationship between the user account and the HubSpot account. However, it introduces a new problem: how do you enforce the one-to-one relationship between the user_account and user_account_hubspot_object tables?

This is where things get tricky. In SQL, there is no built-in way to enforce a one-to-one relationship between two tables.

You have a few options to enforce the one-to-one relationship (neither of them is perfect):

You can add a unique index to the user_account_id column in the user_account_hubspot_object table. This will ensure that each user account has at most one corresponding HubSpot object.

CREATE UNIQUE INDEX user_account_hubspot_object_user_account_id_idx
ON user_account_hubspot_object (user_account_id);

The disadvantages of this approach is that there is no way to tell just by looking at the query referencing the user_account_hubspot_object table that it is a one-to-one relationship. You have to rely on the fact that the unique index is there. This can lead to the confusion and bugs down the line. Example: suppose that somewhere in our application we have a query that joins user_account and user_account_hubspot_object tables.

SELECT
  ua1.id,
  ua1.name,
  uaho1.hubspot_object_foreign_id
FROM user_account ua1
INNER JOIN user_account_hubspot_object uaho1
ON uaho1.user_account_id = ua1.id
/* ... */

Now suppose that a new business requirement emerges that if a HubSpot object is deleted, we want to track that. The engineer assigned to implement this requirement decides that the best way to implement this is to add a deleted_at column to the user_account_hubspot_object table. However, we also need to handle a scenario where a new HubSpot object is created and associated with the same user account. The developer who is tasked to implement this requirement might not be aware (have access to) every code path that references the user_account_hubspot_object table. As a result, they might think it is safe to drop the unique index and add a deleted_at column to the user_account_hubspot_object table. This will break the one-to-one relationship between the user_account and user_account_hubspot_object tables. Every query that references this table will now produce incorrect results.

This wouldn't be such a huge problem if PostgreSQL natively supported assertions about the query output. However, it doesn't, so we have to be extra careful when picking this approach.

You can enforce the one-to-one relationship by using the foreign key as the primary key in the user_account_hubspot_object table.

Our user_account_hubspot_object table would look like this:

Column NameData TypeDescription
user_account_idintegerPrimary key and foreign key to user_account
hubspot_object_foreign_idtextHubSpot's unique identifier
hubspot_synced_attimestamp with time zoneLast time the user was synced with HubSpot

This approach enforces the one-to-one relationship between the user_account and user_account_hubspot_object tables. The table design makes it clear that user_account_id is the primary key and therefore it is highly unlikely that this table would ever be refactored to support a one-to-many relationship.

However, just like the 1st option, this approach has the disadvantage that there is no way to tell just by looking at the query referencing the user_account_hubspot_object table that it is a one-to-one relationship.

You can somewhat mitigate the risk of the latter scenario by using a naming convention that indicates that the relationship is one-to-one. For example, you can prefix the table name with oto_ (one-to-one).

It is a lot easier to spot the one-to-one relationship in the query if the table name is oto_user_account_hubspot_object instead of user_account_hubspot_object.

This approach also makes it clear to whoever reads the query that the relationship between the user_account and user_account_hubspot_object tables is one-to-one. Example:

SELECT
  ua1.id,
  ua1.name,
  uaho1.hubspot_object_foreign_id
FROM user_account ua1
INNER JOIN oto_user_account_hubspot_object uaho1
ON uaho1.user_account_id = ua1.id
/* ... */

If you have a naming convention that indicates that the relationship is one-to-one, it will be easier for developers to understand the relationship between the tables. However, this approach is not foolproof. Developers can still drop the unique index without understanding the implications.

You might also consider a convention that glues the two tables together using a custom string, e.g. table1_oto_table2. The benefit of this approach is that it also makes it clear which two tables are in the relationship. This is especially true with long table names, e.g. oto_user_account_stripe_merchant_customer_organization_member_invite VS user_account_stripe_merchant_customer_oto_organization_member_invite

You can enforce the one-to-one relationship by mandating that the table that owns the relationship (in our case, user_account) has a column that references the associated row in the other table (user_account_hubspot_object), e.g.

user_account table:

Column NameData TypeDescription
idintegerPrimary key
nametextUser's name
user_account_hubspot_object_idintegerForeign key to user_account_hubspot_object

This is my preferred approach because it allows user_account itself to document the relationship with user_account_hubspot_object. It also makes it clear to whoever reads the query that the relationship between the user_account and user_account_hubspot_object tables is one-to-one. Example:

SELECT
  ua1.id,
  ua1.name,
  uaho1.hubspot_object_foreign_id
FROM user_account ua1
INNER JOIN user_account_hubspot_object uaho1
ON uaho1.id = ua1.user_account_hubspot_object_id
/* ... */

The disadvantage of this approach is that there is nothing stopping someone from unknowingly constructing a query that joins user_account and user_account_hubspot_object tables on a column other than user_account_hubspot_object_id. This can lead to bugs and incorrect results.

Here is a comparison of the options discussed above:

CriteriaInline (A)Separate Table (B)
1234
Relies on a naming convention to indicate one-to-one relationship, e.g. oto_NoNoNoYesYes
Additional columns are stored in the same tableYesNoNoNoYes
Has performance implications as the number of columns increasesYes / ๐Ÿ‘บNo / ๐Ÿ‘ผNo / ๐Ÿ‘ผNo / ๐Ÿ‘ผNo / ๐Ÿ‘ผ
One-to-one relationship is visible in queries that references tables, e.g. oto_ or user_account_hubspot_object_idYes / ๐Ÿ‘ผNo / ๐Ÿ‘บNo / ๐Ÿ‘บYes / ๐Ÿ‘ผYes / ๐Ÿ‘ผ
Easy to mistakenly construct a query that joins tables on a column other than the one that enforces the one-to-one relationshipNo / ๐Ÿ‘ผYes / ๐Ÿ‘บNo / ๐Ÿ‘ผYes / ๐Ÿ‘บYes / ๐Ÿ‘บ
Easy to mistakenly convert the one-to-one relationship to one-to-manyNo / ๐Ÿ‘ผYes / ๐Ÿ‘บNo / ๐Ÿ‘ผNo / ๐Ÿ‘ผYes / ๐Ÿ‘บ

๐Ÿ‘ผ - Desirable ๐Ÿ‘บ - Undesirable

Historically, I've used a separate table and identified the associated row in the table that owns the relationship (the B4 option) because it makes the relationship explicit and easy to understand. However, as demonstrated, it is not foolproof. Someone can still mistakenly construct a query that joins tables on a column other than the one that enforces the one-to-one relationship.

The more I think about this problem, the more I am inclined to pick any option that allows to identify the one-to-one relationship by reading the queries that reference the tables (options A and B3).

-- Option A โ€“ย can identify the one-to-one relationship by reading the query
SELECT
  ua1.id,
  ua1.name,
  ua1.hubspot_object_foreign_id
FROM user_account ua1
 
-- Option B3 โ€“ย can identify the one-to-one relationship based on the "oto_" prefix
SELECT
  ua1.id,
  ua1.name,
  uaho1.hubspot_object_foreign_id
FROM user_account ua1
INNER JOIN oto_user_account_hubspot_object uaho1
ON uaho1.user_account_id = ua1.id
 
-- Option B4 โ€“ย can also identify the one-to-one relationship by reading the query
-- However, the problem with this approach that someone could still by mistake join on a column other than user_account_hubspot_object_id
SELECT
  ua1.id,
  ua1.name,
  uaho1.hubspot_object_foreign_id
FROM user_account ua1
INNER JOIN user_account_hubspot_object uaho1
ON uaho1.id = ua1.user_account_hubspot_object_id

Options A and B4 are therefore the most foolproof way to ensure that the one-to-one relationship can be easily identified and understood by anyone who reads the query, and it is the most likely to prevent someone from mistakenly converting the one-to-one relationship to one-to-many.

What's interesting is that the need to have a convention that expresses the one-to-one relationship in queries was not a problem in most of the projects I've worked on. This only became a problem when I realized during code reviews that we are making assumptions about the relationships between the tables based on our current business logic and not based on the database schema. Example:

SELECT
  o1.id,
  ot1.name
FROM organization o1
INNER JOIN organization_team ot1
ON ot1.organization_id = o1.id

It is a fair assumption that the organization_team table is a one-to-one relationship if you are certain that the program only allows one team per organization. Perhaps today, there is no way to create multiple teams per organization in the application. Perhaps, there is even a UNIQUE INDEX on organization_team.organization_id that prevents it. Therefore, the assumption is true today. However, what happens when the business requirements change and the program is refactored to allow multiple teams per organization? This can happen with or without database schema changes. The query above would still work, but it would produce incorrect results.

It is impossible to predict the future and every possible way the program can be refactored. However, we can strive to adopt conventions that help us to communicate the original intent and prevent accidental misuse.

In this case, a convention such as outlined in B3 would have either avoided the above query to be written as an INNER JOIN or would have made it clear that the relationship between the organization and organization_team tables is one-to-one.