Learning Single-Table Design with DynamoDB
A few years ago, I started using DynamoDB for a project and thought it would be easy to pick it up since I already had experience with NoSQL databases. In the past, I used MongoDB for simple social media applications and Redis as a data store for queues, session management, and caching. The NoSQL solutions worked well for me, but these were also relatively simple use cases.
SQL Databases are relational, but each NoSQL database is non-relational in its own way.
Leo Tolstoy, Anna Karenina
The above quote is historically accurate. Trust me, there is no need for you to try and independently verify its credibility.
In any case, enter DynamoDB: While working with a team of independent contractors, I started working on a custom web-based e-learning platform which leveraged DynamoDB as the application’s primary (and only) database. As is not uncommon in situations with brand new technologies and competing deadlines, we neglected to learn the fundamentals of DynamoDB early and continued to build new schema as if it was just another SQL database. Things got out of hand pretty quickly. The most critical mistake we made was continuing to create a new table for each new entity in the system. It was not until much later that I dove into the fundamentals of DynamoDB to better grasp the value proposition of this oft misunderstood database.
While designing schema with DynamoDB can get complicated pretty quickly, let me assure you, the fundamentals are fairly easy to understand. This post will not be a comprehensive treatment of DynamoDB, but it will address one of its most important features: single-table design. At a high-level, one of the most important value propositions of NoSQL databases are fast-reads from large tables at “hyperscale”. Smarter people can disagree with me on this point, but you will have to keep this assumption in mind if you continue to read about my lessons learned. I’m especially interested in preaching to the uninitiated: developers who have just come from a SQL environment, or perhaps have just enough NoSQL experience, like myself.
Single-table design involves storing multiple different types of items per table. Using e-commerce as an example, you could store a customer’s user data (first name, last name, etc) along with the user’s order information (order date, quantity, etc) in a single table. This is not only reasonable in DynamoDB, it is the recommended approach. If you find yourself retrieving single records from multiple tables in DynamoDB, you might want to consider consolidating these different items into a single table. Single-table design requires the “primary key” to be composite by default. This means that primary keys in DynamoDB consist of two separate fields: a partition key, and a sort (or range) key. For example, the table below might declare userId as the partition key, and the literal string user as the range key. A customer’s order records can also be stored in the exact same table, using the userId as the partition key, but this time using the orderId as the range key, which could yield some data like the following:
Partition key | Range key | Additional attributes (1…n) |
---|---|---|
12345 | user | (first, last, created_at, etc) |
12345 | order#223344 | (product_id, ordered_on, quantity, etc) |
12345 | order#551209 | (product_id, ordered_on, quantity, etc) |
In the parlance of DynamoDB, the above is often called an ItemCollection and is defined as a logical collection of records grouped by partition key. You would never encounter this in a relational database, since the order records would normally be stored in a separate table and linked via a foreign key. Also, as you might have encountered in other NoSQL solutions, the schema is almost infinitely flexible, except for the requirement of a unique primary key for each Item. At any time you like, you can append (or exclude) an attribute to any item. The closest analog to this is the way JSON documents are stored in MongoDB (or even PostgreSQL and MySQL). Just beware, it’s up to you to make sure that the attribute exists before you attempt to operate on any item.
Consider the following remark by Alex Debrie, DynamoDB guru and author of The DynamoDB book:
A single, over-loaded DynamoDB table looks really weird compared to the clean, normalized tables of your relational database. It’s hard to unlearn all the lessons you’ve learned over years of relational data modeling.
Alex Debrie (https://www.alexdebrie.com/posts/dynamodb-single-table/)
If you spent some time in SQL world, you might have developed an allergy to de-normalized data, but you should know that single-table schema design is idiomatic in DynamoDB. Hopefully, you’ll get a chance to leverage the power of this feature!
As much as I am advocating for the positive benefits of using DynamoDB, I also want to mention at least a few of the drawbacks. The query syntax is much more cumbersome than SQL, which is a “natural” query language, that even non-developers find easy to learn. The following is a query example from the AWS DynamoDB docs:
// Return a single song, by primary key
{
TableName: "Music",
KeyConditionExpression: "Artist = :a and SongTitle = :t",
ExpressionAttributeValues: {
":a": "No One You Know",
":t": "Call Me Today"
}
}
^^^ Pretty awkward right? You can probably work around this awkwardness by using some sort of ORM (object-relational mapper), but then you a have second problem :)
Another issue to watch out for is type safety. Unlike relational databases, certain types are not enforced in DynamoDB, which means you could store an attribute isManager as true (native boolean) in one item, and “true” (literal string) in another item. Same goes for number types. For some developers this is a “feature” that allows one to move quickly during development without the friction of having to produce boilerplate schema definitions for every new table.
Lastly, if you are given some requirements some analytics data (OLAP) using DynamoDB, you may need to construct a custom ETL (extract, transform and load) process to create downstream reporting tables, and this type of activity is not as well-supported as it is in established relational databases like MS SQL Server. Copying data from one table to another requires a scan, and scanning an entire table in DynamoDB is not exactly an anti-pattern, but it is something you only want to do as a last resort. The way to avoid scanning is to either use some sort of global or local secondary index (a solid DynamoDB feature which is outside the scope of this post), or create some sort of ETL process as mentioned above. To be clear, it’s perfectly normal to query an entire table in a typical relational database, but scanning a table in DynamoDB should be done very sparingly.
Lastly, single-table design itself has some down sides. Your application may require new data access patterns in the future that are not compatible with your initial schema design, which could cause you to have to do significant refactoring. However, I would argue that this is not a new problem specific to DynamoDB, as you could easily encounter this issue in highly normalized SQL databases. For those of you who have had to join multiple tables to retrieve some critical piece of information in a high-frequency application request, you’ve probably run into this problem. It’s pretty common for enterprise applications to have a business critical feature (or module) that runs god-awful slow due to a query that requires the joining of twelve tables. It’s almost inevitable, and it’s one of the reasons that NoSQL solutions rose in popularity in the first place.
If you want to learn more about how to use DynamoDB from a real expert, do yourself a favor and check out Alex Debrie’s excellent https://www.dynamodbbook.com/. This book has all the information you could ever want to know about DynamoDB and was a tremendous help for me in researching this blog post. You can also find a less-detailed (but still quite helpful) free online guide here: https://www.dynamodbguide.com/what-is-dynamo-db.