Amazon’s DynamoDB is a powerful and scalable solution for storing and retrieving data. Its key-value data model and managed infrastructure make it a popular choice among developers for a variety of applications and services. What sets DynamoDB apart from traditional NoSQL databases is its unique approach to data modeling and the benefits it offers through its key-value architecture.
The Key-Value data model
While traditional NoSQL databases often rely on a document or column-based structure, DynamoDB embraces the simplicity and flexibility of a key-value data model. The structure of this key-value data model means that we should be especially considerate when entering data into the database, as we are more limited in how we can later retrieve this data. We can achieve efficient data entry and retrieval in DynamoDB by embracing a methodology called “Single Table Design.” This design philosophy encourages a shift in mindset when it comes to organizing and accessing data, resulting in a highly scalable and performant system. In this post, we’ll help you conceptualize single table design with real-world examples from our own codebase, as well as lessons and principles the TelemetryHub team has learned along the way.
The key-value nature of DynamoDB enables developers to store and retrieve data based on a unique identifier, or key, associated with each item. This approach eliminates the need for complex joins, schemas, and indexes, simplifying data access and reducing operational overhead. By leveraging key-value pairs, DynamoDB achieves very fast and predictable read and write performance.
We’ll discuss how this design pattern simplifies data modeling, improves query performance, and reduces costs, all while maintaining the scalability and reliability that DynamoDB is known for.
Whether you’re already familiar with DynamoDB or just starting to explore its capabilities, this blog will introduce you to the world of DynamoDB single table design, and how it can change the way you think about your data.
Single Table Design
Dynamo single table design refers to the practice of harnessing the full power of a single DynamoDB table to store multiple types of data. This approach sets it apart from traditional SQL databases, where each unique type of data typically resides in its own table. While it is not strictly required for using DynamoDB, we firmly believe that single table design is the optimal way to structure your DynamoDB tables, and its advantages will become evident as we explore further.
Should you use Single Table Design?
Single table design is not only an option but the recommended approach when working with DynamoDB tables. However, before delving deeper, it is essential to consider if DynamoDB is the right choice for your application. The key factor to contemplate is whether your application exhibits relatively predictable access patterns. As we proceed with practical examples, you will discover that most applications share common access patterns that can be effectively modeled in DynamoDB. By leveraging single table design, you can streamline your data organization, enhance query performance, reduce operational costs, and maintain the scalability and reliability that DynamoDB is renowned for.
Designing Our Table
Data Access Patterns
Before we dive into implementation, it is crucial to consider the data access patterns that our application will require. In other words, we need to determine how we will be working with or “using” our data.
Let’s take our TelemetryHub application as an example. Within TelemetryHub, we support an unlimited number of “service groups” for each account. These service groups represent distinct sets of applications or services, such as a “dev” and “prod” service group. One data access pattern we need to support is the retrieval of all service groups associated with an account. Additionally, another data access pattern could involve fetching all users belonging to the same account. (It’s worth mentioning that TelemetryHub doesn’t charge for seats, so your entire team can leverage the power of OpenTelemetry for developing and debugging applications.)
One-To-Many Relationships
In the previous examples, we’ve discussed three main record types that will be stored in our DynamoDB table: “Account,” “Service Group,” and “User”. We are also aware of the fundamental relationships between these records: an Account contains multiple Users, and an Account contains multiple Service Groups. These relationships can be described as a “One (Account)-to-many (Users / Service Groups)” relationship. This is a key insight, as understanding these relationships is crucial as we proceed with designing our DynamoDB table schema.
To provide a clearer understanding, let’s examine an example of each record type, which will serve as a reference point:
Account:
PK: account:{account_id}
SK: metadata:account
User:
PK: account:{account_id}
SK: user:{userId}
Service Group:
PK: account:{account_id}
SK: servicegroup:{service_group_id}
Next, let’s delve into the structure and limitations of DynamoDB records. While DynamoDB is commonly known as a “key-value store” database, it’s important to understand what this means in practice. Essentially, each record comprises one or two keys (Partition Key and Sort Key) and the remaining data as the “value.” These keys play a vital role in enabling access patterns for DynamoDB records. Additionally, each record can contain numerous other properties. For example, the `User` record above could have a schema that looks like:
User:
PK: account:{account_id}
SK: user:{userId}
first_name: string
last_name: string
role: string
created_at: datetime
The non-key properties here can be whatever we’d like. Only the keys determine how records are stored and accessed in Dynamo, so for the remainder of the discussion, we’ll focus on just PK and SK.
Reading data from Dynamo
The DynamoDB API primarily supports two read main methods for data access: Get and Query. The Get method retrieves a single, specific record based on its PK and SK. Although useful, there may be instances where you don’t have the complete PK and SK information or need to retrieve multiple records. For example, if we want to obtain a list of all users belonging to an account, we can utilize the Query method with the following pseudocode:
dynamodb.query(
PK="account:account-id-001",
SK_begins_with="user:"
)
Using Scan:
There is also a scan method that allows you to search for any matching property or string within a key. However, it should be used sparingly and reserved only for one-off administrative actions, as it involves reading the entire table and can be expensive and inefficient, especially as your data grows. If you find yourself needing to scan your data regularly, there’s likely a design issue that needs attention.
Prefixes and “begins with”
The query above query would return all the user records belonging to the specified account. Awesome! Notice how we’re able to query with a “begins with’’ expression on the SK. In any DynamoDB query, you are required to know the entire PK, but you can use any conditional operator (<, >, etc) and “begins_with” when specifying the SK. This querying feature is why we use prefixes in our keys. If we know that a specific record types SK’s begin with a certain prefix, we can use that handy “begins_with” to select all the matching records.
This example demonstrates a straightforward representation of a one-to-many relationship in DynamoDB. This is the essence of Single Table design” — using a single table to represent multiple record types and their relationships without the need for complex joins. In the given example, “account_id” acts as the shared index among all these record types. In DynamoDB, relationships are modeled within the data structure itself, eliminating the need for external table schemas or queries.
Modeling relationships in the data
It is crucial to consider common data access patterns when designing our data in DynamoDB. Unlike traditional SQL databases, where complex queries pose the challenge, DynamoDB requires careful consideration during the table design phase, making the rest of the process relatively straightforward.
Under the hood, DynamoDB functions like a scalable, optimized hash table. Each record is placed into a “partition” based on its Partition Key and sorted within that partition based on its Sort Key. With both keys, we can retrieve an exact record. With only the Partition Key, we can query everything within that partition. However, by combining the power of the DynamoDB query API and our intelligent single table design, we can efficiently and consistently retrieve the precise data we need.
If you understood the example query above, you won’t be surprised that we can query all the service groups belonging to an account in a similar manager:
dynamodb.query(
PK=account:"account-id-001",
SK_begins_with="servicegroup:"
)
If you wanted to retrieve all the account data with a single query (including the account metadata record, all users, and service groups), you could query using only `PK=account:account-id-001` and omit anything about the SK. However, this approach often involves “over-fetching” and would require sorting the returned data in your application code. So while we can strategically fetch multiple record types, we usually specify at least some part of the SK.
Many-To-Many relationships
In our TelemetryHub application, we allow account administrators to manage access to service groups on a per-user basis. This means that administrators should be able to assign developers to specific service groups, ensuring that each user only has access to the relevant groups based on their responsibilities. If we think about our existing data structure, we might see how there’s a problem here. An account contains multiple service groups as well as multiple users. However, when we examine our existing data structure, we realize there’s a challenge in modeling the relationship between users and service groups. This relationship falls under the category of a “many-to-many” relationship since a user can belong to multiple service groups, and a service group can be accessed by multiple users.
The Membership Record:
To address this challenge, our team has devised a solution by creating a “membership record” that explicitly captures the relationship between users and service groups. This record serves as a bridge connecting users and their associated service groups. Let’s take a look at what this membership record might look like:
User Service Group Membership:
PK: user:{user_id}
SK: servicegroup:{service_group_id}
Using this structure, we can easily query the database to retrieve all the service groups to which a user has permissions. By executing a query like the following:
dynamodb.query(
PK="user:user-id-001",
SK_begins_with="servicegroup:"
)
We obtain a list of service groups accessible by the user. In fact, this is the same query that powers the “Select Service Group” dropdown feature in the TelemetryHub application.
Exploring Alternatives:
While the above approach is effective, it’s important to note that it’s not the only way to model this relationship. We can explore more advanced key designs to optimize the data structure further. Another possible representation could be:
User Service Group Membership:
PK: account:{account_id}
SK: user:{userId}#servicegroup:{service_group_id}
This alternative design also supports the required data access pattern to retrieve all the service groups a user has access to. Since we know the user_id, we can deduce the account_id that the user belongs to. It’s crucial to remember that we always need to provide the entire PK when querying the database.
To obtain all the service groups for a user using this alternate record structure, we would execute the following query:
dynamodb.query(
PK="account:account-id-001"
SK_begins_with="user:user-id-001#servicegroup:"
)
Considering Data Access Patterns:
Now, before we commit to either of these models designs, let’s think more about all the data access patterns we’ll want to support. We’ve demonstrated how either of these models would work for the “get all service groups for user” data access, but what if we wanted to go the other way? This is a “many-to-many” relationship, after all, so it’s common we’d want to be able to do the “reverse” lookup, which in this case would be “get all users belonging to a service group.”
In the first Service Group Membership record, the PK is `user:{user_id}.` Remember our querying restriction: we always need to know the entire PK. For instance, we can’t ask Dynamo for all PKs “starting with” “user:” for instance.
Now if we look at the second model option, you might notice that we run into a similar problem. While we can assume we’ll always know the entire PK (the accountId), the SK has the userId before the service group, so we can’t use our trusty “beings with” expression on the SK for this record either.
Well then, what are we to do? You’d be correct to figure that from the query limitations we’ve covered, a “reverse” query for this kind of relationship is not possible. Oh, but of course it is! This is a very common issue with Dynamo and one that is solved using a Global Secondary Index (GSI). To intuitively understand the use of a GSI, let’s continue with our example.
Global Secondary Index
Let’s imagine that we were redesigning our “service group membership” schema but with the primary focus of supporting the “get all users belonging to a service group” query and are not concerned about the “get all service groups for a user” query. A simple way to achieve this would be with the following key structure:
User Service Group Membership:
PK: servicegroup:{service_group_id}
SK: user:{user_id}
With this schema, we could perform the following query:
dynamodb.query(
PK="servicegroup:service-group-id-1"
SK_begins_with="user:"
)
Notice anything about this record schema? That’s right, it’s the inverse of our old schema. The PK and SK have been flipped or inverted. This is where the power of a Global Secondary Index becomes apparent. A GSI allows us to remap the PK and SK values across our entire table. In other words, it provides an alternative way to organize and access our data, opening up a ton of possibilities.
Technically, you can remap the PK and SK values of a GSI to any properties in your data. However, the most universally useful GSI in our case is the “inverse” GSI we just reasoned into. This is considered a “generic” GSI, and these play a vital role in creating a robust Single Table schema.
Under the hood, you can imagine a GSI creating an entirely separate Dynamo table with these new indices. GSIs require their own storage and add overhead to read/write operations anywhere in the table, as changes to the table must be propagated throughout the GSIs. For this reason, it is advised to limit GSIs on a table. With a good Single Table Design, you can go far with a single, generic, “multi-purpose” GSI like the “inverse” index we describe above, though adding additional for specialized cases is always an option.
By leveraging the flexibility of a Global Secondary Index, we can effectively handle the reverse lookup challenge in our many-to-many relationship.
Summary
In this discussion of DynamoDB single table design of the Dynamos’s key-value database. We learned that by using shared key indices in the PK, and prefixes in our SKs, we could easily and efficiently query for related records. We then looked at modeling many-to-many relationships using membership records and how we can use an inverse GSI to support lookups in both directions. This can be a great workflow for any team working on designing their Dynamo schemas and record types.
- Consider the data that you’ll want to store in Dynamo. What sorts of records will you be creating, and how do they relate to each other?
- Think about your common data access patterns. Which queries will be commonly performed on the data.
- Design the initial records while thinking about those data access patterns. For now, only worry about one-to-many relationships. Writing the queries in pseudocode while at the same time speccing out the record can be helpful.
- Identify which many-to-many relationships exist between your data. Create a “Membership Record” to relate these records, and again, think about your most common data access patterns here.
- Create an inverse GSI to support the reverse query.
- Refine! Nothing is set in stone, but the earlier you nail down the structure of your data, the easier development will be going forward.
Single Table Design is the recommended approach when working with DynamoDb, as it lets us take advantage of fast Dynamo query times for querying data without the additional costs and complexity of managing multiple DynamoDB tables. We hope you found this useful.
Once your database is set up and integrated into your application, see how easy it is to integrate OpenTelemetry and TelemetryHub to monitor your entire application through the simple yet powerful dashboard and tools. Learn more about TelemetryHub here and check back to learn more about the tech we use to build the applications for developers, by developers.