Joining DynamoDB - what you need to know
Most of us are used to using normalized SQL databases to manage relational data. It helps to avoid data redundancies and offers simple queries such asbring togetherto combine this data with the defined foreign keys.
A join query aggregates results from two or more tables based on related information (foreign keys). Because DynamoDB is a NoSQL database, you cannot run "join" queries on multiple tables.
A join requires DMBS to examine multiple tables and perform complex processing to aggregate the data and return a result set. DynamoDB is designed to provide fast single-digit answers, regardless of the size of the data set. However, because joins don't scale well, DynamoDB chooses not to support them.
However, application developers need the ability of joins to retrieve aggregated results with a single API call. DynamoDB lets you mimic a "join" by merging the data with theunique table designPrinciple.
How does DynamoDB perform a join?
unique table design
DynamoDB allows developers to simulate a join with theunique table designPrinciple. You can use the single table design to store all your application entities in a single table and use generic "primary key" attributes (e.g. PK, SK) to query the data based on your access pattern. This allows developers to model complex one-to-many and many-to-many relationships in a single table, enabling fast lookup times for individual queries.
Model Relational Data in DynamoDB with the Single Table Design (Using Primary Key + Query API)
To better understand this, let's model a one-to-many relationship in DynamoDB and see how we can mimic a join query by implementing the single-table design.
Consider the following scenario:
- An organization has many users.
- A user belongs to an organization.
Figure 01: Example of a one-to-many relationship
We use a DynamoDB table to store the organization and user entities as this allows for simpler access patterns. So I will use the method to design and create a table with a generic primary keySingle table designeroffered byDynobasis.
Figure 02: Defining a sample template for the single table project.
Image 02 shows the genericPKeSKNames to associate partition and sort key. Removes the coupling between the keys and the entity type maintained in the table. We can then use the primary key and the query API to query the required data.
In addition, an attribute name for the entity type is defined to easily distinguish the type of each persistent entity in each table.
When storing data in a single table, you can use the pound sign (#) to prepend the entity type for the partition key value. It ensures that the same partition key does not overwrite entities of multiple types.
Figure 03: Allocation of the partition key
Figure 03 shows the recommended method for mapping the partition key to an element in a single table design. It helps DynamoDB to store the "Items Collection" in the same partition so it can be queried more easily by DynamoDB.
Figure 04: Inserting an organization into the table
Once we understand how to use the partition key convention, we can start entering the organizational information into the table. I added three organizations named "Facebook", "Microsoft" and "Samsung" using the partition and sort key "ORG#FACEBOOK
", "ORG#MICROSOFT
", e"ORG#SAMSUNG
". The sort key is the primary key to ensure that only the organization can be queried.
Also, the "Entity Type" for each item is marked as "Organization" to make it easier to visualize the data model.
The next part is to model the one-to-many relationship.
Traditionally, we would add a foreign key ("Organisations-ID
") for the user. But with the single-table design, we could create an entry with the partition key - "ORG#ORG_NAME
" and sort key - "USER#USER_NAME
."
Consider the example shown below.
Figure 05: Assigning a user to the organization
In Figure 5, a new item with the sort key "USER#LAKINDU
" is entered with the partition key "ORG#APPLE
". This means that a user will be created for a specific organization. Likewise, we can enter many users in the organization as shown below.
Figure 06: Creating the single table design for the data
We provide a sort key for each user with the prefix "FROM USER#
" for easy reference. We created two access patterns for the user entity.
- Get information from all users in the organization.
- Get information for a single user in the organization.
This is possible because of the way we model the primary key. Therefore, you should model your data this way to mimic "join" queries.
I declared the two access patterns using Dynobase.
Figure 07: Defining access patterns with Dynobase
Now we can query the data with partition key and sort key (using thebegins with
operator) to retrieve aggregated data. With this, we have successfully modeled our data to mimic a compound.
Figure 08: Viewing the Aggregated Tabular Results
Figure 08 illustrates the collections of items that we can easily obtain using our composite key. We can get all organizations and users in the organization.
Querying the modeled data
Now that we have successfully modeled the relational data into a single table, we can query that data. The following snippets show some of the queries we can perform.
Query of organization information
artMesa= "test application";art Querying an organization = asynchronous () => { art {Article= [] } = expectDocument client.Advice({ table name:Mesa, key condition expression: '#PK = :PK und #SK = :SK', ExpressionAttributeNames: { "#PK": "PK", "#SK": "SK" }, ExpressionAttributeValues: { ":PK": "ORG#APPLE", ":SK": "ORG#APPLE" }, }) .promise();console.record((Article))};Querying an organization();
Figure 09 - Query Result: Query on a single organization
The snippet shown above retrieves information about a single organization. By providing the same partition and sort key value, DynamoDB can query information for a single organization.
Get all users in an organization
artMesa= "test application";art Querying users in an organization = asynchronous () => { art {Article= [] } = expectDocument client.Advice({ table name:Mesa, key condition expression: '#PK = :PK and begins_with(#SK, :SK)', ExpressionAttributeNames: { "#PK": "PK", "#SK": "SK" }, ExpressionAttributeValues: { ":PK": "ORG#APPLE", ":SK": "BY USER#" // Query all users in the organization }, }) .promise();console.record((Article))};Querying users in an organization();
Figure 10 - Query result: query of all users in the organization by a join simulation.
The snippet above shows a query to get all users in the organization. In theory, this query demonstrates the functionality of a SQL join as we are aggregating results from two entities using a common attribute (organization name).
Search for a single user in an organization
artMesa= "test application";art Consult and execute a single user in an organization = asynchronous () => { art {Article= [] } = expectDocument client.Advice({ table name:Mesa, key condition expression: '#PK = :PK und #SK = :SK', ExpressionAttributeNames: { "#PK": "PK", "#SK": "SK" }, ExpressionAttributeValues: { ":PK": "ORG#APPLE", ":SK": "USER#LAKINDU", // Get information for a single user }, }) .promise();console.record((Article))};Consult and execute a single user in an organization();
Figure 11 - Query Result: Query of a single user in an organization
Figure 11 shows an extension of the query we saw earlier. He uses "=
" instead of "begins with
" as a sort key comparison operator to retrieve information about a single user.
artMesa= "test application";art Consultation with the organization and the users = asynchronous () => { art {Article= [] } = expectDocument client.Advice({ table name:Mesa, key condition expression: '#PK = :PK', ExpressionAttributeNames: { "#PK": "PK", }, ExpressionAttributeValues: { ":PK": "ORG#APPLE", }, }) .promise();console.record((Article))};Consultation with the organization and the users();
Figure 12 - Query result: Query for all users and organizations
Finally, we can get organizational information for all users by not specifying the sort key.
The examples above show the combination of queries we can run when the single table design is implemented correctly. Additionally, it provides a way to effectively emulate the SQL join in DynamoDB over a table.
Therefore, the single-table design principle is the recommended way to model relational data in DynamoDB.
DynamoDB join performance
Performing joins with DynamoDB is much faster than with a traditional SQL database for two main reasons.
1. Reduced API calls
Using a single table reduces the number of API calls for the access pattern, which reduces data retrieval latency.
2. Improved query performance through partition optimization
Single table design used partition keys like "ORG#ORG_NAME
" and classifies keys as "#ORG#ORG_NAME
" or "USER#USER_NAME
." When items are persisted in DynamoDB, the partition key is passed through a hash function to determine the partition in which to store the data. This allows us to store the group of related data in one partition.
Therefore, when the data is queried, DynamoDB can retrieve the entire collection of items from the same partition, resulting in faster query times for the join operation.
These reasons make DynamoDB joins much more effective and scalable compared to SQL joins.
Diploma
Performing joins is not supported in DynamoDB due to performance issues related to scalability. However, DynamoDB uses the single-table design as a more scalable and improved way to manipulate and query relational data without sacrificing performance.
I hope this article has given you the information you need to model relational data and efficiently query it using DynamoDB.
Thank you for reading.
Frequently Asked Questions
Can DynamoDB create joins?
NO. Joins are resource-intensive queries that don't scale well as your database grows. Therefore, DynamoDB does not allow "join" queries. However, it is possible to create joins for DynamoDB tables through external services such asApache Hive and Amazon EMR. However, it is important to note that DynamoDB does not support joins natively.
Can DynamoDB store relational data?
Yes. Although DynamoDB is a NoSQL database, its single-table design allows developers to model and store relational data.
How does DynamoDB manage relational data?
DynamoDB manages relational data in a single table usinggeneric primary key (can be a composite key). It then uses the primary key to query the data based on the access pattern. To perform additional queries, you can use inverted indexes or GSIs to improve relational data access patterns.
How does DynamoDB handle many-to-many relationships?
DynamoDB can model a many-to-many relationship usingDesign pattern for adjacency lists.