Recently, I have been working on the design of a request management system. I found it an interesting topic as I was gathering requirements for it and designing the system. I thought of writing down some insights that I discovered while I was designing this system. Advice and improvements are most welcome! I would try to reduce the requirements to generalize the system for most request management use cases.
To begin with, we may start with these functional requirements :
- We want to create a request.
- We want to update the status of the request.
- We want a retry mechanism in case the status update of the request fails.
- We want a notification-sending mechanism when the request has reached its last SUCCESS or FAILURE state.
- Removal of requests older than 6 months automatically.
To begin with, we can keep these basic non-functional requirements :
- The eventual consistency of the requests is acceptable given that the final SUCCESS or FAILURE should be always consistent.
- The whole system would be write-heavy as we will create and update records frequently but people won't check their records very often (given they would receive some notification when the final stage of SUCCESS and FAILURE is reached).
- The retry mechanism should be fully automated to remove any manual efforts. The failure rate though should be as low as possible.
With this basic set of requirements, we would need to make some decisions in terms of design. We can discuss them separately :
- Database
- We would want to make the contents of the request dynamic and maybe give the ability to add a new field in the request as we go. Also, the operations and potential queries that we would need to support would be mostly writing and basic reads without joins etc. Also, we would need some sort of scaling to avoid APIs failing. We are satisfied with the eventual consistency of the system and would be more interested in flexibility with the data that we are storing.
- Based on these points, I might go with some no SQL database, probably DynamoDB as it offers auto-scaling and flexibility of creating a table with just the basic partition key and the GSI's. The remaining fields can always be added.
- I would probably have a 'requestId' (random UUID) as the partition key of the table 'UserRequests' and the 'creationTimestamp' as a sort key. We may have a few GSIs based on the requirements of querying.
- Queues
- In the process, we might want to have queues at different places of the design. For example, we would want to have a queue for retries and for status update triggers. I would use Amazon SQS probably as it integrates very easily with DynamoDB as well as AWS Lambdas which we may use to retry the failed update requests in batches.
- Minor computations
- We might go with AWS Lambdas for minor computations as they are easy on-the-go computation solutions with 10GBs of ephemeral storage and 15 minutes of max timeout. This much time and space should be good for minor computations and Lambdas integrate very easily with DDB and SQS.
- Major computations and APIs
- For this, I am assuming a service running on some instances (probably EC2 or Fargate).
- Notifications
- We might want some sort of notification service. Instead of designing one, we may use off-the-shelf AWS SNS notifications which are capable of sending notifications via a lot of mediums. Besides, its notifications can be consumed by SQS and Lambdas.
Based on these basic design decisions that we took, let's try to come up with some basic high-level component diagram which may put all these pieces together to form a working system. We may improve on it later as we go. I came up with this one :
Figure 1: Request creation flow component diagram first draft
These two diagrams almost summarize my initial thought process of how we can proceed with such a system. We are missing a few things like load balancers and replications etc. But I think I have tried to keep it generic and readers may extend this may be to suit their needs of scale and low-level design. All the components used in the design are easily adaptable to scale and low-level design updates for most of our use cases. The internal implementation of the APIs is also not that detailed and we may need a few more APIs here and there but those can be covered as per the requirement of the readers.
I am still working on improving this model but looks like this will work for my use case at least (with more custom changes). I am open to better approaches, drawbacks and further enhancements (I am still learning and seems will always be!).
- Amrit Raj
Comments