Asynchronous processing helps the System achieve high Availability, Scale & resilience.
Suppose you have a website that helps people file Income Tax returns where a user logs in to your website to fill in his details and attach documents for further processing.
The backend of your system validates the data, does some needed transformations or formatting of the data, and then finally sends it to the web service offered by the government’s Tax department to file the returns online.
You will have all sorts of validation on the user’s request like the size of the documents, formats,number of documents for the first happy flow in production but the story changes when the system goes live, there can be N number of factors that come into play in the field and the future challenges of scale i.e when you design this software for a highly populated country say India where the scale ( data + requests/second to the offering ) is huge ,the system must be equipped to recover from problems and avoid them before they could cause an outage, But then, you also have to keep some considerations about constraints of not only the system developed by us but the systems and services offered by other parties too.(As small as the details of the bandwidth\location\user profile accessing your offering but we are not covering them today in this article)
Lets say the user sends the details of his earnings to POST /validateTaxSubmission which is hosted by you and then this service after some validation submits the details to the govt’s Service POST /submitReturns which is called by your service.
To summarise so far POST/validateTaxSubmission (your service) sends requests to POST /submitReturns (third party service)
You must keep in mind the following considerations :
P1:When POST /submitReturns is down
P2:When POST /submitReturns is responding poorly
This will lead to thread contention at your end as the threads will wait for a longer time for responses and while the user load on your website increase say it is 31st March, your backend will head towards outage over time.
So you need to control the flow in the system like a Linear Control system does .[Just that i did a degree in Electronics & Communications and it reflects in my thought process sometimes :) ]

P3:When POST /submitReturns starts erroring out with 500 due to any issue related to and is beyond our control, which is when you must replay the user requests when /submitReturns is back.
P4:.The rate of processing i.e processing of submissions per second
The maximum throughput of a linear control system/workflow is the throughput of the least performing part/sub-system of the overall system i.e if in this case POST /submitReturns processes 1500/second and POST /validateTaxSubmission processes 1600, the overall flow can work at maximum 1500/second .
So, the flow of traffic must be controlled in a manner the least performing sub-system/Service doesn't start giving slow responses i.e we must not go beyond its current scalability point ( but must scale it out :) )
Scalability point is the point ( situation- requests/sec, data ) beyond which the APIs/system starts showing “considerable” degradation in performance.
P5:Most importantly the day — 31st March
how do you handle the 10x load on peak days .If your backend system that validates the requests has a throughput of x ops/second but the user load starts moving towards 2x,4x and 5x , the backend will move towards outage and at some point of time will start giving 5XX or timeouts to the user which is where there are two problems
- You loose potential customers
- The customer isn't going to re fill the forms and send again or replay the request which is where you loose the requests unless saved somewhere.
Solution :-
This is where you must have a “flow controller” as in the above diagram.This flow controller can be a Event based processor implemented by a queue .
The idea is to send the requests of user submissions to POST /validateTaxSubmission to a queue but before that generate the orderId and tell the user that you have accepted the request and then process the requests asynchronously batch per batch.
Lets think about the below picture :-
P1, P2 …PN are the containers/servers of POST /validateTaxSubmission and you can scale them down or up by k8 as the load increases.
C1,C2..CN are the consumers of the user requests which send the work to
The number of C1,C2…..CN from the below picture will he evaluated based on the rate limits or the response times from POST /submitReturns.
The solution to the problems discussed above :
P1: When third party service is down , you still have not lost the requests as they are in queue ,you process these requests when the third party is available.
P2: When third party service is responding poorly, you can reduce the scale at our ends to avoid thread contentions in the Kubernetes Nodes resulting in making the system more resilient to the changes in other components of the system.
P3: When the third party results in 500, you will add the message back to the queue for processing again (incase the issue wasnt really qualifying for 4XX — client issue )
P4: .The rate of processing of third party being very low .
As can now increase and decrease the number of consumers ( C1,C2..CN) you could handle the situation better on the fly.
P5: Which is actually the issue that the Front end layer can take 2X the load while the backend isnt really going beyond X, you could simply accept the requests of submissions ,generate an orderId for reference give that to the customer ,stating that the validations and submission of the return will be filed later. This way you can scale the front end independently of how the other systems in the user request flow are behaving.
