Offline vs online computations in software engineering

 


    When we are designing systems, we sometimes need to distinguish which pieces of requirements can be fulfilled in the online mode and which can be fulfilled in the offline mode. When I say online, I mean an action which will be completed while the UI is still live and the user is waiting for the response in real time. When I say offline, I mean tasks which can best be done when the user is not using the UI and these tasks are mostly done when the user is not interacting with the system in the back end. I thought of spending some words on this as recently I was working on a system which had to have some offline computations. 

    The general traits of pieces which might need to be designed for offline mode are somewhat like these : 

  1. Pieces which require heavy computation like processing GBs to TBs of data. Or maybe training a heavy machine learning algorithm with a huge dataset. Or maybe updating a large number of records in a database or data lake. Such tasks need heavy computation power.
  2. On similar lines, pieces take a lot of time to respond to. For example, say a piece of code downloads some huge file for a system which takes time in hours. We would not want the user to wait for hours staring at the UI on the loading screen. 
  3. Pieces that are dependent on some other source for input can only work once the dependent system has completed its execution and returned the response. Unless we can parallelize the two, one has to wait for the other which will cause a bad user experience.
    Online systems do not work very well in these cases. Recently I had this use case for designing a system which would allow users to create a request based on which a set of actions which includes downloading a dataset and evaluating a machine learning algorithm and generating the accuracy reports for it. I thought of adding pieces of offline computation to this. What I did was to create an online piece which would allow users to create a request with some info about the AWS S3 location where the dataset resides. They would be allocated a requested and the status of that request-id would be updated as each step in the computation is completed in the backend.
    To streamline the set of actions which need to be done for any request, I used an AWS Step function. It triggered an execution whenever a new request was created and each execution consisted of steps which at a high level included downloading the dataset, followed by triggering the ML model followed by refactoring of output and generation of evaluation reports and finally followed by saving the report to an AWS S3 location. Each step was followed by a small step which invoked a lambda to update the status of the request. This way, the user can submit a request and keep checking its status based on notifications it will receive when each step is completed. He/she will not have to wait for the UI to respond. 
    It ultimately depends on the user's requirements on if the user is happy with this approach. A lot of users and use cases would not allow this kind of offline approach and they would need an almost real-time response. We might have to come up with a different design in that case but in many cases, this is not really possible and we must aim to reduce the time and enhance the user experience as much as we can. Maybe someday with exceptional computation power, we may convert a lot of offline use cases to online but for now, we have to live with it. As engineers, I believe even if we have to live with this, we must try to make it as gracious and seamless as possible for the user.

Comments