The elements of scalable machine learning

The data lake has been set up, app events are pouring in 24×7, a business problem has been identified, features have been engineered, and a model has been trained, but the road to production looks distant, broken and hard. Does this ring a bell?

Your ‘Machine Learning (ML) Runtime’ is an overweight beast and you need to get it in shape if you want to scale ML at your organization.

If you look at any successful consumer product, they all have one thing in common – they understand customer preferences in a fraction of a second and personalize the experience for them. This personalization requires surfing through the data lake, creating useful ML models, and then deploying to use these models in production.

At Zomato, we use ML to predict a lot of unknowns, often in real-time –

When will this food order get delivered?
How much time will the restaurant take to prepare this food order?
Who should be the delivery partner (DP) for this food order?
Is this photo a food shot?
Is the DP properly groomed?
Is the DP wearing a mask?
Is this review a fake review?

Solving these unknowns help us provide better customer experience and improve business metrics by either reducing costs, or increasing the revenue. I strongly believe, if an organisation starts integrating ML in their daily operating activities, it can create significant differentiation in the product experience.

The three ingredients of ML

At the heart of ML is a simple equation of three variables – Input, Brain and Output.

If the Output and the Input are known, the process of figuring out the Brain is called Model Training. And if the Input and the Brain are known, the process of computing the Output is called Model Prediction.

When we deploy a model to production, we create a ‘Model Server’ so that we can make predictions via remote API’s (Application Program Interface). By using these remote API’s, we can make Model Predictions and make our applications smart. As you deploy more ‘Model Servers’, you evolve your ‘ML Runtime’. Formalizing your model training and deployment process can increase your team’s cadence, and higher cadence means faster turnaround time, more experimentation, and better models.

Engineering the system that performs at scale and still predicts at a fraction of a second requires a well-oiled ‘ML Runtime’. And our ‘ML Runtime’ consists of four essential components – Feature Compute Engine, Feature Store, Model Store, and Model Serving API Gateway.

We support two types of features based on frequency – real-time features and batched features. Real-time features are computed via event streams published on Apache Kafka, and are processed in real-time by stream processing engine Apache Flink. These are then stored in an Online Feature Store powered by Redis Cluster. Batched features are computed using Apache Spark, and are stored in our Offline Feature Store DynamoDB with hot features getting cached in Redis Cluster.

Load testing ‘Feature Store’

In preparation for 2021 New Year’s Eve, we load-tested our Feature Store to a maximum throughput of around 18 million requests per minute, with everything working as expected in terms of performance and latency. This led to a 3X improvement in our Feature Store service compared to 2020 New Year’s Eve.

Here at Zomato, we convert all our production models into a standard format via MLFlow, which also provides a registry for such models. The primary advantage of doing so is decoupling. It makes it possible to write tools that work with models from any ML library (Tensorflow, PyTorch, LightGBM or scikit-learn) without having to integrate each tool with each library.

Coming to our production deployment, it is orchestrated in the cloud with Kubernetes, which is a container orchestration platform. Most of the models we use are tuned for inferencing on CPU rather than GPU, and to further optimize the cost, we utilise spot instances in our Kubernetes cluster. This setup of ML on Kubernetes has helped us adapt and scale the model, serving across multiple production models, with ease.

Interestingly, after deploying multiple models in production, we observed a pattern – model features are tightly coupled with the production model. As a system-design, to make our model changes faster and more independent, the desired outcome is to engineer a system that makes clients agnostic to this coupling. Our ML API Gateway was written in Golang with the intent to remove this coupling at the client side so that the specific logic sits within the Gateway.

Generally, when we redeploy a retrained model or a new model to the same problem, our API requests to the ML API Gateway do not change. That gives us tremendous liberty to deploy and experiment models often, and with less effort. ML API Gateway has been written as a workflow engine that executes a directed acyclic graph of tasks, and it has native support with our Feature Store i.e. the Gateway is responsible for fetching features for the model based on the request from the Feature Store as specified in the model plan.

This system has reduced our time to deploy a model to production to less than 24 hours.

Let’s now review what we have built in the last one year using this ‘ML Runtime’. All of this has been accomplished by a small team of highly motivated explorers and deeply committed Data Scientists, ML Engineers and Data Engineers at Zomato.

1. Menu digitization

Customers have different cravings, and sometimes, very specific ones. Maybe when the mood hits, they don’t want just any kind of Indian food – they want Chicken Chettinad with a side of paratha, and nothing else will hit the spot! To help such eaters satisfy their cravings, we have built a system that uses ML to digitize menus without the requirement of any human input. This enables us to automatically recommend restaurants to customers based on searches for specific dishes.

With this system, we have witnessed improvements in customer experience through advanced dish search. Our content team also uses it to accelerate menu creation for online ordering.

The system takes a menu image as input and passes it to different models like Text Detection, Optical Character Recognition (OCR), Section Detection and Dish Classifier to showcase where different kinds of dishes are presented within the menu.

By Chiranjeev Ghai

2. Personalized homepage restaurant listings

Restaurant recommendations are powered through a customer’s past purchases, browsing history, and what other similar customer’s in the vicinity are ordering. The aim is to optimize for order through rates (OTR), and to achieve this optimization, we utilize the LambdaMART implementation of LightGBMs called LGBMRanker.

By Saurabh Kalia, Manav Gupta

3. Increase in GMV and AOV

Often, we want to solve multiple business objectives simultaneously. The primary goal of restaurant listings is to not only optimize the OTR but also increase GMV per app open. In our case, this translates into recommending restaurants to a customer such that they view more restaurants in line with their interests, and in turn, increase the value and frequency of their orders.

This multi-objective optimization is achieved by a popular Reinforcement Learning technique known as Contextual Multi-Arm Bandit (MAB) with Bayesian Regression. We used features related to Order Value Distribution for customers and restaurants, and the Probability to Order to optimize for GMV and OTR. This is implemented at customer level, and the weights for each customer are updated daily based on their interaction with our restaurant listing. MAB gives the expected reward at a customer-restaurant level and the restaurants are re-ranked based on the expected reward.

We were able to increase GMV per app open and AOV by INR 3 and INR 6 respectively through this approach.

By Deepankar Pal, Manav Gupta

4. Predicting food preparation time (FPT)

FPT is an important component of Estimated Delivery Time (EDT) that we share with customers placing an order on Zomato. A more accurate FPT prediction translates into a more accurate EDT prediction and hence, makes it less likely for us to breach the EDT conveyed to our customers. FPT depends on numerous factors such as the quantity of dishes ordered, type of dishes ordered, restaurant behaviour, time of day, day of week, footfall in the restaurant, etc.

We have created a Bidirectional LSTM-based deep learning model that takes into account all these features and provides FPT for each order in real-time.

By Deepankar Pal, Abhilash Awasthi

5. Enhancing road detection

Zomato’s ETA, time services, and various location-based services rely on map data which is in an open source format. It may be free, but there is ample room for improvement, especially because it is sparse in many Indian cities.

This specific project tried to leverage extensive DP pings along with DP trip data to reconstruct maps. This helps create new roads, unknown roads, and shortcuts, which in turn result in dense maps, higher ETA accuracy, better snapping, routing, addressing, etc.

This multi-level approach includes improving raw location data by conducting sensor fusion within the DP app, implementing various pre & post-filtering techniques to remove GPS noise, map inferencing & reconstruction, filtration of new roads, merging the newfound roads with India maps, making them routable, etc.

By Siddhartha Agnihotri, Vedanta Jha

6. Active DP dispatch

Our DP’s often travel to locations, which according to their instincts and experience, drive higher orders for them. As a result of this behaviour, a gap in supply and demand is created, which leads to shutdown due to short supply at specific locations.

Active Dispatch, a Deep Q-Network-based multi-agent Reinforcement Learning Model, aims to reduce this gap and also increase DP utility by recommending appropriate locations to free DP’s. This model is trained using predicted demand and available DP supply to increase their earnings. Empirical testing has shown it to be better performing than a system without any recommendations and rule-based algorithms. Overall, it led to an increase in % orders a DP received when they followed our recommendation, and a decrease in time between delivery and their next order.

By Rahul Kumar, Shubh Chaurasia

7. DP grooming audit and compliance

At Zomato, our delivery fleet is one of the most important components of our ecosystem and pivotal to great customer experience. To ensure its proper functioning, we have put various audit mechanisms in place.

Our ‘DP selfie audit’ is one such case where we check for compliance related to DP grooming – asset audit and mask audit. The asset audit checks whether DP’s have put on a Zomato t-shirt and are carrying a Zomato bag, whereas the mask audit checks whether DP’s have put on masks for safe and secure deliveries.

To solve this, we have developed a system where we can easily schedule such audits and automatically approve or disapprove of DP grooming. In this system, an audit can be triggered on the DP app either while they’re logging in or during order deliveries. The DP’s are mandated to submit their selfie in a very short span of time, which increases accuracy of checks. These images then flow to our DP-service, which passes them to deep learning models that can effectively detect faces with and without masks. Our asset audit similarly replicates the process for the presence of Zomato t-shirts and assets.

These models are trained using convolutional neural network and image processing algorithms in a classification setting. The addition of automated systems to the pre-existing manual moderation (moderated by human moderators) allows us to conduct more frequent audits, and at scale, by removing the cost associated with manual-audits. This also helps provide real-time feedback for a seamless DP experience.

[“source=zomato”]