Python with a Dash of C++: Optimizing Recommendation Serving

Thu, 30 Jun 2022 16:54:09 +0530

Serving recommendation to 200+ millions of users for thousands of candidates with less than 100ms is hard but doing that in Python is harder. Why not add some compiled spice to it to make it faster? Using Cython you can add C++ components to your Python code. Isn’t all machine learning and statistics libraries already written in C and Cython to make them super fast? Yes. But there’s still some optimizations left on the table. I’ll go through how I optimized some of our sampling methods in the recommendation system using C++.

Go faster with Go: Golang for ML Serving

Mon, 20 Jun 2022 21:36:00 +0530

So the ask is to do 3 Million Predictions per second with as little resources as possible. Thankfully its one of the simpler model of Recommendation systems, Multi Armed Bandit(MAB). Multi Armed bandit usually involves sampling from distribution like Beta Distribution. That’s where the most time is spent. If we can concurrently do as many sampling as we can, we’ll use the resources well. Maximizing Resource utilization is the key to reducing overall resources needed for the model.

multi-armed-bandit on AI Logs

Python with a Dash of C++: Optimizing Recommendation Serving

Go faster with Go: Golang for ML Serving