Introduction to Machine Learning Systems.

raunak malhotraraunak malhotra
6 min read

Let’s start with a basic understanding of Machine Learning (ML) and identify when it is an optimal solution.

I won’t exaggerate the term machine learning—instead, here’s a simple, clear definition:

Machine Learning is an algorithm that learns from data provided by the user, extracts complex patterns from it, and predicts the output for unseen data.


When to use Machine Learning?

Machine Learning has made a drastic change in the last two decades, and its uses are increasing day by day. However, do we need to use it for every single problem? The answer is no. Firstly, for some problems, ML might not be the most optimal solution. Secondly, machine learning depends on various factors for it to be applied effectively. Let’s discuss those factors.

  • Data: There should be enough data to train our ML model. Data is one of the most important factors in Machine Learning, which makes it powerful. If there is no data present, how will the system learn the complex patterns needed to predict unseen data? While we can deploy our ML model without any data and train it with new data entered by users, this approach can lead to poor customer experience and risks.

  • Learn: The system should have the capacity to learn. For example, when we explicitly define relations between two tables in a database, we can say a database is not an ML system. An ML system, however, will predict the relations between the tables if we provide enough data.

  • Patterns: There should be complex patterns present in the data. For instance, if we make an ML system to predict the outcome of tossing a fair coin, will it be able to predict? The answer is no, because there is no pattern in tossing a coin, similar to rolling a pair of dice. On the other hand, predicting stock prices or today’s weather involves patterns that are complex in nature, thus making ML systems suitable for these problems.

  • Prediction: The problem should be predictive in nature. ML models predict the output of new data points, whether it is a classification problem or a regression problem. For example, determining which class a provided image belongs to or predicting if it will rain today are scenarios where ML can be effectively applied.

  • Unseen Data and Its Relationship with Provided Data: Machine learning models are designed to make predictions on unseen data based on the patterns learned from the provided data. The relationship between the unseen data and the provided data is crucial because the model's ability to generalize and make accurate predictions depends on how well the training data represents the real-world scenarios it will encounter. If the unseen data significantly differs from the training data, the model's predictions may be inaccurate. Therefore, ensuring that the training data is comprehensive and representative of the problem space is essential for effective machine learning applications.


Let’s discuss the key differences between Machine Learning in Research and in Production.

AspectML in ResearchML in Production
ObjectivesThe main goal is innovation—pushing the boundaries of what's possible in ML.The focus is on delivering real-world value and solving business problems reliably.
Model PerformanceEmphasis is placed on pushing metrics like accuracy, F1 score, or BLEU scores.Performance includes robustness, scalability, and serving efficiency—not just metrics.
Stakeholder ObjectivesResearchers aim for publications and academic benchmarks.Engineers, PMs, and business leaders care about KPIs, uptime, and user satisfaction.
Computational PriorityPrioritizes computational efficiency and resource management. fast experimentation and high-throughput training.Prioritizes computational efficiency and resource management, fast experimentation, and high-throughput training. fast inference, resource efficiency, and minimal latency.
Training vs. InferenceHigh-performance training (often on large compute clusters) is critical.Inference speed and scalability under tight latency SLAs are key.
DataAssumes static, well-curated datasets.Data is dynamic, noisy, and constantly evolving with real-world changes.
FairnessOften considered a secondary concern (though this is changing).Crucial for trust, compliance, and real-world impact—especially in regulated domains.
InterpretabilityNice to have, but often sacrificed for performance in competitive benchmarks.Essential for debugging, trust, compliance, and stakeholder communication.

Let’s unpack why these differences exist and why they matter:

  • Research is about possibilities; production is about practicality. In research, the audience is typically academic peers or the scientific community, and success means publishing a paper or achieving state-of-the-art results. In production, success means the model integrates seamlessly with systems, serves millions of users, and doesn't break in live environments.

  • Different stakeholders mean different goals. Researchers might prioritize novelty and performance on benchmark datasets, while business stakeholders care about business metrics, customer experience, and infrastructure costs.

  • Data is alive in production. Unlike in research, where a static dataset is used to prove a point, real-world applications involve constantly changing data distributions (known as data drift), which can silently break models if not handled properly.

  • Fairness and interpretability aren’t optional in production. A model that can’t explain its predictions, or that discriminates against certain users, can have serious legal, ethical, and brand consequences.

  • Production introduces new constraints. Memory usage, inference speed, failure handling, and logging are often afterthoughts in research, but they become critical in production systems.


Machine Learning Systems Vs Traditional Software

Unlike traditional software, which follows explicit rules and logic crafted by developers, machine learning (ML) systems learn patterns from data and make predictions or decisions based on that learned behavior. In traditional software, the output is fully deterministic—given the same input, it always produces the same result. In contrast, ML systems are inherently probabilistic and can behave unpredictably when faced with new or shifting data. This makes ML systems more flexible but also more complex to debug, test, and maintain. Additionally, ML systems require continuous monitoring and retraining to adapt to evolving data, a challenge that doesn’t exist in most conventional software. As a result, building and managing ML systems often involves not just code, but also data pipelines, model versioning, performance monitoring, and infrastructure for retraining—making them fundamentally different from traditional software applications.


Summary

Machine Learning is a transformative technology, but not a universal solution. It thrives when:

  • Sufficient data is available

  • The system can learn from that data

  • There are complex patterns to detect

  • The problem is predictive in nature

  • New data is similar enough to training data to enable generalization

Understanding the distinction between ML research and production is vital to successfully building real-world ML systems. While research aims for innovation and performance on controlled benchmarks, production prioritizes reliability, scalability, and fairness in dynamic environments.

Finally, ML systems differ fundamentally from traditional software in their behavior, architecture, and lifecycle—requiring a new mindset, toolset, and approach to engineering.

8
Subscribe to my newsletter

Read articles from raunak malhotra directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

raunak malhotra
raunak malhotra