Top MLOps Interview Questions.


Frequently asking MLOps interview questions and Answers?

What is MLOps?

  • MLOps, a.k.a Machine Learning Operations, is an emerging domain within the more significant AI/DS/ML space that addresses operationalizing the ML models. MLOps is a practice and culture within software engineering that fundamentally attempts to unify the machine learning/data science model development (Dev) and its subsequent operationalization (Ops).
  • MLOps has some analogies to traditional DevOps, but it is also significantly different from that. While DevOps predominantly focuses on operationalizing code and software releases that may not be stateful, MLOps has another complexity added — data. That is why MLOps is often referred to as the union of ML + Data + Ops (machine learning, data engineering, and DevOps).

What are the benefits of MLOps?

MLOps has several benefits. Some of them are listed below (in no particular order)

  • Improves Efficiency — Implementing MLOps principles allows both Data Engineers and Data Scientists to have unfettered access to curated and cultivated datasets and exponentially increases their ability to develop models faster.
  • Rinse/Repeat — Because MLOps helps automate all or most of the tasks/steps in the MDLC (model development lifecycle), data scientists and MLOps engineers can reproduce experiments quickly, ensuring models are trained and evaluated correctly. It also enables versioning both for models and data.
  • Improves reliability — Because MLOps practices borrow heavily from DevOps, it also ingrains several CI/CD principles within itself, thereby improving code quality and reliability.
  • Leaves breadcrumbs (audit trail) — The ability to have models and datasets versioned will considerably improve the model audit trail, allowing data scientists to fall back on the model that performed better if the newer iteration does not meet expectations.

How do you create infrastructure for MLOps?

  • There are many different ways in which MLOps infrastructure can be created. The core responsibility typically lies outside of the scope of an MLOps engineer. However, for a given set of existing environments, the MLOps engineer can create a tech stack best suited for hosting a successful machine learning platform. For example, suppose the enterprise has a predominantly AWS-based infrastructure. In that case, it becomes easy to implement MLOps pipelines utilizing AWS Sagemaker framework in conjunction with services like Sagemaker pipelines, Cloudformation, Lambdas for orchestration, and Infrastructure as Code. If the enterprise is open, the best platform for most modern software development firms is leaning towards a Kubernetes (k8s) powered infrastructure. This also enables the ML engineer to adopt Kubeflow, quickly becoming the de facto MLOps framework of choice for many ML practitioners. However, creating an infrastructure exclusively for ML models is generally not within the scope of an ML Engineer.

What is the difference between MLOps, ModelOps & AIOps?

  • MLOps is an application of DevOps in building end-to-end Machine Learning algorithms, including — Data Collection, Data Pre-processing, Model Building, Model Deployment in Production, Monitoring Model in Production, and Model Periodic Upgradation.
  • ModelOps is the application of DevOps in end-to-end handling implementation of any algorithms such as Rule-Based Models. This is a more generic term used.
  • AIOps is building AI applications end to end using DevOps concepts

Define MLOps and how is it different from Data Science?

  • MLOps is a profession where the entire lifecycle, including the deployment and monitoring in production, is performed seamlessly. This also means that the Data Science workforce with MLOps skills will be more preferred, and this will be the way forward for scaling up the career ladder & earn lucrative salaries that are much higher than typical Data Scientists.

What is the difference between MLOps and DevOps?

  • MLOps & DevOps have a lot of things in common. However, DevOps include developing and deploying the software application code in production, and this code is usually static and does not change rapidly.
  • MLOps, on the other side also includes developing and deploying the ML code in production. However, the data changes rapidly, and the up-gradation of models has to happen more frequently than typical software application code.

What is the difference between MLOps and DataOps?

  • DataOps is a term coined by IBM with a focus on data quality. Sudden change in data will trigger an alarm to the stakeholders for action.
  • MLOps has DataOps as one of the components, and in addition to that, it has end-to-end model development, deployment, monitoring in place.

What are the risks associated with Data Science & how MLOps can overcome the same?

Data Science typically has the following issues:

  • The model goes down without an alert and becomes unavailable
  • The model gives incorrect predictions for a given observation that cannot be scrutinized further
  • Model accuracy decreases further as and how time progresses
  • Model maintenance also should be done by data scientists, who are expensive
  • Model scaling across the organization is not easy
  • These risks can be addressed by using MLOps.

Is model deployment end of ML lifecycle?

  • Model deployment in production is currently being treated as the start of the actual ML lifecycle. Post-deployment, monitoring how the model is performing for a longer duration, how the data is increasing, and how to scale the model for broader organization use. These are the activities at the core of the ML lifecycle and the heart of MLOps.

How to create CI/CD pipelines for machine learning?

  • CI stands for continuous integration, and CD stands for continuous deployment. The fundamental feature of having a CI/CD pipeline is to ensure that data scientists and software engineering teams can create and deploy error-free code as quickly as possible.
  • Specifically, a CI/CD pipeline aims to automate and streamline the software deployment process, including building code, running tests, and deploying new versions of the model/application when there are updates/revisions.
  • CI/CD for machine learning has an added complexity in terms of including data in addition to code. But, it could be achieved through various tools depending on the technical stack the enterprise is using.
  • If the technical stack is primarily AWS-driven, Sagemaker pipelines can stand in for CI/CD pipelines.
  • Other approaches could be to use Kubeflow pipelines and traditional tools like Jenkins or even Github actions to build CI/CD pipelines.

Explain model/concept drift.

Model drift, sometimes called concept drift, occurs when the model performance during the inference phase (using real-world data) degrades compared to its performance during the training phase (using historical, labeled data). It is also known as train/serve skew, as the model’s performance is skewed when compared with the training and serving phases. This could be due to many reasons like

  • The underlying distribution of data has changed.
  • Unforeseen events — like a model trained on pre-covid data are expected to perform much worse on data during the COVID-19 pandemic.
  • The training happened on a limited number of categories, but a recent environmental change happened, adding another category.
  • In NLP problems, the real-world data has significantly more number tokens that are different from training data.

To detect model drift, it is always necessary to keep continuously monitoring the performance of the model. If there is a sustained degradation of model performance, the cause must be investigated, and treatment methods must be applied accordingly, which almost always involves model retraining.