Blog | Artificial intelligence

AI Powered Predictive Maintenance

  1. Home
  2. >
  3. 2023
  4. >
  5. April

AI Powered Predictive Maintenance

What is predictive maintenance and how does it work?

Artificial intelligence (AI) is transforming the energy and resources industry in many ways, including predictive maintenance. AI powered predictive maintenance is a data-driven approach that uses machine learning (ML) algorithms to address some of the crucial challenges related to machine maintenance.

The energy industry is heavily reliant on machinery and equipment. These assets are critical for operations and require ongoing maintenance to operate effectively. Traditional maintenance approaches, such as reactive or scheduled maintenance, can be costly and inefficient.

Reactive maintenance involves fixing equipment after a breakdown has occurred, leading to costly downtime and repairs whereas scheduled maintenance involves conducting maintenance at set intervals, regardless of the condition of the equipment leading to unnecessary maintenance and downtime.

Both these methods of performing maintenance are not smart and leads to wastage of resources.

Predictive maintenance, on the other hand, uses data to predict when equipment is likely to fail and schedule maintenance accordingly. This approach can help companies reduce downtime, extend the life of equipment, and increase safety. AI and ML algorithms are used to analyse data from sensors and other sources to identify patterns and anomalies. Predictive ML models are then developed to predict when maintenance will be required.

Now, let’s look at what can be achieved using AI powered predictive maintenance solutions.

First, we need to understand what type of data is involved in the process. Generally, there are two types of data involved in building such systems.

1. Images
2. Sensors’ reading – analogue signals and numbers

This data is used to answer three types of questions which leads to three different use cases.

1. Is the machine behaving abnormally? – Anomaly detection
2. Is there a fault in my system? And what is the root cause? – fault classification or root cause isolation
3. When is the machine likely to encounter a fault condition? – Remaining useful life estimation.

Once we have clarity on what we want to achieve using predictive maintenance solution, it is time to move onto the next steps of building the solution.

Developing the solution involves multiple steps described below.

1. Data acquisition – Collecting relevant data to solve a problem is the base of any machine learning solution. Data is collected either from the sensors attached to the equipment or from images captured during operation.
2. Data analysis – Once the data is stored, it is analysed using signal processing techniques (extracting, manipulating, and storing information embedded in complex signals and images) and desired features are extracted from it.
3. Modelling – Once the data type and the problem statement is clear, we can make a choice and select the right type of ML algorithm. For example, while working with image data, we can use deep neural network-based architecture such as Convolutional Neural Network (CNN) to classify if the equipment is damaged or not. Once the training process is complete, next we set up an inferencing pipeline which is used to query from the model.

Generally we use CNNs with image input. It’s not clear what you mean by automating feature selection from image data. For fault detection do we need something like semantic segmentation? If yes you need to explain that.


Considerations in building a predictive maintenance solution

Let’s talk about key considerations with building AI powered predictive maintenance solution -

• Data collection – We need enough data to cover all the use cases we want our solution to deliver. Data corresponding to fault or failure conditions is usually difficult to obtain as the incidents of failures are relatively few. One way to solve this problem is by simulating scenarios to capture fake failure . Another way is using existing failure data to create more failure data using machine learning.
• Data engineering and analysis – Managing large amounts of data cost efficiently is a challenging task. Machine sensors might produce data in GBs every day, and to process, store and analyse this data requires robust data pipelines.
• Feature extraction – The data which will be used for training the ML models should contain maximum information and minimum noise. If the features are not relevant to the problem, resources will get wasted in training models with such data. So, subject matter expertise is important to extract as much information as possible while reducing the size of the data set.

In the energy and resources industry, predictive maintenance can be used for a wide range of equipment, including turbines, compressors, pumps, and generators. Let’s have a look at some of these examples.

Predictive maintenance to monitor the condition of wind turbines – Data from sensors on the turbines is used to monitor parameters such as temperature, vibration, and oil levels. ML models can then analyse this data to predict when maintenance will be required. This can help companies schedule maintenance during periods of low wind activity, reducing downtime and increasing energy production.
•  Predictive maintenance to monitor the condition of pipelines and other infrastructure – Data from sensors placed along pipelines is used to monitor parameters such as pressure, temperature, and flow rate. ML models can then analyze this data to predict when maintenance will be required. This can help companies identify potential leaks or failures before they occur, reducing the risk of accidents and downtime.


Predictive Maintenance in Action

Here’s an example of how to leverage AI to predict Remaining Useful Life (RUL) for an equipment.

We can build, deploy and scale AI powered solutions on AWS. Below is one such solution for predicting remaining useful life of an equipment.

The example uses Amazon SageMaker to train a deep learning ML model with the MXNet deep learning framework. The model is a stacked bidirectional LSTM neural network that can learn from sequential or time series data. The model is robust to the input dataset and does not expect the sensor readings to be smoothed, as the model has 1D convolutional layers with trainable parameter that can smooth and perform feature transformation of the time series. The deep learning model is trained so that it learns to predict the remaining useful life (RUL) for each sensor



The historical sensor data is stored in S3 bucket and from here it will be used to train the deep learning model


Once the training data is stored in a S3 Bucket, model training code is written in a Jupyter notebook running on an Amazon Sagemaker notebook instance.
The MXNet model is trained and the model artifacts are stored in a S3 bucket.


AWS Lambda function is included to perform batch inference on new data that comes from sensors and is stored in an Amazon S3 bucket
The Lambda function can be invoked by an AWS CloudWatch Event or with a S3 put event notification so that it runs on a schedule or as soon as new sensor readings are stored in S3
When invoked, the Lambda function creates a Sagemaker Batch Transform job, which uses the Sagemaker Model that was saved during training, to obtain model predictions for the new sensor data
The results of the batch transform job are stored back in S3, and can be fed into a dashboard or visualization module for monitoring



To conclude, if there is enough sensor data available, we can harness its power and shift from a reactive/scheduled method to AI powered predicitive maintenance. Amazon Web Services accelerates the process of building such solutions. This will help companies save on their operations by cutting down on their maintenance cost and improving overall reliability of the system.

Author: Rishi Khandelwal 

Blog | Machine Learning

How MLOPS helps industries and businesses scale their machine learning workloads

  1. Home
  2. >
  3. 2023
  4. >
  5. April


Machine learning is revolutionizing various industries, but its successful implementation is not just limited to building the perfect model. MLOps, which stands for Machine Learning Operations, is an approach that addresses the entire machine learning lifecycle, from development to deployment and beyond. MLOps helps businesses and industries scale their machine learning workloads by streamlining the process of building, deploying, monitoring, and updating machine learning models.

What is MLOps?

MLOps, which stands for Machine Learning Operations, is a practice that involves the application of DevOps principles to machine learning workflows. It aims to streamline and automate the development, deployment, monitoring, and management of machine learning models.

MLOps helps to bridge the gap between data science and deployment operations, enabling businesses to build, train, deploy and monitor machine learning models more efficiently. It is not a specific tool or technology but rather a set of practices, methodologies, and tools that are used to support the entire machine learning lifecycle.


MLOps Workflow

MLOps workflows typically include the following stages:

Data preparation: collecting and cleaning the data for use in model training and evaluation.
Model training: selecting the appropriate algorithm and training the model on the prepared data.
Model evaluation: assessing the performance of the trained model and identifying areas for improvement.
Model deployment: deploying the model to production environment.
Model monitoring: monitoring the model's performance in production environment and adjusting as needed.

Why is MLOps important?

MLOps is important because it enables businesses to scale their machine learning workloads in a way that is efficient, reliable, and secure. By applying DevOps principles to machine learning workflows, MLOps helps businesses to:

Reduce development time and costs:

By automating the machine learning workflow, MLOps reduces the time and effort required to develop, test, and deploy models. This results in faster time-to-market and lower development costs.

Improve model accuracy:

MLOps enables businesses to continually monitor and improve the performance of their machine learning models in production environment, resulting in more accurate and reliable models.

Increase security:

MLOps ensures that machine learning models are deployed in a secure environment and that the data used to train the models is protected.

Enable collaboration:

MLOps fosters collaboration between data scientists, IT operations, and other stakeholders involved in the machine learning workflow, resulting in better communication and teamwork.

Scale machine learning infrastructure:

MLOps helps businesses to scale their machine learning infrastructure in a way that is efficient and reliable, enabling them to handle increasing amounts of data and models.


How to deploy machine learning models with high efficiency and scalability in a production environment?[i]

Amazon SageMaker is a great place for automating the complete end-to-end process of the model development and deployment through the interactive UI which AWS provides in the SageMaker resources.

1. Create repeatable training workflows to accelerate model development

One important aspect of MLOps is creating repeatable training workflows that can accelerate model development.

In SageMaker, you can use several features to create a repeatable training workflow, including:

• SageMaker Experiments: SageMaker Experiments lets you track and compare the results of your machine learning experiments, enabling you to quickly identify the most effective models and hyperparameters.
• SageMaker Processing: SageMaker Processing lets you run pre-processing, postprocessing, and other data processing tasks on large datasets in a distributed and scalable way. This helps ensure that your data is consistently processed and prepared for training.
• SageMaker Training: SageMaker Training lets you train machine learning models on large datasets using distributed computing resources. You can use built-in algorithms or bring your own custom algorithms to SageMaker.
• SageMaker Debugger: SageMaker Debugger lets you identify and debug training errors in real time. You can monitor the state of your training job and capture specific events, such as null values or weights that are too large.


2. Catalogue ML artifacts centrally for model reproducibility and governance

Cataloguing ML artifacts centrally is an essential step towards achieving model reproducibility and governance. In machine learning (ML), artifacts are produced during the model training process, such as code, datasets, models, and configurations. Cataloguing them centrally means storing them in a centralized location where they can be easily accessed, tracked, and managed.

There are several benefits to cataloguing ML artifacts centrally such as Reproducibility, Governance, Collaboration, etc.

Amazon SageMaker provides several tools for cataloguing ML artifacts centrally, including:

• SageMaker Model Registry: A managed service that provides a central location for storing, versioning, and sharing ML models. With the Model Registry, data scientists can easily track changes to their models, compare different versions, and promote models to production.
• SageMaker Pipelines: A workflow automation tool that helps data scientists build, deploy, and manage ML workflows. Pipelines allow data scientists to define a series of steps for training and deploying models, and then execute those steps as a single unit.
• SageMaker Experiments: A service that helps data scientists track, organize, and analyse their ML experiments. With Experiments, data scientists can easily capture metadata about their experiments, including code, data, hyperparameters, and metrics.


3. Integrate ML workflows with CI/CD pipelines for faster time to production

Amazon SageMaker provides a set of tools and features that make it easy to integrate machine learning (ML) workflows with Continuous Integration and Continuous Deployment (CI/CD) pipelines, enabling faster time to production for ML models.

CI/CD pipelines are a set of practices and tools that enable developers to quickly and reliably build, test, and deploy code changes to production environments. By integrating ML workflows with CI/CD pipelines, data scientists can automate the process of building, testing, and deploying ML models, reducing the time and effort involved and increasing the speed at which models are delivered to production.

Here are some of the ways in which SageMaker enables the integration of ML workflows with CI/CD pipelines:

• SageMaker Model Building Pipelines: SageMaker Model Building Pipelines is a feature of SageMaker Pipelines that enables data scientists to create automated end-to-end workflows for building, training, and deploying ML models. Data scientists can use Model Building Pipelines to define a series of steps for the ML workflow, including data preparation, model training, evaluation, and deployment.
• SageMaker SDK: The SageMaker SDK is a Python library that makes it easy to interact with SageMaker services, including Pipelines and Model Building Pipelines. Data scientists can use the SDK to create and manage pipelines programmatically, enabling integration with existing CI/CD pipelines.
• SageMaker JumpStart: SageMaker JumpStart provides a collection of pre-built ML models, algorithms, and workflows that can be easily integrated with CI/CD pipelines. Data scientists can use JumpStart to accelerate the process of building and deploying ML models, reducing the time and effort required.


4. Continuously monitor data and models in production to maintain quality

In the field of machine learning, it's essential to continuously monitor the data and models in production to ensure that they maintain quality and performance over time. Monitoring data and models can help identify and address issues such as data drift, model decay, and bias, ensuring that the model continues to perform well and generate accurate predictions.

Amazon SageMaker provides several tools and features to help data scientists and machine learning engineers continuously monitor data and models in production, including:

• SageMaker Model Monitor: A managed service that continuously monitors the data and predictions generated by your ML models in production. Model Monitor detects and alerts you to data drift, concept drift, and quality issues, enabling you to take corrective actions to ensure that the model continues to generate accurate predictions.
• SageMaker Debugger: Debugger also provides real-time metrics and visualizations of model performance, enabling you to identify and address issues quickly.
• SageMaker Autopilot: A feature of SageMaker that automates the process of building, training, and deploying ML models. Autopilot provides automatic monitoring and retraining of models, ensuring that they remain up-to-date and continue to perform well in production.
• SageMaker Clarify: A tool that helps identify and mitigate bias in your ML models. Clarify provides metrics and visualizations that enable you to understand the sources of bias in your models, and it provides recommendations for addressing those issues.


MLOps Tools and Technologies


MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It was developed by Databricks, the company behind Apache Spark, and is now a part of the Linux Foundation's AI Foundation.

MLflow is designed to help data scientists and machine learning engineers manage their machine learning workflows, from data preparation to model deployment. It provides a centralized platform for tracking experiments, packaging code, and sharing models.

Here are some of the key features of MLflow:

• Experiment tracking: MLflow provides a tracking API and UI for logging and visualizing machine learning experiments. This enables data scientists to keep track of different versions of their models, compare results, and reproduce previous experiments.
• Packaging code: MLflow enables data scientists to package their code and dependencies into reproducible environments, making it easier to share and deploy models.
• Model registry: MLflow provides a centralized repository for storing and versioning machine learning models. This enables data scientists to share and collaborate on models, and to deploy them to production environments.
• Deployment: MLflow provides integrations with popular deployment tools such as Docker, Kubernetes, and Amazon SageMaker, making it easy to deploy models to production.


In the above example, MLflow is used to track the parameters and performance metrics of a logistic regression model trained on a synthetic dataset. The mlflow.start_run() function is used to create a new MLflow run, and the mlflow.log_param() and mlflow.log_metric() functions are used to log the model parameters and performance metrics to the MLflow tracking server. The mlflow.sklearn.log_model() function is used to save the trained model to a file and log it to the MLflow tracking server.


Amazon SageMaker

Amazon SageMaker MLOps is a set of tools and best practices to help developers and data scientists to build, train, deploy, and manage machine learning models at scale. It is built on top of Amazon SageMaker, which is a fully managed service that provides developers and data scientists with the ability to build, train, and deploy machine learning models.

SageMaker MLOps provides a suite of tools for automating and managing the machine learning development lifecycle. Some of the key features of SageMaker MLOps include:

• Model training: SageMaker MLOps provides a managed service for training machine learning models at scale. It can automatically scale training resources to meet the demands of large datasets and complex models.
• Model deployment: SageMaker MLOps provides a set of tools for deploying machine learning models to production. It supports a range of deployment options, including batch and real-time inference, and it provides automatic scaling and monitoring of deployed models.
• Model monitoring: SageMaker MLOps includes tools for monitoring the performance of deployed models in production. It can track key metrics such as accuracy and latency, and it can alert developers when performance issues arise.
• Model management: SageMaker MLOps provides a centralized repository for storing and versioning machine learning models. It can track changes to models over time, and it can provide a history of changes for auditing and compliance purposes.



This code example demonstrates how you can use SageMaker to train, deploy, and manage machine learning models while incorporating MLOps practices such as data versioning, hyperparameter tuning, model versioning, and automatic model deployment. You can customize this example to suit your specific use case and data requirement.

Overall, SageMaker MLOps can help organizations to streamline the machine learning development process, reduce time to market, and improve the quality and reliability of deployed models.


Deployment Strategies[i]

Choose a deployment strategy MLOps deployment strategies include blue/green, canary, shadow, and A/B testing.


Blue/green deployments are very common in software development. In this mode, two systems are kept running during development: blue is the old environment (in this case, the model that is being replaced) and green is the newly released model that is going to production. Changes can easily be rolled back with minimum downtime, because the old system is kept alive.


Canary deployments are like blue/green deployments in that both keep two models running together. However, in canary deployments, the new model is rolled out to users incrementally, until all traffic eventually shifts over to the new model.


You can use shadow deployments to safely bring a model to production. In this mode, the new model works alongside an older model or business process and performs inferences without influencing any decisions. This mode can be useful as a final check or higher fidelity experiment before you promote the model to production. Shadow mode is useful when you don't need any user inference feedback. You can assess the quality of predictions by performing error analysis and comparing the new model with the old model, and you can monitor the output distribution to verify that it is as expected.

A/B testing

When ML practitioners develop models in their environments, the metrics that they optimize for are often proxies to the business metrics that really matter. This makes it difficult to tell for certain if a new model will improve business outcomes, such as revenue and clickthrough rate, and reduce the number of user complaints.



In conclusion, MLOps is a critical methodology for organizations looking to scale their machine learning workloads. By combining best practices from DevOps with machine learning, MLOps enables organizations to automate and streamline the entire machine learning lifecycle, from development to deployment to continuous improvement.

Author: Yasir Ul Hadi



[1] operations-planning.pdf


Leveraging 12 Factor App Principles and Kubernetes to Architect Cloud-Native Apps

Businesses are embracing app modernization on a vast scale. The reason can be to meet greenfield necessities, make business future-ready, or to upgrade monolithic legacy applications. On their journey to modernization, businesses are using containers and Kubernetes as primary technologies to modernize the design and distribution of their applications. The key business goal remains the same, which is to have an all-time-available work system in place. A system that is scalable, portable, flexible, and reliable. Architecture based on microservices and Kubernetes, and the 12 factor app methodology can help achieve such a system.

The 12-factor app style of development surfaced about 10 years ago, much before containers. And, since then the 12 principles of the 12 factor app have become a universal standard for cloud-native app development. The 12 factor app development stages offer a set of guidelines for a proper outline for developing modern microservices. And, Kubernetes is known for being an orchestration platform for containers used to deploy and control these microservices.

The 12-factor app principles:

  • Has only one aim: to offer a course of action for cloud-native application development and deployment. They ensure that happens by making applications highly scalable and disposable.
  • Help you and your team to embrace DevOps and microservices in the app development process.
  • Simplify the process, which increases the development time and reduces the time to market.
  • Were designed to build Software as a Service (SaaS) applications by alleviating the difficulties associated with long-term software development.

This article explains how organizations are leveraging the 12-factor app development method and Kubernetes to architect cloud-native apps. Understand how 12 factor app is helping businesses to modernize by establishing scalability, resiliency, robustness, mobility, and reliability across their applications. Let’s get started.

 Leveraging 12 Factor App Principles and Kubernetes

1. A single Codebase for Applications, Multiple Deployments

A 12-factor app methodology states that only one Codebase or a set of code repositories should exist. These are deployable multiple times but never have many codebases. If there are any shared codes, they should be factored into libraries and called through the dependency manager.

Multiple deployments of a codebase are possible by making it active across all instances. The only difference is the versions, which are also tracked in the version manager.

Once the code base is in place with the 12-factor app approach, it can be built, released, or run in separate phases in the Kubernetes environment. Kubernetes and containers have text-based representations. The predictable system states are managed by automation tools in separate files. It is better to manage such evolving artifacts with source control. Using a version control system such as Git can help eliminate the introduction of sudden changes and facilitate tracking the changes added to your system.

2. Declare and Isolate Application Dependencies

The 12-factor app methodology uses the declaration and isolation method for application dependencies. Declare any dependencies explicitly and also check them in the version manager. This approach makes it easier to get started and enhances repeatability. It also becomes easy to track any changes made to the dependencies.

Another approach is to package the app and all its dependencies into a container. This makes it possible to remove the app and all its dependencies from its environment. In addition, it ensures that the app functions as expected regardless of the differences in development and staging environments.

3. Archive Config as Environment Variable

As per 12 factor app principle configs should be archived as environment variables (env vars) but not constants. Env vars are easy to change as the need arises for new code deployments without changing the codes. This flexibility quickens the native-app development process.

Additionally, you can manage env vars independently every time you deploy them. It also becomes easy to scale up as the development process progresses towards completion and deployment.

The 12-factor app strictly separates the application configuration from the code. Kubernetes ConfigMap supports storing configuration by declaring it. This can be helpful for production and development environments that need different configurations to deploy the same code.

4. Backing Services: As Attached Resources, Easy to Swap

Backing services include support applications and systems that your application needs to connect and communicate with, such as databases. They are usually grouped as attached resources that should be accessible when needed.

Modern applications that are microservices-based use backing services. These backing services are handled as attached resources in the 12-factor app. Due to this, in case of any failure, you can simply change the attached resources and not the whole application codebase.

Backing services in the 12-factor app methodology are configurable and easy to change. You can change them from one state to the next as the need arises. The switching is possible by just slightly changing the configuration.

It is the best practice to separate the backing services (such as logging, messaging, databases, third-party services, caching, and others) from the system. And then interact with them through an API. Sticking the APIs to consistent contracts will let you change the basic implementations without highlighting them to clients. Kubernetes ConfigMaps can be used to store connection information to skip building the container image again, in case of any revision in the connection information.

5.  Split Build, Release and Run Phases

The 12-factor app methodology distinguishes all the stages of cloud-native app development.

  • They changed the codebase to deploy it. And once the first stage is completed and the next starts, you cannot alter the code in the previous one.
  • You should build deployable components independent of the environment in the first stage. The second stage involves taking the reusable components already developed and combining them with a specific configuration to match the target environment.
  • The last phase is the run stage. It involves packing the entity created in the previous one in a container and running it in the target environment.

Organizations prefer to automate the development and testing tasks with CI/CD toolchains. Splitting your CI/CD pipeline into a series of sequential tasks can increase productivity. It helps to provide greater insight into failure and improve accountability. For example, dedicate a pipeline exclusively for building a container image at a time. After that, to run the container instance, you can perform the testing, promoting, and deploying of that image.

6. Stateless Processes

The 12-factor app methodology allows you to run cloud-native applications in the environment as one or more processes. The only restrictions are that they should be stateless and never share data. That enhances scalability and portability across cloud computing infrastructure. Data compilation is done during the build stage. Any other thing that requires persistence forms part of the backing services.

Containers are short-lived and when the container goes away, the data inside the container ceases to exist. The state of containerized workloads must be reduced. This helps to maintain a good user experience while remaining unaffected by application scaling.

7. Port Binding to Export Services

This stage of 12-factor app development involves binding your packaged application to a port. You can use the Kubernetes service object if the workload is exposed internally to the cluster. Otherwise, you need other methods such as node ports, Ingres Controllers, and OpenShift routes.

Packing your application inside containers makes networking and port collisions easier by reducing the workload on hosts. Software-defined networks in Kubernetes platforms take over many operations.

8. Concurrency and Scalability

Scalability is one of the primary features of any cloud-native application. That is usually done by deploying more app copies instead of enlarging them. To achieve this, the 12 factor app methodology uses a simple yet reliable operation.

The developer designs the app to take on different workloads by assigning processes varying tasks. An example is an application where a web process handles HTTP requests and a worker processes a long-running background activity.

The pod-based Kubernetes architecture supports the scaling of application components as per varying demands. With the 12-factor app’s stateless processes element, scalability becomes a consistent function that can help to gain an expected level of concurrency.

9. Disposability: Robust Cloud-native apps

According to the 12-factor app methodology, all processes are disposable. They should have minimal startup time, shutdown gracefully, and be immune to crashes and failures. All these capabilities make scaling easier, enhance faster app development, and make the deployments more robust.

The app should create new instances when it needs to and take them down as necessary. It is this 'disposability' property that makes cloud-native applications more robust. In microservices, processes are disposable. That means, in case any application stops working unexpectedly, users stay almost unaffected and the failures are managed gracefully. You can also use Kubernetes ReplicaSets to uphold the stage of availability for microservices by specifying the max-to-min bounds for the number of replicas.

10. Dev/prod parity: Carrying out Development and Production Similarly

The 12-factor app methodology bridges the gap between cloud-native app development and production. That makes it possible to continuously deploy or roll out new features. Also, developers can write code, deploy, and review the app’s features. This process is usually fast, and completed in minutes or hours.

For organizations that pack workloads into containers, initializing the container image in one environment, it shall run on any infrastructure or environment. But there are chances of environmental drift. Consider standardizing the same distribution of Kubernetes across all environments to eliminate this. It helps create a consistent experience for container platform users.

11. Logs are Event Streams

The 12-factor app methodology does not require routing and storage, writing, or managing of the application’s output or log files. Any running process writes its event stream to STDOUT without buffering. A developer views this stream in the foreground of their user interface. This helps to determine how the app behaves and draw conclusions. The event streams also make it easy to troubleshoot or debug an application.

12. Run Admin Processes as One-off Processes

With a 12-factor app methodology, admin and management tasks such as database migrations, scripts, and batch programs run as one-off processes. They are treated as long-running processes. They also utilize the same dependency isolation methods as the app’s usual processes.

It is best practice to isolate the application administrative tasks such as data restore and backup, caching, or migration from the application microservices and carry them out as separate processes. You can use Kubernetes jobs to execute these mundane administrative tasks that are part of the application lifecycle.

What are the Business Benefits of 12 Factor App Methodology

The 12-factor app methodology is the how-to guide for creating cloud-native applications. Many giant tech companies, such as Amazon, Heroku, Microsoft, and others, make use of these 12 principles as they technically help them to enhance business agility by expediting innovation and go-to-market capabilities. With these 12-factor principles, you can design and maintain a robust and modern app architecture required for cloud-based applications.

This methodology is the solution for developers developing the following:

Software-as-a-service solutions


Cloud Applications


Distributed software solutions




With the 12-factor apps methodology, you can create cloud-native applications that are:

  • Suitable for deployment on modern cloud platforms, minimizing the need for servers and server administration
  • Enabled for continuous deployment with minimal differences between development and production
  • Scalable without significant change or effort
  • Capable of using declarative formats for setup automation.


Web applications, platforms, and frameworks using the 12-factor app methodology have generated measurable business outcomes with enhanced productivity in the past few years. This guidebook is suitable for DevOps and cloud app development, which should be a blueprint for developing resilient, scalable, portable, and maintainable applications. Considering these 12-factor app principles with Kubernetes ensures you build a robust solution for your business.

However, this methodology is not the ultimate solution for everyone. Whether or not it works for your business depends on your business model and needs. So, you should not worry if your software development process deviates from the principles of 12-factor app methodology. You are good to go if you understand the reason and the expected outcome.

Do you want to learn more about 12-factor app development principles and its real-time use cases? Contact us today to know how we can help your company.

Book 1-hour free consultation



Blog | Graph Neural Networks

Graph Neural Networks for Financial Fraud Detection

  1. Home
  2. >
  3. 2023
  4. >
  5. April

Overview of Frauds in the Financial Industry:

Fraud is a significant challenge in the finance industry and can have severe consequences for individuals, and organizations. As financial institutions today adopt cloud technologies and other online payment solutions across the global, we are witnessing a steep rise in the number of frauds of various kinds. A recent report indicates that financial fraud is a significant issue for many financial services firms and can result in billions of dollars in losses. Direct losses by merchants and banks exceeded $32 billion globally last year according to Nilson Report released last year.

Online fraud takes many forms, including fake reviews, account takeovers, spam, synthetic identity frauds and bot attacks. While financial institutions use various methods to combat online fraud, simple rule-based techniques and feature-based algorithm techniques such as logistic regression, Bayesian belief networks, and CART may not always be effective in detecting the full range of fraudulent activities.

Fraudsters use sophisticated methods to avoid detection, such as setting up coordinated accounts, which can make it challenging to detect fraudulent behavior patterns at scale. Furthermore, detecting fraudulent behaviour patterns is complex due to the massive amount of data to sift through, and there is a scarcity of actual fraudulent cases required to train classification algorithms.

  • Fraudulent transactions cost firm a lot of money. They also increase the brand and reputation risks as these incidents question an organization’s integrity and vigilance
  • Rule based systems in place needs to be revised regularly to address the latest patterns of scams, account takeovers and illegal transactions.

How Machine Learning addresses some of these challenges:

Machine learning (ML) is a powerful tool that can be used for financial fraud detection. ML algorithms can analyze vast amounts of transaction data and identify patterns of behavior that are indicative of fraud.

One way in which ML is helping with financial fraud detection is through the use of anomaly detection algorithms. These algorithms can identify transactions that are unusual or suspicious based on various features, such as the amount, frequency, and location of the transactions. Anomaly detection algorithms can be trained using historical data to learn what typical transaction patterns look like, enabling them to identify anomalous behavior and flag potential cases of fraud.

ML algorithms can also be used to improve fraud prevention by identifying potential risks before fraudulent activity occurs. For example, ML algorithms can be used to identify high-risk accounts based on various factors, such as the age and location of the account holder, the type of transactions conducted, and the history of the account. Financial institutions can then take steps to monitor these high-risk accounts more closely and act if any suspicious activity is detected.

Figure1: Algorithms can efficiently identify fraudulent transactions based on the user data (Source: Author) 

Challenges in current ML approaches

These algorithms need to be trained using labelled data, which can be difficult to obtain due to the small number of fraudulent transactions that actually happen compared to legitimate ones. However, techniques such as oversampling and undersampling can be used to address this issue today by balancing the number of fraudulent and legitimate transactions in the training data.

For example, in case of credit card frauds, the fraudsters come together and creates multiple bank accounts (often spanning different time and geographies) to make them look like genuine accounts. Traditional ML approaches fail to uncover the network of fraudsters hiding among the genuine accounts. Often the data containing flagged transactions are not exhaustive(since the suspicious account makes a few genuine transactions before they start) and hence these models that are trained in those data are usually unsuccessful in discovering coordinated attacks as in case of credit card frauds which uses multiple accounts.

In the following section, let’s explore how graph database and graph neural networks help address some key issues pointed out in the above sections in the context of credit card frauds—or any case involving more than one perpetrator.


Graph Neural Networks (GNN)

In the classical ML approach where we train predictive algorithms like decision trees, random forest or XGBoost, we typically store the transactions data in tabular format with columns as features. However, in the financial realm, transactions can be efficiently stored as graph databases where each node represents accounts and each edge represents transactions. The node will contain features associated with that account (ie location, data, etc). This representation of the existing transactional data helps the stakeholder understand different properties linked to a fraudulent account.

Figure: Graphs makes it easy to understand different connection between the data (Source:


This helps us with analysis and predictions at different levels:

Node Classification: the task here is to determine the labelling of samples (represented as nodes) by looking at the labels of their neighbors. Account level predictions are not very popular in the traditional ML approach since in most of the cases, we predict if the transaction is fraud or not.
Link Predictions: Link represents transactions or any activity between nodes. A simple use case could be detecting suspicious transactions from genuine accounts could indicate theft or illegal account take overs.
Community classifications: Within the entire network of transactions and accounts, it is now possible to uncover clusters with strong similarities. These would help the model to predict and classify accounts vulnerable to attacks or find group of illegal accounts.
Anomaly Detection: In a collection of nodes, we find outliers in the graph in an unsupervised manner (data with no labels).
Graph classification: the task here is to classify the whole graph into different categories. The applications of graph classification are numerous and range from determining whether a protein is an enzyme or not in bioinformatics, to categorizing documents in NLP, or social network analysis.


Application of GNN in Industry Use Cases

GNN-based models, such as RGCN, can benefit from topological information by combining network structure and node and edge attributes to build a meaningful representation that separates fraudulent from legitimate transactions. By heterogeneous graph embedding, RGCN can efficiently learn to represent many kinds of nodes and edges (relations).

• Loan Default Risk: For commercial banks and financial regularity institutions, monitoring and assessing the default risk is at the heart of risk controlling process. As one of the credit risks, default risk is the probability that the borrower fails to pay the interest and principal on time. With a binary outcome, loan default prediction could be seen as a classification problem and is commonly addressed utilizing user-related features with classifiers including neural network and gradient boosted trees. Since the probability that a borrower defaults may be influenced by other related individuals, there is plenty of literature forming a graph to reflect the interactions between borrowers. With the rapid growth of GNN methods, GNN methods are widely applied on the graph structure for loan default predicting problems.

• Stock movement Prediction: Though there are still debates on whether stocks are predictable, stock prediction receives great attention and there are rich literature on predicting stock movements utilizing machine learning methods. However, the task of stock prediction is challenging due to the volatile and non-linear nature of the stock market. The limitation of these non-graph approaches is that they often have a hidden assumption that the stocks are independent. To take the dependence into account, there is an increasing trend to represent the stock relations in a graph where each stock is represented as a node and an edge would exist if there are relations between two stocks. Predicting multiple stocks’ movements could then be formed as a node classification task and graph neural network models could be utilized to make the prediction.

• Fraud Detection: Observing that fraudsters tend to have abnormal connectivity with other users, there is a trend to present users’ relations in a graph and thus, the fraud detection task could be formulated as a node classification task. Aiming to detect the malicious accounts, who may attack the online services to seek excessive profits, studies show that fraudsters have two patterns: device aggregation and activity aggregation. Due to economic constraints, attackers tend to use limited number of devices and perform activities in a limited time, which may be reflected in the local graph structure.

• Event Prediction: Financial events, including revenue growth, acquisition and bankruptcy, could provide valuable information on market trends and could be used to predict future stock movement. Therefore, it draws great attention on how to predict next financial event based on past events and currently GGNN model is often used to accomplish the task.

(ref: A Review on Graph Neural Network Methods in Financial Applications | DeepAI)



Figure: Image convolution and Graph convolution (Source: towardsdatascience)


The intuition of GNN is that nodes are naturally defined by their neighbors and connections. To understand this, we can simply imagine that if we remove the neighbors and connections around a node, then the node will lose all its information. Therefore, the neighbours of a node and connections to neighbours define the concept of the node.

An important aspect of the training while implementing graph neural network is a process called Graph Convolution. In many ways the idea behind this is similar to that of image convolution which is widely used in image processing. The idea of convolution on an image is to sum the neighboring pixels around a center pixel, specified by a filter with parameterized size and learnable weight. Spatial Convolutional Network adopts the same idea by aggregate the features of neighboring nodes into the center node.


Advantages of using GNN over classic ML algorithms

The reason why this approach is more effective is because each node is classified not just by looking into the node features, but also the neighboring nodes. The task of all GNN is to determine the “node embedding” of each node, by looking at the information on its neighboring nodes.


Figure: Each node prediction is arrived at by considering the node’s feature and its neighbors (Source: towardddatascience)

This allows the model to recognise the node’s connection with other nodes that are further away. Hence it is now possible to discover hidden pattern that would have not been captured by other traditional algorithms.

When multiple layers of graph convolution are performed, this results in a node’s state containing some information from nodes multiple layers away, effectively allowing the GNN to have a “receptive field” of nodes or edges multiple jumps away from the node or edge in question. This is different from the anomaly detection using random forest. In random forest algorithm, the model finds columns(or features) that can split the data into two parts resulting in a more pure subsets of each classes (fraud or not_fraud) and where the depth of traversal is indicative of anomaly. The model does not look into all the features of a user at once. However in case of graph neural network, with each convolutional layers, the model looks not only at every features of a user, but multiple users at a time.

In the context of the fraud detection problem, this large receptive field of GNNs can account for more complex or longer chains of transactions that fraudsters can use for obfuscation. Additionally, changing patterns can be accounted for by iterative retraining of the model.


Explainability is Necessary

Predicting whether a transaction is fraudulent or not is not sufficient for transparency expectations in the financial services industry. It is also necessary to understand why certain transactions are flagged as fraud. This explainability is important for understanding how fraud happens, how to implement policies to reduce fraud, and to make sure the process isn’t biased. Therefore, fraud detection models are required to be interpretable and explainable which limits the selection of models that analysts can use.

One of the reasons why the industry has been reluctant to use neural networks is because they have to be treated like a black box. It is not clear why such model classifies something or which features have been crucial in making a prediction. Classical ML approaches had an edge over neural networks in this case. For example, decision tree algorithms use a metric called Information Gain to split the features efficiently into separate classes. This allows us to see which features have been more useful for making predictions.

Researchers are now putting a lot of effort in making GNN more explainable. GNNExplainer, for example, is proposed to provide an interpretable explanation on trained GNN models such as GCN and GAT. Model explainability in financial tasks is of great importance, since understanding the model could benefit decision-making and reduce economic losses.

Implementing GNN on cloud: Scalability

Figure: Implementing real time prediction on AWS (Link to Github repo)


It's critical to predict frauds in real time. Nevertheless, creating such a solution is challenging. There are not many online resources on converting GNN models from batch serving to real-time serving because GNNs are still relatively novel to the industry. Building a streaming data pipeline that can send incoming events to a GNN real-time serving API is also difficult since the dimension is very high and nodes are densely grouped hence its computationally heavy.

Cloud service providers like AWS have launched services to help developers apply GNN to real-time fraud detection. Amazon Neptune is a fully managed database service built for the cloud that makes it easier to build and run graph applications. Neptune provides built-in security, continuous backups, serverless compute, and integrations with other AWS services like Sagemaker, Glue, S3 and many others.

Amazon Neptune ML is a new capability of Neptune that uses Graph Neural Networks (GNNs), a machine learning technique purpose-built for graphs, to make easy, fast, and more accurate predictions using graph data. With Neptune ML, you can improve the accuracy of most predictions for graphs by over 50% (study by Stanford) when compared to making predictions using non-graph methods.



This article makes a case for using graph neural networks for detecting fraud as compared to other available ML approaches which were originally designed for tabular data. GNN models can develop meaningful representations to separate fraudulent users and events from legitimate ones by combining graph structure with qualities of nodes or edges, such as users or transactions. This capability is essential for identifying frauds in which fraudsters cooperate to mask their odd features while yet leaving some indications of relations.

In conclusion, utilizing ML or Neural Networks for fraud detection is a viable approach for businesses to protect themselves from the increasing prevalence and cost of frauds and scams. It is also important to create a data culture within businesses to leverage the existing data to gain deep, actionable and rich insights on potential areas of fraud and to perform advance analytics on them. Combining the power of AI and cloud technologies like AWS, businesses can detect and prevent frauds in real-time, gain competitive advantages, mitigate fraud risks and protect their financial assets.


Author: Blesson Davis

Blog | Tackling Chronic Kidney Disease

Tackling Chronic Kidney Disease: One Prognosis at A Time

  1. Home
  2. >
  3. 2023
  4. >
  5. April

Tackling Chronic Kidney Disease: One Prognosis at A Time

In my previous article, “How AI is Changing the Game in Chronic Disease Care”, I explored the incredible ways that artificial intelligence (AI) is transforming the landscape of chronic disease care. I provided a level 100 overview of chronic diseases, the obstacles that come with them, and how AI is helping to overcome these challenges. This can give you a better understanding of how cutting-edge technology is shaping the future of healthcare, and how it can benefit patients suffering from chronic diseases. 

In this post, I focus primarily on Chronic Kidney Disease (CKD), a ticking time bomb that has already exploded in India. With over 7.8 million people affected[1], it is an urgent public health crisis that requires immediate attention. The situation is dire as more than 1,75,000 new patients develop the end-stage renal disease (ESRD) each year, and the number is expected to increase by 10% annually[2]. And it is not just a nation-specific concern; according to the World Health Organization (WHO), CKD is now the 12th leading cause of death globally, and it is estimated that over 850 million people worldwide are living with this disease[3].

Risk Factors [4]

CKD not only affects patients and their families but also has a significant monetary impact on the world. The cost of treating ESRD is prohibitively expensive, contributing to a loss of productivity and a burden on the understaffed and overworked healthcare system.

In essence, the issue of CKD in India is not just a health issue, but a societal and economic one too. Fortunately, the use of data science and machine learning offers hope in the fight against CKD and can create a proactive strategy to combat CKD and prevent the worst-case scenarios. As Dr. Griffin Rodgers, the director of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), rightly said, “Chronic kidney disease is an under-recognized public health crisis that needs more attention and resources to prevent kidney failure and its complications.”

So, in this article, I delve deeper into how data science can play an important role in predicting and managing this disease Current solutions fail to manage diseases, so new approaches are needed. Using mathematical and statistical concepts, and predictive machine learning models. We can analyze demographic information, lab reports, and clinical data to make a difference in the lives of millions of people around the world.

CKD has five stages - 1 to 5, the last stage being End-Stage Renal Disease (ESRD) or kidney failure, a point where it stops filtering the food you eat to get the nutrition required for your body.

A chronic disease progresses through time if a patient doesn’t change his/her lifestyle and, most importantly, if we don’t intervene early in the disease progression. So, how do we determine if one is in early stage of CKD? We use an estimate called eGFR (estimated glomerular filtration rate), which measures how well the kidneys are working or filtering. This eGFR is calculated based on various attributes such as demographic information, age, serum creatinine levels, etc.

Stages of CKD [5]

As depicted in the above image, eGFR values vary depending on the different stages of CKD. If the eGFR level is above 90, it signifies good kidney health, whereas if it falls below 15, it indicates kidney failure, necessitating immediate treatment or transplant. The detrimental aspect of CKD is its progressive nature, but the positive side is that we can halt its progression at any stage. To accomplish this, we must intervene early, modify the patient’s lifestyle, and deliver correct diagnosis, treatment, and education on CKD. Now that we understand CKD’s progression and how to prevent it from worsening, the crucial questions are how to intervene early and stop the disease from advancing to the later stages. The answer to these questions lies in proper management and data science! So how can we achieve this? Lets explore a case I worked on, at Minfy, for an NGO that provides healthcare to CKD patients.

DISCLAIMER: The following work is solely intended for research purposes and should not be used by healthcare practitioners for diagnosing CKD. The machine learning models and data used in this article are simplified versions and not intended to reflect the full complexity of the actual models used. 

Following, I discuss two important aspects of CKD diagnosis: early detection and predictive modeling.

Early Detection

Early detection in chronic disease care refers to identifying the presence of a disease or the risk of developing an infection at an early stage, before the onset of symptoms, or before the disease has progressed significantly[6].

As I explained in my previous article, early detection can be beneficial in

•  Improving outcomes: It can lead to more effective treatment and management of CKD, which can improve outcomes for patients. For example, if a patient with diabetes is diagnosed early on, he/she can take steps to control blood sugar levels and prevent the development of complications.
• Reducing costs: It can reduce the costs associated with CKD, as treatments and management strategies are more effective when they are initiated early on.
• Better access to care: It can improve access to care for patients, as they are more likely to be diagnosed and treated before the disease progresses and becomes more difficult to manage.
• Reducing the burden on healthcare systems: It can also help reduce the burden on healthcare systems, as patients with CKD diagnosed early on are less likely to require hospitalization or other intensive care.
• Improving the quality of life: It can improve the quality of life for patients, as they can take steps to manage their disease and prevent complications before they occur.

To identify chronic kidney disease (CKD) at an early stage, it is important to monitor various factors such as the individual’s eGFR level, age, lifestyle, and other relevant indicators. Once this information has been gathered, machine-learning techniques can be utilized to aid in the detection process.

To train the machine learning model, I plan to utilize the University of California Irvine’s web data repository on Chronic Kidney Disease [7]. This dataset includes approximately 400 data points, consisting of 11 numeric and 14 nominal features, as well as a binary classification of CKD and NOTCKD. Of the 400 data points, 150 are classified as NOTCKD (healthy), while the remaining 250 are classified as CKD (unhealthy).

Given the limited number of data points available (~400), I plan to use CTGAN, a generative adversarial network (GAN)-based approach for modeling tabular data distribution and generating additional data points. By utilizing the latent space distribution of the original data, I aim to generate approximately 50,000 additional observations.

Following code snippet shows the complete procedure for generating synthetic data from a seed value and saving it in a CSV file for future use.

Synthetic Data Generation

The distributions of the original data vs synthetic data

It is apparent that the distribution of the generated data is comparable to that of the original data, and therefore, it can be safely used for training the machine learning model.


Model Training and Evaluation


The results of data analysis and model evaluation are shown below.



Scatter plot of a few continuous variables v/s target


Pair-wise scatter plot of a few continuous variables


Heatmap of a few continuous variables for the target

Accuracy of XGBoost Classifier


Confusion Matrix Plot

The preceding procedure can be utilized to train a machine learning model to identify individuals who have CKD at an early stage. The tool helps to detect signs of disease quickly and easily by taking into account only a small number of factors, such as those found in routine lab tests, urine tests, and basic personal data.



Predictive modeling

Predictive modeling in CKD care can be used to identify individuals at high risk of developing CKD and predict outcomes through analyzing data such as electronic health records.

• Identifying high-risk patients: Machine learning algorithms can be trained on large amounts of data, such as patient EHRs, to identify patterns and predict outcomes.
• Predicting progression: Predictive modeling can be used to predict the progression of CKD such as diabetes by analyzing data such as blood glucose levels and medication

To develop a predictive machine learning model for CKD progression, it is necessary to have longitudinal patient data capturing disease progression through the five CKD stages. Mathematical models have been employed for the initial classification of patients into these stages, followed by machine learning modeling to develop the predictive model.

Let’s build a predictive model using the XGBoost classifier to predict the stages of CKD using a synthetic dataset.

Limitation: This method has certain limitations. Specifically, due to the unavailability of labeled data for each stage, a mathematical model formula was utilized to create a new column in the dataset representing the CKD stage. However, it must be acknowledged that obtaining labeled data from healthcare professionals would be a more robust and reliable approach. Thus, the current limitation presents an opportunity to further improve the methodology through enhanced data collection methods.



Accuracy of XGBoost Multi-Class Classifier



Confusion Matrix Plot

Chronic Kidney Disease (CKD) is a global public health issue that demands immediate attention. Treating End-Stage Renal Disease (ESRD) is financially prohibitive, burdening an already stretched healthcare system. But hope lies in data science and machine learning, offering a proactive strategy to combat CKD and prevent its worst-case scenarios.

Early detection is crucial in managing and preventing CKD. Predictive machine learning models, using advanced mathematical and statistical concepts, can help doctors intervene early, change patients’ lifestyles, and stop CKD progression. Leveraging these cutting-edge technologies can lead to improved patient outcomes, reduced healthcare costs, and a healthier society. Proactive intervention, tailored treatments, and early detection are the keys to success in chronic disease management. Let’s work together to harness the power of data science to make this vision a reality.


LIFESPAN – The Healthcare Management System

As mentioned earlier, forecasting the progression of Chronic Kidney Disease poses a challenge due to the limited availability of longitudinal data. Nonetheless, this obstacle can be overcome with appropriate techniques and sufficient resources. It is imperative to acknowledge the complexity of the task, but with careful planning, comprehensive data collection, and the utilization of advanced modeling techniques, we can successfully predict CKD advancement and provide valuable insights for both medical professionals and patients. The main question, therefore, is how we can obtain the required longitudinal data.

Introducing the revolutionary new product from Minfy — LifeSpan!

LifeSpan can collect vital health data on numerous attributes, not just limited to CKD but any existing disease. With LifeSpan, you can streamline the way health data is collected and managed — from hospital administration to digital records, eliminating the tedious paper trail and saving valuable time. Say goodbye to cumbersome paperwork and hello to a future where data is easily accessible, and compliance is effortlessly tracked. Be ready to experience the future of health data collection with LifeSpan.

— Author: Gaurav Lohkna






[4] Image:

[5] Image:


[7] Data:

This website stores cookie on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember you. We use this information in order to improve and customize your browsing experience and for analytics and metrics about our visitors both on this website and other media. To find out more about the cookies we use, see our Privacy Policy. If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference not to be tracked.