The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

This article demonstrates the process of creating, training, saving, and loading a spam detection AI model using ML.NET, but also emphasizes the reusability of the trained model. By following the steps in the article, you will be able to create a model that can be easily reused and integrated into your .NET applications, allowing you to effectively identify and filter out spam emails.

Prerequisites

  • Basic understanding of C#
  • Familiarity with ML.NET and machine learning concepts

Code Overview

    1. Import necessary namespaces:

      using System;
      using System.IO;
      using System.Linq;
      using Microsoft.ML;
      using Microsoft.ML.Data;
    
    1. Define the Email class and its properties:

      public class Email
      {
        public string Content { get; set; }
        public bool IsSpam { get; set; }
      }
    
    1. Create a sample dataset for training the model:

      var sampleData = new List<Email>
      {
        new Email { Content = "Buy cheap products now", IsSpam = true },
        new Email { Content = "Meeting at 3 PM", IsSpam = false },
      };
    
    1. Initialize a new MLContext, which is the main entry point to ML.NET:

      var mlContext = new MLContext();
    
    1. Load the sample data into an IDataView:

      var trainData = mlContext.Data.LoadFromEnumerable(sampleData);
    
    1. Define the data processing pipeline and the training algorithm (SdcaLogisticRegression):

      var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
        .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
    
    1. Train the model:

      var model = pipeline.Fit(trainData);
    
    1. Save the trained model as a .NET binary:

      mlContext.Model.Save(model, trainData.Schema, "model.zip");
    
    1. Load the saved model:

      var newMlContext = new MLContext();
      DataViewSchema modelSchema;
      ITransformer trainedModel = newMlContext.Model.Load("model.zip", out modelSchema);
    
    1. Create a prediction engine:

      var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(trainedModel);
    
    1. Test the model with a sample email:

      var sampleEmail = new Email { Content = "Special discount, buy now!" };
      var prediction = predictionEngine.Predict(sampleEmail);
    
    1. Output the prediction:

      Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
    
    1. Assert that the prediction is correct:

      Assert.IsTrue(prediction.IsSpam);
    
    1. Verify that the model was saved:

      if(File.Exists("model.zip"))
        Assert.Pass();
      else
        Assert.Fail();
    

Conclusion

In this article, we explained a simple spam detection model in ML.NET and demonstrated how to train and test the model. This code can be extended to build more complex models, and can be used as a starting point for exploring machine learning in .NET.

Github Repo

Introduction to Machine Learning in C#: Spam Detection using Binary Classification

Introduction to Machine Learning in C#: Spam Detection using Binary Classification

Introduction to Machine Learning in C#: Spam using Binary Classification

This example demonstrates the basics of machine learning in C# using ML.NET, Microsoft’s machine learning framework specifically designed for .NET applications. ML.NET offers a versatile, cross-platform framework that simplifies integrating machine learning into .NET applications, making it accessible for developers familiar with the .NET ecosystem.

Technologies Used

  • C#: A modern, object-oriented programming language developed by Microsoft, which is widely used for a variety of applications. In this example, C# is used to define data models, process data, and implement the machine learning pipeline.
  • ML.NET: An open-source and cross-platform machine learning framework for .NET. It is used in this example to create a machine learning model for classifying emails as spam or not spam. ML.NET simplifies the process of training, evaluating, and consuming machine learning models in .NET applications.
  • .NET Core: A cross-platform version of .NET for building applications that run on Windows, Linux, and macOS. It provides the runtime environment for our C# application.

The example focuses on a simple spam detection system. It utilizes text data processing and binary classification, two common tasks in machine learning, to classify emails into spam and non-spam categories. This is achieved through the use of a logistic regression model, a fundamental algorithm for binary classification problems.

Creating an NUnit Test Project in Visual Studio Code

 

           Setting up NUnit for DecisionTreeDemo

 

    • Install .NET Core SDK

      Download and install the .NET Core SDK from the .NET official website.

    • Install Visual Studio Code

      Download and install Visual Studio Code (VS Code) from here. Also, install the C# extension for VS Code by Microsoft.

    • Create a New .NET Core Project

      Open VS Code, and in the terminal, create a new .NET Core project:

      dotnet new console -n DecisionTreeDemo
      cd DecisionTreeDemo
    • Add the ML.NET Package

      Add the ML.NET package to your project:

      dotnet add package Microsoft.ML
    • Create the Test Project

      Create a separate directory for your test project, then initialize a new test project:

          
      mkdir DecisionTreeDemo.Tests
      cd DecisionTreeDemo.Tests
      dotnet new nunit
    • Add Required Packages to Test Project

      Add the necessary NUnit and ML.NET packages:

      dotnet add package NUnit
      dotnet add package Microsoft.NET.Test.Sdk
      dotnet add package NUnit3TestAdapter
      dotnet add package Microsoft.ML
    • Reference the Main Project

      Reference the main project:

          dotnet add reference ../DecisionTreeDemo/DecisionTreeDemo.csproj
    • Write Test Cases

      Write NUnit test cases within your test project to test different functionalities of your ML.NET application.

      Define the Data Model for the Email

      Include the content of the email and whether it’s classified as spam.

          
      public class Email
      {
          [LoadColumn(0)]
          public string Content { get; set; }
      
          [LoadColumn(1), ColumnName("Label")]
          public bool IsSpam { get; set; }
      }
      

      Define the Model for Spam Prediction

      This model is used to determine whether an email is spam.

        
      public class SpamPrediction
      {
          [ColumnName("PredictedLabel")]
          public bool IsSpam { get; set; }
      }
      

      Write the test case

             
      // Create a new ML context for the application, which is a starting point for ML.NET operations.
              var mlContext = new MLContext();
      
              // Example dataset of emails. In a real-world scenario, this would be much larger and possibly loaded from an external source.
              var data = new List
              {
                  new Email { Content = "Buy cheap products now", IsSpam = true },
                  new Email { Content = "Meeting at 3 PM", IsSpam = false },
                  // Additional data can be added here...
              };
      
              // Load the data into the ML.NET data model.
              var trainData = mlContext.Data.LoadFromEnumerable(data);
      
              // Define the data processing pipeline. Here we are featurizing the text (i.e., converting text into numeric features) and then     applying a logistic regression model.
              var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
                  .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
      
              // Train the model on the loaded data.
              var model = pipeline.Fit(trainData);
      
              // Create a prediction engine for making predictions on individual data samples.
              var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(model);
      
              // Create a sample email to test the model.
              var sampleEmail = new Email { Content = "Special discount, buy now!" };
              var prediction = predictionEngine.Predict(sampleEmail);
      
              // Output the prediction to the console.
              Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
              Assert.IsTrue(prediction.IsSpam);
      
    • Running Tests

      Run the tests with the following command:

      dotnet test

As you can see the test will pass because the sample email contains the word “buy” that was used in the training data and was labeled as spam

You can download the source code for this article here

This article has explored the fundamentals of machine learning in C# using the ML.NET framework. By defining specific data models and utilizing ML.NET’s powerful features, we demonstrated how to build a simple yet effective spam detection system. This example serves as a gateway into the vast world of machine learning, showcasing the potential for integrating AI technologies into .NET applications. The skills and concepts learned here lay the groundwork for further exploration and development in the exciting field of machine learning and artificial intelligence.

SOLID design pattern and XPO

SOLID design pattern and XPO

SOLID is an acronym that stands for five fundamental principles of object-oriented programming and design. These principles were first introduced by Robert C. Martin (also known as Uncle Bob) and have since become a cornerstone of software development best practices. Each letter in SOLID represents a principle that, when applied correctly, leads to more maintainable and modular code.

 

Let’s dive into each of the SOLID principles and understand how they contribute to building high-quality software systems:

 

Single Responsibility Principle (SRP):

The SRP states that a class should have only one reason to change. In other words, a class should have a single responsibility or a single job. This principle encourages developers to break down complex systems into smaller, cohesive modules. By ensuring that each class has a focused responsibility, it becomes easier to understand, test, and modify the code without affecting other parts of the system.

 

Open-Closed Principle (OCP):

The OCP promotes the idea that software entities (classes, modules, functions, etc.) should be open for extension but closed for modification. This principle emphasizes the importance of designing systems that can be easily extended with new functionality without modifying existing code. By relying on abstractions, interfaces, and inheritance, developers can add new features by writing new code rather than changing the existing one. This approach reduces the risk of introducing bugs or unintended side effects.

 

Liskov Substitution Principle (LSP):

The LSP states that objects of a superclass should be replaceable with objects of its subclasses without affecting the correctness of the program. In simpler terms, if a class is a subtype of another class, it should be able to be used interchangeably with its parent class without causing any issues. This principle ensures that inheritance hierarchies are well-designed, avoiding situations where subclass behavior contradicts or breaks the functionality defined by the superclass. Adhering to the LSP leads to more flexible and reusable code.

 

Interface Segregation Principle (ISP):

The ISP advises developers to design interfaces that are specific to the needs of the clients that use them. It suggests that clients should not be forced to depend on interfaces they don’t use. By creating small and focused interfaces, rather than large and monolithic ones, the ISP enables clients to be decoupled from unnecessary dependencies. This principle enhances modularity, testability, and maintainability, as changes to one part of the system are less likely to impact other parts.

 

Dependency Inversion Principle (DIP):

The DIP encourages high-level modules to depend on abstractions rather than concrete implementations. It states that high-level modules should not depend on low-level modules; both should depend on abstractions. This principle promotes loose coupling between components, making it easier to substitute or modify dependencies without affecting the overall system. By relying on interfaces or abstract classes, the DIP facilitates extensibility, testability, and the ability to adapt to changing requirements.

 

By applying the SOLID principles, software engineers can create codebases that are modular, flexible, and easy to maintain. These principles provide a roadmap for designing systems that are resilient to change, promote code reusability, and improve collaboration among development teams. SOLID principles are not strict rules but rather guidelines that help developers make informed design decisions and create high-quality software systems.

 

It’s worth mentioning that the SOLID principles should not be applied blindly in all situations. Context matters, and there may be scenarios where strict adherence to one principle might not be the best approach. However, understanding and incorporating these principles into the software design process can significantly improve the overall quality of the codebase.

 

SOLID and XPO

 

XPO is designed with SOLID design principles in mind. Here’s how XPO applies each of the SOLID principles:

 

Single Responsibility Principle (SRP)

 

XPO uses separate classes for each major concern such as mapping, persistence, connection providers, and data access. Each class has a clearly defined purpose and responsibility.

 

Open-Closed Principle (OCP)

 

XPO is extensible and customizable, allowing you to create your own classes and derive them from the XPO base classes. XPO also provides a range of extension points and hooks to allow for customization and extension without modifying the core XPO code.

 

Liskov Substitution Principle (LSP)

 

XPO follows this principle by providing a uniform API that works with all persistent objects, regardless of their concrete implementation. This allows you to write code that works with any persistent object, without having to worry about the specific implementation details.

 

Interface Segregation Principle (ISP)

 

XPO provides a number of interfaces that define specific aspects of its behavior, allowing clients to use only the interfaces they need. This reduces the coupling between the clients and the XPO library.

 

Dependency Inversion Principle (DIP)

 

XPO was developed prior to the widespread popularity of Dependency Injection (DI) patterns and frameworks.

 

As a result, XPO does not incorporate DI as a built-in feature. However, this does not mean that XPO cannot be used in conjunction with DI. While XPO itself does not provide direct support for DI, you can still integrate it with popular DI containers or frameworks, such as the .NET Core DI container or any other one.

 

By integrating XPO with a DI container, you can leverage the benefits of DI principles in your application while utilizing XPO’s capabilities for database access and mapping. The DI container can handle the creation and management of XPO-related objects, facilitating loose coupling, improved modularity, and simplified testing.

 

A clear example is the XPO Extensions for ASP.NET Core DI:

using DevExpress.Xpo.DB; 
Using Microsoft.Extensions.DependencyInjection; // ...

var builder = WebApplication.CreateBuilder(args); builder.Services.AddXpoDefaultUnitOfWork(true, options => options.UseConnectionString(builder.Configuration.GetConnectionString("MSSqlServer")) .UseAutoCreationOption(AutoCreateOption.DatabaseAndSchema) .UseEntityTypes(new Type[] { typeof(User) }));

More information about XPO and DI here: https://docs.devexpress.com/XPO/403009/connect-to-a-data-store/xpo-extensions-for-aspnet-core-di

And that’s all for this post, until next time ))

We are excited to announce that we are currently in the process of writing a comprehensive book about DevExpress XPO. As we work on this project, we believe it is essential to involve our readers and gather their valuable feedback. Therefore, we have decided to share articles from the book as we complete them, giving you an opportunity to provide input and suggestions that we can consider for inclusion in the final release. Keep in mind that the content presented is subject to change. We greatly appreciate your participation in this collaborative effort.

Related Articles

What is an O.R.M (Object-Relational Mapping)

ADO The origin of data access in .NET

Relational database systems: the holy grail of data