The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

The Steps to Create, Train, Save, and Load a Spam Detection AI Model Using ML.NET

This article demonstrates the process of creating, training, saving, and loading a spam detection AI model using ML.NET, but also emphasizes the reusability of the trained model. By following the steps in the article, you will be able to create a model that can be easily reused and integrated into your .NET applications, allowing you to effectively identify and filter out spam emails.

Prerequisites

  • Basic understanding of C#
  • Familiarity with ML.NET and machine learning concepts

Code Overview

    1. Import necessary namespaces:

      using System;
      using System.IO;
      using System.Linq;
      using Microsoft.ML;
      using Microsoft.ML.Data;
    
    1. Define the Email class and its properties:

      public class Email
      {
        public string Content { get; set; }
        public bool IsSpam { get; set; }
      }
    
    1. Create a sample dataset for training the model:

      var sampleData = new List<Email>
      {
        new Email { Content = "Buy cheap products now", IsSpam = true },
        new Email { Content = "Meeting at 3 PM", IsSpam = false },
      };
    
    1. Initialize a new MLContext, which is the main entry point to ML.NET:

      var mlContext = new MLContext();
    
    1. Load the sample data into an IDataView:

      var trainData = mlContext.Data.LoadFromEnumerable(sampleData);
    
    1. Define the data processing pipeline and the training algorithm (SdcaLogisticRegression):

      var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
        .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
    
    1. Train the model:

      var model = pipeline.Fit(trainData);
    
    1. Save the trained model as a .NET binary:

      mlContext.Model.Save(model, trainData.Schema, "model.zip");
    
    1. Load the saved model:

      var newMlContext = new MLContext();
      DataViewSchema modelSchema;
      ITransformer trainedModel = newMlContext.Model.Load("model.zip", out modelSchema);
    
    1. Create a prediction engine:

      var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(trainedModel);
    
    1. Test the model with a sample email:

      var sampleEmail = new Email { Content = "Special discount, buy now!" };
      var prediction = predictionEngine.Predict(sampleEmail);
    
    1. Output the prediction:

      Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
    
    1. Assert that the prediction is correct:

      Assert.IsTrue(prediction.IsSpam);
    
    1. Verify that the model was saved:

      if(File.Exists("model.zip"))
        Assert.Pass();
      else
        Assert.Fail();
    

Conclusion

In this article, we explained a simple spam detection model in ML.NET and demonstrated how to train and test the model. This code can be extended to build more complex models, and can be used as a starting point for exploring machine learning in .NET.

Github Repo

Decision Trees and Naive Bayes Classifiers

Decision Trees and Naive Bayes Classifiers

Decision Trees and Naive Bayes Classifiers

Decision Trees

Overview:

  • Decision trees are a type of supervised learning algorithm used for classification and regression tasks.
  • They work by breaking down a dataset into smaller subsets while at the same time developing an associated decision tree incrementally.
  • The final model is a tree with decision nodes and leaf nodes. A decision node has two or more branches, and a leaf node represents a classification or decision.

Brief History:

  • The concept of decision trees can be traced back to the work of R.A. Fisher in the 1930s, but modern decision tree algorithms emerged in the 1960s and 1970s.
  • One of the earliest and most famous decision tree algorithms, ID3 (Iterative Dichotomiser 3), was developed by Ross Quinlan in the 1980s.
  • Subsequently, Quinlan developed the C4.5 algorithm, which became a standard in the field.

Simple Example:

Imagine a decision tree used to decide if one should play tennis based on weather conditions. The tree might have decision nodes like ‘Is it raining?’ or ‘Is the humidity high?’ leading to outcomes like ‘Play’ or ‘Don’t Play’.

Naive Bayes Classifiers

Overview:

  • Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes’ theorem with strong independence assumptions between the features.
  • They are highly scalable and can handle a large number of features, making them suitable for text classification, spam filtering, and even medical diagnosis.

Brief History:

  • The foundation of Naive Bayes is Bayes’ theorem, formulated by Thomas Bayes in the 18th century.
  • However, the ‘naive’ version, assuming feature independence, was developed and gained prominence in the 20th century, particularly in the 1950s and 1960s.
  • Naive Bayes has remained popular due to its simplicity, effectiveness, and efficiency.

Simple Example:

Consider a Naive Bayes classifier for spam detection. It calculates the probability of an email being spam based on the frequency of words typically found in spam emails, such as “prize,” “free,” or “winner.”

Conclusion

Both decision trees and Naive Bayes classifiers are instrumental in the field of machine learning, each with its strengths and weaknesses. Decision trees are known for their interpretability and simplicity, while Naive Bayes classifiers are appreciated for their efficiency and performance in high-dimensional spaces. Their development and application over the years have significantly contributed to the advancement of machine learning and data science.