Introduction to Machine Learning in C#: Spam using Binary Classification

This example demonstrates the basics of machine learning in C# using ML.NET, Microsoft’s machine learning framework specifically designed for .NET applications. ML.NET offers a versatile, cross-platform framework that simplifies integrating machine learning into .NET applications, making it accessible for developers familiar with the .NET ecosystem.

Technologies Used

  • C#: A modern, object-oriented programming language developed by Microsoft, which is widely used for a variety of applications. In this example, C# is used to define data models, process data, and implement the machine learning pipeline.
  • ML.NET: An open-source and cross-platform machine learning framework for .NET. It is used in this example to create a machine learning model for classifying emails as spam or not spam. ML.NET simplifies the process of training, evaluating, and consuming machine learning models in .NET applications.
  • .NET Core: A cross-platform version of .NET for building applications that run on Windows, Linux, and macOS. It provides the runtime environment for our C# application.

The example focuses on a simple spam detection system. It utilizes text data processing and binary classification, two common tasks in machine learning, to classify emails into spam and non-spam categories. This is achieved through the use of a logistic regression model, a fundamental algorithm for binary classification problems.

Creating an NUnit Test Project in Visual Studio Code

 

           Setting up NUnit for DecisionTreeDemo

 

    • Install .NET Core SDK

      Download and install the .NET Core SDK from the .NET official website.

    • Install Visual Studio Code

      Download and install Visual Studio Code (VS Code) from here. Also, install the C# extension for VS Code by Microsoft.

    • Create a New .NET Core Project

      Open VS Code, and in the terminal, create a new .NET Core project:

      dotnet new console -n DecisionTreeDemo
      cd DecisionTreeDemo
    • Add the ML.NET Package

      Add the ML.NET package to your project:

      dotnet add package Microsoft.ML
    • Create the Test Project

      Create a separate directory for your test project, then initialize a new test project:

          
      mkdir DecisionTreeDemo.Tests
      cd DecisionTreeDemo.Tests
      dotnet new nunit
    • Add Required Packages to Test Project

      Add the necessary NUnit and ML.NET packages:

      dotnet add package NUnit
      dotnet add package Microsoft.NET.Test.Sdk
      dotnet add package NUnit3TestAdapter
      dotnet add package Microsoft.ML
    • Reference the Main Project

      Reference the main project:

          dotnet add reference ../DecisionTreeDemo/DecisionTreeDemo.csproj
    • Write Test Cases

      Write NUnit test cases within your test project to test different functionalities of your ML.NET application.

      Define the Data Model for the Email

      Include the content of the email and whether it’s classified as spam.

          
      public class Email
      {
          [LoadColumn(0)]
          public string Content { get; set; }
      
          [LoadColumn(1), ColumnName("Label")]
          public bool IsSpam { get; set; }
      }
      

      Define the Model for Spam Prediction

      This model is used to determine whether an email is spam.

        
      public class SpamPrediction
      {
          [ColumnName("PredictedLabel")]
          public bool IsSpam { get; set; }
      }
      

      Write the test case

             
      // Create a new ML context for the application, which is a starting point for ML.NET operations.
              var mlContext = new MLContext();
      
              // Example dataset of emails. In a real-world scenario, this would be much larger and possibly loaded from an external source.
              var data = new List
              {
                  new Email { Content = "Buy cheap products now", IsSpam = true },
                  new Email { Content = "Meeting at 3 PM", IsSpam = false },
                  // Additional data can be added here...
              };
      
              // Load the data into the ML.NET data model.
              var trainData = mlContext.Data.LoadFromEnumerable(data);
      
              // Define the data processing pipeline. Here we are featurizing the text (i.e., converting text into numeric features) and then     applying a logistic regression model.
              var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
                  .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
      
              // Train the model on the loaded data.
              var model = pipeline.Fit(trainData);
      
              // Create a prediction engine for making predictions on individual data samples.
              var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(model);
      
              // Create a sample email to test the model.
              var sampleEmail = new Email { Content = "Special discount, buy now!" };
              var prediction = predictionEngine.Predict(sampleEmail);
      
              // Output the prediction to the console.
              Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
              Assert.IsTrue(prediction.IsSpam);
      
    • Running Tests

      Run the tests with the following command:

      dotnet test

As you can see the test will pass because the sample email contains the word “buy” that was used in the training data and was labeled as spam

You can download the source code for this article here

This article has explored the fundamentals of machine learning in C# using the ML.NET framework. By defining specific data models and utilizing ML.NET’s powerful features, we demonstrated how to build a simple yet effective spam detection system. This example serves as a gateway into the vast world of machine learning, showcasing the potential for integrating AI technologies into .NET applications. The skills and concepts learned here lay the groundwork for further exploration and development in the exciting field of machine learning and artificial intelligence.