ML.NET – Introduction to Machine Learning With C#

Posted by MrD Brains | Updated Date: May 10, 2024

Ready to take your skills to the next level? Jump into our high-impact courses in web development and software architecture, all with a focus on mastering the .NET/C# framework. Whether you're building sleek web applications or designing scalable software solutions, our expert-led training will give you the tools to succeed. Visit our COURSES page now and kickstart your journey!

To download the source code for this article, you can visit our GitHub repository.

Introduction

Machine learning is one of the most exciting and rapidly evolving fields in computer science. Lately, we have seen the emergence of advanced AI tools such as ChatGPT. The basis for these tools is machine learning, and its growing prevalence should make developers sit up and take notice.

What Is Machine Learning and How Does ML.NET Enable It?

Machine Learning, or ML for short, is a field of computer science that involves training algorithms to recognize patterns in data. Predictions or decisions are made based on these patterns. The goal of the machine learning model is to predict a new system state based on previous states.

From a clear C# developer’s point of view, Machine Learning can be challenging because building and training models require a lot of specialized knowledge and resources. That is where ML.NET comes into play.

ML.NET is an open-source machine learning framework that makes it simpler for C# developers to build and deploy machine learning models. It provides a range of algorithms for supervised and unsupervised learning, as well as tools for data preparation, training, evaluation, and deployment.

Key advantage: ML.NET integrates seamlessly with the .NET ecosystem. Build machine learning models and deploy them alongside your code without learning new languages or tools.
Performance optimization: ML.NET features hyperparameter tuning, feature selection, and automatic model selection—helping you build accurate and reliable models, even with large datasets.

Setting Up the Development Environment

The first step is to install the required ML.NET packages, using the command line:

PM> Install-Package Microsoft.ML

Or via NuGet Package Manager. Multiple ML.NET packages are available; for most projects, install only the Microsoft.ML package.

Understanding Supervised Learning

3 main categories of machine learning models:

Supervised learning
Unsupervised learning
Semi-supervised learning

Unsupervised learning: Train a model on unlabeled data to find patterns.
Semi-supervised learning: Combine labeled and unlabeled data.
Supervised learning: (focus of this article) Train a model on labeled data (each data point known outcome).

Common supervised learning algorithms: linear regression, logistic regression, decision trees, etc. Each algorithm fits different problems and data types.

Key benefit: Supervised learning enables models for applications such as customer behavior prediction or fraud detection. Requires labeled data.

Building and Training a Simple Model With ML.NET

We'll use the Credit Risk Customers dataset (21 columns) and focus on credit_amount, duration, age, and class.
The class column is the one we want to predict.

Defining ModelInput and ModelOutput Classes

public class ModelInput
{
    [ColumnName("duration"), LoadColumn(1)]
    public float Duration { get; set; }
    [ColumnName("credit_amount"), LoadColumn(4)]
    public float CreditAmount { get; set; }
    [ColumnName("age"), LoadColumn(12)]
    public float Age { get; set; }
    [ColumnName("class"), LoadColumn(20)]
    public string Class { get; set; }
}

// Prediction result model
public class ModelOutput
{
    [ColumnName("PredictedLabel")]
    public string Prediction { get; set; }
}

Defining the ModelBuilder Class

public class ModelBuilder
{
    private MLContext _mlContext = new MLContext(seed: 0);
    private PredictionEngine<ModelInput, ModelOutput> _predictionEngine;
    private IDataView _trainingDataView;
    private IDataView _testDataView;
    private ITransformer _mlModel;
    ...
}

_mlContext – context for all ML.NET operations
_predictionEngine – make predictions on new data
_trainingDataView – train set
_testDataView – test set
_mlModel – stores the trained model

Model Creation Method

public void CreateModel(string dataFilePath, string savingPath)
{
    LoadAndSplitData(dataFilePath);
    var pipeline = PreProcessData();

    BuildAndTrainModel(_trainingDataView, pipeline);

    EvaluateModel();
    SaveModel(savingPath);
    _predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(_mlModel);
}

Step-by-Step ML.NET Model Development

Collecting Data

private void LoadAndSplitData(string dataFilePath)
{
    var allDataView = _mlContext.Data.LoadFromTextFile<ModelInput>(
                              path: dataFilePath,
                              hasHeader: true,
                              separatorChar: ',');
    var split = _mlContext.Data.TrainTestSplit(allDataView, testFraction: 0.1);
    _trainingDataView = split.TrainSet;
    _testDataView = split.TestSet;
}

Load data from CSV (comma separated, header row)
Split data: 90% train, 10% test

Preparing The Data

public IEstimator PreProcessData()
{
    var pipeline = _mlContext.Transforms.Conversion
        .MapValueToKey(inputColumnName: "class", outputColumnName: "Label");
    pipeline.Append(_mlContext.Transforms.Concatenate("Features", "duration", "credit_amount", "age"));
    return pipeline;
}

Convert class to numeric label (MapValueToKey)
Combine duration, credit_amount, age into a features vector

Training the Model

public IEstimator<ITransformer> BuildAndTrainModel(IDataView trainingDataView, IEstimator<ITransformer> pipeline)
{
    var trainingPipeline = pipeline
            .Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
            .Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
    _mlModel = trainingPipeline.Fit(trainingDataView);
    return trainingPipeline;
}

Choose the SdcaNonCalibrated algorithm
Use MapKeyToValue to convert predicted labels back to strings

Evaluating the Model

public void EvaluateModel()
{
    var testMetrics = _mlContext.MulticlassClassification.Evaluate(_mlModel.Transform(_testDataView));
    Console.WriteLine($"- MicroAccuracy:\t{testMetrics.MicroAccuracy:0.###}");
    Console.WriteLine($"- MacroAccuracy:\t{testMetrics.MacroAccuracy:0.###}");
    Console.WriteLine($"- LogLoss:\t\t{testMetrics.LogLoss:#.###}");
    Console.WriteLine($"- LogLossReduction:\t{testMetrics.LogLossReduction:#.###}");
}

Sample Evaluation Metrics:
MicroAccuracy: 0.796
MacroAccuracy: 0.514
LogLoss: 34.539
LogLossReduction: -69.23

Decent accuracy, but room for improvement in macro accuracy and log loss.

Saving and Loading the Model

private void SaveModel(string saveModelPath)
{
    _mlContext.Model.Save(_mlModel, _trainingDataView.Schema, 
        Path.Combine(Environment.CurrentDirectory, saveModelPath));
}

public void LoadModel(string path)
{
    _mlModel = _mlContext.Model.Load(path, out _);
    _predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(_mlModel);
}

Prediction Usage Example

public ModelOutput Predict(ModelInput input)
{
    return _predictionEngine.Predict(input);
}

// Usage:
private static string savedModelFilename = "trainedModel.zip";
var modelBuilder = new ModelBuilder();
modelBuilder.LoadModel(savedModelFilename);
var modelInput = new ModelInput()
{
    Age = 300,
    CreditAmount = 100000,
    Duration = 120
};
var prediction = modelBuilder?.Predict(modelInput);
Console.WriteLine($"\nExample input class is {prediction?.Prediction.ToUpper()}!");

Usage is straightforward once your model is trained and saved.

Improving Model Performance

Algorithm Selection – Choose algorithms suited for the problem/data characteristics
Feature Engineering – Create, transform, or select informative features
Hyperparameter Tuning – Tune algorithm parameters using grid/random search
Regularization – Add penalty terms to reduce overfitting
Ensemble Methods – Combine multiple models (bagging, boosting, stacking)
Data Augmentation – Generate more training data via transformations

What Can We Do With ML.NET?

Image and Video Analysis – Classification, object/face detection
Natural Language Processing – Sentiment analysis, named entity recognition, translation
Anomaly Detection – Outlier or fraud detection, intrusion detection
Predictive Modeling – Forecasting trends, pattern recognition
Recommender Systems – Personalized suggestions (products, movies, etc.)
Speech Recognition – Voice analysis, synthesis, speaker ID

Conclusion

In this article, we’ve covered the basics of machine learning and explored how to create a simple ML model in C# using ML.NET. We also learned about techniques to improve model performance and several interesting ML.NET scenarios.

ML.NET is a great addition to the Microsoft stack, enabling C# developers to keep pace with this exciting, rapidly evolving technology.