Posted by MrD Brains | Updated Date: May 10, 2024
To download the source code for this article, you can visit our GitHub repository.
Machine learning is one of the most exciting and rapidly evolving fields in computer science. Lately, we have seen the emergence of advanced AI tools such as ChatGPT. The basis for these tools is machine learning, and its growing prevalence should make developers sit up and take notice.
Machine Learning, or ML for short, is a field of computer science that involves training algorithms to recognize patterns in data. Predictions or decisions are made based on these patterns. The goal of the machine learning model is to predict a new system state based on previous states.
ML.NET is an open-source machine learning framework that makes it simpler for C# developers to build and deploy machine learning models. It provides a range of algorithms for supervised and unsupervised learning, as well as tools for data preparation, training, evaluation, and deployment.
The first step is to install the required ML.NET packages, using the command line:
PM> Install-Package Microsoft.ML
Or via NuGet Package Manager. Multiple ML.NET packages are available; for most projects, install only
the Microsoft.ML
package.
3 main categories of machine learning models:
Unsupervised learning: Train a model on unlabeled data to find patterns.
Semi-supervised learning: Combine labeled and unlabeled data.
Supervised learning: (focus of this article) Train a model on labeled data (each data point known
outcome).
Common supervised learning algorithms: linear regression, logistic regression, decision trees, etc. Each algorithm fits different problems and data types.
We'll use the Credit Risk Customers dataset (21 columns) and focus on credit_amount
,
duration
, age
, and class
.
The class
column is the one we want to predict.
public class ModelInput
{
[ColumnName("duration"), LoadColumn(1)]
public float Duration { get; set; }
[ColumnName("credit_amount"), LoadColumn(4)]
public float CreditAmount { get; set; }
[ColumnName("age"), LoadColumn(12)]
public float Age { get; set; }
[ColumnName("class"), LoadColumn(20)]
public string Class { get; set; }
}
// Prediction result model
public class ModelOutput
{
[ColumnName("PredictedLabel")]
public string Prediction { get; set; }
}
public class ModelBuilder
{
private MLContext _mlContext = new MLContext(seed: 0);
private PredictionEngine<ModelInput, ModelOutput> _predictionEngine;
private IDataView _trainingDataView;
private IDataView _testDataView;
private ITransformer _mlModel;
...
}
_mlContext
– context for all ML.NET operations_predictionEngine
– make predictions on new data_trainingDataView
– train set_testDataView
– test set_mlModel
– stores the trained modelpublic void CreateModel(string dataFilePath, string savingPath)
{
LoadAndSplitData(dataFilePath);
var pipeline = PreProcessData();
BuildAndTrainModel(_trainingDataView, pipeline);
EvaluateModel();
SaveModel(savingPath);
_predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(_mlModel);
}
private void LoadAndSplitData(string dataFilePath)
{
var allDataView = _mlContext.Data.LoadFromTextFile<ModelInput>(
path: dataFilePath,
hasHeader: true,
separatorChar: ',');
var split = _mlContext.Data.TrainTestSplit(allDataView, testFraction: 0.1);
_trainingDataView = split.TrainSet;
_testDataView = split.TestSet;
}
public IEstimator PreProcessData()
{
var pipeline = _mlContext.Transforms.Conversion
.MapValueToKey(inputColumnName: "class", outputColumnName: "Label");
pipeline.Append(_mlContext.Transforms.Concatenate("Features", "duration", "credit_amount", "age"));
return pipeline;
}
class
to numeric label (MapValueToKey
)duration
, credit_amount
, age
into a features vector
public IEstimator<ITransformer> BuildAndTrainModel(IDataView trainingDataView, IEstimator<ITransformer> pipeline)
{
var trainingPipeline = pipeline
.Append(_mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy("Label", "Features"))
.Append(_mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
_mlModel = trainingPipeline.Fit(trainingDataView);
return trainingPipeline;
}
SdcaNonCalibrated
algorithmMapKeyToValue
to convert predicted labels back to stringspublic void EvaluateModel()
{
var testMetrics = _mlContext.MulticlassClassification.Evaluate(_mlModel.Transform(_testDataView));
Console.WriteLine($"- MicroAccuracy:\t{testMetrics.MicroAccuracy:0.###}");
Console.WriteLine($"- MacroAccuracy:\t{testMetrics.MacroAccuracy:0.###}");
Console.WriteLine($"- LogLoss:\t\t{testMetrics.LogLoss:#.###}");
Console.WriteLine($"- LogLossReduction:\t{testMetrics.LogLossReduction:#.###}");
}
Sample Evaluation Metrics:
MicroAccuracy: 0.796
MacroAccuracy: 0.514
LogLoss: 34.539
LogLossReduction: -69.23
Decent accuracy, but room for improvement in macro accuracy and log loss.
private void SaveModel(string saveModelPath)
{
_mlContext.Model.Save(_mlModel, _trainingDataView.Schema,
Path.Combine(Environment.CurrentDirectory, saveModelPath));
}
public void LoadModel(string path)
{
_mlModel = _mlContext.Model.Load(path, out _);
_predictionEngine = _mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(_mlModel);
}
public ModelOutput Predict(ModelInput input)
{
return _predictionEngine.Predict(input);
}
// Usage:
private static string savedModelFilename = "trainedModel.zip";
var modelBuilder = new ModelBuilder();
modelBuilder.LoadModel(savedModelFilename);
var modelInput = new ModelInput()
{
Age = 300,
CreditAmount = 100000,
Duration = 120
};
var prediction = modelBuilder?.Predict(modelInput);
Console.WriteLine($"\nExample input class is {prediction?.Prediction.ToUpper()}!");
Usage is straightforward once your model is trained and saved.
In this article, we’ve covered the basics of machine learning and explored how to create a simple ML model in C# using ML.NET. We also learned about techniques to improve model performance and several interesting ML.NET scenarios.
ML.NET is a great addition to the Microsoft stack, enabling C# developers to keep pace with this exciting, rapidly evolving technology.