mlnet
SKILL.md
ML.NET
Overview
ML.NET is Microsoft's cross-platform machine learning framework for .NET developers. It provides a pipeline-based API for loading data, transforming features, training models, and making predictions without requiring deep ML expertise. ML.NET supports classification, regression, clustering, anomaly detection, recommendation, and time-series forecasting, with AutoML for automated model selection and hyperparameter tuning.
NuGet Packages
dotnet add package Microsoft.ML
dotnet add package Microsoft.ML.AutoML # AutoML experimentation
dotnet add package Microsoft.ML.TimeSeries # Time-series forecasting
dotnet add package Microsoft.ML.Recommender # Recommendation engine
dotnet add package Microsoft.ML.ImageAnalytics # Image classification
Data Classes
ML.NET uses POCOs (Plain Old C# Objects) to represent input data and predictions.
using Microsoft.ML.Data;
public class HouseData
{
[LoadColumn(0)] public float Size { get; set; }
[LoadColumn(1)] public float Bedrooms { get; set; }
[LoadColumn(2)] public float Bathrooms { get; set; }
[LoadColumn(3)] public float Age { get; set; }
[LoadColumn(4)] public float Price { get; set; }
}
public class HousePrediction
{
[ColumnName("Score")]
public float PredictedPrice { get; set; }
}
public class SentimentData
{
[LoadColumn(0)] public string? Text { get; set; }
[LoadColumn(1), ColumnName("Label")] public bool Sentiment { get; set; }
}
public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Prediction { get; set; }
public float Probability { get; set; }
public float Score { get; set; }
}
Regression (Price Prediction)
using Microsoft.ML;
var mlContext = new MLContext(seed: 42);
// Load data
IDataView data = mlContext.Data.LoadFromTextFile<HouseData>(
"houses.csv", separatorChar: ',', hasHeader: true);
// Split into training and test sets
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
// Build pipeline: feature engineering + training algorithm
var pipeline = mlContext.Transforms.Concatenate(
"Features", nameof(HouseData.Size), nameof(HouseData.Bedrooms),
nameof(HouseData.Bathrooms), nameof(HouseData.Age))
.Append(mlContext.Transforms.NormalizeMinMax("Features"))
.Append(mlContext.Regression.Trainers.FastTree(
labelColumnName: nameof(HouseData.Price),
numberOfLeaves: 20,
numberOfTrees: 100,
minimumExampleCountPerLeaf: 10,
learningRate: 0.2));
// Train the model
var model = pipeline.Fit(split.TrainSet);
// Evaluate on test set
var predictions = model.Transform(split.TestSet);
var metrics = mlContext.Regression.Evaluate(predictions, labelColumnName: nameof(HouseData.Price));
Console.WriteLine($"R-Squared: {metrics.RSquared:F4}");
Console.WriteLine($"RMSE: {metrics.RootMeanSquaredError:F2}");
Console.WriteLine($"MAE: {metrics.MeanAbsoluteError:F2}");
// Make a prediction
var predictor = mlContext.Model.CreatePredictionEngine<HouseData, HousePrediction>(model);
var prediction = predictor.Predict(new HouseData
{
Size = 1500, Bedrooms = 3, Bathrooms = 2, Age = 10
});
Console.WriteLine($"Predicted price: ${prediction.PredictedPrice:N0}");
Binary Classification (Sentiment Analysis)
var mlContext = new MLContext(seed: 42);
IDataView data = mlContext.Data.LoadFromTextFile<SentimentData>(
"sentiment.csv", separatorChar: ',', hasHeader: true);
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
var pipeline = mlContext.Transforms.Text
.FeaturizeText("Features", nameof(SentimentData.Text))
.Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(
labelColumnName: "Label",
featureColumnName: "Features"));
var model = pipeline.Fit(split.TrainSet);
var predictions = model.Transform(split.TestSet);
var metrics = mlContext.BinaryClassification.Evaluate(predictions, "Label");
Console.WriteLine($"Accuracy: {metrics.Accuracy:P2}");
Console.WriteLine($"AUC: {metrics.AreaUnderRocCurve:F4}");
Console.WriteLine($"F1 Score: {metrics.F1Score:F4}");
var predictor = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(model);
var result = predictor.Predict(new SentimentData { Text = "This product is amazing!" });
Console.WriteLine($"Sentiment: {(result.Prediction ? "Positive" : "Negative")} ({result.Probability:P1})");
Multi-Class Classification
public class IssueData
{
[LoadColumn(0)] public string? Title { get; set; }
[LoadColumn(1)] public string? Description { get; set; }
[LoadColumn(2)] public string? Area { get; set; } // Label: bug, feature, docs
}
public class IssuePrediction
{
[ColumnName("PredictedLabel")]
public string? Area { get; set; }
public float[] Score { get; set; } = [];
}
var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", nameof(IssueData.Area))
.Append(mlContext.Transforms.Text.FeaturizeText("TitleFeatures", nameof(IssueData.Title)))
.Append(mlContext.Transforms.Text.FeaturizeText("DescFeatures", nameof(IssueData.Description)))
.Append(mlContext.Transforms.Concatenate("Features", "TitleFeatures", "DescFeatures"))
.Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy())
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));
var model = pipeline.Fit(trainData);
AutoML
Let ML.NET automatically discover the best algorithm and hyperparameters.
using Microsoft.ML.AutoML;
var mlContext = new MLContext(seed: 42);
var data = mlContext.Data.LoadFromTextFile<HouseData>(
"houses.csv", separatorChar: ',', hasHeader: true);
var split = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
// Run AutoML experiment for up to 60 seconds
var experiment = mlContext.Auto()
.CreateRegressionExperiment(maxExperimentTimeInSeconds: 60);
var result = experiment.Execute(
split.TrainSet,
labelColumnName: nameof(HouseData.Price));
Console.WriteLine($"Best algorithm: {result.BestRun.TrainerName}");
Console.WriteLine($"Best R-Squared: {result.BestRun.ValidationMetrics.RSquared:F4}");
// Use the best model
var bestModel = result.BestRun.Model;
var predictor = mlContext.Model.CreatePredictionEngine<HouseData, HousePrediction>(bestModel);
Model Serialization and Loading
// Save model to file
mlContext.Model.Save(model, data.Schema, "model.zip");
// Load model from file
ITransformer loadedModel = mlContext.Model.Load("model.zip", out DataViewSchema schema);
// Create prediction engine from loaded model
var predictor = mlContext.Model.CreatePredictionEngine<HouseData, HousePrediction>(loadedModel);
var prediction = predictor.Predict(new HouseData { Size = 2000, Bedrooms = 4 });
Integration with ASP.NET Core
var builder = WebApplication.CreateBuilder(args);
// Load model once at startup
var mlContext = new MLContext();
var model = mlContext.Model.Load("model.zip", out _);
builder.Services.AddSingleton(model);
builder.Services.AddSingleton(mlContext);
// PredictionEnginePool for thread-safe predictions
builder.Services.AddPredictionEnginePool<HouseData, HousePrediction>()
.FromFile("model.zip");
var app = builder.Build();
app.MapPost("/predict", (
HouseData input,
PredictionEnginePool<HouseData, HousePrediction> pool) =>
{
var prediction = pool.Predict(input);
return Results.Ok(new { predictedPrice = prediction.PredictedPrice });
});
app.Run();
Trainer Comparison
| Task | Trainer | Best For |
|---|---|---|
| Regression | FastTree |
Non-linear relationships, large datasets |
| Regression | Sdca |
Linear relationships, sparse features |
| Regression | LightGbm |
High accuracy, gradient boosting |
| Binary Classification | SdcaLogisticRegression |
Text classification, sparse features |
| Binary Classification | FastTree |
Non-linear decision boundaries |
| Multi-class | SdcaMaximumEntropy |
Many categories, text input |
| Clustering | KMeans |
Customer segmentation, grouping |
| Anomaly Detection | RandomizedPca |
Outlier detection in high-dimensional data |
Best Practices
- Always set
MLContext(seed: 42)(or any fixed seed) during development and evaluation to ensure reproducible results across training runs. - Split data into training/test sets with
TrainTestSplit(80/20 ratio) before fitting the pipeline; never evaluate a model on the same data it was trained on. - Use
NormalizeMinMaxorNormalizeMeanVariancetransforms before training when features have different scales (e.g., square footage vs. number of bedrooms). - Use
PredictionEnginePool<TInput, TOutput>instead ofPredictionEnginein ASP.NET Core becausePredictionEngineis not thread-safe and creates performance bottlenecks under concurrent requests. - Run AutoML experiments with a time limit (
maxExperimentTimeInSeconds) during exploration, then use the winning algorithm directly in production pipelines for faster startup. - Examine feature importance with
model.GetFeatureWeights()or permutation feature importance to identify which input columns drive predictions and remove noise features. - Save trained models with
mlContext.Model.Save(model, schema, "model.zip")and version them alongside your code so you can roll back to previous model versions. - Log evaluation metrics (R-Squared, RMSE, F1 Score, AUC) in CI/CD pipelines and fail builds when metrics regress below established thresholds.
- Use
ColumnNameandLoadColumnattributes explicitly on data classes to decouple CSV column order from property names and prevent silent data misalignment. - Preprocess text with
FeaturizeTextwhich handles tokenization, n-grams, and TF-IDF in one step rather than implementing custom text vectorization.
Weekly Installs
1
Repository
tyler-r-kendric…t-skillsGitHub Stars
3
First Seen
4 days ago
Security Audits
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1