unity-sentis
unity-sentis
Use this skill when a trained model already exists and the job is to make it run reliably inside Unity. The core problems are model import, backend selection, warmup, inference cadence, and platform-safe deployment.
When to use this skill
- Importing an ONNX model into Unity and running inference in play mode
- Migrating from older Sentis wording to the current Unity AI Inference package path
- Choosing between
BackendType.GPUCompute,BackendType.CPU, and fallback behavior - Running NPC or gameplay inference without a Python runtime
- Profiling model cost and reducing frame spikes
- Connecting ML-Agents training output to runtime deployment
Instructions
Step 1: Install the package and note the rename
Current Sentis docs state that Sentis is now called Inference Engine and the latest guidance is moving to com.unity.ai.inference@latest. Existing Sentis 2.1.x projects still use the Unity.Sentis API surface.
Packages/manifest.json:
{
"dependencies": {
"com.unity.sentis": "2.1.3"
}
}
If your project is already migrating to the newer package path, keep the skill focused on runtime concepts: import model, create worker, schedule execution, and manage backend/platform constraints.
Step 2: Load the model and create a worker
using UnityEngine;
using Unity.Sentis;
public class NpcInferenceController : MonoBehaviour
{
[SerializeField] private ModelAsset modelAsset;
private Model runtimeModel;
private Worker worker;
private void Start()
{
runtimeModel = ModelLoader.Load(modelAsset);
var backend = SystemInfo.supportsComputeShaders
? BackendType.GPUCompute
: BackendType.CPU;
worker = new Worker(runtimeModel, backend);
}
private void OnDestroy()
{
worker?.Dispose();
}
}
Use GPUPixel only when compute shaders are unavailable and you still need a GPU path. Prefer GPUCompute or CPU.
Step 3: Schedule inference and warm it up
Official Sentis docs note that the first scheduled run in the Unity Editor can be slow because code, shaders, and internal buffers are compiled and allocated on first use. Warm the model once during startup if user-facing latency matters.
private bool warmedUp;
public void RunInference(float[] observations)
{
using var inputTensor = new Tensor<float>(
new TensorShape(1, observations.Length),
observations
);
worker.Schedule(inputTensor);
if (!warmedUp)
{
var _ = worker.PeekOutput() as Tensor<float>;
warmedUp = true;
}
}
Step 4: Read outputs without stalling more than necessary
public float[] ReadActions()
{
var output = worker.PeekOutput() as Tensor<float>;
return output != null ? output.DownloadToArray() : System.Array.Empty<float>();
}
Practical rules:
- keep tensor shapes stable where possible
- avoid GPU-to-CPU downloads every frame if you can consume results less often
- batch multiple agents when they share the same observation schema
Step 5: Control cadence and backend by platform
private int inferenceInterval = 4;
private void Update()
{
if (Time.frameCount % inferenceInterval != 0)
{
return;
}
RunInference(BuildObservationVector());
ApplyActions(ReadActions());
}
Guidance:
- PC / console: start with
GPUCompute - mobile / WebGL / weaker devices: validate
CPUearly - profile before locking the cadence
Step 6: Integrate ML-Agents outputs cleanly
If the model came from ML-Agents, keep the observation order and shape identical between training and runtime deployment. Treat the exported ONNX file and the runtime observation builder as a contract pair.
Examples
Example 1: NPC movement policy
public void TickNpcBrain()
{
var observations = new[]
{
transform.localPosition.x / 10f,
transform.localPosition.z / 10f,
target.localPosition.x / 10f,
target.localPosition.z / 10f,
};
RunInference(observations);
var actions = ReadActions();
Move(actions);
}
Example 2: Adaptive difficulty model
float[] playerStats =
{
normalizedAccuracy,
normalizedDeaths,
normalizedClearSpeed,
normalizedDamageTaken,
};
RunInference(playerStats);
float multiplier = Mathf.Clamp(ReadActions()[0], 0.5f, 2.0f);
ApplyDifficulty(multiplier);
Example 3: Pair Sentis with unity-mcp
omu "Deploy trained ONNX NPC model"
-> unity-mcp: manage_packages / create_script / validate_script
-> unity-sentis: worker setup, warmup, cadence selection
-> unity-mcp: run_tests / read_console / unity_docs
Best practices
- Dispose every
WorkerinOnDestroy. - Warm up the first schedule during startup or loading screens.
- Keep runtime observation ordering identical to training-time ordering.
- Prefer
GPUComputeandCPU; treatGPUPixelas a fallback path. - Profile both execution and readback cost before shipping.
- When deployment is unstable, verify supported ONNX operators before debugging gameplay code.
References
- https://docs.unity3d.com/Packages/com.unity.sentis@2.1/manual/create-an-engine.html
- https://docs.unity3d.com/Packages/com.unity.sentis@2.1/manual/run-a-model.html
- See
references/onnx-export.mdfor export and compatibility guidance