metal-gpu
Metal GPU Code Skill
Write production-quality Metal code with correct patterns, optimal performance, and clear explanations.
When to Read References
For detailed API topology, Metal 4 specifics, and Apple Silicon optimization patterns, read:
/mnt/skills/user/metal-gpu/references/metal-api-guide.md
Core Principles
- Always start with the device:
MTLCreateSystemDefaultDevice()— every Metal workflow begins here - Command pattern: Device → Command Queue → Command Buffer → Command Encoder → Commit
- Shaders are MSL (Metal Shading Language): C++14-based, with Metal-specific types and attributes
- Resource management matters: Use appropriate storage modes, avoid unnecessary copies
- Triple buffering for render loops to keep CPU and GPU in parallel
Quick Reference: Metal Command Pipeline
MTLDevice
└─ makeCommandQueue() → MTLCommandQueue
└─ makeCommandBuffer() → MTLCommandBuffer
├─ makeRenderCommandEncoder(descriptor:) → MTLRenderCommandEncoder
├─ makeComputeCommandEncoder() → MTLComputeCommandEncoder
└─ makeBlitCommandEncoder() → MTLBlitCommandEncoder
Writing Shaders (MSL)
Use Metal Shading Language. Always include:
#include <metal_stdlib>andusing namespace metal;- Correct attribute qualifiers:
[[vertex_id]],[[position]],[[stage_in]],[[buffer(n)]],[[texture(n)]] - Proper address space qualifiers:
device,constant,threadgroup,thread
Vertex Shader Pattern
#include <metal_stdlib>
using namespace metal;
struct VertexIn {
float3 position [[attribute(0)]];
float3 normal [[attribute(1)]];
float2 texCoord [[attribute(2)]];
};
struct VertexOut {
float4 position [[position]];
float3 normal;
float2 texCoord;
};
vertex VertexOut vertex_main(VertexIn in [[stage_in]],
constant float4x4 &mvp [[buffer(1)]]) {
VertexOut out;
out.position = mvp * float4(in.position, 1.0);
out.normal = in.normal;
out.texCoord = in.texCoord;
return out;
}
Fragment Shader Pattern
fragment float4 fragment_main(VertexOut in [[stage_in]],
texture2d<float> albedo [[texture(0)]],
sampler texSampler [[sampler(0)]]) {
float4 color = albedo.sample(texSampler, in.texCoord);
return color;
}
Compute Kernel Pattern
kernel void compute_main(device float *input [[buffer(0)]],
device float *output [[buffer(1)]],
uint id [[thread_position_in_grid]]) {
output[id] = input[id] * 2.0;
}
Swift-Side Setup Patterns
Render Pipeline Setup
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!
// Load shaders
let library = device.makeDefaultLibrary()!
let vertexFunction = library.makeFunction(name: "vertex_main")
let fragmentFunction = library.makeFunction(name: "fragment_main")
// Pipeline descriptor
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
// Vertex descriptor
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3 // position
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride = MemoryLayout<SIMD3<Float>>.stride
pipelineDescriptor.vertexDescriptor = vertexDescriptor
let pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineDescriptor)
Compute Pipeline Setup
let computeFunction = library.makeFunction(name: "compute_main")!
let computePipeline = try! device.makeComputePipelineState(function: computeFunction)
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(computePipeline)
encoder.setBuffer(inputBuffer, offset: 0, index: 0)
encoder.setBuffer(outputBuffer, offset: 0, index: 1)
let gridSize = MTLSize(width: elementCount, height: 1, depth: 1)
let threadGroupSize = MTLSize(
width: min(computePipeline.maxTotalThreadsPerThreadgroup, elementCount),
height: 1, depth: 1
)
encoder.dispatchThreads(gridSize, threadsPerThreadgroup: threadGroupSize)
encoder.endEncoding()
commandBuffer.commit()
MetalKit View Rendering
import MetalKit
class Renderer: NSObject, MTKViewDelegate {
let device: MTLDevice
let commandQueue: MTLCommandQueue
let pipelineState: MTLRenderPipelineState
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable,
let descriptor = view.currentRenderPassDescriptor else { return }
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: descriptor)!
encoder.setRenderPipelineState(pipelineState)
// Set buffers, draw primitives...
encoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
encoder.endEncoding()
commandBuffer.present(drawable)
commandBuffer.commit()
}
}
Performance Best Practices
- Storage modes: Use
.sharedon Apple Silicon (unified memory),.privatefor GPU-only data,.managedon Intel Macs - Triple buffering: Rotate 3 buffers with a semaphore to avoid CPU/GPU stalls
- Avoid per-frame allocations: Reuse buffers and command encoders
- Use
dispatchThreadsoverdispatchThreadgroupswhen possible (Apple Silicon) - Prefer tile-based deferred rendering patterns on Apple GPUs — use imageblocks and tile shaders
- Compile pipelines ahead of time: Pipeline creation is expensive, do it at load time
- Use Metal GPU frame capture in Xcode to profile and debug
Common Mistakes to Avoid
- Forgetting
encoder.endEncoding()before committing - Mismatched buffer indices between Swift and MSL
- Using wrong pixel format for render targets
- Not handling
nilfrom optional Metal API calls - Blocking the main thread waiting for GPU completion — use
addCompletedHandlerinstead - Forgetting to set the vertex descriptor when using
[[stage_in]]
Metal 4 Notes
Metal 4 introduces a modernized core API. Key changes:
- New compilation API for finer shader compilation control
- Updated command encoding patterns
- See
references/metal-api-guide.mdfor the full Metal 4 API topology
Frameworks Ecosystem
| Framework | Purpose |
|---|---|
| Metal | Direct GPU access, shaders, pipelines |
| MetalKit | View management, texture loading, model I/O |
| MetalFX | Upscaling (temporal/spatial) for performance |
| Metal Performance Shaders | Optimized compute & image processing kernels |
| Compositor Services | Stereoscopic rendering for visionOS |
| RealityKit | High-level 3D rendering (uses Metal underneath) |
More from ios-agent/iosagent.dev
swiftui
SwiftUI framework for building user interfaces
90widgetkit
Build iOS/macOS/watchOS/visionOS widgets, Live Activities, watch complications, and controls using Apple's WidgetKit framework. Use when creating widget extensions, timeline providers, configurable widgets, Lock Screen widgets, Smart Stack widgets, Live Activities with ActivityKit, interactive widgets with buttons/toggles, or watch complications. Covers all widget families (systemSmall/Medium/Large/ExtraLarge, accessoryCircular/Rectangular/Inline/Corner) and rendering modes.
43carplay
CarPlay framework for iOS in-car applications — audio, communication, navigation, parking, EV charging, quick food ordering, fueling, driving task, public safety, and voice-based conversational apps. Includes widgets, live activities, CarPlay Ultra, and instrument cluster support.
42apple-corelocation
>
39mapkit
Help developers integrate Apple MapKit into iOS/macOS apps. Use this skill when users ask to add a map to their app, display maps, show user location on a map, add markers/pins/annotations, implement map clustering, get directions/routing between locations, search for places/points of interest, implement MapKit features, work with MKMapView, SwiftUI Map, MKAnnotation, MKOverlay, MKDirections, MKLocalSearch, or any MapKit-related development task.
38apple-foundation-models
Use this skill when working with Apple's Foundation Models framework for on-device AI and LLM capabilities in iOS/macOS apps
36