instrument-experiment
Guidelines to aid LLMs use p95 to instrument Python training programs.
p95 is a small Python library that helps users run ML experiments and track their results. It supports a local mode (file-based, zero config) and a remote mode (cloud-backed, requires login).
1. Check cloud authentication first
Before doing anything else, run:
pnf cloud status
Parse the output:
-
Logged in — output contains
Linked to … as …and aDefault team: <team>line.- Extract
<team>(e.g.acme). - Default to remote mode for all runs. Use
<team>/<project-name>as the project. - The SDK will automatically pick up the API key and URL from the credentials file — no env vars needed.
- Extract
-
Not logged in — output is
No credentials found. Run 'pnf cloud login' to authenticate.- Ask the user: do they want to log in to the cloud, or continue in local mode?
- If they want to log in, follow the Login flow below.
- If they prefer local mode, skip to step 2 and use a plain project name (no
/).
Login flow
- Run
pnf cloud loginwithrun_in_background=True— the CLI will open the browser automatically and print the settings URL as a fallback. Note the URL from its output in case the browser didn't open for the user. - Use AskUserQuestion: "Your browser should have opened to generate an API key. If not, open
<URL from step 1>. Once you've generated a key, paste it here." - Verify with
pnf cloud status— it should now show the logged-in user and default team. - Continue with remote mode using the default team, which can also be retrieved with
pnf cloud status
2. Install p95
- Make sure p95 is installed. Add it using
pip install p95oruv add p95, can be checked in the requirements.txt file or pyproject.toml dependencies. pnf(the CLI) is installed automatically alongside p95. If you install it withuv, you run it withuv run pnf, withpipyou run it withpnfdirectly.
3. Instrument with p95
Use the project format that matches the mode:
- Remote mode (logged in):
project="<team>/<project-name>"— e.g.project="acme/resnet-cifar" - Local mode:
project="<project-name>"— e.g.project="resnet-cifar"
With a context manager:
from p95 import Run
with Run(project="acme/resnet-cifar", name="experiment-1", share=True) as run:
run.log_config({"learning_rate": 0.001, "epochs": 10})
for epoch in range(10):
loss = train_one_epoch()
run.log_metrics({"loss": loss}, step=epoch)
# → p95: Share your run at https://p95.run/aB12cD34
Without a context manager:
run = Run(project="acme/resnet-cifar", share=True)
run.log_metrics({"loss": 0.5}, step=1)
run.complete()
# If you want, you can make it fail with run.fail("error message")
4. Seeing the results
Remote mode — runs are visible at https://p.ninetyfive.gg/<team>/<project-name>. If share=True was passed, use the share link printed after the run completes.
Local mode — use the pnf CLI:
pnf ls --project <project-name> --logdir <logdir>to see runs and their IDs.pnf show <run-id> --logdir <logdir>for a run summary.- To inspect raw data under
{logdir}/{project}/{run_name}/:meta.json— run status, timestamps, git and system infoconfig.json— hyperparameters logged vialog_configrun.db— all metrics in a SQLite table namedmetrics(columns:name,step,value,time); query withsqlite3 run.db "SELECT name, step, value FROM metrics ORDER BY name, step"
5. Fetching results from the cloud (remote projects)
When the user has a remote project and wants to inspect runs or sweeps, fetch data directly from the API using WebFetch. The base URL is https://p.ninetyfive.gg/api/v1.
Authentication requires a Bearer token. Use the API key from pnf cloud status (ask the user for it if not already known, or ask them to run pnf cloud login). Otherwise, use the API key from P95_API_KEY.
Useful endpoints:
| What | Request |
|---|---|
| List runs | GET /api/v1/teams/{team}/apps/{app}/runs |
| Get run details | GET /api/v1/runs/{run_id} |
| List metric names | GET /api/v1/runs/{run_id}/metrics |
| Metrics summary (min/max/mean) | GET /api/v1/runs/{run_id}/metrics/summary |
| Latest metric values | GET /api/v1/runs/{run_id}/metrics/latest |
| Full metric time series | GET /api/v1/runs/{run_id}/metrics/{metric_name} |
| List sweeps | GET /api/v1/teams/{team}/apps/{app}/sweeps |
| Get sweep details | GET /api/v1/sweeps/{sweep_id} |
Example workflow to answer "which run had the best val_loss?":
GET /api/v1/teams/{team}/apps/{app}/runs— get run list with IDs- For each run:
GET /api/v1/runs/{run_id}/metrics/summary— find the minimumval_loss - Report the best run ID, its config, and the metric value to the user
6. Show the user the CLI cheatsheet
After instrumenting, always show the user the following so they can explore results themselves:
Using the pnf CLI
If you installed with
uv, prefix commands withuv run(e.g.uv run pnf ls). Withpip, runpnfdirectly.
| Command | What it does |
|---|---|
pnf cloud status |
Show current login status and default team |
pnf cloud login |
Log in to the cloud and save API key |
pnf ls |
List all runs across all projects |
pnf ls --project <name> |
List runs for a specific project |
pnf ls --logdir <path> |
Use a custom log directory (default: ./logs) |
pnf show <run-id> |
Show summary for a run (config + metric stats) |
pnf show <run-id> --logdir <path> |
Same, with a custom log directory |
pnf tui |
Open the interactive TUI to explore all runs and metrics |
pnf serve |
Launch a local web UI to explore runs and metrics in the browser |
Example workflow after a training run:
# Check login status
pnf cloud status
# List runs in your project
pnf ls --project my-project
# Show summary of a specific run (use the short id from ls)
pnf show abc123
# Explore runs and metrics interactively (pick one)
pnf tui # terminal UI
pnf serve # web UI in your browser
7. Hyperparameter Sweeps
Use p95.sweep + p95.agent to search over hyperparameters automatically.
import p95
from p95.sweep import SweepConfig, ParameterSpec
# 1. Create the sweep (returns a sweep_id)
sweep_id = p95.sweep(
project="acme/resnet-cifar", # or plain "resnet-cifar" in local mode
config=SweepConfig(
method="random", # "random" or "grid"
metric="val_loss", # metric to optimize
goal="minimize", # "minimize" or "maximize"
parameters=[
ParameterSpec("lr", "log_uniform", min=1e-5, max=0.1),
ParameterSpec("batch_size", "categorical", values=[16, 32, 64]),
ParameterSpec("epochs", "int", min=5, max=50),
ParameterSpec("dropout", "uniform", min=0.0, max=0.5),
],
max_runs=20,
# Optional: stop poor runs early
early_stopping={"method": "median", "min_steps": 5, "warmup": 3},
),
)
# 2. Define a training function — any Run created inside is auto-linked to the sweep
def train(params):
with p95.Run(project="acme/resnet-cifar") as run:
run.log_config(params)
for epoch in range(int(params["epochs"])):
loss = train_epoch(lr=params["lr"], batch_size=params["batch_size"])
run.log_metrics({"val_loss": loss}, step=epoch)
# Optional: prune poorly performing runs early
if p95.should_prune(run, "val_loss", loss, epoch):
print("Pruning run")
break
# 3. Run the agent — it loops until the sweep is complete
p95.agent(sweep_id, train)
ParameterSpec types
| type | required fields | description |
|---|---|---|
"uniform" |
min, max |
Uniform float sample |
"log_uniform" |
min, max |
Log-uniform float sample (good for learning rates) |
"int" |
min, max |
Uniform integer sample |
"categorical" |
values |
Random choice from a list |
Viewing sweeps
For remote projects, sweeps are visible at:
https://p.ninetyfive.gg/<team>/<project-name>/sweeps
For local projects, use the pnf CLI — sweep runs appear alongside regular runs:
pnf ls --project my-project
pnf tui # or pnf serve for the browser UI
Notes
p95.sweepreturns a sweep ID. For local projects (no/in name), it starts withlocal:.p95.agentruns continuously untilmax_runsis hit or all grid combinations are exhausted.- Pass
count=Ntop95.agentto limit how many runs this agent executes (useful for distributed sweeps). p95.should_prune(run, metric_name, value, step)returnsTruewhen a run is performing below the median of completed runs at that step. Only effective whenearly_stoppingis configured.- A
staticconfig shared across all runs can be passed viaSweepConfig(config={...}).
8. Sharing runs
Always pass share=True to Run — it is the default. After the run finishes, capture the printed share link and surface it to the user.
- Remote mode only.
share=Trueis ignored (with a warning) in local mode — the project must be inteam/appformat with credentials configured. - The share link is public and requires no login to view.
- If the API call fails, a warning is printed but the run itself is unaffected.
- To keep a run private, pass
share=FalsetoRun. Do this when the user mentions the run or its results should not be publicly accessible.
Best practices
- Prefer using the context manager, it will automatically close the run when the code exits.
- Use descriptive and short names for the project and run, this will help you find them later.
- Always check
pnf cloud statusbefore instrumenting — it determines the project format and whether runs go to the cloud.