NYC
skills/letta-ai/skills/bn-fit-modify

bn-fit-modify

SKILL.md

Bayesian Network Fitting and Modification

Overview

This skill provides guidance for tasks involving Bayesian Network structure recovery, parameter estimation, causal intervention, and sampling. It covers common pitfalls when using libraries like pgmpy for Linear Gaussian Bayesian Networks and other BN types.

Workflow

Phase 1: Structure Learning

When recovering a DAG structure from observational data:

  1. Explore the data first - Understand variable types, distributions, and potential relationships before applying algorithms
  2. Choose appropriate algorithms based on data size:
    • For large datasets, constraint-based methods (PC algorithm) may cause memory issues
    • Score-based methods (HillClimbSearch) can also be memory-intensive
    • Consider correlation-based greedy approaches for very large datasets
  3. Apply domain constraints - If constraints are given (e.g., "variable U has no parents", "exactly N edges"), incorporate them into the search
  4. Handle ambiguous edges - When multiple edges have similar scores, follow any specified ordering rules (e.g., alphabetical) for deterministic results

Phase 2: Parameter Estimation

After structure is determined:

  1. Use library methods for fitting - Always prefer model.fit(data) over manual parameter computation
  2. Verify fitted parameters - Print and inspect CPD parameters after fitting:
    for cpd in model.get_cpds():
        print(cpd)
    
  3. For Linear Gaussian BNs, verify:
    • Intercept values
    • Coefficient values for each parent
    • Variance estimates
  4. Compare against expected values if test cases provide them

Phase 3: Intervention (do-calculus)

When performing causal interventions:

  1. Understand intervention semantics: do(X=x) means:

    • Remove ALL incoming edges to X (X no longer depends on its parents)
    • Fix X to the specified value x
    • Keep all outgoing edges from X (X still influences its children)
  2. Create the interventional DAG correctly:

    # Remove incoming edges to intervention variable
    intervened_dag = original_dag.copy()
    for parent in list(intervened_dag.predecessors(intervention_var)):
        intervened_dag.remove_edge(parent, intervention_var)
    
  3. Re-fit or transfer parameters appropriately:

    • Parameters for non-intervened variables remain the same
    • The intervened variable becomes a constant (delta distribution)

Phase 4: Sampling from Intervened Network

Critical: Use library methods, not custom implementations

  1. Prefer built-in sampling methods:

    # If library supports intervention in sampling
    samples = model.simulate(n_samples, do={intervention_var: value})
    
  2. If manual sampling is necessary, ensure correct handling:

    • Sample in topological order of the DAG
    • For the intervened variable, use an array of the fixed value (not a scalar):
      # Correct
      samples[intervention_var] = np.full(n_samples, intervention_value)
      
      # Incorrect - causes broadcasting issues
      samples[intervention_var] = intervention_value
      
    • For downstream variables, use the fixed intervention value correctly in conditional distributions
  3. Verify array dimensions at each step to catch broadcasting errors early

Verification Strategies

Incremental Testing

Test each phase independently before proceeding:

  1. Structure verification: Compare learned edges against expected structure
  2. Parameter verification: Compare fitted parameters against expected values
  3. Intervention verification: Confirm correct edges removed and added
  4. Sampling verification: Check sample statistics match expected distributions

Statistical Validation

For final samples:

  1. Compute mean and variance of sampled values
  2. Compare against theoretical expectations from the fitted model
  3. Use statistical tests (e.g., chi-square, KS test) to validate distributions
  4. A p-value of 0.0 indicates fundamental errors in sampling logic

Debugging Checklist

When tests fail:

  • Are fitted parameters correct? Print and verify.
  • Is the intervention DAG correct? Visualize or print edges.
  • Are array dimensions consistent throughout sampling?
  • Is the intervention value propagating correctly to downstream variables?
  • Are library methods being used where available?

Common Pitfalls

1. Custom Sampling Instead of Library Methods

Problem: Writing custom sampling functions when library methods exist.

Why it fails: Custom implementations often have subtle bugs in:

  • Array dimension handling
  • Conditional distribution computation
  • Topological ordering

Solution: Always check if the library provides sampling with intervention support before implementing custom code.

2. Scalar vs Array for Fixed Values

Problem: Setting intervention variable to a scalar instead of an array.

# Wrong
samples['Y'] = 0.0

# Correct
samples['Y'] = np.zeros(n_samples)

Why it fails: Causes broadcasting issues when used as parent values in conditional distributions.

3. Incorrect Intervention Semantics

Problem: Misunderstanding what do(X=x) means in causal inference.

Common mistakes:

  • Keeping incoming edges to X
  • Removing outgoing edges from X
  • Not fixing X to the exact value

Solution: Review Pearl's do-calculus - intervention removes causes (parents) but preserves effects (children).

4. Not Verifying Intermediate Results

Problem: Running the full pipeline and only checking final outputs.

Why it fails: Errors compound through the pipeline; early errors produce misleading final results.

Solution: Verify structure, then parameters, then intervention, then sampling - each step independently.

5. Memory Issues with Large Data

Problem: OOM errors with PC or HillClimbSearch algorithms on large datasets.

Solutions:

  • Subsample the data for structure learning
  • Use simpler correlation-based approaches
  • Increase available memory
  • Use algorithms with lower memory footprint

6. Ignoring Numerical Precision

Problem: Floating-point precision issues in parameter estimation and sampling.

Solution:

  • Use appropriate tolerances in comparisons
  • Check for near-zero variances that could cause division issues
  • Validate that covariance matrices are positive definite

Code Organization

  • Maintain a single script and iterate on it rather than creating multiple versions (v1, v2, v3)
  • Use functions to separate structure learning, parameter fitting, intervention, and sampling
  • Add assertions or checks after each major step
  • Clean up intermediate files after successful completion
Weekly Installs
17
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
claude-code12
codex11
gemini-cli11
opencode11
antigravity10
windsurf9