lifelines
Lifelines - Survival Analysis
In medicine, we often care about "Time to Event" (death, recovery, relapse). Lifelines handles the complexity of "censored" data (patients who left the study).
When to Use
- Analyzing clinical trial data (time to death, disease progression).
- Comparing survival between treatment groups.
- Identifying risk factors using Cox Proportional Hazards regression.
- Building survival models for prognosis.
- Epidemiology studies (time to infection, recovery).
Core Principles
Censoring
Patients who haven't experienced the event by the end of the study are "censored". Lifelines properly accounts for this.
Hazard Ratios
In Cox regression, a hazard ratio > 1 means increased risk; < 1 means decreased risk.
Survival Curves
Kaplan-Meier estimates the probability of survival over time without assuming a distribution.
Quick Reference
Standard Imports
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
import pandas as pd
Basic Patterns
# 1. Kaplan-Meier (Visualizing survival)
kmf = KaplanMeierFitter()
kmf.fit(durations=df['days'], event_observed=df['died'])
kmf.plot_survival_function()
kmf.median_survival_time_ # Time when 50% have died
# 2. Cox Proportional Hazards (Risk factors)
cph = CoxPHFitter()
cph.fit(df, duration_col='days', event_col='died')
cph.print_summary() # See hazard ratios for age, drug type, etc.
cph.plot_partial_effects_on_outcome(covariates=['age'], values=[30, 50, 70])
Critical Rules
✅ DO
- Use event_observed correctly - 1 = event occurred, 0 = censored.
- Check proportional hazards assumption - Use
cph.check_assumptions()to validate Cox model. - Compare groups with logrank test - Statistical test for survival curve differences.
- Plot confidence intervals - Survival estimates have uncertainty, especially with small samples.
❌ DON'T
- Don't ignore censoring - Treating censored patients as "survived" biases results.
- Don't use regular regression - Time-to-event data requires specialized methods.
- Don't assume proportional hazards - If violated, use stratified Cox or parametric models.
Advanced Patterns
Comparing Multiple Groups
from lifelines.statistics import multivariate_logrank_test
# Compare survival across treatment groups
results = multivariate_logrank_test(df['days'], df['group'], df['died'])
print(results.p_value)
Parametric Models
from lifelines import WeibullFitter, ExponentialFitter
# When you need to extrapolate beyond observed data
wf = WeibullFitter()
wf.fit(df['days'], df['died'])
wf.plot()
Lifelines transforms complex survival data into actionable medical insights, enabling evidence-based decisions in clinical research and practice.
More from tondevrel/scientific-agent-skills
xgboost-lightgbm
Industry-standard gradient boosting libraries for tabular data and structured datasets. XGBoost and LightGBM excel at classification and regression tasks on tables, CSVs, and databases. Use when working with tabular machine learning, gradient boosting trees, Kaggle competitions, feature importance analysis, hyperparameter tuning, or when you need state-of-the-art performance on structured data.
195opencv
Open Source Computer Vision Library (OpenCV) for real-time image processing, video analysis, object detection, face recognition, and camera calibration. Use when working with images, videos, cameras, edge detection, contours, feature detection, image transformations, object tracking, optical flow, or any computer vision task.
143ortools
Google Optimization Tools. An open-source software suite for optimization, specialized in vehicle routing, flows, integer and linear programming, and constraint programming. Features the world-class CP-SAT solver. Use for vehicle routing problems (VRP), scheduling, bin packing, knapsack problems, linear programming (LP), integer programming (MIP), network flows, constraint programming, combinatorial optimization, resource allocation, shift scheduling, job-shop scheduling, and discrete optimization problems.
75matplotlib
The foundational library for creating static, animated, and interactive visualizations in Python. Highly customizable and the industry standard for publication-quality figures. Use for 2D plotting, scientific data visualization, heatmaps, contours, vector fields, multi-panel figures, LaTeX-formatted plots, custom visualization tools, and plotting from NumPy arrays or Pandas DataFrames.
73plotly
A high-level interactive graphing library for Python. Ideal for web-based visualizations, 3D plots, and complex interactive dashboards. Built on plotly.js, it allows users to zoom, pan, and hover over data points in a browser-based environment. Use for interactive charts, web applications, Jupyter notebooks, 3D data visualization, geographic maps, financial charts, animations, time-series analysis, and building production-ready dashboards with Dash.
51scipy
Comprehensive guide for SciPy - the fundamental library for scientific and technical computing in Python. Use for integration, optimization, interpolation, linear algebra, signal processing, statistics, ODEs, Fourier transforms, and advanced scientific algorithms. Built on NumPy and essential for research and engineering.
51