Scikit Survival
Automate and integrate Scikit Survival for advanced survival analysis and modeling
Scikit-survival is a community skill for survival analysis using the scikit-survival Python library, covering time-to-event modeling, Kaplan-Meier estimation, Cox regression, random survival forests, and concordance evaluation for censored data analysis.
What Is This?
Overview
Scikit-survival provides tools for analyzing time-to-event data with censoring using a scikit-learn compatible interface. It covers time-to-event modeling that predicts when events like death, failure, or churn will occur from feature data, Kaplan-Meier estimation that computes non-parametric survival curves from observed event times, Cox regression that fits proportional hazards models relating features to event risk, random survival forests that trains ensemble models for survival prediction without proportionality assumptions, and concordance evaluation that measures model discrimination using the concordance index. The skill helps analysts model censored event data where observations end before the event of interest occurs.
Who Should Use This
This skill serves biostatisticians analyzing clinical trial survival data, data scientists modeling customer churn and retention, and reliability engineers predicting equipment failure times. It is also useful for actuaries estimating policyholder risk and researchers studying time-to-event outcomes in social science studies.
Why Use It?
Problems It Solves
Standard classification and regression cannot handle censored observations where the event has not yet occurred at the time of data collection. Estimating survival probabilities requires specialized methods that account for varying observation periods. Comparing treatment effects on survival needs statistical tests designed for censored time-to-event data. Evaluating prediction models for survival data requires concordance metrics rather than standard accuracy measures, which would otherwise produce misleading performance estimates.
Core Highlights
Survival estimator fits Kaplan-Meier curves from censored event data. Cox modeler relates features to hazard rates with proportional hazards. Forest predictor builds ensemble survival models without parametric assumptions. Concordance scorer evaluates discrimination on censored outcomes.
How to Use It?
Basic Usage
import numpy as np
from sksurv.linear_model\
import CoxPHSurvivalAnalysis
from sksurv.preprocessing\
import OneHotEncoder
from sksurv.metrics import (
concordance_index_censored)
n = 200
rng = np.random.default_rng(
42)
X = rng.standard_normal(
(n, 5))
time = rng.exponential(
10, n)
event = rng.choice(
[True, False], n,
p=[0.7, 0.3])
y = np.array([
(e, t) for e, t in
zip(event, time)],
dtype=[('event', bool),
('time', float)])
cox = CoxPHSurvivalAnalysis()
cox.fit(X, y)
pred = cox.predict(X)
ci = concordance_index_censored(
y['event'], y['time'],
pred)
print(f'C-index: '
f'{ci[0]:.3f}')Real-World Examples
from sksurv.ensemble import (
RandomSurvivalForest)
from sksurv.metrics import (
concordance_index_censored)
from sklearn.model_selection\
import train_test_split
import numpy as np
class SurvivalPipeline:
def __init__(
self,
n_estimators:
int = 100
):
self.model = (
RandomSurvivalForest(
n_estimators=
n_estimators,
random_state=42))
def fit_eval(
self,
X: np.ndarray,
y: np.ndarray
) -> dict:
X_tr, X_te, y_tr,\
y_te = (
train_test_split(
X, y,
test_size=0.2,
random_state=42))
self.model.fit(
X_tr, y_tr)
pred_tr = (
self.model.predict(
X_tr))
pred_te = (
self.model.predict(
X_te))
ci_tr = (
concordance_index_censored(
y_tr['event'],
y_tr['time'],
pred_tr)[0])
ci_te = (
concordance_index_censored(
y_te['event'],
y_te['time'],
pred_te)[0])
return {
'train_ci': ci_tr,
'test_ci': ci_te}
pipe = SurvivalPipeline()
results = pipe.fit_eval(
X, y)
print(f'Train: '
f'{results["train_ci"]:.3f}')
print(f'Test: '
f'{results["test_ci"]:.3f}')Advanced Tips
Use RandomSurvivalForest when the proportional hazards assumption of Cox regression may not hold for the data, for example when hazard ratios change over time. Evaluate models with time-dependent concordance when survival predictions vary across time horizons. Combine scikit-survival with scikit-learn pipelines for integrated preprocessing and survival modeling. When working with high-dimensional data, apply variance thresholding or regularized Cox regression to reduce noise before fitting ensemble models.
When to Use It?
Use Cases
Fit a Cox proportional hazards model to clinical trial data with censored outcomes. Train a random survival forest for customer churn prediction with time-to-event targets. Compare survival models using concordance index on held-out test data.
Related Topics
Scikit-survival, survival analysis, Cox regression, Kaplan-Meier, time-to-event, censored data, and concordance index.
Important Notes
Requirements
Scikit-survival Python package with scikit-learn and numpy dependencies. Structured survival outcome arrays with event indicator and time fields. Feature data without missing values or with prior imputation applied.
Usage Recommendations
Do: use structured numpy arrays with named event and time fields as required by the scikit-survival API. Check the proportional hazards assumption before using Cox regression. Report the concordance index with confidence intervals for model evaluation.
Don't: treat censored observations as non-events since this biases survival estimates. Use standard classification metrics like accuracy for evaluating survival models. Ignore the proportion of censored observations since heavy censoring reduces the reliability of model evaluation.
Limitations
Scikit-survival requires specific structured array formats that differ from standard numpy and pandas data structures, which can require additional data preparation steps. Some advanced survival methods like competing risks are not included in the library. Large datasets with many features may require feature selection before fitting survival models.
More Skills You Might Like
Explore similar skills to enhance your workflow
Abyssale Automation
Automate Abyssale operations through Composio's Abyssale toolkit via
Marp Slide
Automate Markdown-based slide deck creation and presentation workflow integration
Senior Frontend
Senior Frontend development automation and integration for expert-level UI work
Hunter Automation
1. Add the Composio MCP server to your client configuration:
Pylabrobot
Advanced Pylabrobot automation and integration for liquid handling and lab robotics
Forcemanager Automation
Automate Forcemanager tasks via Rube MCP (Composio)