Kaplan-Meier Survival Analysis

Kaplan-Meier survival analysis is a relatively simple way to visualize a scenario where people die off during a medical experiment. The idea is best explained by example.

Suppose a study starts with 12 people and runs for 10 months. Seven of the people die at times 2.3, 5.1, 6.0, 6.5, 7.8, 9.2, 9.8. Three of the people drop out (are “censored”) at times 3.5, 8.5, 8.8. Two of the people are still alive at the end of the study.

Kaplan-Meier analysis gives you the probability that a person lives greater than some time t. First you construct inclusive-exclusive intervals of time, starting at 0, where the exclusive end-points are times when a person died. The dropout times are ignored when constructing intervals:

[0, 2.3)
[2.3, 5.1)
[5.1, 6.0)
[6.0, 6.5)
[6.5, 7.8)
[7.8, 9.2)
[9.2, 9.8)
[9.8, 10.0)

Next, you list how many people died and how many people were censored in each interval:

              died  censor
==========================
[0, 2.3)       0      0 
[2.3, 5.1)     1      1
[5.1, 6.0)     1      0
[6.0, 6.5)     1      0
[6.5, 7.8)     1      0
[7.8, 9.2)     1      2
[9.2, 9.8)     1      0
[9.8, 10.0)    1      0

Next, you compute how many people, n, were in the study in each interval, being very careful about the interval end-points. The first two entries will be the number of people who started the study:

              died  censor  n
=============================
[0, 2.3)       0      0    12
[2.3, 5.1)     1      1    12 
[5.1, 6.0)     1      0    10
[6.0, 6.5)     1      0     9
[6.5, 7.8)     1      0     8
[7.8, 9.2)     1      2     7
[9.2, 9.8)     1      0     4
[9.8, 10.0)    1      0     3
                            2

Next, you compute the 1 – (d/n) proportion for each interval:

              died  censor  n  1-(d/n)
======================================
[0, 2.3)       0      0    12  1.0000
[2.3, 5.1)     1      1    12  0.9167
[5.1, 6.0)     1      0    10  0.9000
[6.0, 6.5)     1      0     9  0.8889
[6.5, 7.8)     1      0     8  0.8750
[7.8, 9.2)     1      2     7  0.8571
[9.2, 9.8)     1      0     4  0.7500
[9.8, 10.0)    1      0     3  0.6667
                            2

The last step is to compute the cumulative product of the proportion terms for each interval:

              died  censor  n  1-(d/n)   S(t)
==============================================
[0, 2.3)       0      0    12  1.0000   1.0000
[2.3, 5.1)     1      1    12  0.9167   0.9167
[5.1, 6.0)     1      0    10  0.9000   0.8250
[6.0, 6.5)     1      0     9  0.8889   0.7333
[6.5, 7.8)     1      0     8  0.8750   0.6417
[7.8, 9.2)     1      2     7  0.8571   0.5500
[9.2, 9.8)     1      0     4  0.7500   0.4125
[9.8, 10.0)    1      0     3  0.6667   0.2750
                            2

This is the tricky part. For example, the S(6.5) = 0.7333 entry is 1.0000 * 0.9167 * .9000 * 0.8889 = 0.7333. This calculation is based on Bayes’ rule and is the heart of Kaplan-Meier.

The S(t) values are the probability that a person survives for greater than the right end-point of the associated interval. For example, the P(survive > 2.3) = 1.0000 (because everyone survived longer than 2.3) and P(survive > 7.8) = 0.6417.

Kaplan-Meier analyses are usually graphed as a stepwise function.

There are several ways to use the results of a Kaplan-Meier analysis to do a so-called hazard analysis, to give a point estimate for survival, such as the probability that a person survives to exactly time t.

There’s no real moral to the story. Classical statistics is often very, very simple compared to modern neural machine learning techniques. But classical techniques can still be very useful when used carefully — correlation is not causation.

Note: Post updated on 01/16/2019 to correct values in the 1-(d/n) column.

The average IQ of Black Americans is about 85 and is about 100 for white Americans, but this is correlation, not causation. Race is a mathematical predictor of lower IQ but race does not cause lower IQ.

1 Response to Kaplan-Meier Survival Analysis

Joel Biola says:

October 1, 2018 at 3:52 pm

Hello dear Sir James, first of all I want to say that I appreciate your work. I am a student in computer science and I would like to exchange with you on the areas in which you evolve. Looking forward to hearing from you soon.
here is my email: jkayembe2015@outlook.fr

Loading...