First, we import the library gsDesign
into R.
################################################
### Sequential monitoring of clinical trials ###
################################################
library(gsDesign)
gsDesign
The function gsDesign
can carry out lots of analyses but here we just focus on the calculation of boundaries. We tell \(\verb|R|\) that we want 5 analyses \(k=5\) (including interim and final). By default, the function gsDesign
assumes equally spaced analyses. Also, we specify that we want two sided test (i.e. both lower and upper boundaries) and type-I error \(\alpha=5\)%. Be careful here as the function always used one-sided type-I error. Thus, if you want a 5% alpha, you need to specify alpha=0.025
!
The following are the bounds proposed by Pocock for \(k=5\) equally spaced interim analyses.
# Pocock boundaries
x1 = gsDesign(k = 5,test.type = 2, alpha = 0.025, sfu = "Pocock")
gsBoundSummary(x1)
## Analysis Value Efficacy Futility
## IA 1: 20% Z 2.4132 -2.4132
## N/Fixed design N: 0.24 p (1-sided) 0.0079 0.0079
## delta at bound 1.5155 -1.5155
## P(Cross) if delta=0 0.0079 0.0079
## P(Cross) if delta=1 0.2059 0.0000
## IA 2: 40% Z 2.4132 -2.4132
## N/Fixed design N: 0.48 p (1-sided) 0.0079 0.0079
## delta at bound 1.0716 -1.0716
## P(Cross) if delta=0 0.0138 0.0138
## P(Cross) if delta=1 0.4661 0.0000
## IA 3: 60% Z 2.4132 -2.4132
## N/Fixed design N: 0.72 p (1-sided) 0.0079 0.0079
## delta at bound 0.8749 -0.8749
## P(Cross) if delta=0 0.0183 0.0183
## P(Cross) if delta=1 0.6747 0.0000
## IA 4: 80% Z 2.4132 -2.4132
## N/Fixed design N: 0.97 p (1-sided) 0.0079 0.0079
## delta at bound 0.7577 -0.7577
## P(Cross) if delta=0 0.0219 0.0219
## P(Cross) if delta=1 0.8149 0.0000
## Final Z 2.4132 -2.4132
## N/Fixed design N: 1.21 p (1-sided) 0.0079 0.0079
## delta at bound 0.6777 -0.6777
## P(Cross) if delta=0 0.0250 0.0250
## P(Cross) if delta=1 0.9000 0.0000
The boundaries are constant across the analyses \(Z=\pm2.41\) and the p-value we have to use in a two-sided test is \(2\times 0.0079=0.0158\). You can ignore the remaining output at this stage. We can also use the \(\verb|plot|\) to produce a graph.
par(mar=c(4,4,1,1))
ss1<-plot(x1, main = "")
ss2<-ss1+theme(legend.position = "none")+
theme(axis.text=element_text(size=10))+
theme(axis.title=element_text(size=15,face="bold"))
ss2
The following are the \(Z\) statistics and the associated \(p\) value boundaries
# Z's and p-values
cbind(Z = gsBoundSummary(x1)$Eff[gsBoundSummary(x1)$Val=="Z"],
P = 2*gsBoundSummary(x1)$Eff[gsBoundSummary(x1)$Val=="p (1-sided)"])
## Z P
## [1,] 2.4132 0.0158
## [2,] 2.4132 0.0158
## [3,] 2.4132 0.0158
## [4,] 2.4132 0.0158
## [5,] 2.4132 0.0158
Notice how far the p-value at the last (fifth) analysis is from the nominal 5% alpha level!
Another issue that you need to realize is that the introduction of the Pocock interim review impacted the sample size.
Recall the following line from the output above:
## N/Fixed design N: 1.21 p (1-sided) 0.0079 0.0079
In fact, the expected sample size is inflated by about 21% compared to the fixed design (more on that later).
To specify the O’Brien-Fleming method, just use sfu = "OF"
. Note that according to the O’Brien-Fleming method it’s much more difficult to reject the null hypothesis in the beginning of the study (i.e. you require a higher level of evidence to do so). The advantage is that the p-value is close to the nominal level at the final analysis.
# O'Brien-Fleming boundaries
x2 = gsDesign(k = 5,test.type = 2, alpha = 0.025, sfu = "OF")
gsBoundSummary(x2)
## Analysis Value Efficacy Futility
## IA 1: 20% Z 4.5617 -4.5617
## N/Fixed design N: 0.21 p (1-sided) 0.0000 0.0000
## delta at bound 3.1059 -3.1059
## P(Cross) if delta=0 0.0000 0.0000
## P(Cross) if delta=1 0.0010 0.0000
## IA 2: 40% Z 3.2256 -3.2256
## N/Fixed design N: 0.41 p (1-sided) 0.0006 0.0006
## delta at bound 1.5530 -1.5530
## P(Cross) if delta=0 0.0006 0.0006
## P(Cross) if delta=1 0.1254 0.0000
## IA 3: 60% Z 2.6337 -2.6337
## N/Fixed design N: 0.62 p (1-sided) 0.0042 0.0042
## delta at bound 1.0353 -1.0353
## P(Cross) if delta=0 0.0045 0.0045
## P(Cross) if delta=1 0.4675 0.0000
## IA 4: 80% Z 2.2809 -2.2809
## N/Fixed design N: 0.82 p (1-sided) 0.0113 0.0113
## delta at bound 0.7765 -0.7765
## P(Cross) if delta=0 0.0128 0.0128
## P(Cross) if delta=1 0.7516 0.0000
## Final Z 2.0401 -2.0401
## N/Fixed design N: 1.03 p (1-sided) 0.0207 0.0207
## delta at bound 0.6212 -0.6212
## P(Cross) if delta=0 0.0250 0.0250
## P(Cross) if delta=1 0.9000 0.0000
The following are the \(Z\) statistics and the associated \(p\) value boundaries
# Z's and p-values
cbind(Z = gsBoundSummary(x2)$Eff[gsBoundSummary(x2)$Val=="Z"],
P = 2*gsBoundSummary(x2)$Eff[gsBoundSummary(x2)$Val=="p (1-sided)"])
## Z P
## [1,] 4.5617 0.0000
## [2,] 3.2256 0.0012
## [3,] 2.6337 0.0084
## [4,] 2.2809 0.0226
## [5,] 2.0401 0.0414
Note that these are twice those listed in the previous output. Also note that the p value for the final analysis is not too far from the nominal 5% alpha level despite four interim analyses. We will see why in a minute.
Another issue has to do with the samle inflation caused by the O’Brien-Fleming procedure. Recall the output from above:
## N/Fixed design N: 1.03 p (1-sided) 0.0207 0.0207
We see that the sample size is inflated less than 3% after four interim analysis (!) another advantage of the O’Brien-Fleming procedure.
par(mar=c(4,4,1,1))
ss1<-plot(x2,main = "")
ss2<-ss1+theme(legend.position = "none")+
theme(axis.text=element_text(size=10))+
theme(axis.title=element_text(size=15,face="bold"))
ss2
Because we want to be able to carry out an analysis at any point, including adding ad hoc analyses at any point during the execution of the study, we adopt the approach. This approach literally governs how the alpha (Type-I error) will be spent during the study.
Consider the following situation where we expend the alpha level over \(k=5\) analyses, so that \(\alpha_1=0.01, \cdots, \alpha_5=0.01\).
The situation is handled by the following code:
t<-c(0.2,0.4,0.6,0.8,1.0)
x <- gsDesign(k=5, test.type=2, beta=0.2,
sfu=sfLinear, sfupar=c(t,t))
x
## Symmetric two-sided group sequential design with
## 80 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
##
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.23 2.58 0.0050 0.005
## 2 0.46 2.49 0.0064 0.005
## 3 0.69 2.41 0.0080 0.005
## 4 0.92 2.34 0.0097 0.005
## 5 1.15 2.28 0.0114 0.005
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 4 5 Total E{N}
## 0.0000 0.0050 0.0050 0.0050 0.0050 0.0050 0.025 1.1268
## 2.8016 0.1089 0.1908 0.2036 0.1714 0.1253 0.800 0.7849
##
## Lower boundary (futility or Type II Error)
## Analysis
## Theta 1 2 3 4 5 Total
## 0.0000 0.005 0.005 0.005 0.005 0.005 0.025
## 2.8016 0.000 0.000 0.000 0.000 0.000 0.000
This is shown pictorially as follows:
par(mar=c(4,4,1,1))
ss1<- plot(x, main="")
ss2<-ss1+theme(legend.position = "none")+
theme(axis.text=element_text(size=10))+
theme(axis.title=element_text(size=15,face="bold"))
ss2
Note here that, despite the fact that an equal amount of alpha is spent at each analysis, the bounds are not constant. This is because, while, for the first analysis
\[ P(|Z_1|>c_1)=\alpha_1=0.01 \]
so that \(c_1=z_{1-a_1/2}\), for the second interim analysis, \(c_2\) must satisfy the #conditional probability#
\[ P(|Z_2|>c_2|-c_1<Z_1<c_1)=\alpha_2=0.01 \]
Thus, the boundary points \(c_1\neq c_2\); and so on.
In their 1983 paper,~ introduced the concept of a (alpha) spending function. This is a function that is defined in all points \(t \in [0,1]\) (i.e., in all trial fractions from the beginning to the end of the study.) There are several families of such functions:
These are \[ \alpha(t)=\alpha\ln\left\{1+(e-1)t\right\} \] and \[ \alpha(t)=2-2\Phi(z_{\alpha/4}/\sqrt{t}) \] for the case of the Pocock and O’Brien-Fleming-like functions respectively.
These spending functions have the general form \[ \alpha(t)=\min\left (\alpha t^\rho, \alpha\right ) \]
where \(\rho>0\). Spending functions with \(\rho=0.8\) approximate the Pocock spending function. (Note that this is close to the linear function employed in the example of Section~ above), while \(\rho=3\) approximates the O’Brien-Fleming function . Power spending functions for several choices of \(\rho\) are shown in the following Figure.
# Plot of power family error spending functions
t<-seq(0,1,.05)
p08=0.05*t^0.8 # Pocock
p1=0.05*t # Linear
p3=0.05*t^3.0 # O-F
plot(t,p08, lty=2, type="l", ylab="Alpha spent", xlab="Trial fraction", main="")
lines(t, p1, lty=1, type="l", lwd=2)
lines(t, p3, lty=3, type="l", lwd=2)
legend(0,.05, legend = c(expression(paste(rho, " = ", 0.8, " (Pocock)")),
expression(paste(rho, " = ", 1, " (linear)")),
expression(paste(rho, " = ", 3, " (O-F)"))),
lty=c(2,1,3), lwd=c(2,2,2), title="Power spending functions")
Alpha is spent more quickly the larger the value of \(\gamma\). This is shown in Figure~5.
# Plot of HSD error spending functions
t<-seq(0,1,.05)
g0=0.05*t # Pocock
g4=0.05*(1-exp(4*t))/(1-exp(4)) # O-F
g2=0.05*(1-exp(-2*t))/(1-exp(-2))
plot(t,g2, lty=2, lwd=2, type="l",
ylab="Alpha spent", xlab="Trial fraction", main="")
lines(t,g0, lty=1, lwd=2)
lines(t,g4, lty=3, lwd=2)
legend(0,.05, legend = c(expression(paste(gamma, " = ", 2)),
expression(paste(gamma, " = ", 0, " (Pocock)")),
expression(paste(gamma, " = ", -4, " (O-F)"))),
lty=c(2,1,3), lwd=c(2,2,2), title="HSD spending functions")
Spending functions were a ``game changer" in the effort to monitor a clinical trial. The following simple example underlines this fact.
Suppose that you have planned a study with \(k=3\) total analyses (two interim and one final), to be carried out at the \(\tau_1=20\%\), \(\tau_2=50\%\) and \(\tau_3=100\%\) trial fractions. The analyses are going to be carried out at the \(\alpha=0.025\) (one-sided) with power 90% (i.e., \(\beta=0.1\)) and we will use a power spending functions with \(\rho=1\) (i.e., a linear spending of the Type-1 error). The study design, as it stands at the start is as follows:
# Inserting an additional interim analysis
t<-c(0.2,0.5,1.0)
gsDesign(k=3, test.type=1, sfu=sfLinear, sfupar=c(t,t), timing=t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.218 2.58 0.0050 0.0050
## 2 0.544 2.38 0.0087 0.0075
## 3 1.088 2.14 0.0161 0.0125
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2 0.5 1 0.2 0.5 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 Total E{N}
## 0.0000 0.0050 0.0075 0.0125 0.025 1.0791
## 3.2415 0.1436 0.3772 0.3791 0.900 0.7574
We can see that the alpha level will be spent in the three analyses as
and the final analysis will be carried out at a p value \(p=0.0161\).
Now suppose that, after the second interim analysis at the \(\tau_2=0.5\) trial fraction, we (or, more likely, the DSMB reviewing this study) decide to add one more interim analysis at the point \(\tau^*_3=0.75\) (making now the final analysis at the time point \(\tau^*_4=1.00\)). Spending functions allow us to do this almost effortlessly! Consider the following output:
# adding an interim look at t=0.75
t<-c(0.2,0.5, 0.75, 1.0)
gsDesign(k=4, test.type=1, sfu=sfLinear, sfupar=c(t,t), timing=t)
## One-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Sample
## Size
## Analysis Ratio* Z Nominal p Spend
## 1 0.225 2.58 0.0050 0.0050
## 2 0.562 2.38 0.0087 0.0075
## 3 0.843 2.32 0.0102 0.0063
## 4 1.124 2.24 0.0124 0.0062
## Total 0.0250
##
## ++ alpha spending:
## Piecewise linear spending function with line points = 0.2 0.5 0.75 1 0.2 0.5 0.75 1.
## * Sample size ratio compared to fixed design with no interim
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 4 Total E{N}
## 0.0000 0.0050 0.0075 0.0063 0.0063 0.025 1.1134
## 3.2415 0.1494 0.3870 0.2326 0.1310 0.900 0.7067
There are a couple of remarkable things in the previous output that must be remarked on. First of all, the addition of the third analysis only depended on the alpha level spent cumulatively up to the second analysis (had a revision been necessary, this would be have been an unworkable situation). On the other hand, adding the third interim analysis, and the additional alpha spent prior to the final analysis, required the final analysis to be carried out at an alpha level of 0.062 (or a p value of 0.0124) rather than 0.0125 like in the previous situation (where the final analysis was the third analysis). Thus, expending more alpha level has a downside: It requires a higher threshold of evidence at the final analysis if you do not interrupt the study during any of the interim analyses.
The effect of introducing a number of interim analysis on top of a fixed design invariably results in increasing the required sample size to preserve the power.
Consider the three following cases shown in Figure 6
x<-array(dim=11)
for(i in 1:11) {
x[i]<-gsDesign(k=5, test.type=2, alpha=0.025, sfu=sfHSD, sfupar=-i+1)$en[1]
}
par(mar=c(4,4,1,1))
plot(seq(0,-10,-1), x, type="b", ylab="Sample size inflation",
xlab=expression(paste(gamma, " parameter in the HSD spending function")))
To figure out why there is this significant inflation of the sample size, we can consider the rates of spending (recall Figure 5 above). Long story short, the earlier the alpha is spent, the higher the sample size inflation compared to the fixed design. This is the reason why the Pocock prodedure resulted in such a large sample size inflation compared to the O’Brien-Fleming