Clinical Programming: SAS macros

Showing posts with label SAS macros. Show all posts

Wednesday, September 26, 2012

Clopper-Pearson Confidence Interval

Confidence intervals are a common statistical output. In clinical trials, it is not uncommon to report rates such as the Overall Response Rate (ORR) or the Disease Control Rate (DCR). Such rates are accompanied by confidence intervals. There are many types of confidence intervals, but the interval that summarizes these rates is a binomial proportion confidence interval.

If you have taken an elementary probability course, you have been exposed to the binomial distribution. Remember that the binomial distribution finds the probability of x success occurring in n trials. In other words, if we flip a coin 10 times (n = 10), what is the probability that we would observe 6 heads (x = 6)? My mother gave birth to 7 children, 4 of which were boys. We could apply the binomial distribution to discover the probability that my mom could have 4 boys (x = 4) out of 7 children (n = 7).

ORR and DCR can be related to the binomial distribution because either a patient experiences an ORR (or a DCR) or she doesn't. Success or failure. Heads or tails. Boy or girl. Recall that ORR and DCR are derivations from RECIST standards that show if a patient's tumor is increasing (progressing), decreasing (responding), or stable (no getting better and not getting worse). Here is a general table to help define ORR and DCR:

*Term*	*Abbreviation*	*Definition*
Complete Response	CR	A tumor completely disappears
Partial Response	PR	A tumor decreases in size by at least 30% from baseline but does not fully disappear
Stable Disease	SD	A tumor does not decrease enough to be a PR or increase enough to be a PD
Progressive Disease	PD	A tumor increases by 20% or more from baseline

Overall Response Rate	ORR	Number of patients experiencing a CR or PR out of all patients of interest
Disease Control Rate	DCR	Number of patients experiencing a CR, PR, or SD out of all patients of interest

Now that we have cleared up the background information, let's look at how to get the confidence intervals.

There are many forms of the binomial proportion confidence interval, but the most common used among statisticians is the Clopper-Pearson confidence interval. The formula for the confidence interval is as follows:

Where:

x is the number of successes
n is the number of trials
p is the proportion of successes (x/n)
α is the level of significance
F() is the specified percentile of the F distribution

Now that the hard part is out of the way, we get to do the fun part—SAS programming!

The Macro

This macro truly only requires 2 parameters. The other 5 are for more specification and output preferences. The 2 required parameters are any two of x, n, or p (i.e., x and n, x and p, or n and p). If only one of these parameters is specified, an error message is displayed, and the macro stops execution. x, n, and p have the same meanings as above (number of successes, number of trials, and proportion of successes).

alpha is defaulted to 0.05 for a 95% confidence interval, but it can be specified to 0.01 or 0.10 (for a 99% or 90% confidence interval, respectively) or any other level of desired significance.

sided defaults to 2, meaning a two-sided confidence interval. Replace this with sided=1 for a one-sided confidence interval.

print defaults to Y (or yes), meaning that the user wants the results to print to the output screen in SAS. If printed output is not needed, this can be changed to N (or no).

Finally, dsout allows the user to specify an output dataset. If the results are not printed, it would be a good idea to specify an output dataset; otherwise, no results will be retained. This is also handy if the results are needed for use in another program.

%macro cp_ci(x=,n=,p=,alpha=0.05,sided=2,print=Y,dsout=);

%if (%sysfunc(upcase(&print.))=Y | %sysfunc(upcase(&print.))=YES) & &dsout.= %then %do;

%put WARNING: No print option or output dataset has been specified. No output will be displayed or saved.;

%end;

%let missing=0;

%if &x.= %then %let missing=%eval(&missing.+1);

%if &n.= %then %let missing=%eval(&missing.+1);

%if &p.= %then %let missing=%eval(&missing.+1);

/*** Print ERROR message and quit execution if insufficient number of entries ***/

%if &missing.>1 %then %do;

%put ERROR: The CP_CI macro requires at least 2 of the 3 parameters - x, n, p;

%if &x.= %then %put WARNING: x is empty.;

%if &n.= %then %put WARNING: n is empty.;

%if &p.= %then %put WARNING: p is empty.;

%return;

%end;

/*** Calculate Confidence Limits ***/

data cp;

/* calculate missing parameter */

%if &x.= %then %do;

n=&n.;

p=&p.;

x=round(n*p);

%end;

%else %if &n.= %then %do;

x=&x.;

p=&p.;

n=round(x/p);

%end;

%else %if &p.= %then %do;

x=&x.;

n=&n.;

p=x/n;

%end;

%else %do; * if all 3 parameters are given, run a check to overwrite p based on x and n;

x=&x.;

n=&n.;

p=x/n;

%end;

/* determine alpha based on 1-sided or 2-sided test */

%if &sided.=1 %then %do;

a=1-(&alpha.);

%end;

%else %if &sided.=2 %then %do;

a=1-(&alpha./2);

%end;

/* calculate the lower limit */

v11=2*(n-x+1);

v12=2*x;

fscore1=finv(a,v11,v12);

coef1=(n-x+1)/x;

cp_lcl=1/(1+coef1*fscore1);

/* calculate the upper limit */

v21=2*(x+1);

v22=2*(n-x);

fscore2=finv(a,v21,v22);

coef2=(x+1)/(n-x);

cp_ucl=(coef2*fscore2)/(1+coef2*fscore2);

/* combine lower and upper limits into a single string and add variable labels */

label x='Successes'

n='Trials'

p='Proportion'

cp_lcl='Clopper-Pearson Lower Limit'

cp_ucl='Clopper-Pearson Upper Limit';

keep x n p cp_lcl cp_ucl;

run;

%let ci=%sysevalf(100*(1-&alpha.)); * for use in the title;

/*** Set up title information ***/

%if &sided.=1 %then %do;

title "1-sided &ci.% Clopper-Pearson Confidence Interval";

%end;

%else %if &sided.=2 %then %do;

title "2-sided &ci.% Clopper-Pearson Confidence Interval";

%end;

/*** Print to Output screen if requested ***/

%if %sysfunc(upcase(&print.))=Y | %sysfunc(upcase(&print.))=YES %then %do;

proc print data=cp label noobs;

var x n p cp_lcl cp_ucl;

format p cp_lcl cp_ucl 6.4;

run;

%end;

/*** Save to output dataset if requested ***/

%if &dsout.^= %then %do;

data &dsout.;

set cp;

run;

%end;

/*** Clean up ***/

%if &dsout.= | %sysfunc(upcase(&dsout.))^=CP %then %do;

proc datasets lib=work;

delete cp;

run;

quit;

%end;

title;

%mend cp_ci;

An Example

An investigational drug is given to 45 patients. The following RECIST results are collected:

	% (n)
CR	0% (0)
PR	4% (2)
SD	36% (16)
PD	60% (27)
ORR (CR+PR)	4% (2)
DCR (CR+PR+SD)	40% (18)

You now know the ORR and DCR (4% and 40%, respectively), but your manager is interested in the 95% confidence intervals for each result. You should be able to see this as a binomial problem (a success/failure situation). A patient is either counted in the ORR or not. A patient is either counted in the DCR or not.

Simply running your new macro, you can now code:

%cp_ci(x=2,n=45);

%cp_ci(x=18,n=45);

and the following results will magically appear in your SAS output:

If you need the output saved to a dataset and do not need the results printed to the screen, the following code will get the desired results:

%cp_ci(x=2,n=45,dsout=orr_cp);

%cp_ci(x=18,n=45,dsout=dcr_cp);

where orr_cp is a SAS dataset containing the Clopper-Pearson confidence limits for the ORR and dcr_cp is a SAS dataset containing the Clopper-Pearson confidence limits for the DCR, both datasets saved in the WORK library.

Pretty simple, huh?

Friday, June 8, 2012

Waterfall Plots

Clinical data reporting often uses the waterfall plot to display how subjects are changing related to each other or how much they are changing. A waterfall plot is basically a bar graph where the bars are ordered by decreasing height; thus, it is intended to look somewhat like a waterfall gradually falling to the right.

I have written a macro that will create a waterfall plot for the user with just a few input variables (and of course, some additional options if so desired).

Before I get started, I must say that I am not a huge fan of SAS waterfall plots. If you do not care why I use SAS over R, then please skip to The Macro Description section. I have heard that with the new SAS 9.3 that there are some SG procedures that can produce these plots quite easily and attractively. I do not have access to SAS 9.3 (thanks to the kind programmers at SAS Institute who require paychecks). I still use SAS 9.1.3. I have always been more fond of my waterfall plots in R (which you can compare here).

The only reason I have settled with SAS plots is because it became too much of a management issue for me. First, let me explain my SAS to R process (There is a SAS to R to SAS method that I have yet to figure out). Since SAS is great for data management and not so wonderful when it comes to graphics while R is fabulous with graphics while not so user friendly when it comes to data manipulation (this may just be a personal opinion), I would manipulate my data in SAS, export it to a .csv Excel file, read it into R, perform minor manipulations if needed, and output a PDF of the graph.

Now my issues...

First, my plots are usually not wanted in PDF because it makes it difficult to copy into PowerPoint presentations for management. Second, my overseers have this loving habit of asking me to produce a plot. They tell me how wonderful it looks, but they would like to see what happens if I change just one criteria. Well, they are not programmers, so they do not realize that sometimes that one change requires quite a bit more programming--not necessarily getting the result, but getting the output to adjust.

In short, to skip all that hassle, I have just dropped my R waterfall plots (sorry my wonderful R graphics, maybe we can get together some other time) and moved on to SAS. R is kind of like that girl that you really wanted to date in high school, but she was way too classy for you. So, you went out with the next best thing because she like you back.

The Macro Description

Okay, back to the macro! This macro is called %waterfall (yes, a very creative name--this is why programmers get paid the big bucks). %waterfall requires 4 input parameters with 5 more to enhance presentation. There is some internal stuff that can be changed around if you want to even more personalize the output for your liking. Take time to adjust the code in this macro to make it fit your company profile or your personal programming preferences!

Parameter	Required?	Default Value	Description
dsin	Yes		The input dataset used containing the data with at least yvar, byvar, and any variables required for the whr filtering
whr	No	1=1 (captures all observations in the dsin dataset)	Filters out any unwanted observations in the dsin dataset
yvar	Yes		The name of the variable containing the values that will be plotted on the y-axis
byvar	No		A grouping variable if you want multiple waterfalls side by side (e.g., suppose you want to compare males to females)
title	No		Plot title
ylab	No		Label for the y-axis
outpath	Yes		The path for the output file
filename	Yes		The name of the output file
barwidth	No	3	The width of the waterfall bars

The Macro Code

This is a very straightforward macro for those familiar with SAS programming. I have tried to leave sufficient comments throughout the code, but feel free to contact me with questions (via e-mail or by leaving a comment).

%macro waterfall(dsin=, whr=%str(), yvar=, byvar=, title=%str(), ylab=%str(), outfile=%str(), filename=%str(), barwidth=3);

/* Display error message if WATERFALL dataset is attempting to be overwritten */

%if %sysfunc(upcase(&dsin.))=WATERFALL %then %do;

%put ERROR: The dataset WATERFALL is used in the waterfall macro. Please rename your input dataset and rerun the macro;

%return;

%end;

/* Get variable type and format of BYVAR */

proc sql noprint;

select type, format into :vartyp, :varfmt from sashelp.vcolumn

where upcase(memname)=upcase("&dsin.") & upcase(name)=upcase("&byvar.");

quit;

/* Subset DSIN into WATERFALL & sort */

data waterfall;

set &dsin.;

where &whr.;

run;

proc sort data=waterfall;

by &byvar. descending &yvar.;

run;

data waterfall;

set waterfall;

n=_n_;

run;

/* Set up BYVAR distinct values and number of distinct values */

proc sql ;

select count(distinct &byvar.) into :nby from waterfall;

%let nby=%sysfunc(compress(&nby.));

%if &nby.=1 %then %do;

select distinct &byvar. into :by1 from waterfall;

%end;

%if &nby.>1 %then %do;

select distinct &byvar. into :by1-:by&nby. from waterfall;

%end;

quit;

/* Create waterfall plot */

%let col1=blue; %let col2=red; %let col3=yellow; %let col4=green; %let col5=purple;

%let col6=orange; %let col7=pink; %let col8=brown; %let col9=gray; %let col10=black;

options formchar="|____|+|___+=|_/\<>*" pageno=1 nonumber nodate orientation=landscape center;

ods listing close;

ods rtf file="&outfile.\&filename..rtf" style=journal;

goptions reset=all ftext=swissl ftitle=swissl colors=(black) hsize=9 in vsize=6.5 in;

axis1 label=(angle=90 "&ylab.") order=(-100 to 150 by 50) minor=(n=4);

axis2 label=(' ') value=none major=none minor=none;

legend1 across=1 mode=protect position=(top left inside) cborder=black label=none value=(j=c %do i=1 %to &nby.; "&&by&i." %end;);

title1 h=2 "&title.";

proc gplot data=waterfall;

plot &yvar.*n=&byvar. / vaxis=axis1 haxis=axis2 legend=legend1;

where &yvar.^=.;

%do i=1 %to &nby.;

symbol&i. interpol=Needle value=none width=&barwidth. color=&&col&i.;

%end;

run;

quit;

ods rtf close;

ods listing;

title; footnote;

proc datasets lib=work;

delete waterfall;

run;

quit;

%mend waterfall;

Example

First, please note from the first few lines of the macro that you cannot name your input dataset WATERFALL. I guess you could, but you will get an error message, and your macro will not run. This is set up so that you do not accidentally overwrite any desired data.

With that said, let's create a dataset with a more creative name like WATERFALLDATA.

data waterfalldata;

input patient $ change gender $ @@;

cards;

001 64 Male 002 0 Female

003 -30 Female 004 -42 Female

005 7 Female 006 19 Male

007 4 Female 008 0 Male

009 0 Male 010 -100 Female

011 -19 Female 012 -3 Female

013 14 Female 014 28 Male

015 -13 Male 016 -67 Female

017 -50 Female 018 59 Female

019 27 Female 020 -24 Male

021 16 Female 022 -54 Female

023 35 Male 024 -69 Male

025 -9 Female 026 61 Female

027 -19 Female 028 95 Male

029 3 Female 030 -5 Male

031 107 Male 032 -2 Female

033 65 Male 034 78 Female

035 65 Female 036 -41 Female

037 12 Female 038 15 Male

039 -13 Male 040 -35 Female

041 -21 Male 042 15 Female

043 35 Female 044 -54 Male

045 21 Female 046 10 Male

047 -100 Male 048 10 Female

;

run;

If you have followed any of my other posts, you will know that I love to write reusable macros, save them in a separate file, and call them using the %inc statement. It makes the current program less crowded

%inc 'D:\Documents and Settings\dbateman\Desktop\Reusable Macros\waterfall.sas';

%waterfall(dsin=waterfalldata, yvar=change, byvar=gender,

title=%str(Best Change in Sum of Longest Diameter by Gender), ylab=%str(Best Change in Sum LD),

outpath=%str(D:\Documents and Settings\dbateman\Desktop\Waterfall), filename=%str(waterfall_example), barwidth=8);

And with that short amount of code, you can create this beautiful thing: