Friday, June 8, 2012

Waterfall Plots - SAS vs. R

I recently posted a discussion on how to create a waterfall plot in SAS.  I mentioned how I would love to create these in R, but I am a bit limited in my know-how.

Here I will display the same waterfall plot in SAS as well as in R.

The Data

This is the same data as was presented in my previous post, but let's stop making you click back and forth.  This is not real data at all.  I just made up these numbers.  Creating fake data is what I do in my free time.


data waterfalldata;
      input patient $ change gender $ @@;
      cards;
001   64    Male        002   0     Female
003   -30   Female      004   -42   Female
005   7     Female      006   19    Male
007   4     Female      008   0     Male
009   0     Male        010   -100  Female
011   -19   Female      012   -3    Female
013   14    Female      014   28    Male
015   -13   Male        016   -67   Female
017   -50   Female      018   59    Female
019   27    Female      020   -24   Male
021   16    Female      022   -54   Female
023   35    Male        024   -69   Male
025   -9    Female      026   61    Female
027   -19   Female      028   95    Male
029   3     Female      030   -5    Male
031   107   Male        032   -2    Female
033   65    Male        034   78    Female
035   65    Female      036   -41   Female
037   12    Female      038   15    Male
039   -13   Male        040   -35   Female
041   -21   Male        042   15    Female
043   35    Female      044   -54   Male
045   21    Female      046   10    Male
047   -100  Male        048   10    Female
049   -100  Male        050   93    Female
;
run;


SAS Code and Output

To save on space, I will only present the main SAS code here (including the macro code from a separate file and then the macro call).  To see the actual macro code, I invite you to refer to my SAS waterfall plot macro post here.

%inc 'D:\Documents and Settings\dbateman\Desktop\Reusable Macros\waterfall.sas';
%waterfall(dsin=waterfalldata, yvar=change, byvar=gender,
            title=%str(Best Change in Sum of Longest Diameter by Gender), ylab=%str(Best Change in Sum LD),
            outpath=%str(D:\Documents and Settings\dbateman\Desktop\Waterfall), filename=%str(waterfall_example), barwidth=8);


R Code and Output

Getting the data into R wasn't too tough.  As I stated in my earlier related post, it was just a management issue.  I simply exported the SAS dataset into an Excel .csv file, opened R, and read in the data.

# Include Required Libraries
#============================
library(plotrix)
library(Hmisc)
library(plyr)

# Read in the data
#==================
waterfalldata=read.csv("waterfalldata.csv",header=T)
waterfalldata=waterfalldata[1:50,]  # the read.csv() function reads in an extra line.
waterfalldata=arrange(waterfalldata, gender, desc(change))


# Create the RECIST variable
#============================
recist=ifelse(waterfalldata$change>=20,"PD",ifelse(waterfalldata$change<=-30,"CR/PR","SD"))
waterfalldata=cbind(waterfalldata,recist)


# Create the plot
#=================
color=ifelse(waterfalldata$gender=="Male",4,2)
barplot(height=waterfalldata$change,width=.5,space=0,col=color,ylim=range(-100,100),ylab="Percent Change in Sum LD",main="Best Change in Sum of Longest Diameter by Gender")
abline(h=0)
abline(h=20,lty=2)
abline(h=-30,lty=2)
par(cex=.8)
minor.tick(nx=0,ny=5,tick.ratio=.5)
     
legend("topright",lty=c(1,1,2),lwd=c(5,5,1),c("Male","Female","SD Boundary"),col=c(4,2,1))


# Add a table to the plot
#=========================
x=xtabs(~waterfalldata$recist+waterfalldata$gender)
c11=paste(x[rownames(x)=="CR/PR",colnames(x)=="Female"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="CR/PR",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c21=paste(x[rownames(x)=="SD",colnames(x)=="Female"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="SD",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c31=paste(x[rownames(x)=="PD",colnames(x)=="Female"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="PD",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c41=paste(sum(x[rownames(x)!="PD",colnames(x)=="Female"])," (",sprintf("%1.1f%%",100*(sum(x[rownames(x)!="PD",colnames(x)=="Female"])/sum(x[,colnames(x)=="Female"]))),")",sep="")
c12=paste(x[rownames(x)=="CR/PR",colnames(x)=="Male"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="CR/PR",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c22=paste(x[rownames(x)=="SD",colnames(x)=="Male"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="SD",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c32=paste(x[rownames(x)=="PD",colnames(x)=="Male"]," (",sprintf("%1.1f%%",100*(x[rownames(x)=="PD",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c42=paste(sum(x[rownames(x)!="PD",colnames(x)=="Male"])," (",sprintf("%1.1f%%",100*(sum(x[rownames(x)!="PD",colnames(x)=="Male"])/sum(x[,colnames(x)=="Male"]))),")",sep="")

testdf=data.frame(Male=c(c11,c21,c31," ",c41),Female=c(c12,c22,c32," ",c42))
rownames(testdf)=c("CR/PR","SD","PD"," ","DCR")
addtable2plot(0,-100,testdf,bty="o",display.rownames=T)


# Save the plot to a file
#=========================
savePlot(filename="waterfall_example",type="pdf")
par(cex=1)



Discussion

There are a few things that I like better about the R plot over the SAS plot:
  1. I was able to easily add a data table and RECIST SD boundary lines.  These are possible in SAS, but it is much more difficult.  It makes the macro much more intense.  The table in R is a bit cumbersome, but it works.
  2. Making changes or additions is quite easy in R code. 
  3. It just looks more professional and attractive (I like the lack of gaps, the square bar tips rather than the round bar tips, the crisper font, etc.).

Now, you can choose which one you like the best

No comments: