Here I will display the same waterfall plot in SAS as well as in R.
The Data
This is the same data as was presented in my previous post, but let's stop making you click back and forth. This is not real data at all. I just made up these numbers. Creating fake data is what I do in my free time.
data
waterfalldata;
input
patient $ change gender $ @@;
cards;
001 64 Male 002 0 Female
003 -30 Female 004 -42 Female
005 7 Female 006 19 Male
007 4 Female 008 0 Male
009 0 Male 010 -100 Female
011 -19 Female 012 -3 Female
013 14 Female 014 28 Male
015 -13 Male 016 -67 Female
017 -50 Female 018 59 Female
019 27 Female 020 -24 Male
021 16 Female 022 -54 Female
023 35 Male 024 -69 Male
025 -9 Female 026 61 Female
027 -19 Female 028 95 Male
029 3 Female 030 -5 Male
031 107 Male 032 -2 Female
033 65 Male 034 78 Female
035 65 Female 036 -41 Female
037 12 Female 038 15 Male
039 -13 Male 040 -35 Female
041 -21 Male 042 15 Female
043 35 Female 044 -54 Male
045 21 Female 046 10 Male
047 -100 Male 048 10 Female
049 -100 Male 050 93 Female
;
run;
SAS Code and Output
To save on space, I will only present the main SAS code here (including the macro code from a separate file and then the macro call). To see the actual macro code, I invite you to refer to my SAS waterfall plot macro post here.
%inc
'D:\Documents and Settings\dbateman\Desktop\Reusable
Macros\waterfall.sas';
%waterfall(dsin=waterfalldata,
yvar=change, byvar=gender,
title=%str(Best
Change in Sum of Longest Diameter by Gender), ylab=%str(Best
Change in Sum LD),
outpath=%str(D:\Documents
and Settings\dbateman\Desktop\Waterfall), filename=%str(waterfall_example),
barwidth=8);
R Code and Output
Getting the data into R wasn't too tough. As I stated in my earlier related post, it was just a management issue. I simply exported the SAS dataset into an Excel .csv file, opened R, and read in the data.
#
Include Required Libraries
#============================
library(plotrix)
library(Hmisc)
library(plyr)
#
Read in the data
#==================
waterfalldata=read.csv("waterfalldata.csv",header=T)
waterfalldata=waterfalldata[1:50,] # the read.csv() function reads in an extra
line.
waterfalldata=arrange(waterfalldata,
gender, desc(change))
#
Create the RECIST variable
#============================
recist=ifelse(waterfalldata$change>=20,"PD",ifelse(waterfalldata$change<=-30,"CR/PR","SD"))
waterfalldata=cbind(waterfalldata,recist)
#
Create the plot
#=================
color=ifelse(waterfalldata$gender=="Male",4,2)
barplot(height=waterfalldata$change,width=.5,space=0,col=color,ylim=range(-100,100),ylab="Percent
Change in Sum LD",main="Best Change in Sum of Longest Diameter by
Gender")
abline(h=0)
abline(h=20,lty=2)
abline(h=-30,lty=2)
par(cex=.8)
minor.tick(nx=0,ny=5,tick.ratio=.5)
legend("topright",lty=c(1,1,2),lwd=c(5,5,1),c("Male","Female","SD
Boundary"),col=c(4,2,1))
#
Add a table to the plot
#=========================
x=xtabs(~waterfalldata$recist+waterfalldata$gender)
c11=paste(x[rownames(x)=="CR/PR",colnames(x)=="Female"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="CR/PR",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c21=paste(x[rownames(x)=="SD",colnames(x)=="Female"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="SD",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c31=paste(x[rownames(x)=="PD",colnames(x)=="Female"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="PD",colnames(x)=="Female"]/sum(x[,colnames(x)=="Female"]))),")",sep="")
c41=paste(sum(x[rownames(x)!="PD",colnames(x)=="Female"]),"
(",sprintf("%1.1f%%",100*(sum(x[rownames(x)!="PD",colnames(x)=="Female"])/sum(x[,colnames(x)=="Female"]))),")",sep="")
c12=paste(x[rownames(x)=="CR/PR",colnames(x)=="Male"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="CR/PR",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c22=paste(x[rownames(x)=="SD",colnames(x)=="Male"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="SD",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c32=paste(x[rownames(x)=="PD",colnames(x)=="Male"],"
(",sprintf("%1.1f%%",100*(x[rownames(x)=="PD",colnames(x)=="Male"]/sum(x[,colnames(x)=="Male"]))),")",sep="")
c42=paste(sum(x[rownames(x)!="PD",colnames(x)=="Male"]),"
(",sprintf("%1.1f%%",100*(sum(x[rownames(x)!="PD",colnames(x)=="Male"])/sum(x[,colnames(x)=="Male"]))),")",sep="")
testdf=data.frame(Male=c(c11,c21,c31,"
",c41),Female=c(c12,c22,c32," ",c42))
rownames(testdf)=c("CR/PR","SD","PD","
","DCR")
addtable2plot(0,-100,testdf,bty="o",display.rownames=T)
#
Save the plot to a file
#=========================
savePlot(filename="waterfall_example",type="pdf")
par(cex=1)
Discussion
There are a few things that I like better about the R plot over the SAS plot:
- I was able to easily add a data table and RECIST SD boundary lines. These are possible in SAS, but it is much more difficult. It makes the macro much more intense. The table in R is a bit cumbersome, but it works.
- Making changes or additions is quite easy in R code.
- It just looks more professional and attractive (I like the lack of gaps, the square bar tips rather than the round bar tips, the crisper font, etc.).
Now, you can choose which one you like the best
No comments:
Post a Comment