Last month, at the R/Pharma conference that took place on the Harvard Campus, I presented bioWARP, a large Shiny application containing more than 500,000 lines of code. Although several other Shiny apps were presented at the conference, I noticed that none of them came close to being as big as bioWARP. And I asked myself, why?
I concluded that most people just don’t need to built them that big! So now, I would like to explain why we needed such a large app and how we went about building it.
To give you an idea of the scale I am talking about an automotive methaphor might be useful. A typical Shiny app I see in my daily work has about 50 or even less interaction items. Let’s imagine this as a car. With less than 50 interactions think of a small car like a mini cooper. Compared to these applications, with more than 500 interactions, bioWARP is a truck, maybe even a “monster” truck. So why do my customers want to drive trucks when everyone else is driving cars?
<script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script> <script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>Building software often starts with checking the user requirements. So when we started the development of our statistical web application, we did that, too. Asking a lot of people inside our department we noticed, that the list of requirements was huge:
Main user requirements
- Pretty Design which works universally
- Interactive elements
- Mathematical correctness of all results
Main application features
- Session logging
- Standardized PDF reports of all results
- Ability to restore sessions
- Harmonize it with other software applications
- Everything has to be tested
- Help pages
More requirements came then from all the analysis people perform on daily basis. They wanted to have some tasks integrated into our app:
Mathematical tasks
- Linear regression app
- Descriptive statistics app
- Homogeneity test app
- T-Test app
- Bootstrap simulation app
- Sensitivity/Specificity app
- Linearity app
- Clustering app
- BoxPlotting app
Additionally it was required to write the whole application in R as all our mathematical packages are written in R. So we decided for doing it all with shiny because it already covers 2 of the 3 main user requirements, being pretty and being interactive.
Inside our department we were running some large scale desktop applications already. When it came to testing we always noticed, that testing takes forever. If one single software gathers data, calculates statistics, provides plot outputs and renders PDF reports, this is a huge truck and you can just test it by driving it a thousand miles and see if it still works. The idea we came up with was building our truck out of Lego bricks. Each Lego brick can be tested on its own. If a Lego wheel runs, the truck will run. The wheel holder part is universal and if we change the size of the wheels, we can still run the truck, in case each wheel was tested. What this is called, is modularity. There exist different solutions in R and shiny which can be combined to make things modular:
- Shiny Modules
- Object orientation
- R-packages
- clever namespacing
As Shiny modules were not existing when we started, we chose option 2 and 3.
As an example, I’ll compare two simple Shiny apps representing two cars
here. One is written using object orientation, one as a simple Shiny
application. The image below shall illustrate, that the renderPlot
function in a standard shiny app includes a plot, in this case using the
hist
function. So whenever you add a new plot, its function has to be
called inside.
In the object oriented app the renderPlot
function calls the
shinyElement
method of a generic plot object we created and called
AnyPlot
. The fist advantage is that plot can easily be exchanged.
(Please look into the code if you wonder if this really is so.) To
describe that advantage, you can imagine a normal car, built of car
parts. Our car is really a a Lego car, using even smaller
standardized parts (Lego bricks), to construct each part of the car.
So instead of the grille made of one piece of steal, we constructed it
of many little grey Lego bricks. Changing the grille for an update of
the car does not need to reconstruct the whole front. Just use green
bricks instead of grey bricks e.g. They should have the same shape.
By going into the code of the two applications, you see there is a straight forward disadvantage of object orientation. There is much more code. We have to define what a Lego brick is and what features it shall have.
library(methods)
library(rlang)
setGeneric("plotElement",where = parent.frame(),def = function(object){standardGeneric("plotElement")})
setGeneric("shinyElement",where = parent.frame(),def = function(object){standardGeneric("shinyElement")})
setClass("AnyPlot", representation(plot_element = "call"))
setClass("HistPlot", representation(color="character",obs="numeric"), contains = "AnyPlot")
AnyPlot <- function(plot_element=expr(plot(1,1))){
new("AnyPlot",
plot_element = plot_element
)
}
HistPlot <- function(color="darkgrey",obs=100){
new("HistPlot",
plot_element = expr(hist(rnorm(!!obs), col = !!color, border = 'white')),
color = color,
obs = obs
)
}
#' Method to plot a Plot element
setMethod("plotElement",signature = "AnyPlot",definition = function(object){
eval(object@plot_element)
})
#' Method to render a Plot Element
setMethod("shinyElement",signature = "AnyPlot",definition = function(object){
renderPlot(plotElement(object))
})
server <- function(input, output, session) {
# Create a reactive to create the Report object
report_obj <- reactive(HistPlot(obs=input$obs))
# Check for change of the slider to change the plots
observeEvent(input$obs,{
output$renderedPDF <- renderText("")
output$renderPlot <- shinyElement( report_obj() )
} )
}
# Simple shiny App containing the standard histogram + PDF render and Download button
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
sliderInput(
"obs",
"Number of observations:", min = 10, max = 500, value = 100)
),
mainPanel(
plotOutput("renderPlot")
)
)
)
shinyApp(ui = ui, server = server)
server <- function(input, output) {
# Output Gray Histogram
output$distPlot <- renderPlot({
hist(rnorm(input$obs), col = 'darkgray', border = 'white')
})
}
# Simple shiny App containing the standard histogram + PDF render and Download button
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
sliderInput(
"obs",
"Number of observations:", min = 10, max = 500, value = 100)
),
mainPanel(
plotOutput("distPlot")
)
)
)
shinyApp(ui = ui, server = server)
But an advantage of the object orientation is that you can now output
the plot in a lot of different formats. We solved this by introducing
methods called pdfElement
, logElement
or archiveElement
. To get a
deeper look you can check out some examples stored on
github.
These show differences between object oriented and standard
shiny apps. You can see that duplicated
code is reduced in object oriented apps, additionally the code of the
shiny app itself does not change for
object oriented apps. But the code constructing the objects shown on the
page changes. While for the standard apps the
shiny code itself also changes everytime
an element is updated.
The main advantage of this approach is, that you can keep your shiny app exactly the same whatever it calculates or whatever it reports. Inside our department this meant, whenever somebody wants a different plot inside an app, we do not have to touch our main app again. Whenever somebody wanted to change just the linear regression app, we did not have to touch other apps. The look and feel, the logging, the PDF report, stays exactly the same. Those 3 functionalities shall never be touched in case no update of those were needed.
As you know we did not build a singular app, we had to build many for the different mathematical analysis. So we decided for each app we will construct a separate R-package. This means we had to define one Class that defines what an app will look like in a core-package. This can be seen as the Lego theme. So our app whould be Lego city, where you have trucks and cars. Other apps may be more advanced and range inside Lego Technic.
Now each contributer to our shiny app build a package that contains a child of our core class. We called this class Module. So we got a lot of Module-packages. This is not a shiny-module, but it’s modular. Our app now allows bringing together a lot of those modules and making it bigger and bigger and bigger. It get’s more HP and I wouldn’t call it a car anymore. Yeah, we have a truck! Made of Lego bricks!
<script async src="//embedr.flickr.com/assets/client-code.js" charset="utf-8"></script>The modularization and packaging now enables fast testing. Why? Each package can be tested using basic testthat functionalities. So first we tested our core application package, that allows adding building blocks. Afterwards we tested each single package on its own. Finally, the whole application is tested. Our truck is ready to roll. Upon updates, we do not have to test the whole truck again. If we want to have larger tires, we just update the tire package, but not the core-package or any other packages.
The truck is made of bricks, actually the same bricks we used to build the car. Just many more of them. Now the hard part is putting them all together and not losing track.
We are dealing with many the different Modules that we were writing.
Each Module comes in one package. The main issue we had was that we
wanted all apps to be deeply tested. During development of course not
all apps were tested right away, so we had to give them a tag (tested
yes/no). Additionally some apps required help pages, others don’t. Some
apps came with example data sets, some don’t. Some apps had a nice title
in them already, for some it shall be easy to configure. For each
Module we’ll also have to source js
and css
files, which we
allowed to be additionally added for each app. The folder where to
source them shall be chosen by the app author. We wanted to provide as
much flexibility as possible while keeping our standards for Lego bricks
(Look&Feel, logging, plotting and reporting). A simple example for such
an app can be found on
github.
We came up with the idea of config XML
files. So the XML file contains
all the information needed to tell what needs to be set for each
Module. An example XML is given below which you can see as the LEGO
manual. These small configurations allow managing the apps. We also
build an XML
that allows the apps to use features of what we call
core-package. This XML
file is rather difficult to set up. But
imagine it tells which Plot shall be logged, which input shall be used
and which plots shall go into the PDF report. It allows fast development
while sticking to standards.
from LegoBrickinstructions.com
<module id="module1" type="default" datasets="yes" tested="no">
<package> modulepackage1 </package>
<class> modulepackage1_Module </class>
<title> Great BoxPlot Module </title>
<short> GBM </short>
<path source="modulepackage1"> . </path>
<help>
<level0>help/index.html</level0>
<level1>
<item name="details">help/about.html</item>
</level1>
</help>
<data>
<ds name="Two Groups" file="datasets/two_groups.csv">
</data>
</module>
Inside the config file you can clearly see that now the title of the app and the location of help pages, example data sets is given. Even the name of the class that describes the Module is given. This allows us to rapidly add modules to our main app environment.
At the end our truck is made of many parts, that all increase its power and strength. As we now have around 16 modules in our real (in production) app and each has between 20 and 50 inputs, the truck has 500 inputs. All which look similar and can be used to produced standardized PDF reports. The truck can even become a monster truck and thanks to the config files will still be easy to manage.
- Please do not start building a car until you know how many parts it will have at the end. Always consider it might become a truck. At first, always define your requirements.
- Use modularization! Use shiny modules or inheritance provided by object orientation ( s4 or s6 ). Both keep you from changing a lot of code on minor changes in requirements.
- Use standardization! Try to have all your inputs and outputs as standardized as possible. If you use simple output bricks it’s easy to output them in your preferred format. Features like logging, PDF reporting or even testing will be way easier with standardized elements. Standardized inputs allow your users to be comfortable with new apps way faster.
- Don’t build real trucks, build Lego trucks.