R/LinkedCharts Tutorial:

A multi-coloured t-SNE plot

In this example, we continue our exploration of the CiteSeq cord blood single-cell dataset by Stoecklin et al. (Nature Methods, 2017). If you haven’t read the first part yet, please go there first.

This is what we are aiming for:

In this app, you can assign a colour channel to each antibody, and so explore the identities of the cells in a colourful manner. If you want to try this out first before reading the details, here is the complete code. It’s less than a page of R.

Loading the data

The CiteSeq method presented by Stoecklin et al. in their paper is a way to simultaneously sequence the transcriptome and the “epitome” of thousands of single cells, where “epitome” means a collection of surface markers (epitopes) of the cells: they conjugated antibodies for 13 different blood cell surface markers with DNA oligomers, which they sequenced alongside the cell’s own transcripts, thus getting counts of sequencing reads from the labelled antibody molecules, which they denote “antibody-derived tags” (ADTs).

We have already explored the transcriptome data in the first part of this tutorial. Now, we will use the epitome data to learn more about the cells’ identities.

We start by loading again the rlc library and the CiteSeq data file that we have prepared in the first part, and which you can download here.

library( rlc )

load( "citeseq_data.rda" )

We also download the epitome data table (ADT table) from GEO.

download.file( "ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz",
   "GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz" )

countMatrixADT <- as.matrix( read.csv( gzfile( "GSE100866_CBMC_8K_13AB_10X-ADT_umi.csv.gz" ), row.names=1 ) )

We first subset the matrix to only those rows that describe the cells that we had retained when filtering the RNA count matrix:

countMatrixADT <- countMatrixADT[ , colnames(countMatrix) ]

We now have UMI count values for antibody-derived tags (ADTs) from antibodies against 13 epitopes, for ~8000 cells. Here are the first 4 cells:

countMatrixADT[ , 1:4 ]
##        TACAGTGTCTCGGACG GTTTCTACATCATCCC ATCATGGAGTAGGCCA GTACGTATCCCATTTA
## CD3                  36               34              370               49
## CD4                  28               21             1706               38
## CD8                  34               41             9559               52
## CD45RA              228              228           102390              300
## CD56                 26               18             1518               48
## CD16                 44               38             4617               51
## CD10                 41               48              363               48
## CD11c                25               16             8204               41
## CD14                 44               57             4402               89
## CD19                 24               27              988               29
## CD34                 24              230              422               57
## CCR5                 28               24              879               42
## CCR7                 24               27              239               36

A multi-channel flourescent t-SNE plot

Last time, we calculated from the RNA-Seq data a t-SNE plot. If we want to know which of these cells are T cells, we cold colour the cells by the strength of expression of the T-cell marker CD3:

unitrange <- function( x )
  ( x - min(x) ) / ( max(x) - min(x) )

plot( tsne$Y, 
   col = rgb( unitrange( log( countMatrixADT[ "CD3", ] ) ), 0, 0 ),
   asp = 1, pch=20, cex=.5 )

plot of chunk unnamed-chunk-5

Here, we have defined a function unitrange, which simply takes a vector of numbers (here, the logarithmized expression of the CD3 epitope) and scales them such that the smallest number becomes 0 and the largest 1. This is what the rgb function wants: three numbers, all between 0 and 1, which is uses to mix a colour with the specified amount of red (R), green (G) and blue (B). (If you are unfamiliar with the RGB color model, look it up, e.g., on Wikipedia.)

To see, for example, both T cells and B cells on one glance, we could continue to use the red channel for CD3 and the green channel for CD19, a B-cell marker:

plot( tsne$Y, 
   col = rgb( 
     unitrange( log( countMatrixADT[ "CD3", ] ) ),    # red channel:   T cell marker
     unitrange( log( countMatrixADT[ "CD19", ] ) ),   # green channel: B cell marker
     0 ),                                             # blue channel:  not used
   asp = 1, pch=20, cex=.5 )

plot of chunk unnamed-chunk-6

Is the big cluster on top, with the brownish red-green mix now a B or a T cell, or something else. It would be nice to be able to quickly explore all the markers by attaching to them red, green or blue “virtual flourophores” on the click of a button. This is what the app on the top of the page allows us to do.

Interactive input

Those, who already have had a look at one of the previous tutorials, may guess, how to make such a scatter plot in linked-charts with lc_scatter function. Here, we’ll just show the result.

red   <- "off"
green <- "off"
blue  <- "off"

lc_scatter( 
   dat(
      x = tsne$Y[,1],
      y = tsne$Y[,2],
      colour = rgb( 
          if( red == "off" )   0 else unitrange(log( countMatrixADT[red, ] )), 
          if( green == "off" ) 0 else unitrange(log( countMatrixADT[green, ] )), 
          if( blue == "off" )  0 else unitrange(log( countMatrixADT[blue, ] )) ),
      size = 1 ))
## Chart 'Chart5' added.
## Layer 'Layer1' is added to chart 'Chart5'.

We created three variables red, green and blue to store the markers for each colour channel. Initially they all are off and all the points on the scatter plot are black. Now, one can manually change them and run updateCharts() like this.

red <- "CD3"
green <- "CD19"
updateCharts()

Yet, what would be really great to do the same in an interactive manner, simply by clicking. This can be done with the help of lc_input function that allows to add HTML [input](https://www.w3schools.com/tags/tag_input.asp) tags on the page and handle their responses. linked-charts supports five types of input: "text", "radio", "range", "checkbox" and "button". For this example we would need three sets of radio buttons - one for each colour channel. Let’s put it side by side next to the scatter plot.

openPage(FALSE, layout = "table1x4")

red   <- "off"
green <- "off"
blue  <- "off"

lc_scatter( 
  dat(
    x = tsne$Y[,1],
    y = tsne$Y[,2],
    colour = rgb( 
      if( red=="off" )   0 else unitrange(log( countMatrixADT[red,] )), 
      if( green=="off" ) 0 else unitrange(log( countMatrixADT[green,] )), 
      if( blue=="off" )  0 else unitrange(log( countMatrixADT[blue,] )) ),
    size = 1 ),
  place = "A1" )
## Chart 'A1' added.
## Layer 'Layer1' is added to chart 'A1'.
buttonRows <- c("off", rownames(countMatrixADT))

lc_input(type = "radio", 
         labels = buttonRows, 
         title = "Red", 
         value = 1, 
         width = 100, 
         on_click = function(value) {
            red <<- buttonRows[value]
            updateCharts("A1")
         }, 
         place = "A2")
## Chart 'A2' added.
lc_input(type = "radio", 
         labels = buttonRows, 
         title = "Green", 
         value = 1, 
         width = 100, 
         on_click = function(value) {
            green <<- buttonRows[value]
            updateCharts("A1")
         }, 
         place = "A3")
## Chart 'A3' added.
lc_input(type = "radio", 
         labels = buttonRows, 
         title = "Blue", 
         value = 1, 
         width = 100, 
         on_click = function(value) {
            blue <<- buttonRows[value]
            updateCharts("A1")
         }, 
         place = "A4")
## Chart 'A4' added.

Here, we create 1x4 table and put our scatter plot in the leftmost cell. Three other cells are occupied by the sets of radio buttons. For each of them we set a required property type, which must be one of c("text", "radio", "range", "checkbox", "button"), to "radio". Then we need to specify an array of labels to be printed next to our radio buttons. In this example, we use all available markers and off value, which is stored in the buttonRows variable.

buttonRows
##  [1] "off"    "CD3"    "CD4"    "CD8"    "CD45RA" "CD56"   "CD16"  
##  [8] "CD10"   "CD11c"  "CD14"   "CD19"   "CD34"   "CCR5"   "CCR7"

lc_input uses labels to define the number of required elements, which is 1 by default. So even if you don’t want to any text next to your radiobuttons or checkboxes, you are still requeired to pass an array of empty strings to this property.

value sets current value for an input element. For a set of radio buttons, it’s a number of the checked button. Here, we use this property to set the initial value, but in other applications you can also use this property to control the state of your inputs from the R session.

The on_click property works the same way as it does in all other charts in the rlc library. Whenever user clicks on one of the buttons, this function is called. As an arguments it gets current value of the input block (for radio buttons it’s a number of the checked button). So we assign corresponding value to the variable of the corresponding colour channel and update the A1 chart (the scatter plot). If you know how to use HTML input tags, you may know that generally they use the onchange attribute instead of onclick, which means that the event is fired not when user clicks on the element, but only when its value is changed. Internally rlc does the same, however to make it less confusing, we decided to keep the same property name. You can also use on_change instead of on_click if you find this more intuitive. For lc_input the two are complete synonims.

Finally, we give a title to each button set and change its width to 100 pixels to make them less spread (the default width is 200 pixels).