Chapter 1 Prerequisites
1.1 Terminology
In the following, we use a general termonology to describe the biological data of interest. We analyze quantitative expression values (e.g., RT-qPCR Log2Ex, RNA-Seq log2 counts, usf.) of features (e.g., genes, transcripts, spike-in controls, usf.), which were obtained from individual samples (e.g., single cells).
1.2 Load Library
Before ready to use, the CellTrails libraries must be loaded into the R environment:
library(CellTrails)
1.3 Third-party Software: yEd
We strongly recommend to download and install the graph visualization software yEd (http://www.yworks.com/products/yed). It provides great capabilities to perform planar embedding, visualization, and analysis of a trajectory graph produced by CellTrails.
1.4 Input: SingleCellExperiment
CellTrails organizes its data in an object of Bioconductor’s SingleCellExperiment (Lun and Risso 2017) class. It provides all attributes required for smooth and user-friendly data processing and analysis of single cell data and enables interoperability between packages. Please, refer to the SingleCellExperiment vignette for details.
1.4.1 Shape of Expression Data
CellTrails expects the expression data to be normalized and log-transformed; it is not required that features were filtered at this point. The expression data is expected to be available from the logcounts
assay entry. If this entry is empty (check for its existence with function assays
), the function logcounts<-
can be used to store the log-normalized data in a SingleCellExperiment object.
If your expression data is not stored in an object of class SingleCellExperiment, we suggest to initiate an object from a numerical matrix composed of the log-normalized expression values; features should be listed in rows, and samples in columns.
1.4.2 Spike-in Controls
There is no need to remove spike-in controls from your SingleCellExperiment object. CellTrails automatically ignores spike-in controls for its analysis, if they were properly annotated in the object using the function isSpike
.
1.5 Example Datasets
exSim
In this vignette, simulated data (with log-transformed Negative Binomial distributed expression counts) and real expression data are used to illustrate the functionality of the CellTrails package.
The first dataset, exSim
, is composed of expression values of 15,000 features measured in 100 samples; 80 spike-in transcripts were added.
# Create example expression data
# with 15,000 features and 100 samples
set.seed(1101)
emat <- simulate_exprs(n_features=15000, n_samples=100)
# Create SingleCellExperiment object
exSim <- SingleCellExperiment(assays=list(logcounts=emat))
# Annotate ERCC spike-ins
isSpike(exSim, "ERCC") <- 1:80
show(exSim)
## class: SingleCellExperiment
## dim: 15000 100
## metadata(0):
## assays(1): logcounts
## rownames(15000): feature_1 feature_2 ... feature_14999
## feature_15000
## rowData names(0):
## colnames(100): sample_1 sample_2 ... sample_99 sample_100
## colData names(0):
## reducedDimNames(0):
## spikeNames(1): ERCC
exBundle
The second dataset, exBundle
, contains transcript expression profiles of 183 genes expressed during sensory hair cell bundle maturation and function, which were quantified in the chicken utricle sensory epithelium at embryonic day 15 using multiplex RT-qPCR. Experimental metadata was generated during tissue preparation (cell origin) and cell sorting (uptake of FM1-43 dye indicating cell maturity). This data set is the foundation used for the development of CellTrails. If you use this dataset for your research, please cite Ellwanger et al. (2018).
# Load bundle data
exBundle <- readRDS(system.file("exdata", "bundle.rds", package="CellTrails"))
References
Lun, ATL, and D Risso. 2017. SingleCellExperiment: S4 Classes for Single Cell Data.
Ellwanger, DC, M Scheibinger, RA Dumont, PG Barr-Gillespie, and S Heller. 2018. “Transcriptional Dynamics of Hair-Bundle Morphogenesis Revealed with Celltrails.” Cell Reports 23 (10): 2901–14.