1 Step-by-Step Method Details

1.1 Step 1: Installing PhosR

Full installation of PhosR includes downloading the PhosR package from GitHub or BioConductor. To use the latest developmental version of PhosR, install from GitHub.

Install PhosR by running the following code:

if(!require(devtools)){ 
 install.packages("devtools") # If not already installed 
}

#devtools::install_github("PYangLab/PhosR", 
#                        build_opts = c("--no-resave-data", "--no-manual"), 
#                        build_vignettes=TRUE, 
#                        dependencies =  TRUE)
library(PhosR)

CRITICAL: To install all dependencies, you will need to update to the latest version of R.

Whilst this STAR protocol provides an in-depth step-by-step tutorial for running PhosR with tips and suggestions, you can browse through the streamlined version of the vignette with the associated code to reproduce the findings from our original publication.

#utils::browseVignettes("PhosR")

Note that all the necessary data needed to reproduce the code can be downloaded from here.

1.2 Step 2: Processing data from a typical MaxQuant output

The very first step of the analysis of phosphoproteomics using the PhosR package is to construct a signal matrix from the output of an instrument. For demonstration purposes, we will use a subset of tab separated (txt) file of PXD001792 hepatocyte data from MaxQuant. Briefly, this dataset contains the phosphoproteomic quantifications of two mouse hepatocyte cell lines, FL83B and Hep 1-6 cells, that were treated with either PBS (mock) or insulin. Each condition includes six biological replicates (Humphrey et al., 2015).

phospho_hepatocyte_raw <- read.delim("Data/PXD001792_raw_hepatocyte.txt", 
                                     header = TRUE)
# delete reverse matches and potential contaminants
del <- which(phospho_hepatocyte_raw[,"Reverse"]=="+" | phospho_hepatocyte_raw[,"Potential.contaminant"]=="+")
phospho_hepatocyte_clean <- phospho_hepatocyte_raw[-del,]

# Subset the raw data to select columns with "Intensity" values.
PXD001792_raw_hepatocyte <- phospho_hepatocyte_clean[,grep("Intensity", colnames(phospho_hepatocyte_clean))]

1.3 Step 3: Creating the PhosphoExperiment Object

To increase the usability of PhosR functions, we implement a “PhosphoExperiment” (ppe) object based on the “SummarizedExperiment” class. To create the PhosphoExperiment object, you will be required to provide a quantification matrix where columns refer to samples and rows refer to phosphosites. Additional annotation labels for phosphosites should be provided alongside the matrix, including the phosphosite residue and position, “sequence window” that captures the amino acids flanking the phosphorylation sites, and the official gene symbol of the host protein in capital letters. Here, we will show the basic steps for generating a PhosphoExperiment object using the above quantification matrix.

Create PhosphoExperiment Object

ppe <- PhosR::PhosphoExperiment(assays = list(Quantification = as.matrix(PXD001792_raw_hepatocyte)))
## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : GeneSymbol is not specified. This may affect subsequent analysis steps.
## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Site is not specified. This may affect subsequent analysis steps.
## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Sequence is not specified. This may affect subsequent analysis steps.
## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Residue is not specified. This may affect subsequent analysis steps.

Add site annotations to PhosphoExperiment object

# Extracting feature information
GeneSymbol <- toupper(sapply(strsplit(as.character(phospho_hepatocyte_clean[,"Gene.names"]), ";"),
                             function(x){x[1]}))
Residue <- as.character(phospho_hepatocyte_clean[,"Amino.acid"])
Site <- as.numeric(phospho_hepatocyte_clean[,"Position"]) 
Sequence <- sapply(strsplit(as.character(phospho_hepatocyte_clean[,"Sequence.window"]), ";"), function(x){x[1]})

### add these annotations to respective ppe slots 
ppe@GeneSymbol <- GeneSymbol
ppe@Residue <- Residue
ppe@Site <- Site
ppe@Sequence <- Sequence

Alternatively, we can create PhosphoExperiment object as following

ppe <- PhosphoExperiment(assays = list(Quantification = as.matrix(PXD001792_raw_hepatocyte)), 
                         Site = Site, 
                         GeneSymbol = GeneSymbol, 
                         Residue = Residue, Sequence = Sequence)

Lastly add colData information

sample_name = strsplit(gsub("^Intensity.", "", colnames(ppe)), "_")

df = S4Vectors::DataFrame(
    cellline = sapply(sample_name, "[[", 1),
    condition = sapply(sample_name, "[[", 2),
    replicate = sapply(sample_name, "[[", 3)
)
rownames(df) = colnames(ppe)
SummarizedExperiment::colData(ppe) = df

Have a quick glance of the object

ppe
## class: PhosphoExperiment 
## dim: 49003 24 
## metadata(0):
## assays(1): Quantification
## rownames(49003): 1 2 ... 49612 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate
dim(ppe)
## [1] 49003    24

1.4 Step 4: Data Pre-processing and Differential Phosphosite Identification

The presence of missing values in quantitative phosphoproteomics reduces the completeness of data. Whilst imputation has been widely applied to handle missing values, it remains a major challenge when analyzing phosphoproteomic data and has significant impact on downstream analysis such as normalization. PhosR provides users with greater flexibility for imputation with imputation functions such as ‘scImpute’ and ‘tImpute’. Here, we will go through each function step by step to demonstrate their use in imputing phosphoproteomic data.

Log transformation

logmat = log2(SummarizedExperiment::assay(ppe, "Quantification"))
logmat[is.infinite(logmat)] = NA
SummarizedExperiment::assay(ppe, "Quantification") = logmat

Filtering using function selectGrps

grps <- paste0(SummarizedExperiment::colData(ppe)$cellline, 
               "_", 
               SummarizedExperiment::colData(ppe)$condition)
ppe <- selectGrps(ppe, grps, 0.5, n=1)
dim(ppe)
## [1] 17493    24

Note: The ‘Quantification’ assay in PhosphoExperiment object is now filtered of phosphosites with high number of missing values. Please ensure you have saved the original matrix under a different name if you would like to keep the unfiltered matrix for future analysis.

1.4.1 Imputation

PhosR enables site- and condition-specific imputation. Here, for each phosphosite in each condition, we impute its missing values in that condition (if any) using site- and condition-specific imputation if the quantification rate within that condition is equal to or greater than a desired percentage (such as 50% below).

set.seed(123)
ppe <- scImpute(ppe, 0.5, grps)
ppe
## class: PhosphoExperiment 
## dim: 17493 24 
## metadata(0):
## assays(2): Quantification imputed
## rownames(17493): 1 2 ... 49611 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate

Lastly, we can impute the remaining sites using tail-based imputation /

ppe <- tImpute(ppe, assay = "imputed")

At this stage, the imputed quantification matrix is stored as an assay called ‘imputed’ in the PhosphoExperiment object.

ppe <- medianScaling(ppe, scale = FALSE, assay = "imputed")
ppe
## class: PhosphoExperiment 
## dim: 17493 24 
## metadata(0):
## assays(3): Quantification imputed scaled
## rownames(17493): 1 2 ... 49611 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate

The scaled quantification matrix can be found as the ‘scaled’ matrix in the PhosphoExperiment object.

Next, we will call differentially phosphorylated sites. We use limma package for calling for differentially phosphorylated sites.

library(limma)
# construct design matrix by group
design <- model.matrix(~ grps - 1)
# fit linear model for each phosphosite
fit <- lmFit(ppe@assays@data$scaled, design)

contrast.matrix <- makeContrasts(grpsFL83B_Ins-grpsFL83B_Control, 
                                 grpsHepa1.6_Ins-grpsHepa1.6_Control, 
                                 levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
#Counting and visualise differentially phosphorylated sites for FL83B cells in control and insulin-simulated condition after imputation.
## FL83B
FL83B.a.DE <- topTable(fit2, 
                       coef="grpsFL83B_Ins - grpsFL83B_Control", 
                       number = Inf)

plot(FL83B.a.DE[,"logFC"], -log10(FL83B.a.DE[,"P.Value"]))
sel <- which(FL83B.a.DE[,"adj.P.Val"] < 0.05 & FL83B.a.DE[,"logFC"] > 0)
length(sel)
## [1] 1261
points(FL83B.a.DE[sel,"logFC"], -log10(FL83B.a.DE[sel,"P.Value"]), pch=16, col="red")
sel <- which(FL83B.a.DE[,"adj.P.Val"] < 0.05 & FL83B.a.DE[,"logFC"] < 0)
length(sel)
## [1] 1249
points(FL83B.a.DE[sel,"logFC"], -log10(FL83B.a.DE[sel,"P.Value"]), pch=16, col="blue")