1 Step-by-Step Method Details

1.1 Step 1: Installing PhosR

Full installation of PhosR includes downloading the PhosR package from GitHub or BioConductor. To use the latest developmental version of PhosR, install from GitHub.

Install PhosR by running the following code:

if(!require(devtools)){ 
 install.packages("devtools") # If not already installed 
}

#devtools::install_github("PYangLab/PhosR", 
#                        build_opts = c("--no-resave-data", "--no-manual"), 
#                        build_vignettes=TRUE, 
#                        dependencies =  TRUE)
library(PhosR)

CRITICAL: To install all dependencies, you will need to update to the latest version of R.

Whilst this STAR protocol provides an in-depth step-by-step tutorial for running PhosR with tips and suggestions, you can browse through the streamlined version of the vignette with the associated code to reproduce the findings from our original publication.

#utils::browseVignettes("PhosR")

Note that all the necessary data needed to reproduce the code can be downloaded from here.

1.2 Step 2: Processing data from a typical MaxQuant output

The very first step of the analysis of phosphoproteomics using the PhosR package is to construct a signal matrix from the output of an instrument. For demonstration purposes, we will use a subset of tab separated (txt) file of PXD001792 hepatocyte data from MaxQuant. Briefly, this dataset contains the phosphoproteomic quantifications of two mouse hepatocyte cell lines, FL83B and Hep 1-6 cells, that were treated with either PBS (mock) or insulin. Each condition includes six biological replicates (Humphrey et al., 2015).

phospho_hepatocyte_raw <- read.delim("Data/PXD001792_raw_hepatocyte.txt", 
                                     header = TRUE)
# delete reverse matches and potential contaminants
del <- which(phospho_hepatocyte_raw[,"Reverse"]=="+" | phospho_hepatocyte_raw[,"Potential.contaminant"]=="+")
phospho_hepatocyte_clean <- phospho_hepatocyte_raw[-del,]

# Subset the raw data to select columns with "Intensity" values.
PXD001792_raw_hepatocyte <- phospho_hepatocyte_clean[,grep("Intensity", colnames(phospho_hepatocyte_clean))]

1.3 Step 3: Creating the PhosphoExperiment Object

To increase the usability of PhosR functions, we implement a “PhosphoExperiment” (ppe) object based on the “SummarizedExperiment” class. To create the PhosphoExperiment object, you will be required to provide a quantification matrix where columns refer to samples and rows refer to phosphosites. Additional annotation labels for phosphosites should be provided alongside the matrix, including the phosphosite residue and position, “sequence window” that captures the amino acids flanking the phosphorylation sites, and the official gene symbol of the host protein in capital letters. Here, we will show the basic steps for generating a PhosphoExperiment object using the above quantification matrix.

Create PhosphoExperiment Object

ppe <- PhosR::PhosphoExperiment(assays = list(Quantification = as.matrix(PXD001792_raw_hepatocyte)))

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : GeneSymbol is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Site is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Sequence is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Residue is not specified. This may affect subsequent analysis steps.

Add site annotations to PhosphoExperiment object

# Extracting feature information
GeneSymbol <- toupper(sapply(strsplit(as.character(phospho_hepatocyte_clean[,"Gene.names"]), ";"),
                             function(x){x[1]}))
Residue <- as.character(phospho_hepatocyte_clean[,"Amino.acid"])
Site <- as.numeric(phospho_hepatocyte_clean[,"Position"]) 
Sequence <- sapply(strsplit(as.character(phospho_hepatocyte_clean[,"Sequence.window"]), ";"), function(x){x[1]})

### add these annotations to respective ppe slots 
ppe@GeneSymbol <- GeneSymbol
ppe@Residue <- Residue
ppe@Site <- Site
ppe@Sequence <- Sequence

Alternatively, we can create PhosphoExperiment object as following

ppe <- PhosphoExperiment(assays = list(Quantification = as.matrix(PXD001792_raw_hepatocyte)), 
                         Site = Site, 
                         GeneSymbol = GeneSymbol, 
                         Residue = Residue, Sequence = Sequence)

Lastly add colData information

sample_name = strsplit(gsub("^Intensity.", "", colnames(ppe)), "_")

df = S4Vectors::DataFrame(
    cellline = sapply(sample_name, "[[", 1),
    condition = sapply(sample_name, "[[", 2),
    replicate = sapply(sample_name, "[[", 3)
)
rownames(df) = colnames(ppe)
SummarizedExperiment::colData(ppe) = df

Have a quick glance of the object

ppe

## class: PhosphoExperiment 
## dim: 49003 24 
## metadata(0):
## assays(1): Quantification
## rownames(49003): 1 2 ... 49612 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate

dim(ppe)

## [1] 49003    24

1.4 Step 4: Data Pre-processing and Differential Phosphosite Identification

The presence of missing values in quantitative phosphoproteomics reduces the completeness of data. Whilst imputation has been widely applied to handle missing values, it remains a major challenge when analyzing phosphoproteomic data and has significant impact on downstream analysis such as normalization. PhosR provides users with greater flexibility for imputation with imputation functions such as ‘scImpute’ and ‘tImpute’. Here, we will go through each function step by step to demonstrate their use in imputing phosphoproteomic data.

Log transformation

logmat = log2(SummarizedExperiment::assay(ppe, "Quantification"))
logmat[is.infinite(logmat)] = NA
SummarizedExperiment::assay(ppe, "Quantification") = logmat

Filtering using function selectGrps

grps <- paste0(SummarizedExperiment::colData(ppe)$cellline, 
               "_", 
               SummarizedExperiment::colData(ppe)$condition)
ppe <- selectGrps(ppe, grps, 0.5, n=1)
dim(ppe)

## [1] 17493    24

Note: The ‘Quantification’ assay in PhosphoExperiment object is now filtered of phosphosites with high number of missing values. Please ensure you have saved the original matrix under a different name if you would like to keep the unfiltered matrix for future analysis.

1.4.1 Imputation

PhosR enables site- and condition-specific imputation. Here, for each phosphosite in each condition, we impute its missing values in that condition (if any) using site- and condition-specific imputation if the quantification rate within that condition is equal to or greater than a desired percentage (such as 50% below).

set.seed(123)
ppe <- scImpute(ppe, 0.5, grps)
ppe

## class: PhosphoExperiment 
## dim: 17493 24 
## metadata(0):
## assays(2): Quantification imputed
## rownames(17493): 1 2 ... 49611 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate

Lastly, we can impute the remaining sites using tail-based imputation /

ppe <- tImpute(ppe, assay = "imputed")

At this stage, the imputed quantification matrix is stored as an assay called ‘imputed’ in the PhosphoExperiment object.

ppe <- medianScaling(ppe, scale = FALSE, assay = "imputed")
ppe

## class: PhosphoExperiment 
## dim: 17493 24 
## metadata(0):
## assays(3): Quantification imputed scaled
## rownames(17493): 1 2 ... 49611 49617
## rowData names(0):
## colnames(24): Intensity.FL83B_Control_1 Intensity.FL83B_Control_2 ...
##   Intensity.Hepa1.6_Ins_5 Intensity.Hepa1.6_Ins_6
## colData names(3): cellline condition replicate

The scaled quantification matrix can be found as the ‘scaled’ matrix in the PhosphoExperiment object.

Next, we will call differentially phosphorylated sites. We use limma package for calling for differentially phosphorylated sites.

library(limma)
# construct design matrix by group
design <- model.matrix(~ grps - 1)
# fit linear model for each phosphosite
fit <- lmFit(ppe@assays@data$scaled, design)

contrast.matrix <- makeContrasts(grpsFL83B_Ins-grpsFL83B_Control, 
                                 grpsHepa1.6_Ins-grpsHepa1.6_Control, 
                                 levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)

#Counting and visualise differentially phosphorylated sites for FL83B cells in control and insulin-simulated condition after imputation.
## FL83B
FL83B.a.DE <- topTable(fit2, 
                       coef="grpsFL83B_Ins - grpsFL83B_Control", 
                       number = Inf)

plot(FL83B.a.DE[,"logFC"], -log10(FL83B.a.DE[,"P.Value"]))
sel <- which(FL83B.a.DE[,"adj.P.Val"] < 0.05 & FL83B.a.DE[,"logFC"] > 0)
length(sel)

## [1] 1261

points(FL83B.a.DE[sel,"logFC"], -log10(FL83B.a.DE[sel,"P.Value"]), pch=16, col="red")
sel <- which(FL83B.a.DE[,"adj.P.Val"] < 0.05 & FL83B.a.DE[,"logFC"] < 0)
length(sel)

## [1] 1249

points(FL83B.a.DE[sel,"logFC"], -log10(FL83B.a.DE[sel,"P.Value"]), pch=16, col="blue")

## Hepa1.6
Hepa1.6.a.DE <- topTable(fit2, 
                         coef="grpsHepa1.6_Ins - grpsHepa1.6_Control", 
                         number = Inf)
plot(Hepa1.6.a.DE[,"logFC"], -log10(Hepa1.6.a.DE[,"P.Value"]))
sel <- which(Hepa1.6.a.DE[,"adj.P.Val"] < 0.05 & Hepa1.6.a.DE[,"logFC"] > 0)
length(sel)

## [1] 895

points(Hepa1.6.a.DE[sel,"logFC"], -log10(Hepa1.6.a.DE[sel,"P.Value"]), pch=16, col="red")
sel <- which(Hepa1.6.a.DE[,"adj.P.Val"] < 0.05 & Hepa1.6.a.DE[,"logFC"] < 0)
length(sel)

## [1] 953

points(Hepa1.6.a.DE[sel,"logFC"], -log10(Hepa1.6.a.DE[sel,"P.Value"]), pch=16, col="blue")

After imputation, we take the ratios for each of the two cell lines.

mat = SummarizedExperiment::assay(ppe, "scaled")

FL83B.ratio <- mat[, grep("FL83B_", colnames(ppe))] - rowMeans(mat[,grep("FL83B_Control", colnames(ppe))])
Hepa.ratio <- mat[, grep("Hepa1.6_", colnames(ppe))] - rowMeans(mat[,grep("Hepa1.6_Control", colnames(ppe))])

SummarizedExperiment::assay(ppe, "ratio") <- cbind(FL83B.ratio, Hepa.ratio)

par(mfrow=c(1,2))
boxplot(ppe@assays@data$scaled, 
        ylab="Log2 LFQ", 
        main="Normalised LFQ data", las=2, 
        col=factor(rep(1:4, each=6)))
boxplot(ppe@assays@data$ratio, 
        ylab="Log2 Fold Change", 
        main="Ratio data", las=2, 
        col=factor(rep(1:4, each=6)))

We will now save this fully processed matrix for use later

PXD001792_ppe_hepatocyte = ppe 
#save(PXD001792_ppe_hepatocyte, file = "Data/PXD001792_ppe_hepatocyte.RData")

CRITICAL: After imputation, data from label-free quantification are typically converted to ratios before subsequent analysis. In contrast to label-free data, you do not need to take ratios of phosphoproteomic data derived from SILAC quantification since the values are inherently ratios (typically with respect to the control sample). The tail-based imputation (Beck et al., 2015) is designed specifically for label-free data (such as the one used in our example) and is not applicable to SILAC data.

1.5 Step 5: identifying stably phosphorylated sites

Several commonly used data normalisation approaches such as the ’removal of unwanted variation” (RUV) (Gagnon-Bartsch and Speed, 2012) require a set of internal standards whose expression are known to be unchanged in the samples measured. This is a challenge for phosphoproteomic data since phosphorylation is a highly dynamic biochemical process. Identifying a set of stably phosphorylated sites (SPSs) is a unique feature of PhosR which enables users to identify context-dependent sets of SPSs. We also included a set of 100 SPSs as a resource, identified from multiple high-quality datasets generated from different cell types and experimental conditions (Kim et al., 2021). As an example, we will use three datasets to demonstrate how SPSs can be identified from multiple phosphoproteomic datasets. Users may wish to replace the example datasets with their own collection of datasets.

Load datasets

load("Data/PXD010621_ppe_ESC.RData", verbose = TRUE)

## Loading objects:
##   PXD010621_ppe_ESC

load("Data/PXD003631_ppe_adipocyte.RData", verbose = TRUE)

## Loading objects:
##   PXD003631_ppe_adipocyte

load("Data/phospho_ppe_adipocyte.RData", verbose = TRUE)

## Loading objects:
##   phospho_ppe_adipocyte

# Simplify names of datasets
ppe1 <- PXD010621_ppe_ESC
ppe2 <- PXD003631_ppe_adipocyte
ppe3 <- phospho_ppe_adipocyte

Prepare inputs to the function

# Make a list of all PhosphoExperiment objects
ppe.list <- list(ppe1, ppe2, ppe3)

# Make a vector of the selected assays in each of the PhosphoExperiment objects
assays <- "Quantification"

# Make a list of grouping information of each dataset
cond.list <- list(
    grp1 = gsub("_.+", "", colnames(ppe1)),
    grp2 = gsub("_r[0-9]", "", colnames(ppe2)),
    grp3 = colnames(ppe3))

Note: If inputs are not PhosphoExperiment objects, please transfer the data format to a PhosphoExperiment object referring to the instructions in Step 2. Example datasets are processed datasets after filtering, normalization and ratio converting. Please refer to Step 4 for data normalization guidance.

Identifying SPSs by calling getSPS()

inhouse_SPSs <- getSPS(phosData = ppe.list, 
                       assays = assays, 
                       conds = cond.list, 
                       num = 100)

Note: Please note that presence of sufficient overlapped phosphosites identified from the input datasets is critical to identify SPSs. You will receive an error if the number of overlapped sites is fewer than 200 in at least two datasets or fewer than 1000 across all input datasets.

1.6 Step 6: Normalisation and Batch Correction of Datasets

A common but critical step in phosphoproteomic data analysis is to correct for batch effect. Without batch effect correction, it is often not possible to analyze datasets in an integrative manner. To perform data integration and batch effect correction, we utilise a set of stably phosphorylated sites (SPSs) across a panel of phosphoproteomic datasets (defined from Step 4) and implement normalisation using RUV-III (Molania et al., 2019). To demonstrate batch effect correction, we will perform RUVphospho to normalise a SILAC data from L6 myotubes treated with two factors: 1) AICAR, an analog of adenosine monophosphate (AMP) that stimulates AMPK activity and/or 2) insulin.

CRITICAL: RUV-III requires a complete data matrix. If you have not followed through the steps above, you will need to perform imputation of the missing values. The imputed values are removed by default after normalisation but can be retained for downstream analysis if the users wish to use the imputed matrix.

Load quantification matrix of phosphoproteomic data and prepare PPE object as before

load("Data/PXD019127_ratio_myoblast.RData", verbose = TRUE)

## Loading objects:
##   PXD019127_ratio_myoblast

ppe <- PhosphoExperiment(assays = list(Quantification = as.matrix(PXD019127_ratio_myoblast)))

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : GeneSymbol is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Site is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Sequence is not specified. This may affect subsequent analysis steps.

## Warning in .se_to_pe(se, UniprotID, GeneSymbol, Site, Residue, Sequence, : Residue is not specified. This may affect subsequent analysis steps.

rowNames = strsplit(rownames(ppe), "~")

ppe@GeneSymbol <- toupper(sapply(rowNames, "[[", 2))
ppe@Residue <- gsub("[0-9]","", sapply(rowNames, "[[", 3))
ppe@Site <- as.numeric(gsub("[A-Z]","", sapply(rowNames, "[[", 3)))
ppe@Sequence <- sapply(rowNames, "[[", 4)

# Generate colData
sample_name = strsplit(colnames(ppe), "_")
df = S4Vectors::DataFrame(
    condition = sapply(sample_name, "[[", 1),
    replicate = gsub("exp", "", sapply(sample_name, "[[", 2))
)
rownames(df) = colnames(ppe)
SummarizedExperiment::colData(ppe) = df

Diagnosing batch effect

plotQC(SummarizedExperiment::assay(ppe,"Quantification"), 
       panel = "dendrogram", 
       grps = SummarizedExperiment::colData(ppe)$condition, 
       labels = colnames(ppe)) + 
  ggplot2::ggtitle("before batch correction")

plotQC(SummarizedExperiment::assay(ppe,"Quantification"), 
       panel = "pca", 
       grps = SummarizedExperiment::colData(ppe)$condition, 
       labels = colnames(ppe)) + 
  ggplot2::ggtitle("before batch correction")

Correcting batch effect

design = model.matrix(~ SummarizedExperiment::colData(ppe)$condition - 1)
head(design) # observe first 6 rows of the design matrix

##   SummarizedExperiment::colData(ppe)$conditionAICAR
## 1                                                 1
## 2                                                 1
## 3                                                 1
## 4                                                 1
## 5                                                 0
## 6                                                 0
##   SummarizedExperiment::colData(ppe)$conditionAICARIns
## 1                                                    0
## 2                                                    0
## 3                                                    0
## 4                                                    0
## 5                                                    0
## 6                                                    0
##   SummarizedExperiment::colData(ppe)$conditionIns
## 1                                               0
## 2                                               0
## 3                                               0
## 4                                               0
## 5                                               1
## 6                                               1

sites = paste(
    sapply(ppe@GeneSymbol, function(x)x), ";", 
    sapply(ppe@Residue, function(x)x), 
    sapply(ppe@Site, function(x)x), ";", 
    sep = ""
)
data(SPSs)
ctl = which(sites %in% SPSs)

# Run RUV
ppe = RUVphospho(ppe, M = design, k = 3, ctl = ctl)

Visualise QC plots

# plot after batch correction
p1 = plotQC(SummarizedExperiment::assay(ppe,"Quantification"), 
            grps = SummarizedExperiment::colData(ppe)$condition, 
            labels = colnames(ppe), 
            panel = "dendrogram")
p2 = plotQC(SummarizedExperiment::assay(ppe,"normalised"), 
            grps = SummarizedExperiment::colData(ppe)$condition, 
            labels = colnames(ppe), 
            panel="dendrogram")
ggpubr::ggarrange(p1, p2, nrow = 1)

p1 = plotQC(SummarizedExperiment::assay(ppe,"Quantification"), 
            panel = "pca", 
            grps = SummarizedExperiment::colData(ppe)$condition, 
            labels = colnames(ppe)) +
  ggplot2::ggtitle("Before Batch correction")
p2 = plotQC(SummarizedExperiment::assay(ppe,"normalised"), 
            grps = SummarizedExperiment::colData(ppe)$condition, 
            labels = colnames(ppe), 
            panel="pca") +
  ggplot2::ggtitle("After Batch correction")
ggpubr::ggarrange(p1, p2, nrow = 2)

We can now save the final processed data for future use

# save data
PXD019127_ppe_myoblast = ppe
#save(PXD019127_ppe_myoblast, file = "Data/PXD019127_ppe_myoblast.RData")

Pause Point: This is an ideal pause point as we have generated a fully processed data. By now, you should have a good idea of the data quality and have performed the necessary processing to filter any suboptimal sites, imputed missing values (if present) and diagnosed and corrected any batch effect present in the data.

1.7 Step 7: Predicting Kinase Substrates

A key end-goal of phosphoproteomic data analysis is to identify kinases that are responsible for the phosphorylation of specific sites. The basic computational approach to annotate kinases to their substrates or phosphosites is to find consensus amino acid sequences around the phosphorylation site. We can go beyond this approach by considering cell type and/or treatment (perturbation) specificity of phosphorylation. PhosR implements a multi-step framework that contains two major components including (i) a kinaseSubstrateScore function which scores a given phosphosite using kinase recognition motif and phosphoproteomic dynamics, and (ii) a kinaseSubstratePred function which synthesise the scores generated from (i) for predicting kinase-substrate relationships using an adaptive sampling-based positive-unlabeled learning method (Yang et al., 2018).

In the original PhosR publication, we demonstrate the application of the scoring method to the myotube phosphoproteome and uncover potential kinase-substrate pairs and global relationships between kinases. We confirm well established substrates of AMPK in our publication: ACACA S79, AKAP1 S103, SMCR8 S488 (Hoffman et al., 2015) and MTFR1L S100 (Schaffer et al., 2015). Importantly, PhosR generates a list of potential candidates not included in the PhosphoSitePlus database for validation.

load("Data/PXD019127_ppe_myoblast.RData", verbose = TRUE)

## Loading objects:
##   PXD019127_ppe_myoblast

ppe = PXD019127_ppe_myoblast
mat = SummarizedExperiment::assay(ppe, "normalised")

CRITICAL: If you have not processed your data, please go through Steps 1 to 5 to perform the necessary processing prior to performing the downstream analysis in Steps 6 and 7.

Filter for up-regulated phosphosites

mat.mean <- meanAbundance(mat, 
                          grps = SummarizedExperiment::colData(ppe)$condition)
aov <- matANOVA(mat=mat, 
                grps = SummarizedExperiment::colData(ppe)$condition)
idx <- (aov < 0.05) & (rowSums(mat.mean > 0.5) > 0)
mat.reg <- mat[idx, ,
               drop = FALSE]

Standardise the matrix

mat.std <- PhosR::standardise(mat.reg)
rownames(mat.std) <- sapply(strsplit(rownames(mat.std), "~"), function(x) { 
  gsub(" ", "", paste(toupper(x[2]), x[3], "", sep=";"))
  })

CRITICAL: To calculate the profile matching score, we rely on the z-score transformed matrix to compare the profiles of phosphosites. Thus, the standardization step is critical.

Kinase substrate scoring step integrates information from both kinase recognition motif (i.e., motif matching score) and experimental perturbation (i.e., profile matching score) for prioritising kinases that may be regulating the phosphorylation level of each site quantified in the

Run PhosR kinase-substrate prediction with the default parameters. PhosR generates the final combined scores of the motif matching score and the profile matching score by taking into account the number of sequences and substrates used for calculating the motif and profile of the kinase.

data("KinaseMotifs")
seqs <- ppe@Sequence[idx]
kssMat <- kinaseSubstrateScore(substrate.list = PhosphoSite.mouse, 
                               mat = mat.std, seqs = seqs,
                               numMotif = 5, numSub = 1, verbose = FALSE)

As the second and last step of kinase-substrate prediction, PhosR uses the ‘kinaseSubstratePred’ function to synthesise the scores generated from ‘kinaseSubstrateScore’ to predict the kinase-substrate relationships using an adaptive sampling-based positive-unlabeled learning method (Yang et al., 2018). This step prioritises the kinases that are most likely to regulate a phosphosite.

set.seed(1)
predMat <- kinaseSubstratePred(kssMat, 
                               top = 30, 
                               verbose = FALSE)
colnames(predMat)

##  [1] "AKT1"    "AKT2"    "CAMK2A"  "CDK2"    "CDK5"    "IRAK1"   "MAP2K1" 
##  [8] "MAPK1"   "MAPK10"  "MAPK14"  "MAPK3"   "MAPK8"   "MAPK9"   "MTOR"   
## [15] "PRKAA1"  "PRKAA2"  "PRKACA"  "PRKCA"   "PRKCB"   "PRKCD"   "PRKCE"  
## [22] "PRKCH"   "RPS6KA1" "RPS6KA5" "RPS6KB1" "TBK1"

Pause Point: Once the desired parameters are used to successfully run the kinase-substrate prediction, you may find this to be a good place to pause and save the results before proceeding with the visualisation steps.

1.8 Step 8: Constructing Signalling Networks (Signalomes)

Constructing signaling networks referred to as “Signalomes” is a useful feature in PhosR that allows the users to obtain a global view of kinase regulation and to establish distinct modules of proteins that demonstrate similar kinase and dynamic regulation upon perturbation or across a time-course.

An important feature of PhosR signalomes is that the resultant signaling modules denote a set of proteins and not phosphosites. Proteins are frequently phosphorylated at multiple sites and these sites may not necessarily be targeted by the same kinases. Site- and protein-centric analyses of phosphoproteomic data lie at opposite ends of the spectrum, with the former treating phosphosites on the same protein independently and ignoring the host protein information, and the latter focussing on a specific protein, losing information from individual phosphosites. Using our global kinase-substrate scoring of phosphosites, we generate signalomes wherein dynamic changes in phosphorylation within and across proteins are conjointly analysed, allowing us to detect proteins that are co-regulated across multiple phosphosites.

The signalome construction uses the outputs of ‘kinaseSubstrateScore’and ‘kinaseSubstratePred’ functions for the generation of a visualisation of the kinase regulation of discrete regulatory protein modules present in our phosphoproteomic data.

To generate the signalomes, run:

kinaseOI = c("AKT1")
signalomesRes <- Signalomes(KSR = kssMat, 
                            predMatrix = predMat, 
                            exprsMat = mat.std, 
                            module_res = 6,
                            KOI = kinaseOI)

We can visualise the signalomes as a balloon plot. Using the resulting visualisation, we are better able to compare the differences in kinase regulation of the modules and the varying proportions of regulation. In the balloon plot, the size of the balloons denotes the percentage magnitude of kinase regulation in each module.

### generate palette
my_color_palette <- grDevices::colorRampPalette(RColorBrewer::brewer.pal(8, "Accent"))
kinase_all_color <- my_color_palette(ncol(kssMat$combinedScoreMatrix))
names(kinase_all_color) <- colnames(kssMat$combinedScoreMatrix)
kinase_signalome_color <- kinase_all_color[colnames(predMat)]

plotSignalomeMap(signalomes = signalomesRes,
                 color = kinase_signalome_color)

Finally, we can also plot the signalome network that illustrates the connectivity between kinase signalome networks.

plotKinaseNetwork(KSR = kssMat, 
                  predMatrix = predMat, 
                  threshold = 0.95, 
                  color = kinase_all_color)

2 Session Info

utils::sessionInfo()

## R version 4.0.3 (2020-10-10)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] limma_3.46.0     PhosR_1.1.7      devtools_2.3.2   usethis_2.0.1   
## [5] BiocStyle_2.18.1
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_2.0-0            ggsignif_0.6.1             
##   [3] ellipsis_0.3.1              class_7.3-18               
##   [5] rio_0.5.26                  rprojroot_2.0.2            
##   [7] circlize_0.4.12             markdown_1.1               
##   [9] XVector_0.30.0              GenomicRanges_1.42.0       
##  [11] GlobalOptions_0.1.2         ggdendro_0.1.22            
##  [13] fs_1.5.0                    gridtext_0.1.4             
##  [15] ggtext_0.1.1                proxy_0.4-25               
##  [17] farver_2.1.0                ggpubr_0.4.0               
##  [19] remotes_2.2.0               fansi_0.4.2                
##  [21] xml2_1.3.2                  cachem_1.0.4               
##  [23] knitr_1.32                  pkgload_1.2.0              
##  [25] jsonlite_1.7.2              broom_0.7.6                
##  [27] pheatmap_1.0.12             BiocManager_1.30.10        
##  [29] compiler_4.0.3              backports_1.2.1            
##  [31] assertthat_0.2.1            Matrix_1.3-2               
##  [33] fastmap_1.1.0               cli_2.4.0                  
##  [35] htmltools_0.5.1.1           prettyunits_1.1.1          
##  [37] tools_4.0.3                 igraph_1.2.6               
##  [39] gtable_0.3.0                glue_1.4.2                 
##  [41] GenomeInfoDbData_1.2.4      reshape2_1.4.4             
##  [43] dplyr_1.0.5                 Rcpp_1.0.6                 
##  [45] carData_3.0-4               Biobase_2.50.0             
##  [47] cellranger_1.1.0            jquerylib_0.1.3            
##  [49] vctrs_0.3.7                 preprocessCore_1.52.1      
##  [51] xfun_0.22                   stringr_1.4.0              
##  [53] network_1.16.1              ps_1.6.0                   
##  [55] openxlsx_4.2.3              testthat_3.0.2             
##  [57] lifecycle_1.0.0             rstatix_0.7.0              
##  [59] dendextend_1.14.0           zlibbioc_1.36.0            
##  [61] MASS_7.3-53.1               scales_1.1.1               
##  [63] pcaMethods_1.82.0           hms_1.0.0                  
##  [65] MatrixGenerics_1.2.1        parallel_4.0.3             
##  [67] SummarizedExperiment_1.20.0 RColorBrewer_1.1-2         
##  [69] yaml_2.2.1                  curl_4.3                   
##  [71] memoise_2.0.0               gridExtra_2.3              
##  [73] ggplot2_3.3.3               sass_0.3.1                 
##  [75] reshape_0.8.8               stringi_1.5.3              
##  [77] highr_0.8                   S4Vectors_0.28.1           
##  [79] desc_1.3.0                  e1071_1.7-6                
##  [81] BiocGenerics_0.36.0         pkgbuild_1.2.0             
##  [83] zip_2.1.1                   shape_1.4.5                
##  [85] GenomeInfoDb_1.26.7         rlang_0.4.10               
##  [87] pkgconfig_2.0.3             matrixStats_0.58.0         
##  [89] bitops_1.0-6                evaluate_0.14              
##  [91] lattice_0.20-41             ruv_0.9.7.1                
##  [93] purrr_0.3.4                 labeling_0.4.2             
##  [95] cowplot_1.1.1               processx_3.4.5             
##  [97] tidyselect_1.1.0            GGally_2.1.1               
##  [99] plyr_1.8.6                  magrittr_2.0.1             
## [101] bookdown_0.21.7             R6_2.5.0                   
## [103] magick_2.7.1                IRanges_2.24.1             
## [105] generics_0.1.0              DelayedArray_0.16.3        
## [107] DBI_1.1.1                   pillar_1.6.0               
## [109] haven_2.3.1                 foreign_0.8-81             
## [111] withr_2.4.1                 abind_1.4-5                
## [113] RCurl_1.98-1.3              tibble_3.1.0               
## [115] crayon_1.4.1                car_3.0-10                 
## [117] utf8_1.2.1                  rmarkdown_2.7.4            
## [119] viridis_0.6.0               grid_4.0.3                 
## [121] readxl_1.3.1                data.table_1.14.0          
## [123] callr_3.5.1                 forcats_0.5.1              
## [125] digest_0.6.27               tidyr_1.1.3                
## [127] stats4_4.0.3                munsell_0.5.0              
## [129] viridisLite_0.4.0           bslib_0.2.4                
## [131] sessioninfo_1.1.1

Protocol for the Processing and Downstream Analysis of Phosphoproteomics Data with PhosR

16 April 2021