pathway_analysis.Rmd
Most phosphoproteomic studies have adopted a phosphosite-level analysis of the data. To enable phosphoproteomic data analysis at the gene level, PhosR
implements both site- and gene-centric analyses for detecting changes in kinase activities and signalling pathways through traditional enrichment analyses (over-representation or rank-based gene set test, together referred to as‘1-dimensional enrichment analysis’) as well as 2- and 3-dimensional analyses.
This vignette will perform gene-centric pathway enrichment analyses on the normalised myotube phosphoproteomic dataset using both over-representation and rank–based gene set tests and also provide an example of how directPA
can be used to test which kinases are activated upon different stimulations in myotubes using 2-dimensional analyses Yang et al. 2014.
First, we will load the PhosR package with few other packages will use for the demonstration purpose.
We will use RUV normalised L6 phosphopreteome data for demonstration of gene-centric pathway analysis. It contains phosphoproteome from three different treatment conditions: (1) AMPK agonist AICAR, (2) insulin (Ins), and (3) in combination (AICAR+Ins).
suppressPackageStartupMessages({
library(calibrate)
library(limma)
library(directPA)
library(org.Rn.eg.db)
library(reactome.db)
library(annotate)
library(PhosR)
})
data("PhosphoSitePlus")
We will use the ppe_RUV
matrix from batch_correction.
To enable enrichment analyses on both gene and phosphosite levels, PhosR implements a simple method called phosCollapse
which reduces phosphosite level of information to the proteins for performing downstream gene-centric analyses. We will utilise two functions, pathwayOverrepresent
and pathwayRankBasedEnrichment
, to demonstrate 1-dimensional (over-representation and rank-based gene set test) gene-centric pathway enrichment analysis respectively.
First, extract phosphosite information from the ppe object.
sites = paste(sapply(ppe@GeneSymbol, function(x)x),";",
sapply(ppe@Residue, function(x)x),
sapply(ppe@Site, function(x)x),
";", sep = "")
Then fit a linear model for each phosphosite.
f <- gsub("_exp\\d", "", colnames(ppe))
X <- model.matrix(~ f - 1)
fit <- lmFit(ppe@assays@data$normalised, X)
Extract top-differentially regulated phosphosites for each condition compared to basal.
table.AICAR <- topTable(eBayes(fit), number=Inf, coef = 1)
table.Ins <- topTable(eBayes(fit), number=Inf, coef = 3)
table.AICARIns <- topTable(eBayes(fit), number=Inf, coef = 2)
DE1.RUV <- c(sum(table.AICAR[,"adj.P.Val"] < 0.05), sum(table.Ins[,"adj.P.Val"] < 0.05), sum(table.AICARIns[,"adj.P.Val"] < 0.05))
# extract top-ranked phosphosites for each group comparison
contrast.matrix1 <- makeContrasts(fAICARIns-fIns, levels=X) # defining group comparisons
contrast.matrix2 <- makeContrasts(fAICARIns-fAICAR, levels=X) # defining group comparisons
fit1 <- contrasts.fit(fit, contrast.matrix1)
fit2 <- contrasts.fit(fit, contrast.matrix2)
table.AICARInsVSIns <- topTable(eBayes(fit1), number=Inf)
table.AICARInsVSAICAR <- topTable(eBayes(fit2), number=Inf)
DE2.RUV <- c(sum(table.AICARInsVSIns[,"adj.P.Val"] < 0.05), sum(table.AICARInsVSAICAR[,"adj.P.Val"] < 0.05))
o <- rownames(table.AICARInsVSIns)
Tc <- cbind(table.Ins[o,"logFC"], table.AICAR[o,"logFC"], table.AICARIns[o,"logFC"])
rownames(Tc) <- sites[match(o, rownames(ppe))]
rownames(Tc) <- gsub("(.*)(;[A-Z])([0-9]+)(;)", "\\1;\\3;", rownames(Tc))
colnames(Tc) <- c("Ins", "AICAR", "AICAR+Ins")
Summarize phosphosite-level information to proteins for the downstream gene-centric analysis.
pathways = as.list(reactomePATHID2EXTID)
path_names = as.list(reactomePATHID2NAME)
name_id = match(names(pathways), names(path_names))
names(pathways) = unlist(path_names)[name_id]
pathways = pathways[which(grepl("Rattus norvegicus", names(pathways), ignore.case = TRUE))]
pathways = lapply(pathways, function(path) {
gene_name = unname(getSYMBOL(path, data = "org.Rn.eg"))
toupper(unique(gene_name))
})
Perform 1D gene-centric pathway analysis
path1 <- pathwayOverrepresent(geneSet, annotation=pathways,
universe = rownames(Tc.gene), alter = "greater")
path2 <- pathwayRankBasedEnrichment(Tc.gene[,1],
annotation=pathways,
alter = "greater")
Next, we will compare enrichment of pathways (in negative log10 p-values) between the two 1-dimensional pathway enrichment analysis. On the scatter plot, the x-axis and y-axis refer to the p-values derived from the rank-based gene set test and over-representation test, respectively. We find several expected pathways, while these highly enriched pathways are largely in agreement between the two types of enrichment analyses.
lp1 <- -log10(as.numeric(path2[names(pathways),1]))
lp2 <- -log10(as.numeric(path1[names(pathways),1]))
plot(lp1, lp2, ylab="Overrepresentation (-log10 pvalue)", xlab="Rank-based enrichment (-log10 pvalue)", main="Comparison of 1D pathway analyses", xlim = c(0, 10))
# select highly enriched pathways
sel <- which(lp1 > 1.5 & lp2 > 0.9)
textxy(lp1[sel], lp2[sel], gsub("_", " ", gsub("REACTOME_", "", names(pathways)))[sel])
One key aspect in studying signalling pathways is to identify key kinases that are involved in signalling cascades. To identify these kinases, we make use of kinase-substrate annotation databases such as PhosphoSitePlus
and Phospho.ELM
. These databases are included in the PhosR
and directPA
packages already. To access them, simply load the package and access the data by data(“PhosphoSitePlus”) and data(“PhosphoELM”).
The 2- and 3-dimensional analyses enable the investigation of kinases regulated by different combinations of treatments. We will introduce more advanced methods implemented in the R package directPA
for performing “2 and 3-dimentional” direction site-centric kinase activity analyses.
# 2D direction site-centric kinase activity analyses
par(mfrow=c(1,2))
dpa1 <- directPA(Tc[,c(1,3)], direction=0,
annotation=lapply(PhosphoSite.rat, function(x){gsub(";[STY]", ";", x)}),
main="Direction pathway analysis")
dpa2 <- directPA(Tc[,c(1,3)], direction=pi*7/4,
annotation=lapply(PhosphoSite.rat, function(x){gsub(";[STY]", ";", x)}),
main="Direction pathway analysis")
# top activated kinases
dpa1$pathways[1:5,]
#> pvalue size
#> AKT1 6.207001e-09 9
#> MAPK1 0.00057404 9
#> PRKACA 0.0006825021 25
#> PRKAA1 0.000965093 6
#> MAPK3 0.006670176 10
dpa2$pathways[1:5,]
#> pvalue size
#> PRKAA1 0.00463462 6
#> AKT1 0.02942273 9
#> CSNK2A1 0.2193148 12
#> CDK5 0.2607434 5
#> MAPK1 0.2767886 9
There is also a function called perturbPlot2d
implemented in kinasePA
for testing and visualising activity of all kinases on all possible directions. Below are the demonstration from using this function.
z1 <- perturbPlot2d(Tc=Tc[,c(1,2)],
annotation=lapply(PhosphoSite.rat, function(x){gsub(";[STY]", ";", x)}),
cex=1, xlim=c(-2, 4), ylim=c(-2, 4),
main="Kinase perturbation analysis")
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sonoma 14.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: UTC
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] PhosR_1.13.1 annotate_1.80.0 XML_3.99-0.17
#> [4] reactome.db_1.86.2 org.Rn.eg.db_3.18.0 AnnotationDbi_1.64.1
#> [7] IRanges_2.36.0 S4Vectors_0.40.2 Biobase_2.62.0
#> [10] BiocGenerics_0.48.1 directPA_1.5.1 limma_3.58.1
#> [13] calibrate_1.7.7 MASS_7.3-60.2
#>
#> loaded via a namespace (and not attached):
#> [1] DBI_1.2.3 bitops_1.0-9
#> [3] gridExtra_2.3 rlang_1.1.4
#> [5] magrittr_2.0.3 matrixStats_1.4.1
#> [7] e1071_1.7-16 compiler_4.4.1
#> [9] RSQLite_2.3.7 reshape2_1.4.4
#> [11] png_0.1-8 systemfonts_1.1.0
#> [13] vctrs_0.6.5 stringr_1.5.1
#> [15] pkgconfig_2.0.3 shape_1.4.6.1
#> [17] crayon_1.5.3 fastmap_1.2.0
#> [19] backports_1.5.0 XVector_0.42.0
#> [21] utf8_1.2.4 rmarkdown_2.28
#> [23] preprocessCore_1.64.0 ragg_1.3.3
#> [25] network_1.18.2 purrr_1.0.2
#> [27] bit_4.5.0 xfun_0.48
#> [29] zlibbioc_1.48.2 cachem_1.1.0
#> [31] GenomeInfoDb_1.38.8 jsonlite_1.8.9
#> [33] blob_1.2.4 highr_0.11
#> [35] DelayedArray_0.28.0 broom_1.0.7
#> [37] R6_2.5.1 stringi_1.8.4
#> [39] bslib_0.8.0 RColorBrewer_1.1-3
#> [41] GGally_2.2.1 car_3.1-3
#> [43] GenomicRanges_1.54.1 jquerylib_0.1.4
#> [45] Rcpp_1.0.13 SummarizedExperiment_1.32.0
#> [47] knitr_1.48 igraph_2.1.1
#> [49] Matrix_1.7-0 tidyselect_1.2.1
#> [51] abind_1.4-8 yaml_2.3.10
#> [53] viridis_0.6.5 lattice_0.22-6
#> [55] tibble_3.2.1 plyr_1.8.9
#> [57] KEGGREST_1.42.0 coda_0.19-4.1
#> [59] evaluate_1.0.1 desc_1.4.3
#> [61] ggstats_0.7.0 proxy_0.4-27
#> [63] circlize_0.4.16 Biostrings_2.70.3
#> [65] ggpubr_0.6.0 pillar_1.9.0
#> [67] BiocManager_1.30.25 carData_3.0-5
#> [69] MatrixGenerics_1.14.0 generics_0.1.3
#> [71] RCurl_1.98-1.16 ggplot2_3.5.1
#> [73] munsell_0.5.1 scales_1.3.0
#> [75] BiocStyle_2.30.0 xtable_1.8-4
#> [77] class_7.3-22 glue_1.8.0
#> [79] pheatmap_1.0.12 tools_4.4.1
#> [81] dendextend_1.18.1 ggsignif_0.6.4
#> [83] fs_1.6.4 grid_4.4.1
#> [85] tidyr_1.3.1 colorspace_2.1-1
#> [87] GenomeInfoDbData_1.2.11 Formula_1.2-5
#> [89] cli_3.6.3 ruv_0.9.7.1
#> [91] textshaping_0.4.0 fansi_1.0.6
#> [93] viridisLite_0.4.2 S4Arrays_1.2.1
#> [95] ggdendro_0.2.0 dplyr_1.1.4
#> [97] pcaMethods_1.94.0 gtable_0.3.6
#> [99] rstatix_0.7.2 sass_0.4.9
#> [101] digest_0.6.37 SparseArray_1.2.4
#> [103] htmlwidgets_1.6.4 memoise_2.0.1
#> [105] htmltools_0.5.8.1 pkgdown_2.1.1
#> [107] lifecycle_1.0.4 httr_1.4.7
#> [109] statnet.common_4.10.0 GlobalOptions_0.1.2
#> [111] statmod_1.5.0 bit64_4.5.2