site_gene_analysis.Rmd
While 1, 2, and 3D pathway analyses are useful for data generated from experiments with different treatment/conditions, analysis designed for time-course data may be better suited to analysis experiments that profile multiple time points.
Here, we will apply ClueR
which is an R package specifically designed for time-course proteomic and phosphoproteomic data analysis Yang et al. 2015.
We will load the PhosR package with few other packages we will use for this tutorial.
suppressPackageStartupMessages({
library(parallel)
library(ggplot2)
library(ClueR)
library(reactome.db)
library(org.Mm.eg.db)
library(annotate)
library(PhosR)
})
We will load a dataset integrated from two time-course datasets of early and intermediate insulin signalling in mouse liver upon insulin stimulation to demonstrate the time-course phosphoproteomic data analyses.
data("phospho.liver.Ins.TC.ratio.RUV.pe")
ppe <- phospho.liver.Ins.TC.ratio.RUV.pe
ppe
#> class: PhosphoExperiment
#> dim: 800 90
#> metadata(0):
#> assays(1): Quantification
#> rownames(800): LARP7;256; SRSF10;131; ... SIK3;493; GSK3A;21;
#> rowData names(0):
#> colnames(90): Intensity.Liver_Ins_0s_Bio7 Intensity.Liver_Ins_0s_Bio8
#> ... Intensity.Liver_Ins_10m_Bio5 Intensity.Liver_Ins_10m_Bio6
#> colData names(0):
Let us start with gene-centric analysis. Such analysis can be directly applied to proteomics data. It can also be applied to phosphoproteomic data by using the phosCollapse
function to summarise phosphosite information to proteins.
# take grouping information
grps <- sapply(strsplit(colnames(ppe), "_"),
function(x)x[3])
# select differentially phosphorylated sites
sites.p <- matANOVA(ppe@assays@data$Quantification,
grps)
ppm <- meanAbundance(ppe@assays@data$Quantification, grps)
sel <- which((sites.p < 0.05) & (rowSums(abs(ppm) > 1) != 0))
ppm_filtered <- ppm[sel,]
# summarise phosphosites information into gene level
ppm_gene <- phosCollapse(ppm_filtered,
gsub(";.+", "", rownames(ppm_filtered)),
stat = apply(abs(ppm_filtered), 1, max), by = "max")
# perform ClueR to identify optimal number of clusters
pathways = as.list(reactomePATHID2EXTID)
pathways = pathways[which(grepl("R-MMU", names(pathways), ignore.case = TRUE))]
pathways = lapply(pathways, function(path) {
gene_name = unname(getSYMBOL(path, data = "org.Mm.eg"))
toupper(unique(gene_name))
})
RNGkind("L'Ecuyer-CMRG")
set.seed(123)
c1 <- runClue(ppm_gene, annotation=pathways,
kRange = seq(2,10), rep = 5, effectiveSize = c(5, 100),
pvalueCutoff = 0.05, alpha = 0.5)
# Visualise the evaluation results
data <- data.frame(Success=as.numeric(c1$evlMat), Freq=rep(seq(2,10), each=5))
myplot <- ggplot(data, aes(x=Freq, y=Success)) +
geom_boxplot(aes(x = factor(Freq), fill="gray")) +
stat_smooth(method="loess", colour="red", size=3, span = 0.5) +
xlab("# of cluster") +
ylab("Enrichment score") +
theme_classic()
myplot
set.seed(123)
best <- clustOptimal(c1, rep=5, mfrow=c(2, 3), visualize = TRUE)
Phosphosite-centric analyses will perform using kinase-substrate annotation information from PhosphoSitePlus.
data("PhosphoSitePlus")
RNGkind("L'Ecuyer-CMRG")
set.seed(1)
PhosphoSite.mouse2 = mapply(function(kinase) {
gsub("(.*)(;[A-Z])([0-9]+;)", "\\1;\\3", kinase)
}, PhosphoSite.mouse)
# perform ClueR to identify optimal number of clusters
c3 <- runClue(ppm_filtered, annotation=PhosphoSite.mouse2, kRange = 2:10, rep = 5, effectiveSize = c(5, 100), pvalueCutoff = 0.05, alpha = 0.5)
# Visualise the evaluation results
data <- data.frame(Success=as.numeric(c3$evlMat), Freq=rep(2:10, each=5))
myplot <- ggplot(data, aes(x=Freq, y=Success)) + geom_boxplot(aes(x = factor(Freq), fill="gray"))+
stat_smooth(method="loess", colour="red", size=3, span = 0.5) + xlab("# of cluster")+ ylab("Enrichment score")+theme_classic()
myplot
set.seed(1)
best <- clustOptimal(c3, rep=10, mfrow=c(2, 3), visualize = TRUE)
# Finding enriched pathways from each cluster
best$enrichList
#> $`cluster 1`
#> kinase pvalue size
#> [1,] "PRKACA" "0.000184676866298047" "5"
#> substrates
#> [1,] "NR1H3;196;|MARCKS;163;|PRKACA;339;|ITPR1;1755;|SIK3;493;"
#>
#> $`cluster 3`
#> kinase pvalue size
#> [1,] "Humphrey.Akt" "0.000162969329853963" "5"
#> [2,] "Yang.Akt" "0.000165386907010959" "6"
#> substrates
#> [1,] "TSC2;939;|PFKFB2;486;|FOXO3;252;|FOXO1;316;|GSK3A;21;"
#> [2,] "AKT1S1;247;|TSC2;939;|PFKFB2;486;|FOXO3;252;|FOXO1;316;|GSK3A;21;"
sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> Random number generation:
#> RNG: L'Ecuyer-CMRG
#> Normal: Inversion
#> Sample: Rejection
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats4 parallel stats graphics grDevices utils datasets
#> [8] methods base
#>
#> other attached packages:
#> [1] PhosR_1.9.1 annotate_1.76.0 XML_3.99-0.12
#> [4] org.Mm.eg.db_3.16.0 reactome.db_1.82.0 AnnotationDbi_1.60.0
#> [7] IRanges_2.32.0 S4Vectors_0.36.0 Biobase_2.58.0
#> [10] BiocGenerics_0.44.0 ClueR_1.4 e1071_1.7-12
#> [13] ggplot2_3.4.0
#>
#> loaded via a namespace (and not attached):
#> [1] colorspace_2.0-3 ggsignif_0.6.4
#> [3] class_7.3-20 rprojroot_2.0.3
#> [5] circlize_0.4.15 XVector_0.38.0
#> [7] GenomicRanges_1.50.1 GlobalOptions_0.1.2
#> [9] ggdendro_0.1.23 fs_1.5.2
#> [11] proxy_0.4-27 farver_2.1.1
#> [13] ggpubr_0.5.0 bit64_4.0.5
#> [15] fansi_1.0.3 splines_4.2.2
#> [17] cachem_1.0.6 knitr_1.41
#> [19] jsonlite_1.8.3 broom_1.0.1
#> [21] png_0.1-8 pheatmap_1.0.12
#> [23] BiocManager_1.30.19 compiler_4.2.2
#> [25] httr_1.4.4 backports_1.4.1
#> [27] Matrix_1.5-1 fastmap_1.1.0
#> [29] limma_3.54.0 cli_3.4.1
#> [31] htmltools_0.5.3 tools_4.2.2
#> [33] igraph_1.3.5 coda_0.19-4
#> [35] gtable_0.3.1 glue_1.6.2
#> [37] GenomeInfoDbData_1.2.9 reshape2_1.4.4
#> [39] dplyr_1.0.10 Rcpp_1.0.9
#> [41] carData_3.0-5 statnet.common_4.7.0
#> [43] jquerylib_0.1.4 pkgdown_2.0.6
#> [45] vctrs_0.5.1 Biostrings_2.66.0
#> [47] nlme_3.1-160 preprocessCore_1.60.0
#> [49] xfun_0.35 stringr_1.4.1
#> [51] network_1.18.0 lifecycle_1.0.3
#> [53] rstatix_0.7.1 dendextend_1.16.0
#> [55] zlibbioc_1.44.0 MASS_7.3-58.1
#> [57] scales_1.2.1 BiocStyle_2.26.0
#> [59] ragg_1.2.4 pcaMethods_1.90.0
#> [61] MatrixGenerics_1.10.0 SummarizedExperiment_1.28.0
#> [63] RColorBrewer_1.1-3 yaml_2.3.6
#> [65] memoise_2.0.1 gridExtra_2.3
#> [67] sass_0.4.4 reshape_0.8.9
#> [69] stringi_1.7.8 RSQLite_2.2.19
#> [71] highr_0.9 desc_1.4.2
#> [73] shape_1.4.6 GenomeInfoDb_1.34.3
#> [75] rlang_1.0.6 pkgconfig_2.0.3
#> [77] systemfonts_1.0.4 matrixStats_0.63.0
#> [79] bitops_1.0-7 evaluate_0.18
#> [81] lattice_0.20-45 ruv_0.9.7.1
#> [83] purrr_0.3.5 labeling_0.4.2
#> [85] bit_4.0.5 tidyselect_1.2.0
#> [87] GGally_2.1.2 plyr_1.8.8
#> [89] magrittr_2.0.3 R6_2.5.1
#> [91] generics_0.1.3 DelayedArray_0.24.0
#> [93] DBI_1.1.3 mgcv_1.8-41
#> [95] pillar_1.8.1 withr_2.5.0
#> [97] KEGGREST_1.38.0 abind_1.4-5
#> [99] RCurl_1.98-1.9 tibble_3.1.8
#> [101] crayon_1.5.2 car_3.1-1
#> [103] utf8_1.2.2 rmarkdown_2.18
#> [105] viridis_0.6.2 grid_4.2.2
#> [107] blob_1.2.3 digest_0.6.30
#> [109] xtable_1.8-4 tidyr_1.2.1
#> [111] textshaping_0.3.6 munsell_0.5.0
#> [113] viridisLite_0.4.1 bslib_0.4.1