EPIC: MHC-I epitope prediction integrating mass spectrometry derived motifs and tissue-specific expression profiles

bioRxiv(2019)

引用 4|浏览32
暂无评分
摘要
Background: Accurate prediction of epitopes presented by human leukocyte antigen (HLA) is crucial for personalized cancer immunotherapies targeting T cell epitopes. Mass spectrometry (MS) profiling of eluted HLA ligands, which provides unbiased, high-throughput measurements of HLA associated peptides in vivo, could be used to faithfully model the presentation of epitopes on the cell surface. In addition, gene expression profiles measured by RNA-seq data in a specific cell/tissue type can significantly improve the performance of epitope presentation prediction. However, although large amount of high-quality MS data of HLA-bound peptides is being generated in recent years, few provide matching RNA-seq data, which makes incorporating gene expression into epitope prediction difficult. Methods: We collected publicly available HLA peptidome and matching RNA-seq data of 34 cell lines derived from various sources. We built position score specific matrixes (PSSMs) for 21 HLA-I alleles based on these MS data, then used logistic regression (LR) to model the relationship among PSSM score, gene expression and peptide length to predict whether a peptide could be presented in each of the cell line. Comparing the feature weights and biases across different HLA-I alleles and cell lines, we observed a universal relationship among these three variables. To confirm this, we built a single LR model by pooling PSSM scores, gene expression levels and peptide length features across different HLA alleles and cell lines, and compared its performance with the allele and cell line specific LR models. Indeed, the predictive powers had no significant differences across cell lines and HLA alleles, and both substantially outperformed predictions based on PSSM scores alone. Based on such a finding, we further built a universal LR model, termed Epitope Presentation Integrated prediCtion (EPIC), based on more than 180,000 unique HLA ligands collected from public sources and ~3,000 HLA ligands generated by ourselves, to predict epitope presentation for 66 common HLA-I alleles. Results: When evaluating EPIC on large, independent HLA eluted ligand datasets, it performed substantially better than other popular methods, including MixMHCpred (v2.0), NetMHCpan (v4.0), and MHCflurry (v1.2.2), with an average 0.1% positive predictive value (PPV) of 51.59%, compared to 36.98%, 36.41%, 24.67% and 23.39% achieved by MixMHCpred, NetMHCpan-4.0 (EL), NetMHCpan-4.0 (BA) and MHCflurry, respectively. It is also comparable to EDGE, a recent deep learning-based model that is not yet publicly available, on predicting epitope presentation and selecting immunogenic cancer neoantigens. However, the simplicity and flexibility of EPIC makes it much easier to be applied in diverse situations, especially when users would like to take advantage of emerging eluted ligand data for new HLA alleles. We demonstrated this by generating MS data for the HCC4006 cell line and adding the support of HLA-A*33:03, which has no previous MS or binding affinity data available, to EPIC. EPIC is publicly available at . Conclusions: we have developed an easy to use, publicly available epitope prediction tool, EPIC, that incorporates information from both MS and RNA-seq data, and demonstrated its superior performance over existing public methods.
更多
查看译文
关键词
T cell epitope,MHC-I,HLA peptidome,Eluted ligand,Mass spectrometry,RNA-seq,Neoantigen,Cancer immunotherapy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要