BERTrand—peptide: TCR binding prediction using Bidirectional Encoder Representations from Transformers augmented with random TCR p

相关文章推荐

活泼的松鼠 · “药神案”办案检察官：一纸不起诉决定书为绝命 ...· 5 天前 ·

读研的盒饭 · 学院简介-文学院· 9 月前 ·

满身肌肉的跑步鞋 · 钱波大使出席纳卡西露天排球场竣工仪式· 9 月前 ·

求醉的枕头 · 北大汇丰商学院英国校区正式启动· 10 月前 ·

想旅行的山羊 · 致敬红色百年，再创辉煌时代——“传承上大附中 ...· 1 年前 ·

Motivation The advent of T cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide: TCR binding data available and a number of machine learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides. Results We prepare the dataset of known peptide: TCR binders and augment it with negative decoys created using healthy donors’ T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing (NLP) to train part a peptide: TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training. Availability The datasets and the code for model training are available at https://github.com/SFGLab/bertrand Supplementary information Supplementary data are available at Bioinformatics online. 中文翻译：动机 T 细胞受体 (TCR) 测序实验的出现使得肽的数量显着增加：可用的 TCR 结合数据以及近年来出现的许多机器学习模型。只要有足够的已知结合 TCR 序列可用，固定表位序列的高质量预测模型是可行的。然而，对于以前未见过的肽，它们的性能显着下降。结果我们准备了已知肽的数据集：TCR 结合物，并使用健康捐赠者的 T 细胞库创建的负诱饵对其进行增强。我们采用自然语言处理（NLP）中常用的深度学习方法来训练肽的一部分：具有一定程度的跨肽泛化（0.69 AUROC）的TCR结合模型。我们证明，在对模型训练期间未使用的肽序列进行评估时，BERTrand 的性能优于已发布的方法。可用性模型训练的数据集和代码可在 https://github.com/SFGLab/bertrand 上获取补充信息补充数据可在 Bioinformatics online 上获取。