Motivation The advent of T cell receptor (TCR) sequencing experiments allowed for a significant increase in the amount of peptide: TCR binding data available and a number of machine learning models appeared in recent years. High-quality prediction models for a fixed epitope sequence are feasible, provided enough known binding TCR sequences are available. However, their performance drops significantly for previously unseen peptides. Results We prepare the dataset of known peptide: TCR binders and augment it with negative decoys created using healthy donors’ T-cell repertoires. We employ deep learning methods commonly applied in Natural Language Processing (NLP) to train part a peptide: TCR binding model with a degree of cross-peptide generalization (0.69 AUROC). We demonstrate that BERTrand outperforms the published methods when evaluated on peptide sequences not used during model training. Availability The datasets and the code for model training are available at https://github.com/SFGLab/bertrand Supplementary information Supplementary data are available at Bioinformatics online. 中文翻译: 动机 T 细胞受体 (TCR) 测序实验的出现使得肽的数量显着增加:可用的 TCR 结合数据以及近年来出现的许多机器学习模型。只要有足够的已知结合 TCR 序列可用,固定表位序列的高质量预测模型是可行的。然而,对于以前未见过的肽,它们的性能显着下降。结果我们准备了已知肽的数据集:TCR 结合物,并使用健康捐赠者的 T 细胞库创建的负诱饵对其进行增强。我们采用自然语言处理(NLP)中常用的深度学习方法来训练肽的一部分:具有一定程度的跨肽泛化(0.69 AUROC)的TCR结合模型。我们证明,在对模型训练期间未使用的肽序列进行评估时,BERTrand 的性能优于已发布的方法。可用性 模型训练的数据集和代码可在 https://github.com/SFGLab/bertrand 上获取 补充信息 补充数据可在 Bioinformatics online 上获取。