The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely. Share
  • 1 National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili Dongcheng District, Beijing 10050, P. R. China.
  • 2 BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, Guangdong Province, 518083, P. R. China.
  • 3 State Food and Drug Administration Hubei Center for Medical Equipment Quality Supervision and Testing, 24-9, Zhongbei East Road, Wuhan, Hubei Province, 430000, P. R. China.
  • 4 BGI-Qingdao, Tuanjie Rd., Huangdao District, Qingdao, Shandong Province, 266555, P. R. China.
  • 1 National Institutes for food and drug Control (NIFDC), No.2, Tiantan Xili Dongcheng District, Beijing 10050, P. R. China.
  • 2 BGI-Shenzhen, Bei Shan Industrial Zone, Yantian District, Shenzhen, Guangdong Province, 518083, P. R. China.
  • 3 State Food and Drug Administration Hubei Center for Medical Equipment Quality Supervision and Testing, 24-9, Zhongbei East Road, Wuhan, Hubei Province, 430000, P. R. China.
  • 4 BGI-Qingdao, Tuanjie Rd., Huangdao District, Qingdao, Shandong Province, 266555, P. R. China.
  • Huang J, et al. Gigascience. 2018 Dec 1;7(12):giy144. doi: 10.1093/gigascience/giy144. Gigascience. 2018. PMID: 30500904 Free PMC article. No abstract available. BGISEQ-500 is a new desktop sequencer developed by BGI. Using DNA nanoball and combinational probe anchor synthesis developed from Complete Genomics™ sequencing technologies, it generates short reads at a large scale. Here, we present the first human whole-genome sequencing dataset of BGISEQ-500. The dataset was generated by sequencing the widely used cell line HG001 (NA12878) in two sequencing runs of paired-end 50 bp (PE50) and two sequencing runs of paired-end 100 bp (PE100). We also include examples of the raw images from the sequencer for reference. Finally, we identified variations using this dataset, estimated the accuracy of the variations, and compared to that of the variations identified from similar amounts of publicly available HiSeq2500 data. We found similar single nucleotide polymorphism (SNP) detection accuracy for the BGISEQ-500 PE100 data (false positive rate [FPR] = 0.00020%, sensitivity = 96.20%) compared to the PE150 HiSeq2500 data (FPR = 0.00017%, sensitivity = 96.60%) better SNP detection accuracy than the PE50 data (FPR = 0.0006%, sensitivity = 94.15%). But for insertions and deletions (indels), we found lower accuracy for BGISEQ-500 data (FPR = 0.00069% and 0.00067% for PE100 and PE50 respectively, sensitivity = 88.52% and 70.93%) than the HiSeq2500 data (FPR = 0.00032%, sensitivity = 96.28%). Our dataset can serve as the reference dataset, providing basic information not just for future development, but also for all research and applications based on the new sequencing platform.
    Raw image data processing on the BGISEQ-500 platform. (a) Registration of images from different channels. Relative coordinates will be calculated according to the pattern layout of DNBs. (b) Intensity correction between channels and cycles. Correction of the optical and chemical interferences on different channels and the neighbor cycles was applied. (c) Connecting called bases to FASTQ. Bases from all cycles will be collected and converted to FASTQ format. Phred score calculation and statistics will be applied during the conversion.
    Quality control of the dataset after data filtering. Base-wise quality score distributions of the first read (a) from left to right (BGISEQ-500 PE50, BGISEQ-500 PE100, and HiSeq2500 PE150) and the second read (b) from left to right (BGISEQ-500 PE50, BGISEQ-500 PE100, and HiSeq2500 PE150). For each position along the reads, the quality scores of all reads were used to calculate the mean, median, and quantile values; thus the box plot can be shown. The overall quality score distribution of BGISEQ-500 and HiSeq2500 data (c) . GC content distribution of the BGISEQ-500 and HiSeq2500 data (d) . FastQC [18] was used for the calculation (FastQC, RRID:SCR_014583).
    Kim HM, et al. Gigascience. 2021 Mar 12;10(3):giab014. doi: 10.1093/gigascience/giab014. Gigascience. 2021. PMID: 33710328 Free PMC article. Patch AM, et al. PLoS One. 2018 Jan 10;13(1):e0190264. doi: 10.1371/journal.pone.0190264. eCollection 2018. PLoS One. 2018. PMID: 29320538 Free PMC article. Korostin D, et al. PLoS One. 2020 Mar 16;15(3):e0230301. doi: 10.1371/journal.pone.0230301. eCollection 2020. PLoS One. 2020. PMID: 32176719 Free PMC article. Whitford W, et al. J Biomed Inform. 2019 Jun;94:103174. doi: 10.1016/j.jbi.2019.103174. Epub 2019 Apr 6. J Biomed Inform. 2019. PMID: 30965134 Review. Thudi M, et al. Brief Funct Genomics. 2012 Jan;11(1):3-11. doi: 10.1093/bfgp/elr045. Brief Funct Genomics. 2012. PMID: 22345601 Review. Feng W, et al. Sci Data. 2023 Sep 14;10(1):627. doi: 10.1038/s41597-023-02533-0. Sci Data. 2023. PMID: 37709774 Free PMC article. Cereser B, et al. Nat Commun. 2023 Sep 6;14(1):5136. doi: 10.1038/s41467-023-40608-z. Nat Commun. 2023. PMID: 37673861 Free PMC article. Sun J, et al. Clin Epigenetics. 2023 Aug 14;15(1):130. doi: 10.1186/s13148-023-01543-4. Clin Epigenetics. 2023. PMID: 37582783 Free PMC article. Li N, et al. Front Microbiol. 2023 Jul 25;14:1173614. doi: 10.3389/fmicb.2023.1173614. eCollection 2023. Front Microbiol. 2023. PMID: 37555072 Free PMC article. Sharma T, et al. Front Cell Dev Biol. 2023 Jul 7;11:1160227. doi: 10.3389/fcell.2023.1160227. eCollection 2023. Front Cell Dev Biol. 2023. PMID: 37484913 Free PMC article.