Real-Time DNA Sequencing from Single Polymerase Molecules 单聚合酶分子的实时DNA测序
Abstract
①We detected the temporal order (时间顺序) of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays (零模波导纳米结构阵列), which provide optical observation volume confinement and enable parallel, simultaneous detection of thousands of single-molecule sequencing reactions (成千上万的单分子测序反应能够并行、同步检测).
②Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without steric hindrance. ( 将荧光团与dNTPs末端磷酸基结合,可以连续观察上千个碱基的DNA合成,而不受空间位阻。这里可能指的是荧光基团的空间位阻不影响DNA的合成)
③Consensus sequences were generated from the single-molecule reads at 15-fold coverage (15倍的测序深度), showing a median accuracy of 99.3%(准确率中位数是99.3%), with no systematic error beyond fluorophore-dependent error rates (除了与荧光基团相关的错误率外,没有系统误差).
Introduction
①Sanger method: This method relies on the low error rate of DNA polymerases (发挥了DNA聚合酶自身的低错误率), but exploits neither their potential for high catalytic rates (高催化效率) nor high processivity (高持续合成能力).
如何理解:大肠杆菌的DNA 聚合酶I三个功能区,5’→3’ DNA聚合酶活性外,还有5’→3’(去除引物)和3’→5’(检查)的外切核酸酶活性。
Klenow fragment,去除5’→3’外切核酸酶活性,在二代测序合成中应用,效率更高。
Methods
Technology
▲when a fluorophore is linked to the terminal phosphate moiety (phospholinked), phosphodiester bond formation catalyzed by the DNA polymerase results in release of the fluorophore from the incorporated nucleotide, thus generating natural, unmodified DNA. (当一个荧光团与末端磷酸基连接时,DNA聚合酶催化的磷酸二酯键形成会使荧光团从合并的核苷酸中释放出来,从而生成天然的、未经修饰的DNA)
▲Φ29 DNA polymerase was selected for these studies because it is a stable, single-subunit enzyme with high speed, accuracy, and processivity (稳定的单亚基酶,具有快速、准确和持续合成能力) that efficiently uses phospholinked dNTPs. It is capable of strand-displacement DNA synthesis and has been used in whole-genome amplification, showing minimal sequencing context bias. (链置换复制模式,就是上次提到的一直循环复制的模式,也被广泛用于全基因组扩增)
▲we reported a surface chemistry that enables selective immobilization of DNA polymerase molecules in the detection zone of ZMW nanostructures with high yield. (使DNA聚合酶分子在ZMW纳米结构检测区域的选择性固定化成为可能)
▲可以测到甲基化,甲基化的碱基脉冲的时常和光谱的特征都会变化,所以可以捕捉。
Structure & Pipeline
(标记的dNTP插入的示意图,以及相对应的从ZMW检测到的荧光强度的预期时间轨迹)
(1) A phospholinked nucleotide forms a cognate association with the template in the polymerase active site,
(在聚合酶的活性位点dNTP与模板互补配对)
(2) causing an elevation of the fluorescence output on the corresponding color channel. (催化反应导致相应的)
(3) Phosphodiester bond formation liberates the dye-linker-pyrophosphate product, which diffuses out of the ZMW, thus ending the fluorescence pulse.
(磷酸二酯键的形成释放了染料-连接剂-焦磷酸盐产物,该产物从ZMW扩散出去,从而终止了荧光脉冲)
(4) The polymerase translocates to the next position, and
(5) the next cognate nucleotide binds the active site beginning the subsequent pulse.
((4)聚合酶转移到下一个位置,
(5)下一个同源核苷酸与活性位点结合,开始随后的脉冲。)
PS:A fluorescence pulse is produced by the polymerase retaining the cognate nucleotide with its colorcoded fluorophore in the detection region of the ZMW. It lasts for a period governed principally by the rate of catalysis, and ends upon cleavage of the dye-linker-pyrophosphate group, which quickly diffuses from the ZMW detection region.(荧光脉冲由聚合酶产生,该聚合酶将同源核苷酸及其彩色荧光团保留在ZMW的检测区域。它持续一段主要由催化速率控制的时间,并在染料链接基焦磷酸盐裂解时结束,该裂解迅速从ZMW检测区域扩散。)
The duration of the fluorophore retention is much longer than the time scales associated with diffusion (2 to 10 ms) or noncognate sampling (
Using synthetic DNA to illustrate approach
To illustrate the principle of our approach to DNA sequencing, we used a synthetic, linear,
single-stranded DNA template with a two-base artificial sequence pattern.
(为了说明我们的DNA测序方法的原理,我们使用了一个合成的,线性的,单链DNA模板,但是我们只使用A555-dCTP and A647-dGTP也就是G和C两种dNTP)
potential of long-read DNA sequencing
we performed a similar two-base signature sequence pattern experiment using a single-stranded 72-base circular DNA template (Fig. 3A). The template was designed such that cytosines were present on only half of the circle, and guanines on the other half. Φ29 DNA polymerase is highly processive (>70,000 bases) without cofactors in bulk reactions. It will carry out multiple laps of DNA strand-displacement synthesis around the circular template.
(我们使用单链72碱基圆环状DNA模板进行了双碱基序列模式实验(图3A)。模板的设计使得胞核嘧啶只出现在一半的圆环上,鸟嘌呤则出现在另一半的圆环上。Φ29 DNA聚合酶在没有辅助因子的情况下反应具有高持续性(> 70000个碱基)。它将围绕圆形模板进行多次strand-displacement类型的DNA合成。)
About errors
官方SMRT流程文档
SMRT下机文件示例
DATA sample:DNA N6-adenine methylation in Arabidopsis thaliana
官方软件
Oxford Nanopore单纳米孔测序
- Nanopore DNA sequencing offers the possibility of a label-free, single-molecule approach that can be performed without the need for sample amplification. (纳米孔DNA测序提供了一种无标签、单分子方法的可能性,无需样品扩增即可进行。)
- Like second-generation systems, nanopore technology is amenable to parallelization, and several cost estimates place nanopore sequencing in the $1,000 range for a complete human genome. (一个完整的人类基因组进行纳米孔测序需要花费1000美元。/第二代技术将单倍体人类基因组重新测序到高质量的全部成本(包括仪器、样品制备和人工)目前在10万至100万美元左右。)
- Furthermore, because the sequence quality should be constant throughout a read, long reads from single molecules of DNA will be possible by using nanopores, offering many advantages including the possibility of de novo sequencing, the high-resolution analysis of chromosomal structure variation, and long-range haplotype mapping. (使用纳米孔可以长时间读取单个DNA分子,这提供了许多优势,包括从头测序的可能性、染色体结构变异的高分辨率分析和远程单倍型映射)
Nanopore sequencing principle
Structure
Pipeline
To enable sequencing of both strands, a library is constructed from double-stranded DNA (dsDNA) with a protocol similar to that used for short-read, second-generation platforms. The library preparation chemistries (SQK MAP005 and SQK MAP005.1) used in this study, contain two different adapters that are ligated to the DNA (Figure 1A). The first, the leader adapter , consists of two oligos with partial complementarity that form a Y-shaped structure once annealed. The second, the hairpin adapter , is a single oligo with internal complementarity to form a hairpin structure. Both adapters in the sequencing kit used for this study are preloaded with motor proteins that mediate the movement of DNA through the pore. Another function of the adapters is to guide the DNA fragments to the vicinity of pores via binding to tethering oligos with affinity for the polymer membrane (Figure 1B). Sequencing begins at the single-stranded 5 end of the leader adapter (Figure 1C). (为了实现对两条链的测序,用双链DNA (dsDNA)构建了一个文库。这里用到两个不同的DNA的接头(图1A)。第一个是leader adapter,由两个部分互补的oligos组成,退火后形成y形结构。第二种是hairpin adapter,是一个内部互补的单一寡聚体,形成发卡结构。本研究中使用的测序试剂盒中的两个接头都预先装载了motor蛋白,motor介导DNA通过nanopore。接头的另一个功能链接tether蛋白,引导DNA片段到达孔隙附近(图1B)。测序从leader接头的5’端开始(图1C))
Once the complementary (double-stranded) region of the leader adapter is reached, the motor protein loaded onto the leader adapter unzips the dsDNA, allowing the first strand of the DNA fragment, the template , to be passed into the nanopore one base at a time, while the sensor measures changes in the ionic current. After reaching the hairpin adapter, an additional protein, the hairpin protein , allows the complementary strand of DNA to pass through the nanopore in a similar fashion. (当互补(双链)区域的leader接头到达后,motor蛋白结合到leader接头除,并解旋双链DNA,允许的第一链DNA片段(也就是模板)的碱基依次通过纳米孔,而传感器测量离子电流的变化。到达发夹接头后,另一种蛋白质,即发夹蛋白质,允许互补的DNA链以类似的方式穿过纳米孔)这是一种2D的文库,实际上后来ONT感觉上逐渐淘汰了这种方案。实际上还有1D和1D2两种,区别就是接头不同,1D2可以让第二链紧接着通过,但是存在一定的概率。
Determine the bases & Errors
The raw current measurements or the corresponding events, plotted over time, are referred to as a squiggle plot . The base-caller in use at this time modelled the characteristics of 45 (= 1,024) possible 5-mers and base-calling consisted of finding the optimal path (Figure 1G) through a Hidden Markov Model (HMM) of successive 5-mers using a Viterbi algorithm. (原始的电流测量或相应的事件,随着时间的推移,被称为波线图。这个时候使用的base-caller模仿的特点,通过连续隐马尔科夫模型(HMM)训练数据后来推算碱基的排列)对于1D文库准确度85%,1D2可以达到90%.
Reference
[1] Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., … Turner, S. (2009). Real-Time DNA Sequencing from Single Polymerase Molecules. Science, (January), 133–138.
[2] Pacbio官方文档:Pacific Biosciences Glossary of Terms
[3] Pacbio官方文档:Introduction to SMRTbell? Template Preparation
[4] Pacbio官方文档:Perspective Understanding Accuracy SMRT Sequencing
[5] Pacbio官方文档:Template Preparation
[6] https://zhuanlan.zhihu.com/p/77547922
[7] https://en.wikipedia.org/wiki/FASTQ_format
[8] Pacbio官方文档:SMRT? Analysis Barcoding Overview
[9] Magi, A., Semeraro, R., Mingrino, A., Giusti, B., & D’Aurizio, R. (2017). Nanopore sequencing data analysis: State of the art, applications and challenges. Briefings in Bioinformatics, 19(6), 1256–1272.
[10] Clarke, J., Wu, H. C., Jayasinghe, L., Patel, A., Reid, S., & Bayley, H. (2009). Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology, 4(4), 265–270.
[11] Ip CLC, Loose M, Tyson JR et al. MinION Analysis and Reference Consortium: Phase 1 data release and analysis[version 1; referees: 2 approved] F1000Research 2015, 4:1075
[12] Ip, C. L. C., Loose, M., Tyson, J. R., de Cesare, M., Brown, B. L., Jain, M., … Olsen, H. E. (2015). MinION Analysis and Reference Consortium: Phase 1 data release and analysis. F1000Research, 4.
[13] https://zhuanlan.zhihu.com/p/91629114
Data
[1] http://datasets.pacb.com/2013/Human10x/READS/index.html
[2] https://www.ncbi.nlm.nih.gov/geo/query/acc.cgicc=GSM2157793
PS
Pacbio Inc (2004年建立,基于康奈尔的研究,以半导体和光子技术结合生物科技,Illumina在18年11月1日收购了Pacbio) & Oxford Nanopore Ltd (牛津大学化学院的Hagan Bayley教授等人建立于2005年)
声明:本站部分文章及图片源自用户投稿,如本站任何资料有侵权请您尽早请联系jinwei@zod.com.cn进行处理,非常感谢!