genXone

DNA sequencing methods produce multiple readings of different lengths. However, based solely on such results, it is not possible to obtain the comprehensive information needed about DNA molecules. The next step in the full analysis is to recreate the original DNA sequence, i.e. to know the order of the nucleotide pairs.

What is DNA assembly?

Assembling (reassembling) de novo (Latin from scratch) consists of the appropriate arrangement and connection of overlapping readings obtained from sequencing and basecalling to recreate the studied sequence from which they were created. This method is based on a reading sequence only and assumes that the reference sequence and its length are unknown in this case.

ORIGINAL SEQUENCES – READINGS -BOVERLAPPING THE READINGS

Assembling methodology and the difficulty involved

Sequencing provides a huge amount of readings that come from both strands of DNA. Therefore, it is not clear in what order the given fragments should be read, and checking all possible combinations is associated with the high complexity of the algorithm. The DNA pool under study can also come from many organisms, the readings of which must be submitted separately, which poses another challenge for assembly algorithms.

However, the main problem is the accuracy of the results obtained. When assembling a DNA sequence, the readings are significantly shorter than the target sequence. This presents additional difficulty when the result sequence has a repetition sequence since short readings obtained from sequencing often make it completely impossible to determine the number of repetitions of a given fragment.

Assembling in nanopore sequencing

Readings obtained by nanopore sequencing are even tens or hundreds of times longer than readings obtained from commonly used techniques such as Illumina. Thanks to this property, composing, especially de novo syntax, is much more efficient, requires fewer system resources, and reduces the likelihood of an assembly error. Long, single readings from nanopore sequencing are often able to cover the entire repetitive region, which allows the number of repeats to be determined with greater accuracy. In this way, long readings also help identify structural variants. Even a single reading from nanopore sequencing can contain complex structural variants that are often impossible to track and verify with short readings.

Nanopore sequencing is a development technique, and the readings obtained with its help have a different specificity compared to competing sequencing techniques. Therefore, more and more new tools dedicated to them are created. The Medaka program, which uses a machine learning algorithm based on neural networks, can be used to create consensus sequences from long readings. Many programs used to analyze results from other techniques also create sets of parameters suitable for long readings.

Related news

7 December 2021

HOW BACTERIA AFFECT BRAIN DEVELOPMENT

As we have learned from previous articles on our blog, bacteria play a significant role not only in digestion but also in the formation of our immune system. But there is more! Increasingly, research indicates a close connection between the development of the brain in early life and the process of colonization of the digestive […]

29 October 2021

GUT BACTERIA AND ATOPIC DERMATITIS

The bacteria that colonize our digestive system right after birth are extremely important for the proper development of immunity. Disruptions in this process may translate into the appearance of various types of diseases later in life. As it turns out, colonization time is a particularly important factor, as proven by the latest research by scientists […]

12 October 2021

OCTOBER – THE MONTH OF BREAST CANCER

Cancer is the second most common cause of death in Poland. Annually, almost 170,000 people are diagnosed and about 100 thousand die of it. As a partner of the ONKOODPOWIEDZIALNI (cancer responsible) campaign, in October we engage in the prevention of breast cancer, which is the most common cancer among women. ONKOODPOWIEDZIALNI action In June […]

6 September 2021

A NEW BASECALLING MODEL

Oxford Nanopore is constantly improving its technology, thanks to which in May this year the latest method of basecalling – Super-accurate was introduced to users. What is basecalling? Basecalling is the process of processing changes in the electric potential created in the sequencing process in order to obtain information about the sequence of the genetic […]

5 September 2021

SARS-COV-2 VARIANTS

In the viral genomes, nucleotide changes occur much more frequently than in the genomes of living organisms. In the case of SARS-CoV-2, the most commonly reported infections are currently caused by four variants: Alpha, Beta, Gamma, and Delta. These variants, due to mutations in their genome, can spread faster, make the disease more severe or […]

28 May 2021

PCT TEST INDICATING CORONAVIRUS MUTATIONS

Reading the full sequence of the SARS-CoV-2 coronavirus genomes has become a natural consequence of the effective fight against the ongoing epidemic. Such information is provided by sequencing analyzes, but most often it is not generally available to individuals. So how do we find out if one of the dangerous variants of the coronavirus has […]