The past, the present, and the future of RNA secondary structure prediction
Конференцијски прилог (Објављена верзија)
,
© 2023 Institute of Molecular Genetics and Genetic Engineering, University of Belgrade
Метаподаци
Приказ свих података о документуАпстракт
RNA is a biopolymer whose primary structure is a sequence of nucleobases. While
messenger RNA is probably the most known, an increasing number of non-coding RNAs
is being discovered. In order to become biologically active, ncRNA folds intramolecularly,
thus forming segments of paired bases. This secondary structure largely determines
the function of an ncRNA, so its prediction is important for newly discovered sequences.
Owing to the strong link between the two structural levels, most predictors are datadriven
and sequence-based.
The oldest and simplest algorithm was base pair maximization (BPM), which did not
presume important structural features. Another approach exploited the fact that
biophysics dictates RNA folding, so it searched for the thermodynamically optimal
structure. Statistical learning was the base of the third group, with probabilistic contextfree
grammars (PCFGs) being the most influential. These were the state-of-the-art
methods at the beginning of the cen...tury.
However, much has changed in the last years, since technological advancement allowed
the widespread use of machine learning. Its use in the RNA structure prediction ranges from
being the supplementary method (e.g., for estimating thermodynamical and statistical
parameters of traditional methods) to encapsulating the whole prediction process. The
highest success has been reported with transformers, recurrent, and convolutional neural
networks (CNN).
This paper was designed as a review and aimed to compare several methods theoretically
and assess them practically. As expected, model complexity was highly correlated with
accuracy. On the subset of simply structured transfer RNA, for example, BPM predicted
~22% of pairings correctly, PCFG ~86%, and CNN ~99%. Other subsets, such as 16S
ribosomal RNA, were more challenging, but deep learning always performed best. With
the continued growth of computational power and the amount of annotated data,
prediction accuracy is expected to get even closer to the experimental determination,
while still maintaining a much lower cost.
Кључне речи:
RNA structure prediction / review / machine learningИзвор:
4th Belgrade Bioinformatics Conference, 2023, 4, 106-106Издавач:
- Belgrade : Institute of molecular genetics and genetic engineering
Финансирање / пројекти:
- Ministry of Science, Technological Development and Innovation of the Republic of Serbia through the scholarship project for young and unemployed doctoral students, contract number 451-03-1271/2022- 14/2990.
Напомена:
- Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 2023
Колекције
Институција/група
Institut za molekularnu genetiku i genetičko inženjerstvoTY - CONF AU - Vasović, Lazar PY - 2023 UR - https://belbi.bg.ac.rs/ UR - https://imagine.imgge.bg.ac.rs/handle/123456789/2051 AB - RNA is a biopolymer whose primary structure is a sequence of nucleobases. While messenger RNA is probably the most known, an increasing number of non-coding RNAs is being discovered. In order to become biologically active, ncRNA folds intramolecularly, thus forming segments of paired bases. This secondary structure largely determines the function of an ncRNA, so its prediction is important for newly discovered sequences. Owing to the strong link between the two structural levels, most predictors are datadriven and sequence-based. The oldest and simplest algorithm was base pair maximization (BPM), which did not presume important structural features. Another approach exploited the fact that biophysics dictates RNA folding, so it searched for the thermodynamically optimal structure. Statistical learning was the base of the third group, with probabilistic contextfree grammars (PCFGs) being the most influential. These were the state-of-the-art methods at the beginning of the century. However, much has changed in the last years, since technological advancement allowed the widespread use of machine learning. Its use in the RNA structure prediction ranges from being the supplementary method (e.g., for estimating thermodynamical and statistical parameters of traditional methods) to encapsulating the whole prediction process. The highest success has been reported with transformers, recurrent, and convolutional neural networks (CNN). This paper was designed as a review and aimed to compare several methods theoretically and assess them practically. As expected, model complexity was highly correlated with accuracy. On the subset of simply structured transfer RNA, for example, BPM predicted ~22% of pairings correctly, PCFG ~86%, and CNN ~99%. Other subsets, such as 16S ribosomal RNA, were more challenging, but deep learning always performed best. With the continued growth of computational power and the amount of annotated data, prediction accuracy is expected to get even closer to the experimental determination, while still maintaining a much lower cost. PB - Belgrade : Institute of molecular genetics and genetic engineering C3 - 4th Belgrade Bioinformatics Conference T1 - The past, the present, and the future of RNA secondary structure prediction EP - 106 SP - 106 VL - 4 UR - https://hdl.handle.net/21.15107/rcub_imagine_2051 ER -
@conference{ author = "Vasović, Lazar", year = "2023", abstract = "RNA is a biopolymer whose primary structure is a sequence of nucleobases. While messenger RNA is probably the most known, an increasing number of non-coding RNAs is being discovered. In order to become biologically active, ncRNA folds intramolecularly, thus forming segments of paired bases. This secondary structure largely determines the function of an ncRNA, so its prediction is important for newly discovered sequences. Owing to the strong link between the two structural levels, most predictors are datadriven and sequence-based. The oldest and simplest algorithm was base pair maximization (BPM), which did not presume important structural features. Another approach exploited the fact that biophysics dictates RNA folding, so it searched for the thermodynamically optimal structure. Statistical learning was the base of the third group, with probabilistic contextfree grammars (PCFGs) being the most influential. These were the state-of-the-art methods at the beginning of the century. However, much has changed in the last years, since technological advancement allowed the widespread use of machine learning. Its use in the RNA structure prediction ranges from being the supplementary method (e.g., for estimating thermodynamical and statistical parameters of traditional methods) to encapsulating the whole prediction process. The highest success has been reported with transformers, recurrent, and convolutional neural networks (CNN). This paper was designed as a review and aimed to compare several methods theoretically and assess them practically. As expected, model complexity was highly correlated with accuracy. On the subset of simply structured transfer RNA, for example, BPM predicted ~22% of pairings correctly, PCFG ~86%, and CNN ~99%. Other subsets, such as 16S ribosomal RNA, were more challenging, but deep learning always performed best. With the continued growth of computational power and the amount of annotated data, prediction accuracy is expected to get even closer to the experimental determination, while still maintaining a much lower cost.", publisher = "Belgrade : Institute of molecular genetics and genetic engineering", journal = "4th Belgrade Bioinformatics Conference", title = "The past, the present, and the future of RNA secondary structure prediction", pages = "106-106", volume = "4", url = "https://hdl.handle.net/21.15107/rcub_imagine_2051" }
Vasović, L.. (2023). The past, the present, and the future of RNA secondary structure prediction. in 4th Belgrade Bioinformatics Conference Belgrade : Institute of molecular genetics and genetic engineering., 4, 106-106. https://hdl.handle.net/21.15107/rcub_imagine_2051
Vasović L. The past, the present, and the future of RNA secondary structure prediction. in 4th Belgrade Bioinformatics Conference. 2023;4:106-106. https://hdl.handle.net/21.15107/rcub_imagine_2051 .
Vasović, Lazar, "The past, the present, and the future of RNA secondary structure prediction" in 4th Belgrade Bioinformatics Conference, 4 (2023):106-106, https://hdl.handle.net/21.15107/rcub_imagine_2051 .