Prospects for recurrent neural network models to learn RNA biophysics from high-throughput data

bioRxiv(2018)

引用 1|浏览14
暂无评分
摘要
RNA is a functionally versatile molecule that plays key roles in genetic regulation and in emerging technologies to control biological processes. Computational models of RNA secondary structure are well-developed but often fall short in making quantitative predictions of the behavior of multi-RNA complexes. Recently, large datasets characterizing hundreds of thousands of individual RNA complexes have emerged as rich sources of information about RNA energetics. Meanwhile, advances in machine learning have enabled the training of complex neural networks from large datasets. Here, we assess whether a recurrent neural network model, Ribonet, can learn from high-throughput binding data, using simulation and experimental studies to test model accuracy but also determine if they learned meaningful information about the biophysics of RNA folding. We began by evaluating the model on energetic values predicted by the Turner model to assess whether the neural network could learn a representation that recovered known biophysical principles. First, we trained Ribonet to predict the simulated free energy of an RNA in complex with multiple input RNAs. Our model accurately predicts free energies of new sequences but also shows evidence of having learned base pairing information, as assessed by in silico double mutant analysis. Next, we extended this model to predict the simulated affinity between an arbitrary RNA sequence and a reporter RNA. While these more indirect measurements precluded the learning of basic principles of RNA biophysics, the resulting model achieved sub-kcal/mol accuracy and enabled design of simple RNA input responsive riboswitches with high activation ratios predicted by the Turner model from which the training data were generated. Finally, we compiled and trained on an experimental dataset comprising over 600,000 experimental affinity measurements published on the Eterna open laboratory. Though our tests revealed that the model likely did not learn a physically realistic representation of RNA interactions, it nevertheless achieved good performance of 0.76 kcal/mol on test sets with the application of transfer learning and novel sequence-specific data augmentation strategies. These results suggest that recurrent neural network architectures, despite being naive to the physics of RNA folding, have the potential to capture complex biophysical information. However, more diverse datasets, ideally involving more direct free energy measurements, may be necessary to train de novo predictive models that are consistent with the fundamentals of RNA biophysics.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要