Applied Cell Biology

Applied Cell Biology

Seq_B_LSTM_CNN_HPO: Rare Mendelian Diseases to Genotypes Associations from Multiple Data Sources

Mohamed Elhajabdou1*, Amr Maged Ehelw1, Hassan Eldib1 and Mohamed Elhabrouk2

1Faculty of Engineering, Arab Academy for Science and Technology and Maritime Transport, Alexandria, Egypt
2Faculty of Engineering, Alexandria University, Alexandria, 21544, Egypt

Abstract

Motivation: Genotype-Phenotype annotations have become a crucial tool for studying the abnormalities in phenotype diseases. These abnormalities and relations can help to understand more the complex, and hidden information. This information clearly describes the genetic mutations causes in the organisms such as human. Several systems and algorithms have been proposed and implemented to solve this issue, since the digital information is provided for free online from different resources that describe the human mutations and the different variations in genes. Machine learning, especially deep artificial neural network, has proven its ability to overcome the limitations of these traditional algorithms and remarkably performing at extraordinary accuracies compared to conventional methods such as statistical techniques and others.

Results: In this paper, a multilabel hyper-artificial neural networks model classifier is proposed and implemented for predicting rare mendelian diseases. It is called Seq_B_LSTM_CNN_HPO. The proposed system trained on more than 50 features obtained from four data sources, Gene Ontology (GO), Human Phenotype Ontology (HPO), UniProtKB, and Gene Expressions to learn complex features and relations. The proposed system was tested on UniProtKB dataset and compared with different proposed systems in the fields. The experiment was performed on human organism for variety of analytical study in order to find new relations between phenotype diseases. The tabulated results are evaluated using six different unique evaluation metrics with outstanding results scores of Fmax, Precision, Recall, AUPR, AUROC, Smin with scores of 0.894, 0.902, 0.886, 0.711, 0.631, 0.384 which outperformed several proposed systems in the literature.

Data and Source Code Availability: The source code is provided at GitHub repository and the dataset is uploaded at Google_Drive

Keywords:
Powered By
Acadwise

Subscribe to our newsletter and stay up to date with the latest news and deals!

Connect via