Belles Roca, Josep (2024) Computational prediction of promotors using the machine learning technique. Master's Research Project 2, Biology.
|
Text
mBIO2024BellesJ.pdf Download (1MB) | Preview |
|
|
Text
Toestemming.pdf Restricted to Registered users only Download (174kB) |
Abstract
This study explores the use of Convolutional Neural Networks (CNNs) to predict PhoP transcription factor binding sites across diverse bacterial species. By leveraging machine learning techniques, this approach aims to overcome experimental limitations in identifying DNA patterns where transcription factors interact. We developed a predictive model for identifying protein sequences by constructing an initial training dataset of 169 sequences, which was refined through iterative improvements. Early results indicated high precision but low recall, highlighting missed true positives. To improve sensitivity, we expanded the positive sequence dataset, balanced the ratio of positive to negative sequences, and optimized model parameters. Through further hyperparameter tuning and the use of sequence padding to focus on relevant motifs, the model achieved a final precision of 0.95, recall of 0.85, and an F1 score of 0.9, demonstrating enhanced robustness and generalizability. Overall, this study highlights the potential of CNN models in uncovering PhoP binding motifs and underscores the importance of dataset diversity and careful hyperparameter tuning in improving model accuracy and generalization.
| Item Type: | Thesis (Master's Research Project 2) |
|---|---|
| Supervisor name: | Moll, G.N. |
| Degree programme: | Biology |
| Thesis type: | Master's Research Project 2 |
| Language: | English |
| Date Deposited: | 04 Feb 2025 14:09 |
| Last Modified: | 04 Feb 2025 14:09 |
| URI: | https://fse.studenttheses.ub.rug.nl/id/eprint/34684 |
Actions (login required)
![]() |
View Item |
