Javascript must be enabled for the correct page display

Computational prediction of promotors using the machine learning technique

Belles Roca, Josep (2024) Computational prediction of promotors using the machine learning technique. Master's Research Project 2, Biology.

[img]
Preview
Text
mBIO2024BellesJ.pdf

Download (1MB) | Preview
[img] Text
Toestemming.pdf
Restricted to Registered users only

Download (174kB)

Abstract

This study explores the use of Convolutional Neural Networks (CNNs) to predict PhoP transcription factor binding sites across diverse bacterial species. By leveraging machine learning techniques, this approach aims to overcome experimental limitations in identifying DNA patterns where transcription factors interact. We developed a predictive model for identifying protein sequences by constructing an initial training dataset of 169 sequences, which was refined through iterative improvements. Early results indicated high precision but low recall, highlighting missed true positives. To improve sensitivity, we expanded the positive sequence dataset, balanced the ratio of positive to negative sequences, and optimized model parameters. Through further hyperparameter tuning and the use of sequence padding to focus on relevant motifs, the model achieved a final precision of 0.95, recall of 0.85, and an F1 score of 0.9, demonstrating enhanced robustness and generalizability. Overall, this study highlights the potential of CNN models in uncovering PhoP binding motifs and underscores the importance of dataset diversity and careful hyperparameter tuning in improving model accuracy and generalization.

Item Type: Thesis (Master's Research Project 2)
Supervisor name: Moll, G.N.
Degree programme: Biology
Thesis type: Master's Research Project 2
Language: English
Date Deposited: 04 Feb 2025 14:09
Last Modified: 04 Feb 2025 14:09
URI: https://fse.studenttheses.ub.rug.nl/id/eprint/34684

Actions (login required)

View Item View Item