DevLog #2 for PD Biomarkers

Yesterday XGboost wasnt really working so today I switched to logistic regression and it worked perfectly. My ROC AUC and AP are now pretty solid and I can trust the biomarkers that my model extract.

Tomorrow, I’m going to be try to use Random Forest standard instead of XGboost and maybe see if that makes it better. Also, I’m going to focus on documentation readmes all that.

As you can see in the PCA Space plot the controls and the patients are split up pretty well and looks clean.

I’m also looking into using TargetScan with Kegg Pathway Analysis to Biologically validate my top 14 biomarkers.

Open comments for this post

@arjundakshin on PD biomarkers · 5 days ago

1h 6m 11s logged

Hi! This is my first parkinson’s project devlog.

I started this project around 3 months ago, and I knew nothing about machine learning then. In that time, I learnt everything I needed for my project. So far, I have made a decent pipeline to identify biomarkers.

I’m using an NCBI GEO Superseries for training, and I was planning on using a Portuguese cohort for extra real-world data tests.

Today, when I was comparing whether to use XGBoost or Random Forest, it gave accuracies of 93-95%, but after more tests, I realized that my AUC rn is around 70%.

Tomorrow, I’m going to try switching from XGBoost + Boruta feature selection to Logistic Regression to see if my AUC can be brought up to at least 80.