Stroke_Prediction

you can reach the application demo here.

Stroke Prediction

Jefn Alshammari & Abdulaziz Almass

Abstract

This project aimed to predict stroke for people by analyzing a dataset found in Kaggle using different machine learning models(MLs) to help the medical staff to recognize those people with stroke. The used dataset was trained and get 96% accuracy as the highest value of the different used models.

Design

This project is one of the T5 Data Science BootCamp requirements. Data provided by Kaggle has been used in this project. The attribute "Stroke" is the label or target to be predicted in this project. This target is binary having either 1 or 0 as a value. The value of "1" means predicted with stroke and "0" means predicted without a stroke. This classifcation prediction is deployed using various machine learning models and a comparison of these models is done to measure of performance for each model to find the one that fits with the selected dataset. All of the following models have been used and tested: Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Support-Vector Machines (SVM),Random Forest and XGBoost.

Data

The dataset is available in .csv format. It consists of 5110 observations/data points with 12 attributes or features. From exploratory data analysis, the age feature has an important role in stroke prediction which most models deployed confirmed afterwards. Other features were not definately if they are important due to the imbalanced dataset that has been treated with Synthetic Minority Oversampling Technique (SMOTE). Another important feature of this project is the label of the stroke whether the person is predicted with a stroke or not.

Models

Logistic Regression, K-Nearest Neighbors (KNN), Decision Tree, Support-Vector Machines (SVM),Random Forest and XGBoost are trained to predict stroke. The Random Forest has the highiest accuracy.

Models Evaluation and Selection

The following metrics summarize the results of all ML models used in this project :

		Macro Avg	Accuracy	Precision	Recall	F1 Score
Logistic Regression	Imbalanced	0.50	0.92	0.93 0.00	1.00 0.00	0.96 0.00
Logistic Regression	Not Scaled	0.91	0.90	0.89 0.92	0.93 0.89	0.91 0.90
Logistic Regression	Scaled	0.92	0.91	0.88 0.97	0.97 0.87	0.92 0.92
Logistic Regression	Tuned & Scaled	0.92	0.91	0.89 0.94	0.94 0.89	0.92 0.91
KNN	Scaled	0.94	0.94	0.96 0.93	0.93 0.96	0.94 0.94
Decision Tree	Scaled	0.92	0.92	0.94 0.91	0.90 0.95	0.92 0.93
SVM	Not Scaled	0.92	0.92	0.89 0.96	0.97 0.88	0.92 0.92
SVM	Scaled	0.92	0.92	0.88 0.97	0.97 0.87	0.92 0.92
Random Forest	Scaled	0.97	0.96	0.96 0.97	0.97 0.96	0.97 0.97
XGBoost	Scaled	0.93	0.93	0.91 0.95	0.95 0.91	0.93 0.93

Tools

Pandas library for data frames
Numpy for mathematical operations
Matplotlib and Seaborn for plots
SKlearn for modeling
One-Hot-Encoding for categorical labeling
Imblearn
Plotly
Seaborn
XGBoost

Communication

The presentation show is provided here, besides details are provided at the readme of the project. for any enquiries, you can contact us via Email or Twitter via

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
Deployment		Deployment
MVP		MVP
Proposal		Proposal
.DS_Store		.DS_Store
README.md		README.md
Storke_Prediction_Presentation.pdf		Storke_Prediction_Presentation.pdf
Stroke_Predictions_Project.ipynb		Stroke_Predictions_Project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stroke_Prediction

Stroke Prediction

Abstract

Design

Data

Tools

Communication

About

Releases

Packages

Languages

abdulazizalmass/Stroke_Prediction

Folders and files

Latest commit

History

Repository files navigation

Stroke_Prediction

Stroke Prediction

Abstract

Design

Data

Tools

Communication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages