Autism Screening Prediction Model
Concept
Autism Spectrum Disorder (ASD) encompasses a range of developmental disorders that affect communication, behavior, and social skills. This project focuses on applying data-driven methodologies to understand and address the challenges in diagnosing autism. Utilizing statistical models such as linear regression, k-nearest neighbors (KNN), and decision tree modeling (CART), the project aims to predict autism and provide actionable insights for early intervention. Through exploratory data analysis and model evaluation using metrics like root mean squared error (RMSE), this project contributes to the ongoing discourse on enhancing diagnostic accuracy and efficiency.
Data
The dataset includes autism screening records, capturing various attributes such as survey responses, demographic details, and health-related metrics. It features binary and integer types, covering 21 categories of absence reasons and 7 categories for absences without patient follow-up. Key attributes include transportation expense, service time, education level, social habits, and body mass index. The dataset required careful preprocessing to handle missing values and ensure robust analytical insights.
Approach
The analysis began with exploratory data analysis (EDA) to uncover patterns and relationships within the dataset. Descriptive statistics provided a comprehensive understanding of the data structure, while correlation analysis identified interdependencies among variables, facilitating insights into potential predictors of autism.
Algorithm
This project employed a dual-model strategy, using Multiple Linear Regression (MLR) and k-Nearest Neighbors (KNN). MLR captured linear relationships between predictors and the target variable—autism diagnosis. KNN, a non-parametric method, gauged similarity to predict outcomes based on the majority class among k-nearest neighbors. For tree-based analysis, the Classification and Regression Trees (CART) algorithm was used. Pruning techniques, such as Minimum Error and Best Pruned Trees, were applied to fine-tune model complexity and enhance predictive accuracy.
Key Insights and Achievements
- High Predictive Accuracy: Achieved an accuracy rate of 87.94% in identifying autism, significantly enhancing the decision-making process for healthcare professionals.
- Data-Driven Insights: Utilized advanced data analysis techniques to uncover significant patterns and predictors, providing reliable and actionable insights.
- Comprehensive Model Evaluation: Conducted thorough testing and iterative improvements, ensuring the models' robustness and applicability in real-world scenarios.
Post a comment