Predicting Used Car Prices Using Multiple Linear Regression

Project Objective: This project aims to predict used car prices by analyzing key factors affecting market value, assisting buyers, sellers, and platforms in making data-driven pricing decisions.

Background: The competitive used car market sees fluctuating prices influenced by various attributes. Reliable price predictions enhance transparency and build user trust.

Data Description: A Kaggle dataset containing details on car attributes such as make, model, engine size, mileage, and year was used, with a focus on key determinants like engine size, body style, and mileage.

Methodology:

  • Data Preprocessing: Addressed missing values, transformed categorical variables, and normalized continuous features for consistent scaling.
  • Exploratory Data Analysis (EDA): Conducted univariate and multivariate analysis to explore relationships, identifying key factors like engine size and horsepower.
  • Model Development: Built and iteratively improved a multiple linear regression model, ensuring assumptions like linearity and residual normality were met.
  • Model Evaluation: Achieved high R-squared values (0.897 for training, 0.934 for testing), validating the model’s predictive power.

Conclusions: Multiple linear regression proved effective in predicting used car prices, with engine size, car width, and horsepower emerging as significant predictors.

Recommendations: Sellers can focus on attributes like engine size and horsepower for pricing, while platforms could integrate this model to improve price accuracy and user trust.

Technical Stack: Python, Pandas, Numpy, Matplotlib, Seaborn, Sklearn, StatsModels, and Jupyter Notebook.

Project Impact: This model supports individual buyers and sellers, as well as online platforms, by providing a robust framework for price estimation, enhancing marketplace transparency.

You can find the full project on my GitHub: GitHub Link