Car Price Prediction in the USA by using Liner Regression


  • Huseyn Mammadov Carlo Bo University of Urbino



car price prediction, liner regression, data understanding, data cleaning, USA


This paper studies a Linear Regression model to predict the car prices for the U.S market to help a new entrant understand important pricing factors/variables in the U.S automobile industry. The prediction of a car price has become a high-interest research area of great importance, as it requires significant initiative and knowledge of the field expert. I have applied to a highly comprehensive analysis with all data cleaning, exploration, visualization, feature selection and model building. The data used for the prediction was collected from the web portal using a web scraper that was written in Python/Jupyter programming language. According to problem-solving, I have split it into 5 parts (Data understanding and exploration, Data cleaning, Data preparation: Feature Engineering and Scaling, Feature Selection using RFE and Model Building and Linear Regression Assumptions Validation and Outlier Removal).


Gongqi, S., Yansong, W., & Qiang, Z. (2011, January). New Model for Residual Value Prediction of the Used Car Based on BP Neural Network and Nonlinear Curve Fit. In Measuring Technology and Mechatronics Automation (ICMTMA), 2011 Third International Conference on (Vol. 2, pp. 682-685). IEEE.

Richardson, M. S. (2009). Determinants of used car resale value. Retrieved from: /coccc%3A1346 [accessed: August 1, 2020.]

Noor, K., & Jan, S. (2017). Vehicle Price Prediction System using Machine Learning Techniques. International Journal of Computer Applications, 167(9), 27-31.

Wu, J. D., Hsu, C. C., & Chen, H. C. (2009). An expert system of price forecasting for used cars using adaptive neuro-fuzzy inference. Expert Systems with Applications, 36(4), 7809-7817

GELMAN, A. AND HILL, J., 2006. Data Analysis Using Regression and Multilevel Hierarchical Models. Cambridge University Press, New York, USA.

Used cars database. (n.d.) Retrieved from:

QUINLAN, J. R., 1993. C4.5: Programs for Machine Learning. Morgan Kauffmann.

DU, J., XIE, L. AND SCHROEDER S., 2009. Practice Prize Paper – PIN Optimal Distribution of Auction Vehicles System: Applying Price Forecasting, Elasticity Estimation and Genetic Algorithms to Used-Vehicle Distribution. Marketing Science, Vol. 28, Issue 4, pp. 637-644.

LISTIANI, M., 2009. Support Vector Regression Analysis for Price Prediction in a Car Leasing Application. Thesis (MSc). Hamburg University of Technology.