View on GitHub

Exploratory-Data-Analysis

Readme file for the Exploratory Data Analysis project

Statistical/Hypothetical Question: Identify variables that are significant in driving the price of used cars in the used cars dataset from eBay.

Outcome of the EDA: Strong relationships exist between certain variables but there are too many confounding factors to infer causation. We are able to see strong relationship between car price and other factors like power of the engine, number of years old and number of days listed online.

What do I feel was missed during the analysis? If I had another data set with enough data and the actual selling price of the cars, I could use the existing dataset to train the model and use the dataset with actual selling price to validate and predict on. With only listed price in the data, we can not do much other than exploratory data analysis.

Were there any variables I felt could have helped in the analysis? If I could see actual selling price of the cars it would add our ability to validate and test our model.

Were there any assumptions made you felt were incorrect? I assumed that factors like the car’s odometer reading (kilometer), model or brand will play a significant role in the price. I did not see the correlation as much as I expected. Also, having 10014 records with 0 price and having a maximum price value of 2147483647 in the data means even after the data cleansing, there were junk data left behind.

What challenges did you face, what did you not fully understand? I understood and was able to apply Python in the different statistical calculations in the ask, such as PMF, CDF, Analytical Distribution, Hypothesis Testing, Regression Analysis, etc. I developed a fair idea of how to slice and dice an unknown dataset to understand the internal dynamics. This dataset from Kaggle has helped me to easily apply the theoretical concepts in real world.

Exploratory-Data-Analysis

Exploratory-Data-Analysis

Table of contents

Getting Started

Prerequisite

Installation

Usage

Project Status

Versioning

Authors

Acknowledgements