Note: Columns that most data are missing are removed. Missing values on 'state' column are filled with common knowledge. Missing values on
'make_the_world_better_percent' column are replaced with average value of other rows since only few values are missing.
Note: In the first graph, there are some values not in normal range for private schools. However, since in real life, there are some universities are hard to get in, it is understandable
that there are schools have really low accept-rate.
In the second graph, there are no potential outliers.
Note: Image 4 is a new dataset combined by other three datasets.
text data after data cleaning
Note:'state' and 'make_the_world_better_percent' columns are affect by data cleaning.
R codes (record data)
Python codes (text data)