Top 10 Most Important Features
Definition: A random forest is a meta estimator that fits numerous decision tree classifiers on various
sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
(scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)
The Random Forest is ensemble method from SkLearn library, it shows how appropriated columns values to feature
importance ratios. So, each individual test has variable of conditions is based on a ratio importance that help
determine the predicted result.
It is the based logic that outlines the mathematical process in the final decision and/or prediction. The
CliffsNotes of Machine Learning’s little black box.
This Random Forest read in 31 Columns from the Credit DataOriginal.csv to classify. The RandomForestClassifier
from SkLearn fits the default.data (less the DEFAULT column) with default.target ( just the DEFAULT column).
In order to analyze the list, a Data Frame was created displaying Feature Importance and the corresponding
rating.
Then, the iloc of the top ten was utilized while sorting by descending order.
This returned the Top Ten Importance Features that were used to revise specific columns in a new data frame. The
10 columns along with the DEFAULT column became the Credit_Data_Revised.cvs used in the Neural Network
Model.
Prediction Importance Feature and Acending Ratings
Automatically, the RandomForestClassifier calculates the feature importance rating.
The rating is the weighted value of each Feature (column) in the decision process.
By sorting the feature importance in descending order, the highest feature rating lists 1–30.