Random Forest as an Imputation Technique

Big Data
Random Forest
Graph
Author

Amy Akhlaghi

Published

May 28, 2023

Predicting Specialty using Random Forest

For this project, we utilized the data that was previously prepared and described in detail here. Our goal is to predict medical specialties based on the medications which were prescribed simultaneously (2-combs).

By taking advantage of randomForest and randomForestExplainer packages, Random Forest (a supervised learning method) was implemented for predicting the specialty of physicians. Ultimately, the results were visualized using the ggplot2 package. In order to achieve the desired goal the following steps were conducted:

  • The count of the 2-combs was calculated and sorted in a descending order

  • For each specialty, the selected proportions of the prescribed 2-combs were calculated

  • Data were restructured in a wide format

  • To select the appropriate variables as predictors in the Random Forest (RF) model, several RF models were fitted on the data using the first n most frequent 2-combs (n = 1, 2, ⋯, 40) (Figure 1)

Figure 1: The accuracy of the RF models against the number of most frequent 2-combs used as predictors.

  • Selecting variables to achieve appropriate accuracy

  • Checking the validity of the model using the accuracy index

Figure 2: The mean decreased accuracy by deleting each predictor of the model.