Amy Akhlaghi - Random Forest as an Imputation Technique

Predicting Specialty using Random Forest

For this project, we utilized the data that was previously prepared and described in detail here. Our goal is to predict medical specialties based on the medications which were prescribed simultaneously (2-combs).

By taking advantage of randomForest and randomForestExplainer packages, Random Forest (a supervised learning method) was implemented for predicting the specialty of physicians. Ultimately, the results were visualized using the ggplot2 package. In order to achieve the desired goal the following steps were conducted:

The count of the 2-combs was calculated and sorted in a descending order
For each specialty, the selected proportions of the prescribed 2-combs were calculated
Data were restructured in a wide format
To select the appropriate variables as predictors in the Random Forest (RF) model, several RF models were fitted on the data using the first n most frequent 2-combs (n = 1, 2, ⋯, 40) (Figure 1)

Figure 1: The accuracy of the RF models against the number of most frequent 2-combs used as predictors.

Selecting variables to achieve appropriate accuracy
Checking the validity of the model using the accuracy index