Predicting Specialty using Random Forest
For this project, we utilized the data that was previously prepared and described in detail here. Our goal is to predict medical specialties based on the medications which were prescribed simultaneously (2-combs).
By taking advantage of randomForest
and randomForestExplainer packages, Random Forest (a supervised learning method) was implemented for predicting the specialty of physicians. Ultimately, the results were visualized using the ggplot2
package. In order to achieve the desired goal the following steps were conducted:
The count of the 2-combs was calculated and sorted in a descending order
For each specialty, the selected proportions of the prescribed 2-combs were calculated
Data were restructured in a wide format
To select the appropriate variables as predictors in the Random Forest (RF) model, several RF models were fitted on the data using the first n most frequent 2-combs (n = 1, 2, ⋯, 40) (Figure 1)
Selecting variables to achieve appropriate accuracy
Checking the validity of the model using the accuracy index