- Work Accident: Most employees have not meet accidents at work.
```{r echo=FALSE , warning=FALSE, message=FALSE}
freq(HR_comma_sep$Work_accident)
```
### Summary of the variable 'promotion_last_5years'
- Promotion in Last 5years: Most employees have not received promotions in the last five years.
```{r echo=FALSE , warning=FALSE, message=FALSE}
freq(HR_comma_sep$promotion_last_5years)
```
### Summary of the variable 'resigned'
- Resigned: Most employees stay with the organization and do not leave.
```{r , echo=FALSE, message=FALSE, warning=FALSE}
freq(HR_comma_sep$resigned)
```
- 0 denotes those who stayed.
- 1 denotes those who resigned.
### Summary of the variable 'salary_grade'
- Salary : 8.25 percent of the organization are top level with the highest pay, 42.9 percent of the employees are paid a medium salary and 48.7% of the employees are paid low salary.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
freq(HR_comma_sep$salary_grade)
```
### Summary of the variable 'sales'
- departments: Represents the number of employees in each department. Department Sales has the highest number of employees at 27% and management the lowest which forms only 4.2 percent.
```{r, echo=FALSE, message=FALSE, warning=FALSE}
freq(HR_comma_sep$department)
```
# Splitting the Data into Training and Validation:
- Mutated factors of the categorical variables: salary, sales and left.
- Created a new variable 'random' using the runif() function to generate random deviates of the uniform distribution.
To split data into training (train) and test set: validation (val)
- train (10540 observations with 10 variables) and
- val (4459 observations with 10 variables).
# Creating and Interpreting the Classification Tree:
```{r echo=FALSE , warning=FALSE, message=FALSE}
rpart.plot(ct1)
ct1$cptable
```
## Interpreting Two Complete Paths:
- At the top when no condition is applied on the training data set (train) the best guess is determined as 0 (NOT left).
- Ergo, of the total observations 76% did not leave and 24% left.
### Path 1 : Will Not Leave (loyal)
- First condition: satisfaction_level >= 47 percent.
- Second condition: time_spend_company < 5 years.
- Third condition: last_evaluation < 81 percent.
- Hence, those who did NOT leave are highly satisfied, have spent at least 4 years in the organization and are good performers with an evaluation of at least 80 percent.
### Path 2 : Will Leave (or are likely to resign)
- First condition: satisfaction_level < 47 percent.
- Second condition: number_project >= 3 projects.
- Third condition: last_evaluation >= 58 percent.
- Hence, those who leave are lowly or moderately satisfied, have a work load of 3 or more projects with their performance being evaluated at least 58 percent.
# Variable importance:
- The variables are mentioned in the order of their importance below: