Convert all fields to factors

refactor_columns(
  df,
  dv,
  split_on = NA_character_,
  id_col = NULL,
  n_cat = 10,
  collapse_by = c("dv", "n"),
  n_quantile = 10,
  n_digits = 2,
  avg_type = c("mean", "median"),
  ignore_cols = NULL
)

Arguments

df	dataframe to evaluate
dv	dependent variable to use (column name)
split_on	variable to split data / group by
id_col	field to use as ID
n_cat	for categorical variables, the max number of unique values to keep. This field feeds the `forcats::fct_lump(n = )` argument.
collapse_by	should `n_cat` collapse by the distance to the grand mean `"dv"` leaving the extremes as is and grouping factors closer to the grand mean as "other" or should it use size `"n"`
n_quantile	for numeric/date fields, the number of quantiles used to split the data into a factor. Fields that have less than this amount will not be changed.
n_digits	for numeric fields, the number of digits to keep in the breaks ex: [1.2345 to 2.3456] will be [1.23 to 2.34] if `n_digits = 2`
avg_type	mean or median
ignore_cols	columns to ignore from analysis. Good candidates are fields that have have no duplicate values (primary keys) or fields with a large proportion of null values

Examples

refactor_columns(df = iris, dv = Sepal.Length)
#> # A tibble: 150 x 7
#>    y_outcome y_split unique_id Sepal.Width      Petal.Length Petal.Width Species
#>  *     <dbl> <chr>       <int> <chr>            <chr>        <chr>       <chr>  
#>  1       5.1 1               1 07 [3.44 to 3.6~ 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  2       4.9 1               2 05 [2.96 to 3.2) 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  3       4.7 1               3 06 [3.2 to 3.44) 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  4       4.6 1               4 05 [2.96 to 3.2) 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  5       5   1               5 07 [3.44 to 3.6~ 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  6       5.4 1               6 08 [3.68 to 3.9~ 02 [1.59 to~ 02 [0.34 t~ setosa 
#>  7       4.6 1               7 06 [3.2 to 3.44) 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  8       5   1               8 06 [3.2 to 3.44) 01 [0.99 to~ 01 [0.09 t~ setosa 
#>  9       4.4 1               9 04 [2.72 to 2.9~ 01 [0.99 to~ 01 [0.09 t~ setosa 
#> 10       4.9 1              10 05 [2.96 to 3.2) 01 [0.99 to~ 01 [0.09 t~ setosa 
#> # ... with 140 more rows