Visualize spread of avg. values among all factors for all variables

Visualize variation and logic for a single observation

plot_spread(df, dv, ...)

plot_spread_single_obs(df, dv, ..., labels = FALSE, isolate_id = 1)

plot_spread_interactive(...)

Arguments

df

dataframe to evaluate

dv

dependent variable to use (column name)

...

Arguments passed on to refactor_columns, refactor_columns, refactor_columns

split_on

variable to split data / group by

id_col

field to use as ID

n_cat

for categorical variables, the max number of unique values to keep. This field feeds the forcats::fct_lump(n = ) argument.

collapse_by

should n_cat collapse by the distance to the grand mean "dv" leaving the extremes as is and grouping factors closer to the grand mean as "other" or should it use size "n"

n_quantile

for numeric/date fields, the number of quantiles used to split the data into a factor. Fields that have less than this amount will not be changed.

n_digits

for numeric fields, the number of digits to keep in the breaks ex: [1.2345 to 2.3456] will be [1.23 to 2.34] if n_digits = 2

avg_type

mean or median

ignore_cols

columns to ignore from analysis. Good candidates are fields that have have no duplicate values (primary keys) or fields with a large proportion of null values

labels

when TRUE will show the labels of the factor levels outlined in the plot

isolate_id

the unique id from the field specified in unique_id or the row number when unique_id is unspecified

Functions

  • plot_spread_single_obs: highlight a single observation

  • plot_spread_interactive: utilizing ggplotly

Examples

plot_spread(ggplot2::mpg, dv = hwy)
plot_spread_single_obs(df = employee_attrition[,1:5], dv = attrition)
plot_spread_single_obs(df = employee_attrition[,1:5], dv = attrition, labels = TRUE)
plot_spread_interactive(ggplot2::mpg, dv = hwy)
#> Warning: `gather_()` was deprecated in tidyr 1.2.0. #> Please use `gather()` instead.