Normalization of data
Usage
normalize_df(df, standardQuant = TRUE, binarQual = FALSE)
numeric_variabes(df)
all_numeric_variables(df)
is_df_normalized(df)
Arguments
- df
A data frame of equivalent (tibble, data table...)
- standardQuant
TRUE
(default) if quantitative variables ofdf
must be normalised / standardised, i.e centered (having a 0 empirical mean) and reduced / scaled (having a 1 unbiased empirical standard deviation).FALSE
otherwise. If the number of rows is equal to 1 standardisation isn't realized. Standardisation is realized after transformation from qualitative to quantitative, so ifbinarQual
is equal toTRUE
transformed qualitative variables will be normalized.- binarQual
TRUE
if non-numeric values must be converted in numeric form by creating for each qualitative variable one dummy variable per modality minus one. Original columns will be removed.FALSE
(default) otherwise. Ignore if all variables are quantitative. IfstandardQuant
equals toTRUE
created dummies will be standardised as well.
Value
For normalize_df
the normalized version of df
. If
binarQual = TRUE
and some categorical variables were if df
the number
of variables (columns) of df
might be higher.
For numeric_variables
a logical vector of length ncol(df)
indicating which variable is numeric and which is not.
For all_numeric_variables
, TRUE
if all variables of df
are quantitative, FALSE
otherwise. Returns TRUE
by convention if there
is no variable at all.
For is_df_normalized
, a flag equals to TRUE
if all numeric
variables are normalized, FALSE
otherwise. Returns TRUE
if there is
no numeric variables.
Functions
normalize_df()
: Normalize quantitative and / or qualitativedf
variables.numeric_variabes()
: Give a logical mask indicating which columns ofdf
are numeric.all_numeric_variables()
: Give a flag indicating if all variables ofdf
are quantitative. Shortcut forall(numeric_variables(df))
.is_df_normalized()
: Check if the numeric variables ofdf
are normalized.
References
Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables.