Normalization of data
Usage
normalize_df(df, standardQuant = TRUE, binarQual = FALSE)
numeric_variabes(df)
all_numeric_variables(df)
is_df_normalized(df)Arguments
- df
 A data frame of equivalent (tibble, data table...)
- standardQuant
 TRUE(default) if quantitative variables ofdfmust be normalised / standardised, i.e centered (having a 0 empirical mean) and reduced / scaled (having a 1 unbiased empirical standard deviation).FALSEotherwise. If the number of rows is equal to 1 standardisation isn't realized. Standardisation is realized after transformation from qualitative to quantitative, so ifbinarQualis equal toTRUEtransformed qualitative variables will be normalized.- binarQual
 TRUEif non-numeric values must be converted in numeric form by creating for each qualitative variable one dummy variable per modality minus one. Original columns will be removed.FALSE(default) otherwise. Ignore if all variables are quantitative. IfstandardQuantequals toTRUEcreated dummies will be standardised as well.
Value
For normalize_df the normalized version of df. If
binarQual = TRUE and some categorical variables were if df the number
of variables (columns) of df might be higher.
For numeric_variables a logical vector of length ncol(df)
indicating which variable is numeric and which is not.
For all_numeric_variables, TRUE if all variables of df
are quantitative, FALSE otherwise. Returns TRUE by convention if there
is no variable at all.
For is_df_normalized, a flag equals to TRUE if all numeric
variables are normalized, FALSE otherwise. Returns TRUE if there is
no numeric variables.
Functions
normalize_df(): Normalize quantitative and / or qualitativedfvariables.numeric_variabes(): Give a logical mask indicating which columns ofdfare numeric.all_numeric_variables(): Give a flag indicating if all variables ofdfare quantitative. Shortcut forall(numeric_variables(df)).is_df_normalized(): Check if the numeric variables ofdfare normalized.
References
Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables.
