Skip to contents

Normalization of data

Usage

normalize_df(df, standardQuant = TRUE, binarQual = FALSE)

numeric_variabes(df)

all_numeric_variables(df)

is_df_normalized(df)

Arguments

df

A data frame of equivalent (tibble, data table...)

standardQuant

TRUE (default) if quantitative variables of df must be normalised / standardised, i.e centered (having a 0 empirical mean) and reduced / scaled (having a 1 unbiased empirical standard deviation). FALSE otherwise. If the number of rows is equal to 1 standardisation isn't realized. Standardisation is realized after transformation from qualitative to quantitative, so if binarQual is equal to TRUE transformed qualitative variables will be normalized.

binarQual

TRUE if non-numeric values must be converted in numeric form by creating for each qualitative variable one dummy variable per modality minus one. Original columns will be removed. FALSE (default) otherwise. Ignore if all variables are quantitative. If standardQuant equals to TRUE created dummies will be standardised as well.

Value

For normalize_df the normalized version of df. If binarQual = TRUE and some categorical variables were if df the number of variables (columns) of df might be higher.

For numeric_variables a logical vector of length ncol(df)

indicating which variable is numeric and which is not.

For all_numeric_variables, TRUE if all variables of df

are quantitative, FALSE otherwise. Returns TRUE by convention if there is no variable at all.

For is_df_normalized, a flag equals to TRUE if all numeric variables are normalized, FALSE otherwise. Returns TRUE if there is no numeric variables.

Functions

  • normalize_df(): Normalize quantitative and / or qualitative df variables.

  • numeric_variabes(): Give a logical mask indicating which columns of df are numeric.

  • all_numeric_variables(): Give a flag indicating if all variables of df are quantitative. Shortcut for all(numeric_variables(df)).

  • is_df_normalized(): Check if the numeric variables of df are normalized.

References

Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables.