statsExpressions
: Tidy dataframes and expressions with statistical detailsPackage  Status  Usage  GitHub  Miscellaneous 

The statsExpressions
package has two key aims:
Statistical packages exhibit substantial diversity in terms of their syntax and expected input type. This can make it difficult to switch from one statistical approach to another. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, different functions might expect data to be in wide or long format. Some functions can internally omit missing values, while other functions error in their presence. Furthermore, if someone wishes to utilize the objects returned by these packages downstream in their workflow, this is not straightforward either because even functions from the same package can return a list, a matrix, an array, a dataframe, etc., depending on the function.
This is where statsExpressions
comes in: It can be thought of as a unified portal through which most of the functionality in these underlying packages can be accessed, with a simpler interface and no requirement to change data format.
This package forms the statistical processing backend for ggstatsplot
package.
Type  Source  Command 

Release  CRAN  install.packages("statsExpressions") 
Development  GitHub  remotes::install_github("IndrajeetPatil/statsExpressions") 
The package can be cited as:
citation("statsExpressions")
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes
and Expressions with Statistical Details. Journal of Open Source
Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
A BibTeX entry for LaTeX users is
@Article{,
doi = {10.21105/joss.03236},
url = {https://doi.org/10.21105/joss.03236},
year = {2021},
publisher = {{The Open Journal}},
volume = {6},
number = {61},
pages = {3236},
author = {Indrajeet Patil},
title = {{statsExpressions: {R} Package for Tidy Dataframes and Expressions with Statistical Details}},
journal = {{Journal of Open Source Software}},
}
Here is a tabular summary of available tests:
The table below summarizes all the different types of analyses currently supported in this package
Description  Parametric  Nonparametric  Robust  Bayesian 

Between group/condition comparisons  ✅  ✅  ✅  ✅ 
Within group/condition comparisons  ✅  ✅  ✅  ✅ 
Distribution of a numeric variable  ✅  ✅  ✅  ✅ 
Correlation between two variables  ✅  ✅  ✅  ✅ 
Association between categorical variables  ✅  ✅  ❌  ✅ 
Equal proportions for categorical variable levels  ✅  ✅  ❌  ✅ 
Randomeffects metaanalysis  ✅  ❌  ✅  ✅ 
Summary of Bayesian analysis
Analysis  Hypothesis testing  Estimation 

(one/twosample) ttest  ✅  ✅ 
oneway ANOVA  ✅  ✅ 
correlation  ✅  ✅ 
(one/twoway) contingency table  ✅  ✅ 
randomeffects metaanalysis  ✅  ✅ 
To illustrate the simplicity of this syntax, let’s say we want to run a oneway ANOVA. If we first run a nonparametric ANOVA and then decide to run a robust ANOVA instead, the syntax remains the same and the statistical approach can be modified by changing a single argument:
library(statsExpressions)
mtcars %>% oneway_anova(cyl, wt, type = "nonparametric")
#> # A tibble: 1 x 14
#> parameter1 parameter2 statistic df.error p.value
#> <chr> <chr> <dbl> <int> <dbl>
#> 1 wt cyl 22.8 2 0.0000112
#> method estimate conf.level conf.low conf.high
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 KruskalWallis rank sum test 0.736 0.95 0.600 0.868
#> effectsize conf.method conf.iterations expression
#> <chr> <chr> <int> <list>
#> 1 Epsilon2 (rank) percentile bootstrap 100 <language>
mtcars %>% oneway_anova(cyl, wt, type = "robust")
#> # A tibble: 1 x 11
#> statistic df df.error p.value estimate conf.level conf.low conf.high
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 12.7 2 12.2 0.00102 1.04 0.95 0.787 1.60
#> effectsize
#> <chr>
#> 1 Explanatory measure of effect size
#> method expression
#> <chr> <list>
#> 1 A heteroscedastic oneway ANOVA for trimmed means <language>
All possible output dataframes from functions are tabulated here: https://indrajeetpatil.github.io/statsExpressions/articles/web_only/dataframe_outputs.html
Needless to say this will also work with the kable
function to generate a table:
# setup
library(statsExpressions)
set.seed(123)
# onesample robust ttest
# we will leave `expression` column out; it's not needed for using only the dataframe
mtcars %>%
one_sample_test(wt, test.value = 3, type = "robust") %>%
dplyr::select(expression) %>%
knitr::kable()
statistic  p.value  method  estimate  conf.low  conf.high  conf.level  effectsize 

1.179181  0.22  Bootstrapt method for onesample test  3.197  2.872163  3.521837  0.95  Trimmed mean 
These functions also play nicely with dplyr
function. For example, let’s say we want to run a onesample ttest for all levels of a certain grouping variable. Here is how you can do it:
# for reproducibility
set.seed(123)
library(dplyr)
# grouped operation
# running onesample test for all levels of grouping variable `cyl`
mtcars %>%
group_by(cyl) %>%
group_modify(~ one_sample_test(.x, wt, test.value = 3), .keep = TRUE) %>%
ungroup()
#> # A tibble: 3 x 15
#> cyl mu statistic df.error p.value method alternative estimate
#> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <dbl>
#> 1 4 3 4.16 10 0.00195 One Sample ttest two.sided 1.16
#> 2 6 3 0.870 6 0.418 One Sample ttest two.sided 0.286
#> 3 8 3 4.92 13 0.000278 One Sample ttest two.sided 1.24
#> conf.level conf.low conf.high effectsize conf.method conf.distribution
#> <dbl> <dbl> <dbl> <chr> <chr> <chr>
#> 1 0.95 1.97 0.422 Hedges' g ncp t
#> 2 0.95 0.419 1.01 Hedges' g ncp t
#> 3 0.95 0.565 1.98 Hedges' g ncp t
#> expression
#> <list>
#> 1 <language>
#> 2 <language>
#> 3 <language>
Note that expression here means a preformatted intext statistical result. In addition to other details contained in the dataframe, there is also a column titled expression
, which contains expression with statistical details and can be displayed in a plot.
For all statistical test expressions, the default template attempt to follow the gold standard for statistical reporting.
For example, here are results from Welch’s ttest:
Let’s say we want to check differences in weight of the vehicle based on number of cylinders in the engine and wish to carry out robust trimmedmeans ANOVA:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
library(ggridges)
# create a ridgeplot
ggplot(iris, aes(x = Sepal.Length, y = Species)) +
geom_density_ridges(
jittered_points = TRUE, quantile_lines = TRUE,
scale = 0.9, vline_size = 1, vline_color = "red",
position = position_raincloud(adjust_vlines = TRUE)
) + # use the expression in the dataframe to display results in the subtitle
labs(
title = "A heteroscedastic oneway ANOVA for trimmed means",
subtitle = oneway_anova(iris, Species, Sepal.Length, type = "robust")$expression[[1]]
)
Let’s now see an example of a repeated measures oneway ANOVA.
# setup
set.seed(123)
library(ggplot2)
library(WRS2)
library(ggbeeswarm)
library(statsExpressions)
ggplot2::ggplot(WineTasting, aes(Wine, Taste, color = Wine)) +
geom_quasirandom() +
labs(
title = "Friedman's rank sum test",
subtitle = oneway_anova(
WineTasting,
Wine,
Taste,
paired = TRUE,
subject.id = Taster,
type = "np"
)$expression[[1]]
)
# setup
set.seed(123)
library(ggplot2)
library(gghalves)
library(ggbeeswarm)
library(hrbrthemes)
library(statsExpressions)
# create a plot
ggplot(ToothGrowth, aes(supp, len)) +
geom_half_boxplot() +
geom_beeswarm(beeswarmArgs = list(side = 1)) +
theme_ipsum_rc() +
# adding a subtitle with
labs(
title = "TwoSample Welch's ttest",
subtitle = two_sample_test(ToothGrowth, supp, len)$expression[[1]]
)
We can also have a look at a repeated measures design and the related expressions.
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
library(tidyr)
library(PairedData)
data(PrisonStress)
# plot
paired.plotProfiles(PrisonStress, "PSSbefore", "PSSafter", subjects = "Subject") +
# `statsExpressions` needs data in the tidy format
labs(
title = "Twosample Wilcoxon paired test",
subtitle = two_sample_test(
data = pivot_longer(PrisonStress, starts_with("PSS"), "PSS", values_to = "stress"),
x = PSS,
y = stress,
paired = TRUE,
subject.id = Subject,
type = "np"
)$expression[[1]]
)
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# creating a histogram plot
ggplot(mtcars, aes(wt)) +
geom_histogram(alpha = 0.5) +
geom_vline(xintercept = mean(mtcars$wt), color = "red") +
# adding a caption with a nonparametric onesample test
labs(
title = "OneSample Wilcoxon Signed Rank Test",
subtitle = one_sample_test(mtcars, wt, test.value = 3, type = "nonparametric")$expression[[1]]
)
Let’s look at another example where we want to run correlation analysis:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# create a scatter plot
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x) +
labs(
title = "Spearman's rank correlation coefficient",
subtitle = corr_test(mtcars, mpg, wt, type = "nonparametric")$expression[[1]]
)
For categorical/nominal data  onesample:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# basic pie chart
ggplot(as.data.frame(table(mpg$class)), aes(x = "", y = Freq, fill = factor(Var1))) +
geom_bar(width = 1, stat = "identity") +
theme(axis.line = element_blank()) +
# cleaning up the chart and adding results from onesample proportion test
coord_polar(theta = "y", start = 0) +
labs(
fill = "Class",
x = NULL,
y = NULL,
title = "Pie Chart of class (type of car)",
subtitle = contingency_table(as.data.frame(table(mpg$class)), Var1, counts = Freq)$expression[[1]],
caption = "Onesample goodness of fit proportion test"
)
You can also use these function to get the expression in return without having to display them in plots:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# Pearson's chisquared test of independence
contingency_table(mtcars, am, cyl)$expression[[1]]
#> paste(chi["Pearson"]^2, "(", "2", ") = ", "8.74", ", ", italic("p"),
#> " = ", "0.013", ", ", widehat(italic("V"))["Cramer"], " = ",
#> "0.46", ", CI"["95%"], " [", "0.00", ", ", "0.78", "], ",
#> italic("n")["obs"], " = ", "32")
# setup
set.seed(123)
library(metaviz)
library(ggplot2)
library(metaplus)
# metaanalysis forest plot with results randomeffects metaanalysis
viz_forest(
x = mozart[, c("d", "se")],
study_labels = mozart[, "study_name"],
xlab = "Cohen's d",
variant = "thick",
type = "cumulative"
) + # use `statsExpressions` to create expression containing results
labs(
title = "Metaanalysis of Pietschnig, Voracek, and Formann (2010) on the Mozart effect",
subtitle = meta_analysis(dplyr::rename(mozart, estimate = d, std.error = se))$expression[[1]]
) +
theme(text = element_text(size = 12))
Sometimes you may not wish include so many details in the subtitle. In that case, you can extract the expression and copypaste only the part you wish to include. For example, here only statistic and pvalues are included:
# setup
set.seed(123)
library(ggplot2)
library(statsExpressions)
# extracting detailed expression
(res_expr < oneway_anova(iris, Species, Sepal.Length, var.equal = TRUE)$expression[[1]])
#> paste(italic("F")["Fisher"], "(", "2", ",", "147", ") = ", "119.26",
#> ", ", italic("p"), " = ", "1.67e31", ", ", widehat(omega["p"]^2),
#> " = ", "0.61", ", CI"["95%"], " [", "0.52", ", ", "0.68",
#> "], ", italic("n")["obs"], " = ", "150")
# adapting the details to your liking
ggplot(iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
labs(subtitle = ggplot2::expr(paste(
NULL, italic("F"), "(", "2",
",", "147", ") = ", "119.26", ", ",
italic("p"), " = ", "1.67e31"
)))
Here a goto summary about statistical test carried out and the returned effect size for each function is provided. This should be useful if one needs to find out more information about how an argument is resolved in the underlying package or if one wishes to browse the source code. So, for example, if you want to know more about how oneway (betweensubjects) ANOVA, you can run ?stats::oneway.test
in your R console.
two_sample_test
+ oneway_anova
No. of groups: 2
=> two_sample_test
No. of groups: > 2
=> oneway_anova
Hypothesis testing
Type  No. of groups  Test  Function used 

Parametric  > 2  Fisher’s or Welch’s oneway ANOVA  stats::oneway.test 
Nonparametric  > 2  Kruskal–Wallis oneway ANOVA  stats::kruskal.test 
Robust  > 2  Heteroscedastic oneway ANOVA for trimmed means  WRS2::t1way 
Bayes Factor  > 2  Fisher’s ANOVA  BayesFactor::anovaBF 
Parametric  2  Student’s or Welch’s ttest  stats::t.test 
Nonparametric  2  Mann–Whitney U test  stats::wilcox.test 
Robust  2  Yuen’s test for trimmed means  WRS2::yuen 
Bayesian  2  Student’s ttest  BayesFactor::ttestBF 
Effect size estimation
Type  No. of groups  Effect size  CI?  Function used 

Parametric  > 2  ,  ✅  effectsize::omega_squared , effectsize::eta_squared 
Nonparametric  > 2  ✅  effectsize::rank_epsilon_squared 

Robust  > 2  (Explanatory measure of effect size)  ✅  WRS2::t1way 
Bayes Factor  > 2  ✅  performance::r2_bayes 

Parametric  2  Cohen’s d, Hedge’s g  ✅  effectsize::cohens_d , effectsize::hedges_g 
Nonparametric  2  r (rankbiserial correlation)  ✅  effectsize::rank_biserial 
Robust  2  (AlginaKeselmanPenfield robust standardized difference)  ✅  WRS2::akp.effect 
Bayesian  2  ✅  bayestestR::describe_posterior 
Hypothesis testing
Type  No. of groups  Test  Function used 

Parametric  > 2  Oneway repeated measures ANOVA  afex::aov_ez 
Nonparametric  > 2  Friedman rank sum test  stats::friedman.test 
Robust  > 2  Heteroscedastic oneway repeated measures ANOVA for trimmed means  WRS2::rmanova 
Bayes Factor  > 2  Oneway repeated measures ANOVA  BayesFactor::anovaBF 
Parametric  2  Student’s ttest  stats::t.test 
Nonparametric  2  Wilcoxon signedrank test  stats::wilcox.test 
Robust  2  Yuen’s test on trimmed means for dependent samples  WRS2::yuend 
Bayesian  2  Student’s ttest  BayesFactor::ttestBF 
Effect size estimation
Type  No. of groups  Effect size  CI?  Function used 

Parametric  > 2  ,  ✅  effectsize::omega_squared , effectsize::eta_squared 
Nonparametric  > 2  (Kendall’s coefficient of concordance)  ✅  effectsize::kendalls_w 
Robust  > 2  (AlginaKeselmanPenfield robust standardized difference average)  ✅  WRS2::wmcpAKP 
Bayes Factor  > 2  ✅  performance::r2_bayes 

Parametric  2  Cohen’s d, Hedge’s g  ✅  effectsize::cohens_d , effectsize::hedges_g 
Nonparametric  2  r (rankbiserial correlation)  ✅  effectsize::rank_biserial 
Robust  2  (AlginaKeselmanPenfield robust standardized difference)  ✅  WRS2::wmcpAKP 
Bayesian  2  ✅  bayestestR::describe_posterior 
one_sample_test
Hypothesis testing
Type  Test  Function used 

Parametric  Onesample Student’s ttest  stats::t.test 
Nonparametric  Onesample Wilcoxon test  stats::wilcox.test 
Robust  Bootstrapt method for onesample test  trimcibt (custom) 
Bayesian  Onesample Student’s ttest  BayesFactor::ttestBF 
Effect size estimation
Type  Effect size  CI?  Function used 

Parametric  Cohen’s d, Hedge’s g  ✅  effectsize::cohens_d , effectsize::hedges_g 
Nonparametric  r (rankbiserial correlation)  ✅  effectsize::rank_biserial 
Robust  trimmed mean  ✅  trimcibt (custom) 
Bayes Factor  ✅  bayestestR::describe_posterior 
corr_test
Hypothesis testing and Effect size estimation
Type  Test  CI?  Function used 

Parametric  Pearson’s correlation coefficient  ✅  correlation::correlation 
Nonparametric  Spearman’s rank correlation coefficient  ✅  correlation::correlation 
Robust  Winsorized Pearson correlation coefficient  ✅  correlation::correlation 
Bayesian  Pearson’s correlation coefficient  ✅  correlation::correlation 
contingency_table
Hypothesis testing
Type  Design  Test  Function used 

Parametric/Nonparametric  Unpaired  Pearson’s test  stats::chisq.test 
Bayesian  Unpaired  Bayesian Pearson’s test  BayesFactor::contingencyTableBF 
Parametric/Nonparametric  Paired  McNemar’s test  stats::mcnemar.test 
Bayesian  Paired  ❌  ❌ 
Effect size estimation
Type  Design  Effect size  CI?  Function used 

Parametric/Nonparametric  Unpaired  Cramer’s  ✅  effectsize::cramers_v 
Bayesian  Unpaired  Cramer’s  ✅  effectsize::cramers_v 
Parametric/Nonparametric  Paired  Cohen’s  ✅  effectsize::cohens_g 
Bayesian  Paired  ❌  ❌  ❌ 
Hypothesis testing
Type  Test  Function used 

Parametric/Nonparametric  Goodness of fit test  stats::chisq.test 
Bayesian  Bayesian Goodness of fit test  (custom) 
Effect size estimation
Type  Effect size  CI?  Function used 

Parametric/Nonparametric  Cramer’s  ✅  bayestestR::describe_posterior 
Bayesian  ❌  ❌  ❌ 
meta_analysis
Hypothesis testing and Effect size estimation
Type  Test  Effect size  CI?  Function used 

Parametric  Metaanalysis via randomeffects models  ✅  metafor::metafor 

Robust  Metaanalysis via robust randomeffects models  ✅  metaplus::metaplus 

Bayes  Metaanalysis via Bayesian randomeffects models  ✅  metaBMA::meta_random 
ggstatsplot
Note that these functions were initially written to display results from statistical tests on readymade ggplot2
plots implemented in ggstatsplot
.
For detailed documentation, see the package website: https://indrajeetpatil.github.io/ggstatsplot/
Here is an example from ggstatsplot
of what the plots look like when the expressions are displayed in the subtitle
The hexsticker and the schematic illustration of general workflow were generously designed by Sarah Otterstetter (Max Planck Institute for Human Development, Berlin).
I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I personally prefer using the GitHub
issues system over trying to reach out to me in other ways (personal email, Twitter, etc.). Pull Requests for contributions are encouraged.
Here are some simple ways in which you can contribute (in the increasing order of commitment):
Read and correct any inconsistencies in the documentation
Raise issues about bugs or wanted features
Review code
Add new functionality (in the form of new plotting functions or helpers for preparing subtitles)
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.