--- title: "Extending apa_print()" author: "Frederik Aust" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Extending apa_print()} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library("papaja") ``` Here, we provide a brief overview of issues to consider when implementing a new method for `apa_print()`, a convenience function to facilitate reporting of results in accordance with APA reporting guidelines. If you consider adding a new method, please review our brief [contributing guidelines](https://github.com/crsh/papaja/blob/master/.github/CONTRIBUTING.md) and [code of conduct](https://github.com/crsh/papaja/blob/main/CODE_OF_CONDUCT.md). ## Deciding on a new method If you are reporting the results of a statistical analysis that is not yet supported by `apa_print()` you probably have a good motivation and possibly prior work to build on. If you are just looking for a way to contribute to, take a look at the [open issues](https://github.com/crsh/papaja/issues) for inspiration. `apa_print()` is a generic, meaning it can, in principle, work on any output object with a class that is specific enough to purposefully extract the results of the analysis. For example, objects of class `htest`, as returned by `t.test()`, `cor.test()`, `prop.test()`, etc., are named lists that follow a loose convention about the named objects they contain. ```{r htest-example} t_test_example <- t.test(extra ~ group, data = sleep) class(t_test_example) str(t_test_example) ``` Hence, if we pass an `htest` object to `apa_print()` the function expects there to be named elements in the list, such as `statistic`, `estimate`, or `p.value`. These expectations are reflected in the workings of the `apa_print.htest()` method. Objects of less specific classes, such as `list` or `data.frame` cannot be supported, because we cannot make any useful assumptions about their structure. ## Default structure of output Objects returned by `apa_print()` are of class `apa_results`, a named list with four elements: ```{r apa-results, echo = FALSE} papaja:::init_apa_results() ``` To illustrate how `apa_results` objects are populated, let's look at the output of `apa_print.lm()`. ```{r lm-example} # Data from Dobson (1990), p. 9. ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) lm_fit <- lm(weight ~ group) lm_fit_apa <- apa_print(lm_fit) ``` The `estimate` element of the returned `apa_results`-list itself contains a named list with estimated parameters---in this case regression coefficients---and corresponding confidence intervals for the model. The names of the list correspond to the names of the predictors. ```{r estimates} lm_fit_apa$estimate ``` The `estimate` list may contain additional elements, such as the list `modelfit`, that contains quantitative estimates of the model fit. The `statistic` element of the returned `apa_results` list contains a named list with the same structure as `estimate`. Instead of parameter estimates, `statistic` contains the corresponding inferential test statistics, such as significance tests or Bayesian model comparisons. ```{r statistic} lm_fit_apa$statistic ``` Note that the `statistics` list misses elements for the information criteria `AIC` and `BIC`. Because no inferential test statistics on the information criteria are available, it is fine to simply drop those elements. The `full_results` element is a named list that simply combines the results of `estimate` and `statistic` for convenience in reporting. ```{r full-result} lm_fit_apa$full_result ``` Finally, the `table` element contains a `data.frame` of class `apa_results_table` that summarizes the results. In essence this is simply a regular `data.frame` that follows the [column-naming conventions](https://www.tidymodels.org/learn/develop/broom/#glossary) used in [**broom**](https://cran.r-project.org/package=broom) but allows for prettier printing of variable labels. ```{r table} lm_fit_apa$table ``` For more complex analyses `table` may contain a named list of `apa_result_table`s. We use **tinylabels** to set variable labels. These variable labels are attributes attached to each column and contain a typeset label for the respective column. ```{r variable-labels} # library("tinylabels") letters variable_label(letters) <- "Letters of the alphabet" variable_label(letters) letters str(letters) ``` ```{r variable-label-column} lm_fit_apa$table$statistic ``` Variable labels are automatically used by `apa_table()` and plotting functions from the `apa_factorial_plot()`-family to create sensible default labels. If a label is enveloped in `$` it may contain LaTeX math syntax, which is automatically converted to R expressions using [**latex2exp**](https://cran.r-project.org/package=latex2exp) for plotting. Any new `apa_print()` method should output an object of this basic structure. ## Typesetting numeric information `apa_results` do not contain numeric information. Rather the numeric information has been processed for printing in accordance with APA guidelines. There are several **papaja**-functions to facilitate the typesetting. `apa_num()` is a flexible general purpose function that wraps `formatC()` and can be used to round, set decimal as well as thousands separators, or remove leading zeros. ```{r apa-num} x <- rnorm(3) * 1e4 apa_num(x) apa_num(x, digits = 3, big.mark = ".", decimal.mark = ",") apa_num(Inf) ``` `apa_p()` is a wrapper for `apa_num()` that sets appropriate defaults to report *p* values in accordance with APA guidelines. ```{r apa-p} apa_p(c(0.0001, 0.05, 0.99999)) ``` The internal function `apa_df()` is geared towards typesetting degrees of freedom. ```{r apa-df} apa_df(c(12, 12.485)) apa_df(12L) ``` Finally, `apa_interval()` can be used to typeset interval estimates. ```{r apa-interval} apa_interval(rnorm(2), conf.int = 0.95, interval_type = "CI") ``` Again, there are two wrappers that set appropriate defaults to typeset frequentist confidence intervals and Bayesian highest-density intervals. ```{r apa-confint} apa_confint(rnorm(2), conf.int = 0.95) apa_hdint(rnorm(2), conf.int = 0.95) ``` ## Typesetting model terms When creating named lists from terms, these terms names should use `_` as separator, and be valid R names. Adhering to these conventions ensures that `apa_results` can conveniently be indexed using the `$` operator. To facilitate the generation of list names, **papaja** provides the internal function `sanitize_terms()`. ```{r sanitize-terms} mod_terms <- c("(Intercept)", "Factor A", "Factor B", "Factor A:Factor B", "scale(Factor A)") sanitize_terms(mod_terms, standardized = TRUE) ``` While these sanitized terms are well suited to name R objects, they are not ideal for reporting. To facilitate typesetting term names for reporting, there is another internal function `beautify_terms()`. ```{r prettify-terms} beautify_terms(mod_terms, standardized = TRUE) ``` ## Internal workflow ### Method dispatch As with `lm` objects, it is often the case that the objects, as returned by the analysis function, may not contain all information necessary to populate the lists described above. For example, to obtain inferential statistics it may be necessary to call `summary()`. ```{r aov-fit} npk_aov <- aov(yield ~ block + N * P * K, npk) npk_aov summary(npk_aov) ``` This is why there are usually multiple `apa_print()`-methods that are called subsequently to make the function both flexible and convenient. For convenience, `apa_print.aov()` calls `summary()` with its default arguments and passes the result onto `apa_print.summary.aov()`. ```{r apa-print-aov} papaja:::apa_print.aov ``` This approach also ensures that a variety of object types are supported while minimizing code redundancy. ### Restructuring results The internals of `apa_print()` heavily rely on [**broom**](https://cran.r-project.org/package=broom), a package to conveniently restructure the output of analysis functions into tidy `data.frame`s. The objects are often processed using `broom::tidy()`, and `broom::glance()` if necessary, before being modified further to create the contents of the `table` element. Once the results table has been assembled, numeric values have been typeset, and variable labels have been assigned `glue_apa_results()` can be used to create an `apa_results` object according to the above specifications. Consider the following example of an `lm`-object. First we `tidy()` and `glance()` the object to obtain tidy results. We than typeset all "special" numerical results, that is, all results that would not be typeset appropriately by applying `apa_num()` with its default settings. Moreover, we combine the separate columns for lower and upper confidence interval bounds into one column `conf.int` which contains the complete confidence interval. ```{r tidy-results} lm_fit <- lm(mpg ~ cyl + wt, mtcars) # Tidy and typeset output library("broom") tidy_lm_fit <- tidy(lm_fit, conf.int = TRUE) tidy_lm_fit$p.value <- apa_p(tidy_lm_fit$p.value) tidy_lm_fit$conf.int <- unlist(apa_confint(tidy_lm_fit[, c("conf.low", "conf.high")])) str(tidy_lm_fit) glance_lm_fit <- glance(lm_fit) glance_lm_fit$r.squared <- apa_num(glance_lm_fit$r.squared, gt1 = FALSE) glance_lm_fit$p.value <- apa_p(glance_lm_fit$p.value) glance_lm_fit$df <- apa_df(glance_lm_fit$df) glance_lm_fit$df.residual <- apa_df(glance_lm_fit$df.residual) str(glance_lm_fit) ``` Next, we typeset the remaining numeric columns and assign informative variable labels: ```{r construct-apa-results-labels} tidy_lm_fit <- apa_num(tidy_lm_fit) variable_labels(tidy_lm_fit) <- c( term = "Term" , estimate = "$b$" , statistic = paste0("$t(", glance_lm_fit$df.residual, ")") , p.value = "$p$" , conf.int = "95% CI" ) glance_lm_fit <- apa_num(glance_lm_fit) variable_labels(glance_lm_fit) <- c( r.squared = "$R^2$" , statistic = "$F$" , p.value = "$p$" , AIC = "$\\mathrm{AIC}$" ) ``` Now we can use `glue_apa_results()` to create the output object. In doing so, we use the internal function `construct_glue()` to automatically determine the correct "glue" of the reporting string. Let's first examine the glue. ```{r glue} papaja:::construct_glue(tidy_lm_fit, "estimate") ``` The character string contains a combination of text and to-be-evaluated R code enveloped in `<<` and `>>`. All variable names (e.g. `estimate`) are assumed to be columns of `x` (here `tidy_lm_fit`) or any additional object passed to `glue_apa_results()` via `...`. `svl()` is a function that returns a column variable label but, by default, remove the math-environment tags (`$`) as these are not needed here. ```{r construct-apa-results} lm_results <- glue_apa_results( x = tidy_lm_fit , est_glue = papaja:::construct_glue(tidy_lm_fit, "estimate") , stat_glue = papaja:::construct_glue(tidy_lm_fit, "statistic") , term_names = sanitize_terms(tidy_lm_fit$term) ) lm_results ``` If we need to add additional information to this output, we can use `add_glue_to_apa_results()`. This function takes an existing output and adds new strings to a specific sublist. So, let's add some model fit information to the output. ```{r amend-apa-results} add_glue_to_apa_results( .x = glance_lm_fit , container = lm_results , sublist = "modelfit" , est_glue = c( r2 = "$<> = <>$" , aic = "" ) , stat_glue = c( r2 = papaja:::construct_glue(glance_lm_fit, "statistic") , aic = "$<> = <>$" ) ) ``` ## User interface A final issue to consider is that users may pass inappropriate input to `apa_print()`. To ensure that we return correct output or informative error messages, we need input validation. Currently, **papaja** relies on the internal function `validate()` for this. ```{r} in_paren <- TRUE papaja:::validate(in_paren, check_class = "logical", check_length = 1) ``` Please use either `validate()` or perform input validation using the [**assertthat**](https://cran.r-project.org/package=assertthat) package.