class: center, middle, inverse, title-slide # A Statistical Consultant’s Perspective on Getting Help with R ## ISU Statistics Consulting Group ### October 2, 2020 --- <style> .remark-slide-content { background-color: #FFFFFF; border-top: 80px solid #133b5c; font-size: 24px; font-weight: 300; line-height: 1.5; padding: 1em 2em 1em 2em } .inverse { background-color: #133b5c; text-shadow: none; } .title-slide { background-color: #FFFFFF; border-top: 80px solid #FFFFFF; } hr, .title-slide h2::after, .mline h1::after { content: ''; display: block; border: none; background-color: #64958f; color: #64958f; height: 1px; } .remark-slide-number { position: inherit; } .remark-slide-number .progress-bar-container { position: absolute; bottom: 0; height: 4px; display: block; left: 0; right: 0; } .remark-slide-number .progress-bar { height: 100%; background-color: #64958f; } .blue{color: #64958f} </style> <style type="text/css"> .tiny{font-size: 40%} .medium{font-size: 80%} .large{font-size: 90%} </style> # Overview 1. Statistics Consulting Group 2. Helping Yourself 3. Getting Help from Others 4. Work Flow in R --- class: inverse, center, middle # Statistics Consulting Group --- # What the consulting group is? - We are a group of graduate students and professors in the statistics department who assist with statistical analysis in scientific research projects. - Submit a request at: https://www.stat.iastate.edu/statistical-consulting .tiny[ ![](img/consulting_web.jpeg) ] --- # Current Consultants ## AES - Audrey McCombs - Katherine Goode - Haoyan Hu - Yudi Zhang - Miranda Tilton ## Business - Haoyan Hu ## Engineering - Miranda Tilton --- # What we are able to help with? - Experimental design, sample size calculations and power analysis - Suggest statistical methods for data analysis - Basic data visualization and coding in R/SAS/JMP - Interpretation of results --- # What typical consultations look like? Example: To study how growth of corn is affected by temperature and drainage. - Experimental design - Analysis after pilot study - Analysis after collecting all data --- class: inverse, center, middle # Helping Yourself --- # R documentation Search a package manual from CRAN, e.g. [dplyr documentation](https://cran.r-project.org/web/packages/dplyr/dplyr.pdf). .pull-left[ .tiny[ ![](img/google_search.jpeg) ] ] .pull-right[ .tiny[ ![](img/dplyr_google.jpeg) ] ] --- # Vignettes A vignette is like a book chapter or an academic paper: it can describe the problem that a package is designed to solve, and then show the reader how to solve it. A vignette should divide functions into useful categories, and demonstrate how to coordinate multiple functions to solve problems. Usage: `browseVignettes("pkg_name")` or `vignette("pkg_name")` .tiny[ ![](img/v2.jpeg) ] --- # Get help from RStudio - Some useful functions: - `help(function)` or `help(package = "pkg_name")` - `?pkg_name::function` or `?function` - `example("function")` .pull-left[ ```r example("mean") ``` ``` ## ## mean> x <- c(0:10, 50) ## ## mean> xm <- mean(x) ## ## mean> c(xm, mean(x, trim = 0.10)) ## [1] 8.75 5.50 ``` ] .pull-right[ .tiny[ ![](img/r_help3.jpeg) ] ] --- # Get help from RStudio (Cont') - Search through the `help` and `packages` tabs: .pull-left[ .tiny[ ![](img/r_help.jpeg) ] ] .pull-right[ .tiny[ ![](img/r_help2.jpeg) ] ] --- class: inverse, center, middle # Getting Help from Others --- # Where to Get Help .pull-left[ **Stats Consulting Group** 😄 .large[ - Help writing code to perform an analysis in R - Examples: GLMs, mixed models, PCA, etc. - Help creating visualizations (non-publishable versions) - Interpreting output - Issues with conflicting output - Minor debugging ] ] .pull-right[ **Internet Resources** .large[ - Sites such as: - [Stack Exchange](https://stats.stackexchange.com/questions/4544/how-does-one-do-a-type-iii-ss-anova-in-r-with-contrast-codes) - [Stack Overflow](https://stackoverflow.com/questions/58980605/correctly-specify-paired-data-in-lmer-mixed-model-in-r) - [R Studio Community](https://community.rstudio.com/) - Directly contact R package developer (via [GitHub](https://github.com/ropensci/plotly/issues) or [email](https://cran.r-project.org/web/packages/emmeans/index.html)) - Make sure to try your best to find the answer before posting - Important to post a clear question and a reprex (see upcoming slides) ] ] --- # Keep in Mind - Try to find the answer yourself first - Providing a .blue[**focused question**] helps with the process - Importance of asking the right question - Both for R related and general questions - Take into consideration consultant's/package developer's time - Provide a .blue[**reprex**] (.blue[**repr**]oducible .blue[**ex**]ample) - "The .blue[**goal of a reprex**] is to package your problematic code in such a way that other people can run it and feel your pain. Then, hopefully, they can provide a solution and put you out of your misery." - [Tidyverse help page](https://www.tidyverse.org/help/) --- # Parts of a reprex A portion of reprex parts description from [RStudio community FAQ](https://community.rstudio.com/t/faq-whats-a-reproducible-example-reprex-and-how-do-i-do-one/5219): .large[ - **Background information** - Describe what you are trying to do. What have you already done? - **Complete set up** - Include any library() calls and data to reproduce your issue. - **Make it run** - People should be able to copy and paste your code chunk and get the same error. - **Minimal** - Strip away everything that is not directly related to your problem. ] --- # Reproducible Example (or reprex) **Creating a reprex is not necessarily easy:** - Requires pinpointing the problem in the code - Requires crafting a small example that someone else can easily run on their computer - Often enough time is spent creating a reprex that an answer is encountered prior to asking someone else the question! <br> **Additional reading**: [Tidyverse help page](https://www.tidyverse.org/help/), [RStudio community FAQ](https://community.rstudio.com/t/faq-whats-a-reproducible-example-reprex-and-how-do-i-do-one/5219), [Yihui's blog post](https://yihui.org/en/2017/09/the-minimal-reprex-paradox/), and [reprex R package](https://reprex.tidyverse.org/index.html) --- # Example 1 Dear Consultant, I have written some R code, and it won't run. I've attached the data and R script. Help! Many thanks, A. Client .pull-left[ .tiny[ ```r df = read.csv("a local/file/path/on/client/computer/data.csv") df$long_variable.Name_number.1 factor(df$LONG_variable_number.95) factor(df$LONG_variable_number.96) factor(df$LONG_variable_number.97) factor(df$LONG_variable_number.98) factor(df$LONG_variable_number.99) factor(df$LONG_variable_number.100) table(long_variable.Name_number.10, long_variable.Name_number.17) plot(df$y) m = lm(y ~ LONG_variable_number.100 + LONG_variable_number.99, data = df) m = lm(y ~ LONG_variable_number.87 + LONG_variable_number.100, data = df) df$newvar <- factor(df$trt) 1 + 89 ### dim(df) str(df) df$trt <- factor(df$trt) ``` ] ] .pull-right[ .tiny[ ```r df$LONG_variable_number.95f <- factor(df$LONG_variable_number.95) df$LONG_variable_number.96f <- factor(df$LONG_variable_number.96) df$LONG_variable_number.97f <- factor(df$LONG_variable_number.97) df$LONG_variable_number.98f <- factor(df$LONG_variable_number.98) df$LONG_variable_number.99f <- factor(df$LONG_variable_number.99) df$LONG_variable_number.100f <- factor(df$LONG_variable_number.100) library(ggplot2) ggplot(df, aes(x = long_variable.Name_number.1, y = long_variable.Name_number.1)) m = lm(y : trt, data = df) summary(m) na.omit(df) rnorm() m1 = lm(y ~ LONG_variable_number.87 data = df) m2 = lm(y ~ LONG_variable_number.18 data = df) mB = lm(y ~ LONG_variable_number. data = df) plot(mB) hist(df$LONG_variable_number.95f) anova(m1) anova(m2) summary(mB) anova(m1) ``` ] ] --- # Example 2 Dear Consultant, I've been trying to use the code you wrote for me and apply it to a different dataset, but I am running into problems. Could you please help me? Here is the code: .medium[ *df_clean <-* *df_raw %>%* *select(-dont_need1, -dont_need2) %>%* *mutate(trt = factor(trt)) %>%* *pivot_longer(names_to = "week", values_to = "y", cols = -c(year, trt))* *df_clean %>%* *filter(year == "2019") %>%* *groupby(trt, week) %>%* *summarise(mean = mean(y), sd = sd(y))* ] Many thanks, A. Client --- # Example 3 Dear Consultant, I am trying to fit a linear regression model like we talked about in our last meeting. I am having some difficulties with the R code. I've attached a small example that demonstrates my error. Could you take a quick look when you have the chance? Many thanks, A. Client .large[ ```r # Example dataset df = data.frame( y = rnorm(100, 10, 3), trt = rep(c("A", "B"), each = 50) ) # Here is where the error occurs m = lm(df, y, trt) ``` ] --- class: inverse, center, middle # Work Flow in R --- # WTF - *<u>W</u>hat <u>T</u>hey <u>F</u>orgot to Teach You About R* - Book by [Jenny Bryan](https://jennybryan.org/about/) and [Jim Hester](https://www.jimhester.com/) (of [RStudio](https://rstudio.com/)) - Available online at [rstats.wtf](https://rstats.wtf/) <br><br> .font80[ > The initial impetus for creating these materials is a two-day hands-on workshop. The target learner: > > * Has a moderate amount of R and RStudio experience. > * Is largely self-taught. > * Suspects they have drifted into some idiosyncratic habits that may slow them down or make their work products more brittle. > * Is interested in (re)designing their R lifestyle, to be more effective and more self-sufficient. .right[-from the Introduction to rstats.wtf] ] --- # Topics from rstats.wtf - [Project-oriented workflow](https://rstats.wtf/project-oriented-workflow.html) - Projects help with file organization, version control, and collaboration - Avoid `setwd()`, use relative file paths over fixed - [How to name files](https://speakerdeck.com/jennybc/how-to-name-files) - machine readable - human readable - plays well with default ordering - [Install a source package](https://rstats.wtf/install-a-source-package.html) (e.g., from GitHub) - [Debugging R code](https://rstats.wtf/debugging-r-code.html) - `traceback()`, `print()`, `browser()`, `debug()`, `trace()`, etc. - [How to find package source code](https://rstats.wtf/read-the-source.html) --- # Working with data - All data should be described <u>both quantitatively and qualitatively</u> - Don't forget to .blue[**visualize your data**] - Summary stats don't give a full picture - E.g., The [DatasauRus Dozen](https://github.com/lockedata/datasauRus) all have the same summary stats and (near) zero correlation, but... ![](img/DinoSequential2.gif) --- # Tips for visualization - Use base R for quick visualizations ```r data(diamonds) hist(diamonds$price) plot(x = diamonds$carat, y = diamonds$price, main = "Price by Carat") ``` ![](slides_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # Tips for visualization - Use [ggplot2](https://ggplot2.tidyverse.org/) for complex or customized visualizations ```r data(diamonds) ggplot(diamonds) + geom_point(aes(x = carat, y = price, color = color)) + facet_wrap(~cut, nrow = 1) + theme_bw(base_size = 12) ``` ![](slides_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- # Additional tools - [RMarkdown](https://rmarkdown.rstudio.com/lesson-1.html) - <u>Organized</u>: self-contained file of code and output (HTML, PDF, or Word) - <u>Flexible</u>: include paragraphs, figures, tables, LaTeX equations, etc. - <u>Convenient</u>: scroll through results/plots that have already been run - The [Tidyverse](https://www.tidyverse.org/packages/) "ecosystem" of packages - [Hands-On Programming with R](https://rstudio-education.github.io/hopr/) by Garrett Grolemund - [Advanced R](https://adv-r.hadley.nz/) by Hadley Wickham --- class: inverse, center, middle # Thank you