Reproducibility!
Citations
Making things look nice
Collaboration
Collaboration can be tricky
I honestly think it's easier than merging "finalFinalv10-oct2020-NEW_DANSEDITS.docx"
But it does require comfort with general version control practices, as well as R Markdown formatting
Probably not the best strategy for a 40 author literature review
Journals can have very finicky requirements
We'll talk about solutions to this, but sometimes the cure is worse than the bug
Don't wait until the night before it's due to make sure that you can get things in the format needed for your journal / dissertation!
rticles
The rticles
package seems like a great tool, providing R Markdown templates for a solid and growing number of journals.
We're not going to spend this meeting on it. I would definitely recommend that you check it out and see if it fits your needs.
We're going to focus on some nuts and bolts, and learning how to format things without depending on rticles
We'll come back to this a bit later but for today I'm going to
Focus on knitting to PDF
Touch on knitting to Word (works but caveat emptor)
Ignore knitting to HTML
I'm going to be using the bookdown
package exclusively here, since it has a bunch of cross-referencing and formatting tools that are critical for academic publishing.
This looks like
---title: "My Excellent Research"author: Dan Ovandodate: "2020-11-17"output: bookdown::pdf_document2: default bookdown::word_document2: default bookdown::html_document2: defaultlinkcolor: blue---
The linkcolor
thing there ensure that you see Fig.1, instead of Fig.1
Many of the things we look at today will not work unless you use bookdown
I think citations might be my favorite thing about papers in R Markdown, and why I've started using it for all my papers where co-author needs allow, even non-quantitative ones
Automatically format citations for whatever journal you submit to
Automatically deal with in-text vs. in-paraentheses citations
Automatically format and update references / bibliography
No more "your zotero references are no longer linked"
The first step is to get your references into a format that R Markdown can deal with.
The simplest way to do this is to generate a .bib file
I do this by exporting my bibliography from zotero as a .bib using better bibtex
All references you want to cite need to be in your .bib, but but you can include references you don't cite
Your .bib file has entries for each potential citation containing all the information that R Markdown needs to format the citation and reference
@article{keum2001, title = {Fat Penguins and Imaginary Penguins in Perturbative {{QCD}}}, author = {Keum, Yong-Yeon and Li, Hsiang-nan and Sanda, A. I.}, year = {2001}, month = apr, volume = {504}, pages = {6--14}, issn = {0370-2693}, doi = {10.1016/S0370-2693(01)00247-7}, file = {/Users/danovan/Google Drive/references/Keum et al. - 2001 - Fat penguins and imaginary penguins in perturbativ.pdf;/Users/danovan/Zotero/storage/4UI4382Z/S0370269301002477.html}, journal = {Physics Letters B}, language = {en}, number = {1}}
Citations are easy!
Fishes are delicious [@seal2020;@cat1400]
Will automatically format and order the references based on the journal requirements
As shown by @seal2020, fishes are delicious
Will automatically render to "As shown by Seal (2020)" or whatever the right format is
An important note: this @seal2020
format uses citekeys.
the exact formatting of citekeys is defined by your reference manager settings
I have mine set as lowercase first author last name followed by date, with letters for conflicts (e.g. [@seal2020a;@seal2020b]
since that annoying seal publishes like crazy)
But, this could be uppercase first author last name followed by date, title then date, whatever. So, if multiple authors are contributing to the .bib make sure that this is consistent, and remember that if you change settings and re-export your .bib things might break.
To use your bibliography, just load the .bib in the YAML, and specify a citation style using a .csl file
You can download just about any csl file from zotero here
---title: "My Excellent Research"author: Dan Ovandodate: "2020-11-17"output: bookdown::pdf_document2: includes: in_header: "my-pub-header.tex"linkcolor: bluebibliography: pubs-in-rmarkdown.bibcsl: fish-and-fisheries.csl---
Demonstrate generating and saving a .bib file from zotero (sorry users of other platforms, you're on your own here!)
Seriously, does anyone know the difference?
R Markdown automatically formats and creates the bibliography by default at the end of the document.
So, just end your .Rmd with
# References
Suppose though that you want your bibliography in a specific place (e.g. before Supplementary Materials)
# References<div id="refs"></div># Supplementary Materials
Writing a paper in R Markdown means no more "replace all instances of Fig.1 with Fig.2 because Reviewer #2 insisted those figures be switched"
R Markdown lets you automatically number and cross-reference figures, tables, equations, and text sections
All cross referencing works about the same: typing \@ref(ref-type:chunk-name)
produces a linked number to the referenced object.
\@ref(fig:my-plot)
\@ref(tab:my-table)
\@ref(eq:my-equation)
\@ref(my-section)
(exception to the rule here)
We found some cool stuff (Fig.\@ref(fig:main-fig))```{r main-fig, fig.cap = "Our Main Results"}plot(1:10)```
Would produce something like
We found some cool stuff (Fig.1)
Figure 1. Our Main Results
You need to name your chunks with a valid name (bookdown will yell at you if you try and knit with an invalid name)
Each named chunk must be unique
You need to include fig.cap = "something"
to trigger the generation of captions (I think)
Numbering is set by ordering of chunks, not ordering of references
The easiest way to do tables is with knitr::kable
Here are some penguins (Table.\@ref(tab:pen-tab))```{r pen-tab}knitr::kable(head(penguins), caption = "Here be penguins", booktabs = TRUE) %>% kableExtra::kable_styling(latex_options = "striped")```
Along with some help from the kableExtra
package to make things look nice. Rendering this in xaringan
is more trouble than it's worth, so take a look at the documents/my-pub.Rmd
Notice that I've included the caption in the actual kable
call, instead of in the chunk options like we did with figures
Sometimes you don't want to go through all the trouble of creating a dataframe to convert into a table.
R Markdown allows you to "manually" create tables as well. Here's how to caption those
This is a manual table (Table.\@ref(tab:simple-table))| Thing 1 | Thing 2 | Col3 ||---------|---------|------|| A | B | C || D | E | F || G | H | I |Table:(\#tab:simple-table) A Manual Table
Regression tables are another popular thing.
There are lots of options out there, I really like modelsummary
for standard regression tables
Notice that now the caption goes in the title. Fun with open source 🙄!
Here is a flipper model (Table.\@ref(tab:flipper-model))```{r flipper-model}mod <- lm(flipper_length_mm ~ island, data = penguins)modelsummary::modelsummary(list("Flipper Model" = mod), stars = TRUE, title = "My flipper model" )```
OK, we've covered references, figures, and tables, what about equations.
Equations have a few fincky things, see bookdown instructions here
As you can see in Equation.\@ref(eq:binom)\begin{equation} f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k} (\#eq:binom)\end{equation}
Supplementary materials / Supporting Information / Online Appendix (seriously, get it together academic) can be a pain.
Journals often want you to append "S" to everything, restart figure numbering etc (e.g. Figure S1)
Here's how!
# Supplementary Materials\renewcommand{\thefigure}{S\arabic{figure}}\setcounter{figure}{0}\renewcommand{\thetable}{S\arabic{table}}\setcounter{table}{0}\renewcommand{\theequation}{S\arabic{equation}}\setcounter{equation}{0}Here are our Supplementary materials...
The above example, combined with the trick I showed you to manually create the reference section, can make it easy to insert small supporting information right after your main references.
But, if you have a massive SI, with lots of references, it might be nice to put them in a separate document. This is where knitr::knit_child
comes in: It allows us to write another section, say an Appendix, in a separate markdown document, and then include the output in this document.
The results of this "child" document will be knit into the main document!
```{r, child = "appendix.Rmd"}```
So far, we've covered a lot of the nuts and bolts.
Publications should look good too though! And sometimes, journals have specific formatting requirements.
IMO, the default bookdown LaTeX template looks pretty good, certainly better than the average Word document.
But, let's talk a bit about fine tuning. Generally:
If you are submitting to a journal with strict and tricky formatting requirements, check out rticles
before trying to do things yourself
Don't google "how do I do XX in R Markdown?"!
Remember, R Markdown is basically a front end, uses knitr
to translate things into the "right" format, e.g. LaTeX.
A nice feature of this is that you can usually just directly include LaTeX/HTML commands directly into your .Rmd!
\renewcommand{\thefigure}{S\arabic{figure}}
Isn't R Markdown, it's LaTeX, but I can just put it directly in my .Rmd!
The only caveat here, is that these kinds of language-specific inserts usually only work for the language they are in: This command doesn't doesn't cause any errors when knitting to HTML, it just doesn't do what we want.
Same (sometimes) goes for HTML, though more often an HTML trick will work in LaTeX but a LaTeX trick will not work for HTML
Fine tuning for say LaTeX usually means things won't look quite as good on HTML, and vice versa
I just insert little snippets of LaTeX when knitting to PDF and I need some little LaTeX thing (like a watermark).
Say you've got a more complicated problem though, and you really want to do LOTS of complicated LaTeX formatting.
At this point, you're increasingly just working in LaTeX. That's waaaay beyond what we have time for today, but you can check out some great resources for LaTeX in R Markdown here
I'll try and cover a few things here
LaTeX documents have a "preamble" section where you typically load packages, set options etc.
This is where you would for example load a package to add in line numbers, or include a watermark.
I use this area for making kind of "one line at a time" changes to my PDF documents
To see this, we'll add commands to double space and add line numbers to our document in the file "documents/my-pub-header.tex"
\usepackage{setspace}\doublespacing\usepackage{lineno}\linenumbers
We'll then load this .tex file into our preamble using the YAML
---title: "My Excellent Research"author: - Dan Ovando - Daniel Ovandodate: "2020-11-17"output: bookdown::pdf_document2: includes: in_header: "my-pub-header.tex"linkcolor: blue---
I'll use the in_header
trick to do simple things like line numbers. But, suppose you have a more complicated need, and want to make your document look exactly like a PNAS article, or need it to conform to University guidelines.
For that you'll need a LaTeX template. At this point, things are getting complicated. As far as I can tell, if you can figure out how to do it in LaTeX, you can do it in R Markdown, but there's no guarantee: you're adding in a translation step.
Before committing to using R Markdown for a project with really complicated formatting, take time to make sure you can actually achieve it. Going backwards can be hard1
[1] Don't wait until the night your dissertation is due to hit knit
for the first time!!!
I'll be blunt: getting into templates is hard. I personally would rather debug 1,000 lines of someone else's C++ code than 10 lines of LaTeX templates.
My plan was to show you all how to manually use a template, and I couldn't get it to work well without resorting to building an installable template.
If you have Jedi-level LaTeX skills, this might be easy for you, but otherwise dealing with templates is NOT EASY. This is why packages like rticles
are great: someone has done all the work.
This is also why they are tough: it can be really hard to modify things.
So for now, take it that it can be done, and if you REALLY need a template that isn't in something like rticles
, think carefully if you're up for the challenge.
Automating things looking great going from .Rmd -> Word is hard.
You can set up a Word document template to knit to, which will take care of things like fonts, headers, line spacing, etc.
There's nothing fancy about setting up a template: just make a .docx file that looks like you want, save it, and set it as the reference_docx
But, check your results carefully!
---title: "My Excellent Research"author: Dan Ovandooutput: bookdown::word_document2: reference_docx: word-template.docx---
My advice for knitting to word: Keep it simple in the R Markdown
Don't get fancy with other things: accept that you're doing to most of your "make it look good" manually in Word.
Tables are really finicky, mess around with the "output"/"format" options of whatever you are using for tables to try and get them to look right
This is very journal-specific
Some will let you just upload PDFs
Follow the Word + manual futzing route
Use keep_tex: true
in the YAML!
---title: "My Excellent Research"author: - Dan Ovandodate: "2020-11-17"output: bookdown::pdf_document2: keep_tex: truelinkcolor: blue---
keep_tex: true
in the YAML!This will generate and store all the files you need to render your paper as a PDF from the .tex file
If the journal lets people submit .tex files, you should be fine!
But, keep in mind that this .tex file is automatically generated so is even more confusing than the average .tex file
If they want you (not them) to do a bunch of formatting (e.g. if they say LaTeX users must use a specific template) easiest to see if it's in rticles
: you almost certainly won't be able to successfully do major edits to the generated .tex file by hand
So far, we've covered a lot of tricks to accomplish academic-specific tasks in R Markdown.
Let's talk about how to integrate R Markdown into the rest of your workflow.
There are generally two options here
I find this works great for simple self-contained projects, but personally find it cumbersome when things get complicated
This is my preferred workflow, and what I'll demonstrate here
The goal: any user can run something like "make-my-paper.R" and by running that script and only that script reproduce your final publication
This depends on some project-oriented coding skills that we don't have time to cover, but see presentation and links here
Have make-my-pub.R run your analysis, and save all your results in a user-specified folder (e.g. results/v1.0)
Setup your my-pub.Rmd to take as a parameter the results folder to read
Load results inside my-pub.Rmd, incorporate into paper
plots <- ls()[str_detect(ls(), "_plot")]save(file = file.path(run_dir, "plots.RData"), list = plots)
Setup your my-pub.Rmd to take as a parameter the results folder to read
Load results inside my-pub.Rmd, incorporate into paper
Setup your my-pub.Rmd to take as a parameter the results folder to read
Load results
---title: "My Excellent Research"author: Dan Ovandodate: "2020-11-17"output: bookdown::pdf_document2: defaultlinkcolor: blueparams: run_name: ["v6.0"]---```{r setup}run_dir <- here::here("results", params$run_name)load(file = file.path(run_dir,"plots.RData"))```
````
Now that your.Rmd is parameterized, you can actually tell R to knit your report from your make-my-pub.R file!
output_format = c( "bookdown::html_document2", "bookdown::pdf_document2", "bookdown::word_document2")rmarkdown::render( here::here("documents", "my-pub.Rmd"), params = list(results_name = results_name), output_format = output_format, output_dir = results_path # put the reports in the results directory)
Look through make-my-pub.R to see how this all works and give it a try.
OK, you now have a .Rmd file that will
Update all your results, figures, tables automatically
But now, your co-authors have edits...
This is the hardest part about working in .Rmd
You have ~4 options
Knit to word / PDF, have coauthors annotate changes, manually pull those changes into the .RMd
Explore something like redoc
Copy and paste into google docs, edit, then copy back
git + GitHub!
Co-author creates a branch off of the main branch
Co-author edits the .Rmd directly, commits, pulls, and pushes changes to GitHub
Once done with edits, co-author submits pull request to merge into main branch
Author / admin reviews pull request and accepts / rejects as needed, and resolve any conflicts
Repeat process for each coauthor
This sounds hard! But...
No more juggling multiple email versions of the document where half of coauthors edited on top of another, but the other half accidentally edited over a version you emailed three months ago.
Any conflicts are made clear and can be resolved in the pull request
If it all goes to hell, just don't merge the pull request!
If coauthors aren't comfortable with all the fun of git & GitHub, they can also make changes and submit a pull request directly from GitHub!
Practice session collaboratively editing a .Rmd file
You don't need RStudio to work with R Markdown! But, it has some nice features
Depending on your system, you might run into some errors knitting to PDF if you have funky fonts in your plots / text. The extrafont
package can help with that
library(extrafont)extrafont::loadfonts()
Don't forget you can embed results directly in your text as R code, e.g.
The mean penguin weighs `r mean(penguins$body_mass_g, na.rm = TRUE) / 1000 * 2.204623` pounds will produce
The mean penguin weighs 9.2632844 pounds
Use ggplot2::theme_set
to make all your plots look nice and consistent
gauchodown
gauchodown
is an R package I adapted from Ben Marwick's huskydown
to write dissertations that comply with the UCSB LaTeX dissertation template (at least as of 2018)
I can attest that it passes formatting requirements, but be sure to test it out and get it evaluated well before its due if you decide to go that route!
Automating formatting of dissertation takes a massive time-sink off your plate (until the template breaks...)
Easily update all your results when you catch an error two days before your defense
Publish your results to a shareable website with one(ish) line of code!
Makes it much easier to come back to your work and get it ready for publication
But... if it breaks, Word is (mostly) easier to fix
Make sure you have an editing plan in place with your committee
email: danovan@uw.edu
GitHub: DanOvando
twitter: @danovand0
website: danovando.com
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |