Phase 2: Writing & Analysis
-
Create an executable script documenting the code required to load the raw data into a tabular format, and de-identify human subjects if applicable
- Document this preprocessing (“data wrangling”) procedure in the
prepare_data.Rfile. - This file is intended to document steps that can not or should not be replicated by end users, unless they have access to the raw data file.
- These are steps you would run only once, the first time you load data into R.
- Make this file as short as possible; only include steps that are absolutely necessary
- Document this preprocessing (“data wrangling”) procedure in the
-
Save the data using
open_data()orclosed_data()- WARNING: Once you commit a data file to the ‘Git’ repository, its record will be retained forever (unless the entire repository is deleted). Assume that pushing data to a ‘Git’ remote repository cannot be undone. Follow the mantra: “Never commit something you do not intend to share”.
- When using external data sources (e.g., obtained using an API), it is recommended to store a local copy, to make the project portable and to ensure that end users have access to the same version of the data you used.
- NOTE: The
open_data()andclosed_data()functions generate a codebook and possibly additional files as part of their output, don’t worry about all the new files added to your project.
-
Write the manuscript in
Manuscript.Rmd- Use code chunks to perform the analyses. The first code chunk should call
load_data() - Finish each sentence with one carriage return (enter); separate paragraphs with a double carriage return.
- Use code chunks to perform the analyses. The first code chunk should call
-
Regularly Commit your progress to the Git repository; ideally, after completing each small and clearly defined task.
- In the top-right panel of ‘RStudio’, select the ‘Git’ tab
- Select the checkboxes next to all files whose changes you wish to Commit
- Click the Commit button.
- In the pop-up window, write an informative “Commit message”.
- Click the Commit button below the message dialog
- Click the green arrow labeled “Push” to send your commit to the remote repository
-
While writing, cite essential references with one at-symbol,
[@essentialref2020], and non-essential references with a double at-symbol,[@@nonessential2020].
When writing in RMarkdown format, you use Markdown citekeys to refer to references, and these references will be stored in a separate text file known as a .bib file.
To ease this process, we recommend following this procedure for citation:
During writing, maintain a plain-text
.bibfile with the BibTeX references for all citations. + You can export a.bibfile from most reference manager programs; the free, open-source reference manager Zotero is excellent and user-friendly, and highly interoperable with other commercial reference managers. Here is a tutorial for using Zotero with RMarkdown. + Alternatively, it is possible to make this file by hand, copy and pasting each new reference below the previous one; e.g., Figure ?? shows how to obtain a BibTeX reference from Google Scholar; simply copy-paste each reference into the.bibfileTo cite a reference, use the
citekey- the first word in the BibTeX entry for that reference. Insert it in the RMarkdown file like so:@yourcitekey2020. For a parenthesized reference, use[@citekeyone2020; @citekeytwo2020]. For more options, see the RMarkdown cookbook.To indicate a non-essential citation, mark it with a double at-symbol:
@@nonessential2020.When Knitting the document, adapt the
knitcommand in the YAML header.knit: worcs::cite_allrenders all citations, andknit: worcs::cite_essentialremoves all non-essential citations.Optional: To be extremely thorough, you could make a “branch” of the GitHub repository for the print version of the manuscript. Only in this branch, you use the function
knit: worcs::cite_essential. The procedure is documented in this tutorial.