RMarkdown

This weekend I spend quite some time (quite to the dislike of my wife) working on getting a report compiled for work with RMarkdown.

My problem is basically that I spend quite some time on filling out compensation documents in ods/xls formats, handling the PDF documents containing the invoices and receipts and double checking all information. If I could cut this down to a minimum and still end up with an acceptable result for my employer, this would free some time for me and at the same time get rid of some unpleasant work. Double-win!

The current process involves some rather odd .ods template for creating a table with the information required by the financial department, sending in the document along with a couple of other documents showing the invoices, bla, bla, bla.

I had some problems with that:

  1. I needed some kind of office suite in order to be able to edit the template and get the table right. This is not always the case when I am taking notes (like on a mobile). I’d also have to choose the right place where to add the information.

  2. In order to be able to track the payments, I’d have to save the table or at least a copy of it. I’d again need a office suite to be able to read the file.

  3. I got no feedback if I entered something wrong or unexpected, like the date or an invalid value for the money spent.

Probably I’d rather go for a web interface and just import the stuff and let the finance-people deal with it, but that’s not where we are at the moment. So I addressed the problem with RMarkdown.

Datafile

The datafile contains just some basic information as CSV:

date;description;price
2017-01-01;Internet Invoice;200,00
2017-01-02;Other Invoice;199,00
  • Plain text files I can edit everywhere. Synced to a central place, it’s instantly updated and available on PCs and telephone.

  • Each dataset will end up in a separate repayment, easy to keep track of.

RMarkdown file

To put the whole markdown file in here would be a bit too much. But I need to keep some notes on problems I encountered.

General stuff

  • The target format is PDF. Therefore I need a working Latex environment to handle that. I already have that, so this is covered. I only need to make all necessary packages are installed.

  • The R environment needs some packages installed as well in order to get this to work.

YAML header

The .rmd file starts with a YAML header. That’s fine, the metadata needs to be somewhere, right?

  • To update the output document with the current date, the YAML header needs to fetch the current date automatically:

date: "`r format(Sys.time(), '%Y-%m-%d')`"
  • Creating a table in a PDF output isn’t that trivial at it might seem without involving too much Latex. The report should require as less customization as possible. After playing with flextable and xtable, I ended up using tabularx with xtable to get the layout I wanted. For the latex part this means this needs to be included into the Latex header as well. RMarkdown offers some shortcuts for this:

header-includes:
- \usepackage[utf8]{inputenc}
- \usepackage{tabularx}
  • To avoid hard coding the datafile, the RMarkdown file needs to use some parameters. This requires adding some lines to the header as well:

params:
  datafile: "X"

The important part here is that the parameter needs to be called params, otherwise it will fail.

RMarkdown code

  • RMarkdown tends to repeat all the commands used during the computation in the output and then print the result as well. This can either be suppressed for each code chunk with r, echo=FALSE, or once globally:

knitr::opts_chunk$set(echo = FALSE)
  • I wanted to keep personal data out of the RMarkdown code as much as possible - one idea behind this would be that this script might be want to be used by somebody else within the company. Therefore the personal information is included from a third file in the user home directory. It turned out that this isn’t necessarily as easy as it sounds, handling line breaks and all of that.

```{r, results="asis"}
userConfigFile <- paste(Sys.getenv("HOME"), "/.config.rmd", sep='')
if(file.exists(userConfigFile)){
  source(userConfigFile)
  cat(address)
} else {
  address <- '* Create ~/.config.rmd with data in "address" containing what you want to see here.'
}
```
  • The main table is created quite simple actually. It’s put to 100% of the page width and formatting the numbers correctly as well. But I have no idea why the 100% width actually works this way.

```{r xtable, results="asis"}
#
# Main table with data
#
library(xtable)

mytable = xtable(mydata, latex.environments = "center")
align(mytable) <- "|l|l|X|r|"  # Align columns and borders
digits(mytable) <- 2  # Two digits
colnames(mytable) <- c('Dato', 'Beskrivelse', 'Beløp') # Change column names
print(mytable,
      type="latex",
      include.rownames = FALSE,
      tabular.environment = "tabularx",
      floating = TRUE,
      width = "\\textwidth",
      comment = FALSE)
```
  • One problem arose with the fields to sum up the numbers. xtable does not seem to have a way of adding rows to the bottom and quite frankly this would mess up the data as well. So I went for the compromise of adding that information in a separate dataset and put a table under the main table. The principle is the same as with the main table, just the formatting differs.

  • Last but not least: signatures. I never understood why electronic documents needs to have analogue signatures. Creating fields for them on the other hand isn’t quite a trivial in RMarkdown as in plain Latex.

    I basically repeated the approach from the sums-table and created a table with three empty fields, put the row names on top of it and widened it to 100% of the document. A Latex command vspace*{fill} will make sure the table is at the bottom of the page.

    The initial idea was to put it in the Latex page footer instead. It turns out that the Latex document class article reformats the footer and header on the title page when using the command maketitle. The standard pandoc Latex template uses exactly this command and so I ended up with no footer on the first page of the output. Great. This left me with the option to either alter the Latex template and repeat this on other machines, or to go for the solution mentioned above.

Rendering

Rendering RMarkdown from within RStudio is simple. A simple click on the button knitr will drop the pdf document somewhere. But there’s no beauty in this. Rendering from the command line turns out a bit too much to remember it once a month. A small bash function wraps this up for me, makes sure the files I am using actually are there and adjusts the output file name to the input filename + .pdf.

function rmd2pdf() {
  : ${1:?RMD file not specified. Abort.}
  : ${2:?Datafile (CSV) not specified. Abort.}
  inputFile=$(readlink -f "${1}")
  csvFile=$(readlink -f "${2}")

  if [[ ! -e "${inputFile}" ]]; then
    echo 'Input file not accessible. Abort.'
    return 1
  fi
  if [[ ! -e "${csvFile}" ]]; then
    echo 'Data file not accessible. Abort.'
    return 1
  fi

  R -e "rmarkdown::render('${inputFile}', params= list( datafile= '${csvFile}'), output_file = file.path("\'$(dirname ${csvFile})\'", "\'$(basename ${csvFile%.*}).pdf\'"))"
}