The aim of this tutorial is to get you started with a minimalist yet complete and fully working R package using a part programmed in C. So, that post is not here to enter the details of an R package internals, but just to:
- list the required tools as well as some basic installation tips
- provide a small, compilable package written in R and C
- provide pointers to understand the package’s files
NB: this tutorial focuses on creating and building a package on Windows. For Linux things are fairly similar, except that:
- most development tools are probably already installed. On Ubuntu 10.10 x64, where I tested, I had nothing special to install apart from latex (
apt-get install latex-make
), a font package for latex (apt-get install texlive-fonts-recommended
), and maybe texlive-latex-extra if you want to use more “advanced” latex functions such as \href (\href is used in this package, however even with hyperref installed it doesn’t seem to work). Without the fonts you’ll get that kind of error when checking:! \textfont 0 is undefined (character p). \Url@FormatString ...\Url@String \UrlRight \m@th $
- the command for installing and compiling will have to be run as root.
Getting the tools
The first step is to get the helloworld package (Megabitload mirror, MegaUpload mirror), then the required tools:
- The latest version of R (you’ll have a hard time enough keeping the package up-to-date, you probably don’t want to use deprecated functions from the beginning). If for some reason you happen to have the need for an older version of R, note that on Windows you can install an unlimited amount of different R versions in different folders. Also, you need to add the latest version’s bin folder to your PATH.
- (optional) A decent code editor, like Notepad++.
- The R development environment. Basically, everything you need should be in the Windows toolset. This will install some specific tools, as well as Cygwin and MinGW (both i386 and x64).
If you already have Cygwin and it’s in your PATH, I think the best option is to write down what Rtools needs and make sure you have those tools in your Cygwin installation (don’t install Cygwin using Rtools, but just re-run the Cygwin installer and add the missing packages). This solution currently works, but Rtools developers warn that sometimes Rtools really need a very specific version of the GNU tools in Cygwin. Usually the latest version of GNU tools should be fine, but if it’s not maybe you’ll have to uninstall your Cygwin and install the one in Rtools.
If you already have MinGW, it’s probably better to still install those that come with Rtools. Once again, version really seems to matter, plus this way it should handle both 32 and 64 bits compilation “out of the box”). This way I ended up with 2 MinGWs in my PATH (Rtools’s and one that I installed for Code::Blocks), plus another MinGW within Qt but which didn’t register in the PATH… that sounds messy, but nothing’s broken so… good enough. - (optional) MiKTeX (or equivalent), if you want to build PDF help files by running R CMD check.
Then, instead of writing yet another long tutorial, we’ll just study a simple Hello World package, which should compile fine with R 2.12.1. But first we’ll just compile it 😉
Compiling the package
Unpack the source in some working directory, and let’s say you did it so that the file DESCRIPTION is at c:/workdir/helloworld/DESCRIPTION
.
Open a command prompt, and go to c:/workdir/
.
First we’ll need to build the source package:
R CMD build helloworld
Then, we’ll need to install the package, compiling it and creating the Windows ZIP package in the process:
R CMD INSTALL --build helloworld_1.0-1.tar.gz
(NB: 1.0-1 is the version number mentioned in DESCRIPTION).
That’s it, you should now have the package installed in the R installation that’s in your PATH, plus a brand new Windows package. Note that there is a way to compile the Windows ZIP package without installing it, but this generates lots of files (compiled files, etc) within the working source folder, so it’s really messy… and I forgot how to do it since I didn’t find it worth remembering.
You can also run a bunch of checks on the package, to see if it’s all well. It will also produce a PDF version of the help pages. To run the checks, use:
R CMD check helloworld
Reviewing the files
The smallest possible package you can build contains 2 files:
c:/workdir/helloworld/DESCRIPTION
andc:/workdir/helloworld/R/helloworld.R
.
In this example, we also added:
c:/workdir/helloworld/NAMESPACE
c:/workdir/helloworld/src/helloworld.c
c:/workdir/helloworld/man/helloworld-package.Rd
andc:/workdir/helloworld/man/helloWorldR.Rd
The description file
It’s located in the package root folder, so: c:/workdir/helloworld/DESCRIPTION
. As its name suggests, it provides some description of the package. I think it’s fairly self-explanatory, as you can see from its content:
Package: helloworld Version: 1.0-1 Date: 2011-02-12 Title: A simple Hello World R package Author: PatheticCockroach <https://www.patheticcockroach.com> Maintainer: John Smith <john.smith@tardis.net> Depends: R (>= 2.12.1) Suggests: MASS Description: A minimalist package saying hello and using a tiny part written in C License: GPL (>= 3) URL: https://www.patheticcockroach.com BugReports: https://www.patheticcockroach.com LazyLoad: yes
Note that:
- Some items are optional, but some others are mandatory. In this example file there are a bit of both.
- The DESCRIPTION file should end with an empty line. Otherwise you’ll get a warning when building the package (Warning in readLines(ldpath) : incomplete final line found on ‘helloworld/DESCRIPTION’).
- The file should be encoded in ASCII. Since this doesn’t seem to be an option in Notepad++, I found out that UTF8 without BOM is good enough.
R source files
They’re located in a subfolder named “R”, so: c:/workdir/helloworld/R/
. Obviously, they contain functions written in R. For the example we only created one file, but you can split your functions over as many files as you want. Nothing special there, really, so I’ll just show a reduced version of helloworld.r from the package, which only contains one function:
helloWorldR < - function(n, m, ...) { test42 = .Call("helloWorldC",n = n,m = m,PACKAGE = "helloworld"); if(test42==1) return("Hello World, the solution is 42 indeed :)"); return("Hello World, that wasn't the solution :("); }
Note that .Call is used to call the function written in C.
The namespace file
It's located in the package root folder, so: c:/workdir/helloworld/NAMESPACE
. I have to admit that I didn't really understand the whole point of that file. In this example, we use it to call the dll file (result of the compilation of the C part), as well as to make the helloWorldR function accessible (one of the functions of the NAMESPACE file is to hide package functions which are not exported to the end-user, so you can define functions in your package that can only be used by other functions of the package but not directly by the user). Here's the source:
useDynLib("helloworld") export("helloWorldR")
C source files
They’re located in a subfolder named “src”, so: c:/workdir/helloworld/src/
. That src folder can host source files written in other languages, but that’s getting beyond the scope of this post (and, more particularly, beyond my knowledge). Note that R doesn’t work with the usual “int” and “double” and such, but with weird special types (about which I didn’t find any documentation yet) and lots of pointers, as you can see from the source:
#include#include SEXP helloWorldC(SEXP n, SEXP m) { double *nn = REAL(n), *mm = REAL(m); double *res = NULL; SEXP result; // allocate result PROTECT(result = allocVector(REALSXP, 1)); res = REAL(result); *res = *nn * *mm; if(*res==42) *res=1; else *res=0; UNPROTECT(1); return result; }
All this for a simple multiplication… I had a hard time fixing all the access violations I got from using a much more simple syntax that used to work fine in Code::Blocks. Now at least it compiles and works, and that’s enough for the aim of this guide…
Help files
Help files are located in a subfolder named “man”. There is one help file for the package, c:/workdir/helloworld/man/helloworld-package.Rd
and one help file per function, so here we just have c:/workdir/helloworld/man/helloWorldR.Rd
. Help file are not mandatory to build a package, but as far as I saw every published packages do have help files. Their syntax is like LaTeX, but R doesn’t support nearly as many tags as the real thing. Still, basic LaTeX formatting should work. The source is a bit too big to be posted here, note just that, like for the DESCRIPTION file, some fields are mandatory and some are optional.
References
They’re all mentioned in the package (that’s part of learning to structure the documentation ;)), but here they are again:
- R Development Core Team (2010): Writing R Extensions. R Foundation for Statistical Computing, Vienna, Austria.
- Leisch, Friedrich (2008): Creating R Packages: A Tutorial. In: Brito, Paula (ed.), Compstat 2008 – Proceedings in Computational Statistics. Physica Verlag: Heidelberg, Germany.
- R.M. Ripley. Making an R package. Department of Statistics, University of Oxford, 2008/9.
- PA. Cornillon, A. Guyader, F. Husson, N. Jegou, J. Josse, M. Kloareg, E. Matzner-Lober, L. Rouviere. Construire un package R. 2008.
- R Development Core Team (2010): R Installation and Administration. R Foundation for Statistical Computing, Vienna, Austria.
Thanks for the post, if you’re looking for more info on R with C inside check out Rcpp (C++ actually)
Yeah it seems that Rcpp is the way to go, however even that isn’t quite simple IMO. In the end I just programmed my thing in C++ then called it using things such as shell.exec and batch and csv files. A bit dirty but so much easier.
Thanks, worked like a charm. Very helpful. Second that Rcpp looks very good too.
It sure looks good. Probably if I managed to get it working it would kick a** 🙂 But it’s all in the if… 🙁
Thanks! Good post for start from scratch
I used fruitfully this package to start writing packages. I have a remark : I prefer to separate the “normal” C code, and the tiny piece of code using specific vocabulary, which makes the link with R.
That is why I propose a new version of your package at this place :
http://jer.collet.free.fr/R_hello_world_tutorial_1.0-1.zip
Thanks. TBH I pretty much gave up R packages with C inside now. Also I found a miniguide making it quite trivial to create a package using devtools and roxygen2 in RStudio (but that doesn’t seem to allow the use of C/C++)
http://datavu.blogspot.com.es/2015/01/how-to-create-and-publish-r-package-on.html