You will find here a very quick introduction to R for beginners.

Installing R

R is a free open-source statistical software for various platforms such as Windows, Mac OS and Linux. To find out what is the latest version of R, you can look at the CRAN (Comprehensive R Network) website, http://cran.r-project.org/. Binaries are available for Linux, Mac and Windows.

An additional tutorial may be found here or you may to refer to the CRAN page, R Installation and Administration.

What is a R session?

When opening the R console, you see: “>”. This is the R prompt. The commands needed for a particular task are to be typed after this prompt. The command is executed after pressing Enter. Throughout this document, examples of submitted R code are presented in code blocks such as this one:

print("Hello world")
[1] "Hello world"

The output of the submitted R code is printed in the console after execution. The number in brackets “[1]” is the index of the first vector component on the output line. It turns out that R considers the character string “Hello word” as a vector of length one. It is not useful here, but it is part of R’s way of printing vectors.

You can insert comments in your R code which are not run by R. To this aim, start each line to be commented with a hash “#”.

# a comment
2+3
[1] 5

As a matter of fact, R is a functional, object-based language. Objects can be of many types and classes. We investigate a little further three object classes, vector, matrix and list, in section More on data types below. The type of a basic element can be one of integer (the values 0, +/-1, +/-2, …) , double (for real numbers), character (any text that is wrapped within pairs of " or ’) or logical (for the values TRUE and FALSE). The assignment operator is “<-”. It is often useful to store object values and then do calculations on these stored values. Obviously, you can also store the result of your calculations.

# create an object named as x
x <- 5
# check the value of x
x
[1] 5
# ask for the value of x
x+2
[1] 7

When you quit R, you are asked if you want to save the workspace. If you choose “no”, then all your stored values will be lost. If you choose “yes”, then your stored values are saved in your workspace. This means that next time you launch R, you can work with your stored values.

Rather than saving the workspace when you quit R, a recommanded practice consists of saving your R commands in a script. Basically, an R script is a text file which is preferably written in an external text editor. We recommand the use of RStudio (see next subsection) but Emacs or Notepad++ would also do. The extension for R code files is .R. To run instructions written in a script, copy and paste them from the editor to the R console, or run the command source("script.R") in order to execute the file script.R (provided that the file script.R is in R working directory).

Installing RStudio

Once R is installed, you can choose to work either with the basic R console, or within an integrated development environment (IDE). RStudio is a popular IDE for R and supports debugging, workspace management and plotting.

Start an R session

Set R working directory

It turns out that the files identified simply by their names (ie without specifying any path) refer to files in R working directory on your system.
To see where R is currently working on your system, you can use the command getwd() in the R console which prints R current working directory. Usually, the default R working directory is not a convenient repertory. In order to read files from a specific location or to write files to a specific location, it is recommanded to set R working directory as a convenient repertory rather than having file names preceeded by a (long) path.
The command setwd("path to workDir") sets R working directory as “workDir”.

Loading additional packages

A package is a collection of R functions, data, documentation and compiled code designed to address a specific problem. The directory where a package is stored is called a library. R comes with some standard packages which are automatically installed when you install R. To run our code, additional R packages are needed, such as the png package. These additional packages do not come with the standard installation of R, so you need to install them (with the command install.package). Once the package is installed, you have to load it (with the command load) at each R session.

# Installing a package with root access using a local mirror 
# NB: you are actually using the local mirror server to download the package.
install.packages("png")
# loading the package
library(png)

# Installing a package without root access:
# - choose a directory where the downloaded packages will be stored
# NB: in this tutorial, we have chosen "/data/Rpackages/" 
# - download png_0.1-7.tar.gz on https://cran.r-project.org/web/packages/png/index.html
install.packages("png", lib="/data/Rpackages/")
# loading the package
library(png, lib.loc="/data/Rpackages/")

Online help

Numerous R tutorials may be found online. The CRAN website provides some official documentation here, and additional contributed manuals here. You can also ask for R’s built-in help associated to any of the R functions. For example, to ask about the mean function, enter ?mean.

More on data types

Vector

A vector is a finite sequence of elements of a single basic type (integer, numeric, character, logical …). In particular, a scalar is a numerical vector of size one. The construct operator c(...) is used to define vectors.

# define a vector x of length 2 having components "5" and "10"
x <- c(5,10)
x
[1]  5 10

To create a vector from a simple sequence of integers, use the colon “:”.

# the integer number from 5 to 11
x <- 6:11
x
[1]  6  7  8  9 10 11

Specific values of a vector x can be accessed through their index placed inside a single square bracket “[]”.

# print the second component of x
x[2]
[1] 7
## print the components of x from index 3 to 5
x[3:5]
[1]  8  9 10

To get the number of components of a vector, use the function length.

length(x)
[1] 6

Matrix

A matrix is a collection of elements of a single basic type which are arranged in a two-dimensional rectangular array. To create a matrix from a vector x, use the function matrix.

# create a matrix called A with 2 rows and 3 columns, the components of which are those stored in x
A <- matrix(x,nrow=2,ncol=3)
# print A
A
     [,1] [,2] [,3]
[1,]    6    8   10
[2,]    7    9   11

To get the number of rows and columns of a matrix, use the function dim.

# print the dimension of A
dim(A)
[1] 2 3
# print the number of rows of A
dim(A)[1]
[1] 2
# print the number of columns of A
dim(A)[2]
[1] 3

The element at the mth row and nth column of A can be accessed through the expression A[m,n].

# print the element at the second row and third column of A
A[2,3]
[1] 11
# print the second row of A
A[2,]
[1]  7  9 11
# print the third column of A
A[,3]
[1] 10 11

List

A list is an ordered collection of R objects. The elements of a list can be of different types or classes. In the example below, mylist is a list having four components. Its first component is the vector x, the second component is the matrix A, the third component is a logical vector of length 4, the last one is a list of length 2. The function str prints the structure of any R object.

# print the second component of x
mylist <- list(myvector=x, mymatrix=A, c(TRUE,FALSE,FALSE,FALSE), list('cat','dog'))
str(mylist)
List of 4
 $ myvector: int [1:6] 6 7 8 9 10 11
 $ mymatrix: int [1:2, 1:3] 6 7 8 9 10 11
 $         : logi [1:4] TRUE FALSE FALSE FALSE
 $         :List of 2
  ..$ : chr "cat"
  ..$ : chr "dog"

The length of a list is the number of objects in that list. To get the number of objects in a list, use the function length.

# print the length of the list called "mylist"
length(mylist)
[1] 4

The object of the list mylist at position m can be accessed through the expression mylist[[m]].

# print the third object of the list called "mylist"
mylist[[3]]
[1]  TRUE FALSE FALSE FALSE

Some objects of the list mylist have a name. For instance, the first object of mylist is named as myvector. The third object of mylist has no name. The object of the list mylist the name of which is myvector can be accessed through the expression mylist$myvector.

# print the object of the list called "mylist" the name of which is "myvector"
mylist$myvector
[1]  6  7  8  9 10 11

