 R Code Optimization

Code Optimization

Consider a case, where we need to work with a huge dataset. While processing this data, we might face a few performance issues in R, i.e. the code execution might take time longer than usual etc.

One way to speed up the entire program could be specifically targeting a few key areas.

A few of the best practices for efficient performance could be, but not limited to the following:

Sorting and Ordering

When sorting using loops there could be performance issues, but sorting is a very quick way to organize our data.

R currently supports 3 sorting algorithms, shell-sort, quick-sort, radix-sort. These can be mentioned when using the sort function. Radix was recently added to R 3.3. Typically the most optimal option is the non-default i.e. radix for most situations.

Ordering the results partially can also be counted as another useful trick. For instance, while displaying the top 10 results, you can use the partial argument, i.e. sort (x, partial=1:10).

Reversing elements

When we want to see the output of the function in a reversed order, we can use rev() function. In case if you wish to sort in increasing order directly, then instead of first sorting then reversing, directly use the more efficient option i.e. sort(x, decreasing=TRUE) instead of rev(sort(x)).

Logical Response

which() function can be used to determine the particular index (indices) of a vector or array that are TRUE. If you are interested in finding the index of just the minimum or maximum value you can use the function, which(x == min(x)) or use the more efficient which.min() / which.max() variants.

Let's Optimize

Logical AND and OR

vectorised functions in R are the logical AND ( & ) and Logical OR ( | ) which are used for subsetting operations

Syntax
x < 0.4 | x > 0.6

When we are using Logical OR, both the conditions are checked regardless of one being true or false. For example here in the above comparison, R will necessarily compute x > 0.6 regardless of what the value of x < 0.4 is.

For the non-vectorised version, && will only execute the 2nd component if required. This is called as efficient code execution.

CODE/PROGRAM/EXAMPLE
#read.csv is only executed if the file exists