 R Apply Functions

Apply Functions

Control loops are preferred in programming languages like c, java, etc… but R has an efficient way of performing loops by using apply functions

R has multiple apply functions, for different purposes

• apply(): applies a function over the margins of an array
• lapply(): loops over a list of elements to evaluate a function on each of them
• sapply(): same as lapply(), but simplifies the result
• tapply(): applies a function over subsets of a vector
• mapply(): multivariate version of lapply()

Apply functions are an efficient way to perform iterations

Returns a vector or a list of values, obtained by applying a function to margins of an array or matrix Consider a matrix ‘score’ If we need to get the total score of each individual student in the class use sum() function on each column

CODE/PROGRAM/EXAMPLE
list(sum(score[,1]), sum(score[,2]), sum(score[,3]))
[]
 414

[]
 422

[]
 421

apply() :

Syntax :

Syntax
apply (dataset/object, margin, function)

Where,
dataset: the object on which we perform the operations
margin: this is either 1 or 2 (1 performs operation on rows and 2 performs operations on columns)
function: the type of operation, both built-in and custom functions are valid options

Consider the matrix ‘score’ from previous example To get the total score this time, use apply() function

Syntax
apply(score, 2, sum)

Output : Syntax
apply(score, 1, sum)

Output : lapply()

lapply() is especially useful while dealing with lists & data frames. In R the data frame is considered as a list and variables in the data frame are the elements of the list. Therefore we can apply a function to all the variables in a data frame by using the lapply() function

lapply() provides the result column wise. Hence, its syntax will not have the margin parameter

Syntax:

Syntax
lapply (dataset/object, function)

Convert the score matrix to a data frame and then perform the lapply() function

Syntax
score.df <- as.data.frame(score)
score.df

Output : The output is displayed as a list object as shown

CODE/PROGRAM/EXAMPLE
lapply(score.df, sum)

Output :  Note: apply() works on both rows and columns, but lappy() works only on columns

sapply() :

sapply() works similar to the lapply() function. When the argument simplify=F then the sapply() function returns the results in a list just like the lapply() function. However, when the argument simplify=T, the default, then the sapply() function returns the results in a simplified form, if at all possible.

Syntax:

Syntax
sapply (dataset/object, function, simplify)

Consider the score data frame from the previous example and then perform the sapply() function CODE/PROGRAM/EXAMPLE
sapply(score.df, sum)

Output : If the results are all scalars then sapply() returns a vector

If all the results are of the same length then, sapply() will return a matrix with a column for each element in the list, to which the function was applied

sapply() simplifies the result in to different objects depending on the type of the function. The example below illustrates the same

Consider the result of 4 students, who wrote multiple preliminary tests before the main exam. The data has been stored in the list format, as we have vectors of different length.

CODE/PROGRAM/EXAMPLE
marks.list
\$a
 78 75 76 76 80 63 61

\$b
 74 72 69 59 64 77 68 77 75 69 71 72

\$c
 75 84 90 76 74 63 54 76 73 81 82 80 82

\$d
 65 51 66 59 62 61 65 60

Scenario 1: To find the average marks of each student

CODE/PROGRAM/EXAMPLE
avg <- sapply(marks.list, mean)
print(avg)

a        b        c        d
72.71429 70.58333 76.57143 61.12500

is.vector(avg)
 TRUE

//Output is in the form of a Vector

Scenario 2: To find the range of each student

CODE/PROGRAM/EXAMPLE
range <- sapply(marks.list, range)

range
a b c d
[1,] 61 59 54 51
[2,] 80 77 90 66

is.matrix(range)
 TRUE

//Output is in the form of a Matrix

Scenario 3: To find the marks of the students whose marks are less than 65 marks, using sapply()

Create a function to get values less than 65. Invoke this function when performing sapply()

CODE/PROGRAM/EXAMPLE
lt65 <- function(x) {
return(x[x<65])
}

less65 <- sapply(marks.list, lt65)
less65

\$a
 63 61

\$b
 59 64

\$c
 63 54

\$d
 51 59 62 61 60

is.list(less65)
 TRUE

//Output is in the form of a List

tapply()

tapply() is applied to each of the cells which are defined by the categorical variables listed in the argument indices

Syntax:

Syntax
tapply (column A, column B, function)

Where,
column A: the column on which the operation has to be performed
column B: the column on which it has to be “categorized”
function: the type of the operation

Consider a data frame ‘math’ with name, section and marks as columns To know the aggregate marks in each section, tapply() can be used

CODE/PROGRAM/EXAMPLE
tapply(math\$marks, math\$section, sum)
a     b
290   289

Section ‘a’ got the highest marks in math

Consider ‘iris’ dataset from the dataset package ‘Iris’ has data of 50 flowers from 3 different species of iris

To get the mean of each species, use tapply() function

CODE/PROGRAM/EXAMPLE
tapply(iris\$Sepal.Length, iris\$Species, mean)
setosa versicolor virginica
5.006  5.936      6.588

tapply(iris\$Sepal.Width, iris\$Species, mean)
setosa versicolor virginica
3.428  2.770      2.974

tapply(iris\$Petal.Length, iris\$Species, mean)
setosa versicolor virginica
1.462  4.260      5.552

tapply(iris\$Petal.Width, iris\$Species, mean)
setosa versicolor virginica
0.246  1.326      2.026 Note: The by() function works similar to tapply() function

by() :

by() is an object-oriented wrapper for tapply(), applied to data frames

Consider the ‘iris’ dataset again. It gives the measurements in centimeters, of the variables associated with sepal length, sepal width, petal length and petal width for 50 flowers, from each of the 3 species of iris If we need to get the mean of each column, as per the species column, we can use by() function

Syntax
by(iris[,1:4], iris\$Species, colMeans)

Output : mapply() :

mapply() is a multivariate version of sapply(). mapply() applies the function to the first elements of each argument, the second elements, the third elements and so on. Arguments are recycled if necessary

Syntax:

Syntax
mapply (function, arg_1, arg_2,…)

Where,
function: the type of operation
args:  the data that needs to be processed

If we want data in the format shown below, we can use mapply() function

CODE/PROGRAM/EXAMPLE
mapply(rep, 1:4, 4:1)

repVals <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
repVals
[]
 1 1 1 1

[]
 2 2 2

[]
 3 3

[]
 4

Another mapply() example is shown below

Consider a custom function ‘noise’, which generates a random number depending on mean and standard deviation

CODE/PROGRAM/EXAMPLE
noise <- function(n, mean, sd) {
rnorm(n, mean, sd)
}

noise(2, 3, 1)
 0.950255 1.217040

If we use the noise function with simultaneously varying inputs as its arguments, as shown below, the result obtained is not desirable

CODE/PROGRAM/EXAMPLE
noise(1:5, 1:5, 2)
 -0.2760307 1.3783007 3.0931290 5.7079372 5.1899422

Output comprises of one random normal with mean 1, two random normals with mean 2 and so on

To generate a desirable output we can make use of list() or use the mapply() function

CODE/PROGRAM/EXAMPLE
#With List
list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))

Output : CODE/PROGRAM/EXAMPLE
#With mapply()
mapply(noise, 1:5, 1:5, 2)

Output : #Apply_Functions_in_r_programming #apply_in_r_programming #tapply_in_r_programming #r_language_apply #r_programming_lapply #r_programming_apply #r_language_lapply

(New page will open, for Comment)

Not yet commented...