## R Apply Functions

### Apply Functions

Control loops are preferred in programming languages like c, java, etc… but R has an efficient way of performing loops by using apply functions

R has multiple apply functions, for different purposes

• apply(): applies a function over the margins of an array
• lapply(): loops over a list of elements to evaluate a function on each of them
• sapply(): same as lapply(), but simplifies the result
• tapply(): applies a function over subsets of a vector
• mapply(): multivariate version of lapply()

Apply functions are an efficient way to perform iterations

Returns a vector or a list of values, obtained by applying a function to margins of an array or matrix Consider a matrix ‘score’

If we need to get the total score of each individual student in the class use sum() function on each column

CODE/PROGRAM/EXAMPLE
```list(sum(score[,1]), sum(score[,2]), sum(score[,3]))
[[1]]
[1] 414

[[2]]
[1] 422

[[3]]
[1] 421```

### apply() :

Syntax :

Syntax
```apply (dataset/object, margin, function)

Where,
dataset: the object on which we perform the operations
margin: this is either 1 or 2 (1 performs operation on rows and 2 performs operations on columns)
function: the type of operation, both built-in and custom functions are valid options```

Consider the matrix ‘score’ from previous example

To get the total score this time, use apply() function

Syntax
`apply(score, 2, sum)`

Output :

Syntax
`apply(score, 1, sum)`

Output :

### lapply()

lapply() is especially useful while dealing with lists & data frames. In R the data frame is considered as a list and variables in the data frame are the elements of the list. Therefore we can apply a function to all the variables in a data frame by using the lapply() function

lapply() provides the result column wise. Hence, its syntax will not have the margin parameter

Syntax:

Syntax
`lapply (dataset/object, function)`

Convert the score matrix to a data frame and then perform the lapply() function

Syntax
```score.df <- as.data.frame(score)
score.df```

Output :

The output is displayed as a list object as shown

CODE/PROGRAM/EXAMPLE
`lapply(score.df, sum)`

Output :

Note: apply() works on both rows and columns, but lappy() works only on columns

### sapply() :

sapply() works similar to the lapply() function. When the argument simplify=F then the sapply() function returns the results in a list just like the lapply() function. However, when the argument simplify=T, the default, then the sapply() function returns the results in a simplified form, if at all possible.

Syntax:

Syntax
`sapply (dataset/object, function, simplify)`

Consider the score data frame from the previous example and then perform the sapply() function

CODE/PROGRAM/EXAMPLE
`sapply(score.df, sum)`

Output :

If the results are all scalars then sapply() returns a vector

If all the results are of the same length then, sapply() will return a matrix with a column for each element in the list, to which the function was applied

sapply() simplifies the result in to different objects depending on the type of the function. The example below illustrates the same

Consider the result of 4 students, who wrote multiple preliminary tests before the main exam. The data has been stored in the list format, as we have vectors of different length.

CODE/PROGRAM/EXAMPLE
```marks.list
\$a
[1] 78 75 76 76 80 63 61

\$b
[1] 74 72 69 59 64 77 68 77 75 69 71 72

\$c
[1] 75 84 90 76 74 63 54 76 73 81 82 80 82

\$d
[1] 65 51 66 59 62 61 65 60```

Scenario 1: To find the average marks of each student

CODE/PROGRAM/EXAMPLE
```avg <- sapply(marks.list, mean)
print(avg)

a        b        c        d
72.71429 70.58333 76.57143 61.12500

is.vector(avg)
[1] TRUE

//Output is in the form of a Vector```

Scenario 2: To find the range of each student

CODE/PROGRAM/EXAMPLE
```range <- sapply(marks.list, range)

range
a b c d
[1,] 61 59 54 51
[2,] 80 77 90 66

is.matrix(range)
[1] TRUE

//Output is in the form of a Matrix```

Scenario 3: To find the marks of the students whose marks are less than 65 marks, using sapply()

Create a function to get values less than 65. Invoke this function when performing sapply()

CODE/PROGRAM/EXAMPLE
```lt65 <- function(x) {
return(x[x<65])
}

less65 <- sapply(marks.list, lt65)
less65

\$a
[1] 63 61

\$b
[1] 59 64

\$c
[1] 63 54

\$d
[1] 51 59 62 61 60

is.list(less65)
[1] TRUE

//Output is in the form of a List```

### tapply()

tapply() is applied to each of the cells which are defined by the categorical variables listed in the argument indices

Syntax:

Syntax
```tapply (column A, column B, function)

Where,
column A: the column on which the operation has to be performed
column B: the column on which it has to be “categorized”
function: the type of the operation```

Consider a data frame ‘math’ with name, section and marks as columns

To know the aggregate marks in each section, tapply() can be used

CODE/PROGRAM/EXAMPLE
```tapply(math\$marks, math\$section, sum)
a     b
290   289```

Section ‘a’ got the highest marks in math

Consider ‘iris’ dataset from the dataset package

‘Iris’ has data of 50 flowers from 3 different species of iris

To get the mean of each species, use tapply() function

CODE/PROGRAM/EXAMPLE
```tapply(iris\$Sepal.Length, iris\$Species, mean)
setosa versicolor virginica
5.006  5.936      6.588

tapply(iris\$Sepal.Width, iris\$Species, mean)
setosa versicolor virginica
3.428  2.770      2.974

tapply(iris\$Petal.Length, iris\$Species, mean)
setosa versicolor virginica
1.462  4.260      5.552

tapply(iris\$Petal.Width, iris\$Species, mean)
setosa versicolor virginica
0.246  1.326      2.026```

Note: The by() function works similar to tapply() function

### by() :

by() is an object-oriented wrapper for tapply(), applied to data frames

Consider the ‘iris’ dataset again. It gives the measurements in centimeters, of the variables associated with sepal length, sepal width, petal length and petal width for 50 flowers, from each of the 3 species of iris

If we need to get the mean of each column, as per the species column, we can use by() function

Syntax
`by(iris[,1:4], iris\$Species, colMeans)`

Output :

### mapply() :

mapply() is a multivariate version of sapply(). mapply() applies the function to the first elements of each argument, the second elements, the third elements and so on. Arguments are recycled if necessary

Syntax:

Syntax
```mapply (function, arg_1, arg_2,…)

Where,
function: the type of operation
args:  the data that needs to be processed```

If we want data in the format shown below, we can use mapply() function

CODE/PROGRAM/EXAMPLE
```mapply(rep, 1:4, 4:1)

repVals <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
repVals
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4```

Another mapply() example is shown below

Consider a custom function ‘noise’, which generates a random number depending on mean and standard deviation

CODE/PROGRAM/EXAMPLE
```noise <- function(n, mean, sd) {
rnorm(n, mean, sd)
}

noise(2, 3, 1)
[1] 0.950255 1.217040```

If we use the noise function with simultaneously varying inputs as its arguments, as shown below, the result obtained is not desirable

CODE/PROGRAM/EXAMPLE
```noise(1:5, 1:5, 2)
[1] -0.2760307 1.3783007 3.0931290 5.7079372 5.1899422```

Output comprises of one random normal with mean 1, two random normals with mean 2 and so on

To generate a desirable output we can make use of list() or use the mapply() function

CODE/PROGRAM/EXAMPLE
```#With List
list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))```

Output :

CODE/PROGRAM/EXAMPLE
```#With mapply()
mapply(noise, 1:5, 1:5, 2)```

Output :

#Apply_Functions_in_r_programming #apply_in_r_programming #tapply_in_r_programming #r_language_apply #r_programming_lapply #r_programming_apply #r_language_lapply

### (New page will open, for Comment)

Not yet commented...