People remain uncertain when it comes to summarizing actual data easily in R. There are a variety of choices. So who is the right one? I addressed the query below. At first, you must pick one. And become an expert on this. That’s how you’re going to switch to the next.
In this article, I will discuss the primary methods of summarizing data sets. Let’s hope this makes the trip much smoother than it seems.
Methods for summarizing data in R
apply()
Apply function returns a vector or array or a list of values achieved by applying a function to rows or columns. This is the easiest of all the tasks that can do this work. However, this feature is very unique to either row or column collapsing.
Usage
> apply(X, MARGIN, FUN, …)
Arguments
Values | Description |
x | an array, including a matrix. |
MARGIN | a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. |
FUN | the function to be applied. In the case of functions like + , %*% , etc., the function name must be backquoted or quoted. |
Example
# Create a matrix
> mat <- matrix(c(1:20), nrow = 5, ncol=4)
> mat
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
# 2 indicates columns
> apply(mat, 2, mean)
[1] 3 8 13 18
# 1 indicates rows
> apply(mat, 1, mean)
[1] 8.5 9.5 10.5 11.5 12.5
lapply()
lapply()
function is useful for performing operations on list objects and returns a list object of the same length as the original set. lappy()
returns a list of a similar length as the input list object, each element of which is the result of applying FUN to the corresponding element of the list. lapply()
takes list, vector, or data frame as input and gives output in a list.
Usage
> lapply(X, FUN, …)
Arguments
Values | Description |
x | A vector or an object |
FUN | Function applied to each element of x |
l in lapply() stands for list. The difference between lapply() and apply() lies between the output return. The output of lapply() is a list. lapply() can be used for other objects like data frames and lists.
lapply() function does not need MARGIN.
A very easy example can be to change the string value of a matrix to lower case with tolower function. We construct a matrix with the name of the famous movies. The name is in upper case format.
Example
> month <- month.abb
> month
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> lower_month <- lapply(month,tolower)
> str(lower_month)
List of 12
$ : chr "jan"
$ : chr "feb"
$ : chr "mar"
$ : chr "apr"
$ : chr "may"
$ : chr "jun"
$ : chr "jul"
$ : chr "aug"
$ : chr "sep"
$ : chr "oct"
$ : chr "nov"
$ : chr "dec"
sapply()
sapply()
function takes a list, vector, or data frame as input and gives output in vector or matrix. It is useful for operations on list objects and returns a list object of the same length as the original set. sapply()
function does the same job as lapply()
function but returns a vector.
Usage
> sapply(X, FUN)
Arguments
Values | Description |
x | A vector or an object. |
FUN | Function applied to each element of x. |
We can measure the minimum speed and stopping distances of cars from the cars dataset.
Example
# Let's load car dataset
> dt <- cars
> lmn_cars <- lapply(dt, min)
> smn_cars <- sapply(dt, min)
> lmn_cars
$speed
[1] 4
$dist
[1] 2
> smn_cars
speed dist
4 2
We can summarize the difference between apply()
, sapply()
and lapply()
in the following table:
Function | Arguments | Objective | Input | Output |
---|---|---|---|---|
apply | apply(x, MARGIN, FUN) | Apply a function to the rows or columns or both | Data frame or matrix | vector, list, array |
lapply | lapply(X, FUN) | Apply a function to all the elements of the input | List, vector or data frame | list |
sapply | sappy(X FUN) | Apply a function to all the elements of the input | List, vector or data frame | vector or matrix |
tapply()
Till now, all the function we discussed cannot do what Sql can achieve. Here is a function which completes the palette for R. Usage is “tapply(X, INDEX, FUN = NULL, …, simplify = TRUE)”, where X is “an atomic object, typically a vector” and INDEX is “a list of factors, each of same length as X”. Here is an example which will make the usage clear.
Usage
> tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)
Arguments
Values | Description |
x | an R object for which a split method exists. Typically vector-like, allowing subsetting with [ . |
INDEX | a list of one or more factor s, each of same length as X . The elements are coerced to factors by as.factor . |
FUN | Function applied to each element of x. |
Example
> df <- iris
> tp <- tapply(df$Petal.Length, df$Species, mean)
> tp
setosa versicolor virginica
1.462 4.260 5.552
>
by()
Now comes a slightly more complicated algorithm. Function ‘by’ is an object-oriented wrapper for ‘tapply’ applied to data frames. Hopefully the example will make it more clear.
Usage
> by(data, INDICES, FUN, …, simplify = TRUE)
Arguments
Values | Description |
data | an R object, normally a data frame, possibly a matrix. |
INDICES | a factor or a list of factors, each of length nrow(data) . |
FUN | a function to be applied to (usually data-frame) subsets of data . |
simplify | logical condition |
Example
> df <- iris
> mean_col <- by(df[,1:4], df$Species, colMeans)
df$Species: setosa
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.006 3.428 1.462 0.246
------------------------------------------------------------
df$Species: versicolor
Sepal.Length Sepal.Width Petal.Length Petal.Width
5.936 2.770 4.260 1.326
------------------------------------------------------------
df$Species: virginica
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026
Conclusion
Hence, we saw functions that can help for summarizing data in R. Functions like by()
, apply()
, sapply()
, tapply()
and lapply()
with definition and the usage along with an example for each.
This brings the end of this Blog. We really appreciate your time.
Hope you liked it.
Do visit our page www.zigya.com/blog for more informative blogs on Data Science
Keep Reading! Cheers!
Zigya Academy
BEING RELEVANT