Removing NA values in a vector
Lets create a vector containing NA values before removing NA values.
# Create a vector list with NA
> list1 <- c(10, 20, NA, 30, NA, 50)
> list1
[1] 10 20 NA 30 NA 50
As you can see based on the output of the RStudio console, our example vectors contain four numeric values and two NAs. Let’s remove these NAs…
For this we can simply create a new vector without any NA values in R. Using is.na
function which will get all NA values from the vector.
# Create a new vector without NA
> list2 <- list1[!is.na(list1)]
> list2
[1] 10 20 30 50
Another possibility is the removal of NA values within a function by using the na.rm
argument.
if we want to exclude missing values from mathematical operations use the na.rm = TRUE
argument. If you do not exclude these values most functions will return an NA
.
# A vector with NA values
> list1 <- c(10, 20, NA, 30, NA, 50)
# including NA values will produce an NA output
> mean(list1)
[1] NA
> sum(list1)
[1] NA
# excluding NA values will calculate the
# mathematical operation for all non-missing values
> mean(list1, na.rm=TRUE)
[1] 27.5
> sum(list1, na.rm=TRUE)
[1] 110
Removing NA values in a Data Frame
Another useful application of subsetting data frames is to find and remove rows with missing data. The R function to check for this is complete.cases()
. You can try this on the built-in dataset airquality
, a data frame with a fair amount of missing data:
First let’s check the structure of airquality dataset.
# airquality dataset
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
The results of complete.cases()
is a logical vector with the value TRUE for rows that are complete, and FALSE for rows that have some NA values.
> complete.cases(airquality)
[1] TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[25] FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[37] FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
[49] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[73] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
[85] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
[97] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE
[109] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE
[121] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[145] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE
# subset with complete.cases to get complete cases
> airquality[complete.cases(airquality), ]
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
...
# or subset with `!` operator to get incomplete case
> airquality[!complete.cases(airquality), ]
Ozone Solar.R Wind Temp Month Day
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
10 NA 194 8.6 69 5 10
11 7 NA 6.9 74 5 11
25 NA 66 16.6 57 5 25
26 NA 266 14.9 58 5 26
27 NA NA 8.0 57 5 27
32 NA 286 8.6 78 6 1
...
A shorthand alternative is to simply use na.omit()
to omit all rows containing missing values.
As always with R, there is more than one way of achieving your goal. In this case, you can make use of na.omit()
to omit all rows that contain NA values:
# or use na.omit() to get same as above with complete case
> na.omit(airquality)
Conclusion
We covered, how to deal with the Missing values in Vector with is.na
and na.rm
in a function. Also how to handle NA values in a Data Frame with complete.cases()
and na.omit()
function.
This brings the end of this Blog. We really appreciate your time.
Hope you liked it.
Do visit our page www.zigya.com/blog for more informative blogs on Data Science
Keep Reading! Cheers!
Zigya Academy
BEING RELEVANT