Fill Values Within Groups
Last updated: January 18, 2021
Say you have long data, with one row per observation and multiple observations (value
) per subject (id
identifies each subject).
If some observations are missing, you can fill them from other observations for that subject with the code below.
library(tidyverse)
df <- data.frame(id=c(1,1,1,2,2,2), value=c(1,NA,NA,NA,NA,777))
df
## id value
## 1 1 1
## 2 1 NA
## 3 1 NA
## 4 2 NA
## 5 2 NA
## 6 2 777
If the first value is always filled in, the following will work:
df %>%
group_by(id) %>%
mutate(
value2=first(value)
)
## # A tibble: 6 x 3
## # Groups: id [2]
## id value value2
## <dbl> <dbl> <dbl>
## 1 1 1 1
## 2 1 NA 1
## 3 1 NA 1
## 4 2 NA NA
## 5 2 NA NA
## 6 2 777 NA
This did not work for id=2
because the value
column for that id
isn’t filled in until the last row.
Instead, try this:
df %>%
group_by(id) %>%
mutate(
value2=first(value[!is.na(value)])
)
## # A tibble: 6 x 3
## # Groups: id [2]
## id value value2
## <dbl> <dbl> <dbl>
## 1 1 1 1
## 2 1 NA 1
## 3 1 NA 1
## 4 2 NA 777
## 5 2 NA 777
## 6 2 777 777
This will fill using the first non-missing value within a given subject.
ℹ️ This page is part of my knowledge base for R, the popular statistical programming language. I attempt to use idiomatic practices with the tidyverse
collection of packages as much as possible. If you have suggestions for ways to improve this code, please contact me or use the survey link below..