Fill Values Within Groups

Last updated: January 18, 2021

Say you have long data, with one row per observation and multiple observations (value) per subject (id identifies each subject).

If some observations are missing, you can fill them from other observations for that subject with the code below.

library(tidyverse)

df <- data.frame(id=c(1,1,1,2,2,2), value=c(1,NA,NA,NA,NA,777))
df
##   id value
## 1  1     1
## 2  1    NA
## 3  1    NA
## 4  2    NA
## 5  2    NA
## 6  2   777

If the first value is always filled in, the following will work:

df %>% 
  group_by(id) %>% 
  mutate(
    value2=first(value)
  )
## # A tibble: 6 x 3
## # Groups:   id [2]
##      id value value2
##   <dbl> <dbl>  <dbl>
## 1     1     1      1
## 2     1    NA      1
## 3     1    NA      1
## 4     2    NA     NA
## 5     2    NA     NA
## 6     2   777     NA

This did not work for id=2 because the value column for that id isn’t filled in until the last row.

Instead, try this:

df %>% 
  group_by(id) %>% 
  mutate(
    value2=first(value[!is.na(value)])
  )
## # A tibble: 6 x 3
## # Groups:   id [2]
##      id value value2
##   <dbl> <dbl>  <dbl>
## 1     1     1      1
## 2     1    NA      1
## 3     1    NA      1
## 4     2    NA    777
## 5     2    NA    777
## 6     2   777    777

This will fill using the first non-missing value within a given subject.


ℹ️ This page is part of my knowledge base for R, the popular statistical programming language. I attempt to use idiomatic practices with the tidyverse collection of packages as much as possible. If you have suggestions for ways to improve this code, please contact me or use the survey link below..