Chato Solutions: Group by using Dplyr

Rigoberto Chandomi Vazquez

Group_by() functions let us group data using a variable to make any operations by group. It is very useful when you have a categorical variable and you have apply an operation in every group.

In this example using porosity, permeability and rock typing data from cores. First load the from CSV file and load dplyr package

library(dplyr)
core_data <- read.csv("RT_cores.csv")

head(core_data)

    WELL      MD POROSITY PERMEABILITY      PTS   RT
1 WELL_1 4038.35    0.201     1973.924 34.96978 Mega
2 WELL_1 4038.85    0.203     1158.927 25.35061 Mega
3 WELL_1 4039.35    0.207     1935.117 33.69655 Mega
4 WELL_1 4039.85    0.206     1003.681 23.00154 Mega
5 WELL_1 4040.35    0.190      901.498 23.15667 Mega
6 WELL_1 4040.85    0.201      581.095 17.03789 Mega

Now, we can use some statistical functions to analysis our data by every rock type. Using summarise() function we can create new column with to create summary variables like mean, min, max, etc. In this case porosity is evaluated.

core_data %>% 
  group_by(RT) %>%
  summarise(Mean = mean(POROSITY),
            Max = max(POROSITY),
            Min = min(POROSITY),
            SD = sd(POROSITY))

# A tibble: 5 x 5
  RT      Mean   Max   Min     SD
  <fct>  <dbl> <dbl> <dbl>  <dbl>
1 Macro 0.209  0.282 0.079 0.0256
2 Mega  0.206  0.246 0.145 0.0213
3 Meso  0.175  0.232 0.116 0.0317
4 Micro 0.0923 0.145 0.017 0.0365
5 Nano  0.0439 0.096 0.031 0.0232

We can use multiple variables to group the data, in this case grouping by well

core_data %>% 
  group_by(WELL, RT) %>%
  summarise(Mean = mean(POROSITY),
            Max = max(POROSITY),
            Min = min(POROSITY),
            SD = sd(POROSITY))

# A tibble: 9 x 6
# Groups:   WELL [2]
  WELL   RT      Mean   Max   Min     SD
  <fct>  <fct>  <dbl> <dbl> <dbl>  <dbl>
1 WELL_1 Macro 0.214  0.236 0.079 0.0302
2 WELL_1 Mega  0.203  0.238 0.145 0.0202
3 WELL_1 Meso  0.196  0.232 0.123 0.0306
4 WELL_1 Micro 0.0819 0.144 0.017 0.0483
5 WELL_2 Macro 0.208  0.282 0.104 0.0241
6 WELL_2 Mega  0.209  0.246 0.157 0.0220
7 WELL_2 Meso  0.161  0.201 0.116 0.0244
8 WELL_2 Micro 0.101  0.145 0.072 0.0217
9 WELL_2 Nano  0.0439 0.096 0.031 0.0232

Using mutate function, we can add a new column with a calculate value and then using ungroup to remove grouping

core_data <- core_data %>% 
  group_by(RT) %>%
  mutate(Mean_By_RT = mean(POROSITY)) %>%
  ungroup()

head(core_data)

# A tibble: 6 x 7
  WELL      MD POROSITY PERMEABILITY   PTS RT    Mean_By_RT
  <fct>  <dbl>    <dbl>        <dbl> <dbl> <fct>      <dbl>
1 WELL_1 4038.    0.201        1974.  35.0 Mega       0.206
2 WELL_1 4039.    0.203        1159.  25.4 Mega       0.206
3 WELL_1 4039.    0.207        1935.  33.7 Mega       0.206
4 WELL_1 4040.    0.206        1004.  23.0 Mega       0.206
5 WELL_1 4040.    0.19          901.  23.2 Mega       0.206
6 WELL_1 4041.    0.201         581.  17.0 Mega       0.206

Comment on this article Share:

Group by using Dplyr

Citation