Make operations by group using dplyr package
Group_by() functions let us group data using a variable to make any operations by group. It is very useful when you have a categorical variable and you have apply an operation in every group.
In this example using porosity, permeability and rock typing data from cores. First load the from CSV file and load dplyr package
WELL MD POROSITY PERMEABILITY PTS RT
1 WELL_1 4038.35 0.201 1973.924 34.96978 Mega
2 WELL_1 4038.85 0.203 1158.927 25.35061 Mega
3 WELL_1 4039.35 0.207 1935.117 33.69655 Mega
4 WELL_1 4039.85 0.206 1003.681 23.00154 Mega
5 WELL_1 4040.35 0.190 901.498 23.15667 Mega
6 WELL_1 4040.85 0.201 581.095 17.03789 Mega
Now, we can use some statistical functions to analysis our data by every rock type. Using summarise() function we can create new column with to create summary variables like mean, min, max, etc. In this case porosity is evaluated.
core_data %>%
group_by(RT) %>%
summarise(Mean = mean(POROSITY),
Max = max(POROSITY),
Min = min(POROSITY),
SD = sd(POROSITY))
# A tibble: 5 x 5
RT Mean Max Min SD
<fct> <dbl> <dbl> <dbl> <dbl>
1 Macro 0.209 0.282 0.079 0.0256
2 Mega 0.206 0.246 0.145 0.0213
3 Meso 0.175 0.232 0.116 0.0317
4 Micro 0.0923 0.145 0.017 0.0365
5 Nano 0.0439 0.096 0.031 0.0232
We can use multiple variables to group the data, in this case grouping by well
core_data %>%
group_by(WELL, RT) %>%
summarise(Mean = mean(POROSITY),
Max = max(POROSITY),
Min = min(POROSITY),
SD = sd(POROSITY))
# A tibble: 9 x 6
# Groups: WELL [2]
WELL RT Mean Max Min SD
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 WELL_1 Macro 0.214 0.236 0.079 0.0302
2 WELL_1 Mega 0.203 0.238 0.145 0.0202
3 WELL_1 Meso 0.196 0.232 0.123 0.0306
4 WELL_1 Micro 0.0819 0.144 0.017 0.0483
5 WELL_2 Macro 0.208 0.282 0.104 0.0241
6 WELL_2 Mega 0.209 0.246 0.157 0.0220
7 WELL_2 Meso 0.161 0.201 0.116 0.0244
8 WELL_2 Micro 0.101 0.145 0.072 0.0217
9 WELL_2 Nano 0.0439 0.096 0.031 0.0232
Using mutate function, we can add a new column with a calculate value and then using ungroup to remove grouping
core_data <- core_data %>%
group_by(RT) %>%
mutate(Mean_By_RT = mean(POROSITY)) %>%
ungroup()
head(core_data)
# A tibble: 6 x 7
WELL MD POROSITY PERMEABILITY PTS RT Mean_By_RT
<fct> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
1 WELL_1 4038. 0.201 1974. 35.0 Mega 0.206
2 WELL_1 4039. 0.203 1159. 25.4 Mega 0.206
3 WELL_1 4039. 0.207 1935. 33.7 Mega 0.206
4 WELL_1 4040. 0.206 1004. 23.0 Mega 0.206
5 WELL_1 4040. 0.19 901. 23.2 Mega 0.206
6 WELL_1 4041. 0.201 581. 17.0 Mega 0.206
For attribution, please cite this work as
Vazquez (2022, July 17). Chato Solutions: Group by using Dplyr. Retrieved from https://www.chatosolutions.com/posts/2022-07-17-dlyr1/
BibTeX citation
@misc{vazquez2022group, author = {Vazquez, Rigoberto Chandomi}, title = {Chato Solutions: Group by using Dplyr}, url = {https://www.chatosolutions.com/posts/2022-07-17-dlyr1/}, year = {2022} }