> ds = read.csv("helpmiss.csv")
> summarise(group_by(select(filter(mutate(ds,
+ sex=ifelse(female==1, "F", "M")), !is.na(pcs)), age, pcs, sex),
+ sex), meanage=mean(age), meanpcs=mean(pcs),n=n())
# A tibble: 2 × 4
sex meanage meanpcs n
<chr> <dbl> <dbl> <int>
1 F 36.07207 44.86292 111
2 M 35.63025 49.07435 357
In this example, the output of the mutate() function is speci ed as the input to the filter() function, which prunes observations that are missing the pcs variable. The output from this function is sent to the select() function to create a subset of variables, and the results provided to the group by() function, which collapse the dataset by gender. The summarise() function calculates the average age and PCS (physical component score)
as well as the sample size. This nested code is very dificult for humans to parse. An alternative would be to save the intermediate results from the nested functions.
> ds2 = mutate(ds, sex=ifelse(female==1, "F", "M"))
> ds3 = filter(ds2, !is.na(pcs))
> ds4 = select(ds3, age, pcs, sex)
> ds5 = group_by(ds4, sex)
> summarise(ds5, meanage=mean(age), meanpcs=mean(pcs),n=n())
# A tibble: 2 × 4
sex meanage meanpcs n
<chr> <dbl> <dbl> <int>
1 F 36.07207 44.86292 111
2 M 35.63025 49.07435 357
A disadvantage of this (somewhat clunky) approach is that it involves a lot of unnecessary copying. This may be particularly inefficient when processing large datasets. The same operations are done in a different (and likely more readable) manner using the %>% operator.
> ds %>%
+ mutate(sex=ifelse(female==1, "F", "M")) %>%
+ filter(!is.na(pcs)) %>%
+ select(age, pcs, sex) %>%
+ group_by(sex) %>%
+ summarise(meanage=mean(age), meanpcs=mean(pcs),n=n())
# A tibble: 2 × 4
sex meanage meanpcs n
<chr> <dbl> <dbl> <int>
1 F 36.07207 44.86292 111
2 M 35.63025 49.07435 357
Here, it is clear what each operation within the \pipe stream" is doing. It is straightforward to debug expressions in this manner by just leaving off the %>% at each line: this will only evaluate the set of functions called to that point and display the intermediate output.
Reference
J., N., 2015. Using R And Rstudio For Data Management, Statistical Analysis, And Graphics. Chapman And Hall/crc.
No hay comentarios:
Publicar un comentario