block by joyrexus 7313435

Table munging terminology

Table Munging Terminology

We need a simple vocabulary for talking about the manipulation of tabular data structures. That is, we want a clean and elegant way of describing how to munge tables. We want to slice and dice them with minimal pain. Implementation can follow once we settle on the right verbs for data manipulation operations and the right nouns for post-manipulated elements. Yes, we have relational algebra codified in SQL and its mungy extensions. So we’re just talking about a coherent distillation of those parts potentially applicable to any tabular data store.

What follows are extracts from Hadley Wickham’s dplyr project README file.

Context

You need to split up a big data structure into homogeneous pieces, apply a function to each piece and then combine all the results back together. For example, you might want to:

Unary verbs

Binary verbs

As well as verbs that work on a single table, we also want verbs describing how to operate on two tables at a time, viz. your standard joins:

Other

We might also want to provide head() and print() methods for summary display of our tables.

Existing tools

Further Reading