You are ignoring the arguments in the mapper, that seems like a big one. Do you know why functions have arguments? If so, map functions are just functions. The other major thing is that you are calling arrange on a vector instead of a data frame. There's also a number of hard to decode steps like initializing variables and then ignoring them. I would suggest you look into strengthening your understanding of R functions, as well the dplyr library, neither the subject of this group. Once the basics are out of the way, you may want to review the concept of sorting in mapreduce. With big data, as the data is partitioned into multiple files, what sorted mean is not absolutely clear. The usual definition is to have each partition sorted internally and covering disjoint ranges of the data. This is hard to do with rmr2 as we don't have access to custom partitioners, which are necessary to create partitions as described. Moreover, data sorted this way still doesn't allow certain important operations on sorted data, such as applying a moving window operator or computing differences. On the positive side, many important operations can be achieved, sometimes more efficiently, without sorting: approximate quantiles and top and bottom k elements are two examples.