Why mpg data has duplicated rows?

43 views
Skip to first unread message

Hiroaki Yutani

unread,
Oct 22, 2017, 12:53:07 AM10/22/17
to ggplot2
Hi,

I found mpg data has some duplicated rows. Is this intended?
If anyone knows why, please help me better understand this data. Thanks!


reprex::reprex_info()
#> Created by the reprex package v0.1.1.9000 on 2017-10-22

data("mpg",  package = "ggplot2")

duplicated_rows <- mpg[duplicated(mpg) | duplicated(mpg, fromLast = TRUE), ]

duplicated_rows
#>     manufacturer               model displ year cyl      trans drv cty hwy
#> 19     chevrolet  c1500 suburban 2wd   5.3 2008   8   auto(l4)   r  14  20
#> 21     chevrolet  c1500 suburban 2wd   5.3 2008   8   auto(l4)   r  14  20
#> 40         dodge         caravan 2wd   3.3 1999   6   auto(l4)   f  16  22
#> 41         dodge         caravan 2wd   3.3 1999   6   auto(l4)   f  16  22
#> 42         dodge         caravan 2wd   3.3 2008   6   auto(l4)   f  17  24
#> 43         dodge         caravan 2wd   3.3 2008   6   auto(l4)   f  17  24
#> 53         dodge   dakota pickup 4wd   4.7 2008   8   auto(l5)   4  14  19
#> 54         dodge   dakota pickup 4wd   4.7 2008   8   auto(l5)   4  14  19
#> 59         dodge         durango 4wd   4.7 2008   8   auto(l5)   4  13  17
#> 61         dodge         durango 4wd   4.7 2008   8   auto(l5)   4  13  17
#> 65         dodge ram 1500 pickup 4wd   4.7 2008   8 manual(m6)   4  12  16
#> 67         dodge ram 1500 pickup 4wd   4.7 2008   8   auto(l5)   4  13  17
#> 68         dodge ram 1500 pickup 4wd   4.7 2008   8   auto(l5)   4  13  17
#> 69         dodge ram 1500 pickup 4wd   4.7 2008   8 manual(m6)   4  12  16
#> 78          ford        explorer 4wd   4.0 1999   6   auto(l5)   4  14  17
#> 80          ford        explorer 4wd   4.0 1999   6   auto(l5)   4  14  17
#> 101        honda               civic   1.6 1999   4   auto(l4)   f  24  32
#> 104        honda               civic   1.6 1999   4   auto(l4)   f  24  32
#>     fl      class
#> 19   r        suv
#> 21   r        suv
#> 40   r    minivan
#> 41   r    minivan
#> 42   r    minivan
#> 43   r    minivan
#> 53   r     pickup
#> 54   r     pickup
#> 59   r        suv
#> 61   r        suv
#> 65   r     pickup
#> 67   r     pickup
#> 68   r     pickup
#> 69   r     pickup
#> 78   r        suv
#> 80   r        suv
#> 101  r subcompact
#> 104  r subcompact

# For example, this two rows are exactly the same:
dplyr::glimpse(duplicated_rows[1:2,])
#> Observations: 2
#> Variables: 11
#> $ manufacturer <chr> "chevrolet", "chevrolet"
#> $ model        <chr> "c1500 suburban 2wd", "c1500 suburban 2wd"
#> $ displ        <dbl> 5.3, 5.3
#> $ year         <int> 2008, 2008
#> $ cyl          <int> 8, 8
#> $ trans        <chr> "auto(l4)", "auto(l4)"
#> $ drv          <chr> "r", "r"
#> $ cty          <int> 14, 14
#> $ hwy          <int> 20, 20
#> $ fl           <chr> "r", "r"
#> $ class        <chr> "suv", "suv"


Best,
Hiroaki Yutani

Tom Hopper

unread,
Nov 8, 2017, 10:23:01 AM11/8/17
to ggplot2
Not a direct answer to your question, but it may be worth noting that ggplot2’s mpg data is a very abbreviated subset of the much larger EPA vehicles data (nearly 37000 observations, 83 variables, about 25MB in memory), available for download via http://www.fueleconomy.gov/feg/epadata/vehicles.csv.zip

There’s some data cleaning to be done when downloading that set; feel free to email me off-list if you’d like a code snippet.

Regards,

Tom

--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

signature.asc

Hiroaki Yutani

unread,
Nov 8, 2017, 10:54:57 PM11/8/17
to ggplot2
Thanks Tom! I will take a look :)

Regards,
Hiroaki Yutani

P.S. I already got some answers on Rstudio Community: https://community.rstudio.com/t/why-mpg-data-in-ggplot2-has-duplicated-rows/2213?u=yutannihilation


2017年11月9日(木) 0:22 Tom Hopper <tomh...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages