Hi Nana,
> I'm very new in R and currently learning geom_jitter. I've read many
> sources about this geom but still can't understand how it is allowed to
> add noise in your data.
> pls_open.png
>
> For example, in these picture, OP explains that the bottom plot is using
> geom_jitter thus avoiding overplotting. But from what I understand,
> geom_jitter makes the plot changes the data point. In the geom_point,
> there is no x 6,1 and y 20,1 but in geom_jitter there is.
>
> My question is: is it allowed?
Yes.
> Sorry if it sounds stupid, I learned basic statistic in college but
> never about jitter.
It isn't a statistical technique, it's a presentational one.
Providing it is used with care - eg as in the example you presented
with the jitter applied to number of cylinders - so that it gives an
indication of the number of observations at each value of hwy, and does
not infuence the variability in hwy, then it is fine. There are
alternatives - you could colour each value based on the number
of observations at that unique point, or vary the size. But often those
would still overlap a lot.
Regards,
Ron.
PS There isn't much traffic on this list these days, with
community.rstudio.com being the preferred support forum for all
things tidyverse-related, including ggplot2, so I'd recommend
asking future questions over there.