Hi Zack,
it looks like you have a beautiful soup there.
Here's a jab.
suppressPackageStartupMessages({
library(ggplot2)
library(RColorBrewer)
})
lwc <- read.csv("language-wordcounts.csv", header=TRUE, comment.char="#")
## Why not just
## lwc$lang <- factor(lwc$lang)
d.f <- subset(lwc, lang!='(all)')
tots <- tapply(d.f$nwords, d.f$lang, sum)
## ## Threshhold of:
## unpopular <- names(tots[tots < 1e7])
## d.f$lang[d.f$lang %in% unpopular] <- "Other"
## tots <- tapply(d.f$nwords, d.f$lang, sum)
## see
http://www.cookbook-r.com/Manipulating_data/Sorting/
d.f$lang <- factor(d.f$lang, levels = names(tots[order(tots, decreasing
= T)]))
box <- ggplot(d.f, aes(x=name, y=nwords, fill=lang)) +
geom_bar(stat='identity', position='fill') +
coord_flip() +
## ## or
## theme(axis.text.x = element_text(angle=90, hjust=1, vjust=1)) +
scale_colour_gradient(low = "blue", high = "red")
ggsave("zw.pdf", box, width = 12, height = 6)
browseURL("zw.pdf")
Instead of trying to get all 88 on a plot, I suggest you make a category
"Other". See the code comments above. You can then note what was
unpopular in a caption, for example.
Best
Brian