About the static dictionary

Cody L

unread,

Nov 12, 2020, 6:06:24 PM11/12/20

to Brotli

Where is the best place to ask questions about the contents of the static dictionary?

eus...@google.com

unread,

Dec 21, 2020, 8:19:32 AM12/21/20

to Brotli

Hello.

Feel free to ask such questions in this group.

Cody L

unread,

Dec 23, 2020, 12:48:41 AM12/23/20

to Brotli

Hello. Thank you for getting back to me.

Based on a little bit of reading I have done in this group and through the GitHub repository, I get the impression that the static dictionary was generated by analyzing many sources of text and intelligently selecting and extracting text fragments.

My question is, was there any inspection of the resulting static dictionary for possibly offensive words? I ask after inspecting the dictionary myself (i.e. the one provided in RFC 7932).

Thanks again for your time,

eus...@google.com

unread,

Jan 8, 2021, 7:13:39 AM1/8/21

to Brotli

I believe, Jyrki Alakuijala knows the answer.

There are few obscene words in the dictionary (including the f-word), but that reflects the state of the internet and other text corpora at the date of creation of dictionary.

Cody L

unread,

Jan 11, 2021, 10:33:58 PM1/11/21

to Brotli

Okay, makes sense. Hopefully the words aren't too obscene (at least not more obscene than the f-word). I noticed that the dictionary has non-English words; is there any chance either Jyrki Alakuijala or someone else on the Brotli team has looked into it?

eus...@google.com

unread,

Jan 12, 2021, 8:02:15 AM1/12/21

to Brotli

I've took a look at Russian part - it is clear (no obscene words). Though there are also Chinese, Hindi and Arabic parts...

Words that does not belong to those 5 languages are native language names (e.g. Ελληνικά for Greek), pieces of JS, CSS and HTML, and years (e.g. 2020)...

Cody L

unread,

Jan 19, 2021, 9:57:04 PM1/19/21

to Brotli

Cool for now. I'll keep watching this discussion in case someone else finds something/nothing.

Thank you for your time,

Reply all

Reply to author

Forward