e2guardian 5.3 banned phrase lists not working with youtube

447 views
Skip to first unread message

justi...@gmail.com

unread,
Jan 8, 2020, 4:02:15 PM1/8/20
to e2guardian
Hi there,

I am wanting to use phrase lists to block pages with keywords.
For sake of example, I am going to use the innocuous phrase "puppy love" as my phrase, so as not to scandalize anyone. So in bannedphraselist I add:
<puppy love>

If I go to google and type "puppy love" then it blocks because of the banned phrase. However, if I go to YouTube and type "puppy love" it loads the results just fine, and I can even click on a video with "puppy love" in the title or in the description, and the page loads.

Is there some secret to get this working with YouTube? Is there some weird encoding or something?
Right now I am using squid with SSL bump and e2guardian as an ICAP server which squid connects to. I am running all inside of a docker container, not that that matters any. The rest of the config is pretty standard. Let me know if you need any additional information.

Thanks,
-Justin

Philip Pearce

unread,
Jan 14, 2020, 10:38:47 AM1/14/20
to justi...@gmail.com, e2guardian
Hi Justin,

You need to add to the searchregexplist the following line:

"^http://[0-9a-z]+\.youtube\.[a-z]+[-/%.0-9a-z]*\?search_query=([^&]*).*"->"\1"

This regexp is used to 'pick out' the search terms from the request.

I'll add this into the default for this list as well.

Regards
Philip


--
E2guardian:
https://groups.google.com/d/forum/e2guardian
Github:
https://github.com/e2guardian/e2guardian
Follow us on twitter:
https://twitter.com/e2guardian
---
You received this message because you are subscribed to the Google Groups "e2guardian" group.
To unsubscribe from this group and stop receiving emails from it, send an email to e2guardian+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/e2guardian/550e6a84-c221-4ad5-aad7-73eb0f2d2457%40googlegroups.com.

Justin Schwartzbeck

unread,
Jan 14, 2020, 12:41:44 PM1/14/20
to Philip Pearce, e2guardian
Thanks Philip, but this still does not answer why phrase lists don't work on YouTube. Regexpurllists can be easily circumvented by intentionally misspelling search terms so they are autocorrected in the body. For example, you can block the search for "puppy love" as in my earlier example, but if I search "pupy luv" then the search will go through and YouTube will autocorrect to "puppy love." This is where phrase lists would be handy

Philip Pearce

unread,
Jan 15, 2020, 5:27:15 AM1/15/20
to Justin Schwartzbeck, e2guardian
Ok, I've had a quick look at youtube response page.  It is very large and the key words 'puppy love' do not appear until line 14661.    Try increasing the maxcontentfiltersize and maxcontentramcachescansize figures in e2guardian.conf.  

Suggest start with maxcontentfiltersize = 2000.


Philip Pearce

unread,
Jan 15, 2020, 6:24:52 AM1/15/20
to Justin Schwartzbeck, e2guardian
Ok, I've had a quick look at youtube response page.  It is very large and the key words 'puppy love' do not appear until line 14661.    Try increasing the maxcontentfiltersize and maxcontentramcachescansize figures in e2guardian.conf.  

I've now tested this and increasing the two params above to 3000 fixes the issue.

Justin Michael Schwartzbeck

unread,
Jan 15, 2020, 9:46:44 PM1/15/20
to Philip Pearce, e2guardian
Thanks Philip, I will update my config when I get the chance.

John Smith

unread,
Feb 21, 2021, 11:25:44 PM2/21/21
to e2guardian
Hey Philip, I just posted a question on another thread as well. This indeed doesn't seem to work just with youtube on the initial page load. Everything gets blocked on page reload though.

el

unread,
Feb 23, 2021, 2:57:04 PM2/23/21
to e2guardian
Actually after the initial search on youtube, you can search for any banned term, and keep clicking videos with banned titles. As long as you are in the same browser tab, it will never be filtered. Filtering and banning only occurs on page refresh.

The only solution I found to this problem is to enforce safe search on youtube.

Justin Michael Schwartzbeck

unread,
Feb 23, 2021, 9:16:52 PM2/23/21
to el, e2guardian
I have found that banned phrases on youtube worked after I followed Phillip's advice and changed maxcontentfiltersize and maxcontentramcachescansize to 4096 (4MB). This was with e2guardian 5.3.

You received this message because you are subscribed to a topic in the Google Groups "e2guardian" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/e2guardian/trw13WxXmRM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to e2guardian+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/e2guardian/6cd6d944-22c8-4de2-bd0f-dce5e67e611cn%40googlegroups.com.

Justin Michael Schwartzbeck

unread,
Feb 26, 2021, 8:43:50 PM2/26/21
to el, e2guardian
Ok, so it is interesting that this came up again. I tested it again tonight, with both bannedphraselist and bannedregexpurllist

for the bannedregexpurllist, I did the regex: youtube.com.*roberta
for the bannedphraselist, I did: <youtube>,<roberta>

What I am finding is that if I browse to youtube.com and then do a search for "roberta" then I see results, and can browse to videos. But if I manually go to the URL bar and click and press enter, like I am manually entering

https://www.youtube.com/results?searchquery=roberta

then it will block. It will also block on phrase matching when I try to go directly to a youtube video. I wonder why this would be.

el

unread,
Feb 27, 2021, 2:16:49 AM2/27/21
to e2guardian
Yup, there you go.

Changing both maxcontentfiltersize and maxcontentramcachescansize to 4096 does not fix this issue.

The only way I found to block adult videos on youtube is to enforce safe search.

e2guardian does not currently work fully youtube.

Justin Schwartzbeck

unread,
Feb 27, 2021, 9:54:19 AM2/27/21
to el, e2guardian
I don't know about you but I am using icap. I was looking at the encapsulated http for both cases (clicking directly versus opening in a new tab). The process looks different. But the content should be the same. I think I will turn on debug logs later and look at the difference.

Justin Michael Schwartzbeck

unread,
Feb 27, 2021, 10:04:25 PM2/27/21
to el, e2guardian
If anyone wants to look at it, here are the logs for two cases. One where I click on a video link from youtube search results, in which case it is not blocked. The other where I open in a new tab, in which case it is blocked by phrase.

I may start investigating myself, but learning a code base is a long and difficult process.
debuglogs_noblock
debuglogs_block

Philip Pearce

unread,
Feb 28, 2021, 12:16:50 PM2/28/21
to Justin Michael Schwartzbeck, e2guardian, el
Looking at the logs this seems to be due to the convoluted way youtube does calls.

When opened in a new tab the call is including the search term in the url, and e2g is blocking it because you have a the entry in bannedregexpurllist which matches this.    It is not blocking due to content-checking (phrases).

When clicking on link in results, call is https://www.youtube.com/watch?v=Jk3nwzc9s1g&pbj=1 so no search term in url.   One of the requests returns a application/json mimetype, so have you tried adding this to the textmimetypes setting in e2guardianf1.conf so that this gets checked as well as plain text mimetypes?

Philip


Justin Michael Schwartzbeck

unread,
Feb 28, 2021, 10:26:16 PM2/28/21
to Philip Pearce, e2guardian, el
Hey Philip,

Thanks for letting me know about the textmimetypes. You saved me a lot of time digging into that code myself (though perhaps sometime soon I might learn the codebase so that I can become a contributor! I love this project).

So I just uncommented that whole line, and then tried doing a youtube search for "roberta." While it doesn't actually display the block page, it no longer does the search, so basically it is like you are stuck at the main youtube page. Also when I tried to access the video, it wouldn't load at all, it just kind of spins there and on the video I just get an error.

As a developer I can understand why all this happens, there is some javascript magic and JSON data being passed under the hood, as opposed to actually navigating to that page the normal way. Not much e2guardian or the developers can do about that. I would certainly say this is plenty satisfactory as it gets the job done!

So John Smith and el, uncomment the textmimetypes line in /etc/e2guardian/e2guardianf1.conf

Thanks again Philip.
Message has been deleted

el

unread,
Jul 13, 2021, 12:41:28 PM7/13/21
to e2guardian
Thank you all.

Uncommenting the textmimetypes line fixes youtube.com, but it does not fix searching on facebook as it has the same problem..

Dan Schmidt

unread,
Jul 14, 2021, 1:30:18 PM7/14/21
to el, e2guardian
Odd, this doesn't work at all for me.  Also, the mimetypes seem to break certain games. 

Reply all
Reply to author
Forward
0 new messages