I feel a bit discouraged

556 views
Skip to first unread message

jho...@bruinmail.slcc.edu

unread,
Dec 3, 2015, 6:07:40 PM12/3/15
to e2guardian
After all my attempts, I'm left wondering if phrase list and regular expression URL filtering is worth the effort. I realize it would never be perfect, but I had hoped for better results.

I'm running e2guardian development branch with man-in-the-middle explicit proxy enabled on Debian 8. I used the configure options from FredBcode's sticky post titled "E2guardian Debian Package." I've also uncommented the new textmimetypes option in hopes it would filter youtube searches (I inquired about this in an earlier post, but received no replies).

With textmimetypes on, e2guardian still does not filter Youtube searches.

In all my tests, I search for "nude women."

In Firefox, when I browse to google.com and search from the site, the search is properly filtered. However, searches from within the Firefox's search bar are not filtered (the search provider is Google).

The opposite is true in Internet Explorer 11. Searches from google.com are not filtered, but searches in the search bar are correctly filtered (again Google is the provider).

Searches with Yahoo from Firefox are not filtered with either method.

Bing.com offers the only consistently acceptable behavior; both types of searches are filtered correctly, including those from https://bing.com.

EDIT: The search in Firefox for Windows from the google.com site doesn't work after all; the e2guardian block screen appears as expected. However, when I press the back button, the search results appear as if never filtered. Arrgh!

So... I'm frustrated. This is an awkward question to ask here, but would a paid service such as SafeSquid provide better results? Or am I doomed to run in circles with any MTIM filter?

I would appreciate some encouraging news. Do others observe the same behavior? Can I tweak my configuration somehow?

Can someone explain why it doesn't work as expected?

This is the access.log entry from the Firefox search bar attempt:

2015.12.3 15:52:14 - 192.168.1.4 https://www.google.com/search?q=nude+women&ie=utf-8&oe=utf-8 GET 139340 0 1 200 text/html

Philip Pearce

unread,
Dec 7, 2015, 6:11:24 AM12/7/15
to e2guardian
Jholt27,

E2guardian is very flexible and can be configured to do what you want, but is not, as you can see, an 'out-of-box' solution.   If you want 'out of the box' and commercial support, then you need to purchase either Protex (www.protex.e2bn.org) or some other product that will meet you requirements.

In order to get the results you want then you need to have searchregexplist configured correctly.

The reason that something will be blocked on main search page but not on browser search bar is simply that they often use a different format call. So you must cater for both in your regular expressions.   It likely that the Facebook search blocking is not working because you have no entry for Facebook in searchregexplist.    I've attached an example searchregexplist which does work with facebook and web page/search bar for google, bing etc.   You will need to extend this if you want to filter search terms from other search engines you may need.

Other suggestions:-

If content filtering with phrases is not working as expected then try reducing the naughtynesslimit value.

Set searchtermlimit to a much lower value than naughtynesslimit.

Search page returns are now often done via javascript and other non-html and non-standard formats which makes content filtering of search page results problematic.   Also, if you block on search page results then a single bad entry return will block lots of other useful information.   So, it may be better to block on search term and then rely on content filtering to block the actual sites at user click through.


Regards

Philip



--
You received this message because you are subscribed to the Google Groups "e2guardian" group.
To unsubscribe from this group and stop receiving emails from it, send an email to e2guardian+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

searchregexplist

jho...@bruinmail.slcc.edu

unread,
Dec 7, 2015, 10:04:48 AM12/7/15
to e2guardian, philip...@e2bn.org
On Monday, December 7, 2015 at 4:11:24 AM UTC-7, Philip wrote:
> Jholt27,
>
> E2guardian is very flexible and can be configured to do what you want, but is not, as you can see, an 'out-of-box' solution.   If you want 'out of the box' and commercial support, then you need to purchase either Protex (www.protex.e2bn.org) or some other product that will meet you requirements.
>
> In order to get the results you want then you need to have searchregexplist configured correctly.
>
> The reason that something will be blocked on main search page but not on browser search bar is simply that they often use a different format call. So you must cater for both in your regular expressions.   It likely that the Facebook search blocking is not working because you have no entry for Facebook in searchregexplist.    I've attached an example searchregexplist which does work with facebook and web page/search bar for google, bing etc.   You will need to extend this if you want to filter search terms from other search engines you may need.
>
> Other suggestions:-
>
> If content filtering with phrases is not working as expected then try reducing the naughtynesslimit value.
>
> Set searchtermlimit to a much lower value than naughtynesslimit.
>
> Search page returns are now often done via javascript and other non-html and non-standard formats which makes content filtering of search page results problematic.   Also, if you block on search page results then a single bad entry return will block lots of other useful information.   So, it may be better to block on search term and then rely on content filtering to block the actual sites at user click through.
>
>
>
> Regards
>
> Philip
>
>
> From: jho...@bruinmail.slcc.edu
> To: "e2guardian" <e2gua...@googlegroups.com>
> Sent: Thursday, 3 December, 2015 11:07:40 PM
> Subject: I feel a bit discouraged
>
> After all my attempts, I'm left wondering if phrase list and regular expression URL filtering is worth the effort. I realize it would never be perfect, but I had hoped for better results.
>
> I'm running e2guardian development branch with man-in-the-middle explicit proxy enabled on Debian 8. I used the configure options from FredBcode's sticky post titled "E2guardian Debian Package." I've also uncommented the new textmimetypes option in hopes it would filter youtube searches (I inquired about this in an earlier post, but received no replies).
>
> With textmimetypes on, e2guardian still does not filter Youtube searches.
>
> In all my tests, I search for "nude women."
>
> In Firefox, when I browse to google.com and search from the site, the search is properly filtered. However, searches from within the Firefox's search bar are not filtered (the search provider is Google).
>
> The opposite is true in Internet Explorer 11. Searches from google.com are not filtered, but searches in the search bar are correctly filtered (again Google is the provider).
>
> Searches with Yahoo from Firefox are not filtered with either method.
>
> Bing.com offers the only consistently acceptable behavior; both types of searches are filtered correctly, including those from https://bing.com.
>
> EDIT: The search in Firefox for Windows from the google.com site doesn't work after all; the e2guardian block screen appears as expected. However, when I press the back button, the search results appear as if never filtered. Arrgh!
>
> So... I'm frustrated. This is an awkward question to ask here, but would a paid service such as SafeSquid provide better results? Or am I doomed to run in circles with any MTIM filter?
>
> I would appreciate some encouraging news. Do others observe the same behavior? Can I tweak my configuration somehow?
>
> Can someone explain why it doesn't work as expected?
>
> This is the access.log entry from the Firefox search bar attempt:
>
> 2015.12.3 15:52:14 - 192.168.1.4 https://www.google.com/search?q=nude+women&ie=utf-8&oe=utf-8  GET 139340 0  1 200 text/html
>
> --
> You received this message because you are subscribed to the Google Groups "e2guardian" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to e2guardian+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Thanks for your reply. I will try this out. I looked at the list you attached and noticed the lines all start with http://; I assume I should replace these with https:// as all the search engines have switched to SSL searches?

I looked at the Protex web site but it appears it is only for schools; I saw no options for a home user like myself with 3 kids. I won't pay more than $10 a month.

Thanks again.

Philip Pearce

unread,
Dec 7, 2015, 12:46:18 PM12/7/15
to e2guardian
Once you are in MITM mode all requests are just plain http. 

If you change the searchregexplist patterns to https they will not work - leave them as http: !

Regards

Philip



jho...@bruinmail.slcc.edu

unread,
Dec 8, 2015, 6:16:35 PM12/8/15
to e2guardian, philip...@e2bn.org
Phillip,

Sorry to keep pestering you...

I replaced the existing searchregexplist with the version you sent and uncommented the corresponding line in e2guardianf1.conf.

I chose "method 2" so I set the searchtermlimit to 50.

Now a search for a non-naughty phrase such as "star wars trailer" causes a "connection was reset" page to appear on any of the search providers listed in the searchregexplist.

This occurs even on unsecure http://bing.com.

Did I miss something? I don't suppose Protex has an unadvertised "home user" version. My needs are pretty simple; just filter searches on the major sites and Youtube.

Thanks.

Jason

Philip Pearce

unread,
Dec 9, 2015, 4:39:47 AM12/9/15
to e2guardian
Jason,

This https://github.com/e2guardian/e2guardian/issues/95 may be of help.

Regards

Philip


FredB

unread,
Dec 9, 2015, 6:00:11 AM12/9/15
to e2guardian


| Now a search for a non-naughty phrase such as "star wars trailer" causes a "connection was reset" page to appear on any of the search providers listed in the searchregexplist.

| This occurs even on unsecure http://bing.com.


You should see something in log ?

jho...@bruinmail.slcc.edu

unread,
Dec 9, 2015, 3:56:10 PM12/9/15
to e2guardian, philip...@e2bn.org
Phillip,

I wanted to use the latest development to test this, but I can no longer compile it on Debian 8.2. The following is the stderr from make

CertificateAuthority.cpp: In function ‘void log_ssl_errors(const char*, const char*)’:
CertificateAuthority.cpp:41:37: error: format not a string literal and no format arguments [-Werror=format-security]
syslog(LOG_ERR, &buff[0] );
^
cc1plus: some warnings being treated as errors
make[2]: *** [e2guardian-CertificateAuthority.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Philip Pearce

unread,
Dec 9, 2015, 4:33:54 PM12/9/15
to e2guardian
OK I have pushed a fix.

Thanks,  Philip


From: jho...@bruinmail.slcc.edu
To: "e2guardian" <e2gua...@googlegroups.com>
Cc: "philip pearce" <philip...@e2bn.org>
Sent: Wednesday, 9 December, 2015 8:56:10 PM
Subject: Re: I feel a bit discouraged

Message has been deleted

jho...@bruinmail.slcc.edu

unread,
Dec 9, 2015, 6:16:50 PM12/9/15
to e2guardian
> You should see something in log ?

FredB,

I'm sorry to be so ignorant. I'm not sure where to look for error logs. This is the entry from /var/log/messages but I doubt it is what you need:

Dec 9 15:47:10 zombie kernel: [ 242.536151] e2guardian[1000]: segfault at 5b570a8 ip 00000000004915ed sp 00007ffe1d0d89b8 error 4 in e2guardian[400000+ed000]

----

I built the most recent development branch downloaded a few minutes ago. I only made two changes to the default configuration.

1. Uncommentted searchregexplist = '/etc/e2guardian/lists/searchregexplist'
2. Uncommented and set searchtermlimit = 50

I did not modify any other files. I am using Debian 8.2 amd64

This error occurred when I searched for "star wars" on http.bing.com (not https). When searching for "nude women" from the same page, e2guardian blocked the search as expected.

Thanks

Jason

FredB

unread,
Dec 10, 2015, 4:34:06 AM12/10/15
to e2guardian, jho...@bruinmail.slcc.edu
I mean in loglocation = , usually /var/log/e2guardian/access.log
I guess there is something about bing and your request ?

jho...@bruinmail.slcc.edu

unread,
Dec 10, 2015, 11:23:51 AM12/10/15
to e2guardian, jho...@bruinmail.slcc.edu
Below are the access.log entries, but I doubt they will help. They show the AJAX auto-completion requests sent as I typed "star wars". The "page was reset" message occurs after I finish typing and press enter, and I don't see an entry in the log at that point.

I only chose bing.com because it will allow http searches and I wanted to rule out any errors associated with MITM. When I enable MITM, the same error occurs with google searches.

If I comment out the bing.com entries in searchregexplist, no error occurs.

Thanks for your efforts.

----

2015.12.10 8:57:32 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=s&cp=1&o=hs&css=1&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 49440 0 1 200 text/html - -
2015.12.10 8:57:32 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=st&cp=2&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3085 0 1 200 text/html - -
2015.12.10 8:57:33 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=sta&cp=3&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3156 0 1 200 text/html - -
2015.12.10 8:57:33 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star&cp=4&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3164 0 1 200 text/html - -
2015.12.10 8:57:34 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star%20&cp=5&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3259 0 1 200 text/html - -
2015.12.10 8:57:34 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star%20w&cp=6&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3333 0 1 200 text/html - -
2015.12.10 8:57:35 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star%20wa&cp=7&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3359 0 1 200 text/html - -
2015.12.10 8:57:35 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star%20war&cp=8&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3370 0 1 200 text/html - -
2015.12.10 8:57:36 - 192.168.1.4 http://www.bing.com/AS/Suggestions?pt=page.home&mkt=en-us&qry=star%20wars&cp=9&o=hs&cvid=97A3A0AE190B4F8AB22A5969ACB7CBF6 GET 3361 0 1 200 text/html - -

FredB

unread,
Dec 10, 2015, 1:16:48 PM12/10/15
to e2guardian
I'm reading from a phone now, so sorry if I'm wrong ...
The log file is from squid ? Not from e2 ?

jho...@bruinmail.slcc.edu

unread,
Dec 10, 2015, 2:34:46 PM12/10/15
to e2guardian
FredB,

It is from /var/log/e2guardian/access.log

FredB

unread,
Dec 11, 2015, 3:16:05 AM12/11/15
to e2guardian
The request
Is correct, nothing more in Squid's log ?

I wonder maybe there is a segfault entry from /var/log/messages at each "page reset" ? Can you try to find this please ?
If not can you share with me your files (e2guardian + listq) in private mail

Fred

jho...@bruinmail.slcc.edu

unread,
Dec 11, 2015, 10:21:47 AM12/11/15
to e2guardian
Yes I can confirm the page reset corresponds with the seg fault.

I will send you my files, but I'll remind you that I did a fresh build and haven't modified anything other than the two lines in the e2guardianf1.conf file.

Thanks

Philip Pearce

unread,
Dec 12, 2015, 6:17:03 PM12/12/15
to e2guardian
Jholt,

I think I may have found the reason for this. (I was able to duplicate).

It is related to issue #95.  It is where certain lists are commented out which would normally be used.

I have uploaded fix to develop.  Can you test this and see if it fixes it for you?

Thanks
Philip


From: jho...@bruinmail.slcc.edu
To: "e2guardian" <e2gua...@googlegroups.com>
Sent: Friday, 11 December, 2015 3:21:47 PM
Subject: Re: I feel a bit discouraged

Yes I can confirm the page reset corresponds with the seg fault.

I will send you my files, but I'll remind you that I did a fresh build and haven't modified anything other than the two lines in the e2guardianf1.conf file.

Thanks

jho...@bruinmail.slcc.edu

unread,
Dec 14, 2015, 2:56:56 PM12/14/15
to e2guardian
Phillip,

Method 2 no longer causes a page reset! But... it doesn't filter the search.

Method 1 works as expected. I entered the word "hello" in the bannedsearchlist and any search on google.com or youtube.com is blocked for the reason: Banned Search Words: hello. Category is N/A

If I search for "nude women" on Youtube, the search is not blocked. The same search is blocked on google.com for the reason: Blocked URL. Category: Banned Regular Expression URL.

If method 2 were working for "nude women", which reason would be listed? I assume it would be "Banned Search Words" as in method 1.

For method 2, my searchtermlimit is 10. This was built after your most recent commit where you set weightedphrasemode = 1

Thanks for your efforts.

Jason

jho...@bruinmail.slcc.edu

unread,
Dec 14, 2015, 3:28:08 PM12/14/15
to e2guardian
Please disregard my last post. I assumed the nudism weighted phrase lists would block "nude women" but it does not (in my opinion - an oversight by those who made the lists).

After viewing the phrase lists for nudism, I tried searching for "nude beach" on youtube.com and the search was blocked for reason: Weighted search term limit exceeded.

So it is working after all; but apparently I'll need to tweak the lists. Maybe method 1 would be better after all.

Thanks again.

Jason

Philip Pearce

unread,
Dec 14, 2015, 3:54:59 PM12/14/15
to e2guardian
Reply all
Reply to author
Forward
0 new messages