https://en.wikipedia.org/wiki/Wikipedia:List_of_articles_censored_in_Saudi_Arabia
http://www.bbc.co.uk/news/uk-politics-17576745
What are the current technical barriers to redirection to https by default?
- d.
_______________________________________________
Wikitech-l mailing list
Wikit...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> I see no point in doing that. Https doesn't support caching well and
> is generally slower. There is no use for readers for that.
The use is that the requests themselves are encrypted, so that the
only thing logged is that they went to Wikimedia. You did read the
linked articles, right?
> On 1 April 2012 11:55, Petr Bena <bena...@gmail.com> wrote:
>
>> I see no point in doing that. Https doesn't support caching well and
>> is generally slower. There is no use for readers for that.
>
> The use is that the requests themselves are encrypted, so that the
> only thing logged is that they went to Wikimedia. You did read the
> linked articles, right?
Obviously, I cannot confirm whether Mr Bena read the linked articles
or not, but he did provide an answer regarding the technical
restrictions.
Wikimedia already spends an incredible amount of time caching its
content, because *so many* users use Wikipedia and its sister projects
daily.
And since most of the content is fairly static, caching makes a lot of sense.
However, HTTPS does not support caching (at least not well), which
means each page would suddenly have to be generated for *each* page.
It's true that MediaWiki itself supports caching, but its own caching
is no where near as fast as a caching server like Varnish (although I
believe a less powerful caching server is used on Wikimedia's
servers).
The trade off is that the service would be slower for everyone or we
would need more servers. And I am not sure Wikimedia has that kind of
money.
Those are the *technical* limitations to defaulting to HTTPS.
> http://www.bbc.co.uk/news/uk-politics-17576745
Also, this article was written on 1 April and is far beyond any
monitoring scheme ever suggested in the Western World. And I am sure
we would have heard about it being mentioned up until this point, if
it was real.
So I would take that article with a grain of salt. Particularly the
statement about 'real time'. That's not even feasible.
>> http://www.bbc.co.uk/news/uk-politics-17576745
> Also, this article was written on 1 April and is far beyond any
> monitoring scheme ever suggested in the Western World. And I am sure
> we would have heard about it being mentioned up until this point, if
> it was real.
It would be nice, but if it's a prank then (a) lots of other
newspapers are in on it (b) ORG flagged the programme described
several weeks in advance:
http://wiki.openrightsgroup.org/wiki/Communications_Capabilities_Development_Programme
http://www.openrightsgroup.org/issues/ccdp
So no, it's in no way a joke. This is absolutely real.
> So I would take that article with a grain of salt. Particularly the
> statement about 'real time'. That's not even feasible.
That a desired monitoring regime would require a violation of physics
has *never* stopped a legislative push for such.
- d.
> On 1 April 2012 12:23, Svip <svi...@gmail.com> wrote:
>
>> On 1 April 2012 12:06, David Gerard <dge...@gmail.com> wrote:
>>
>>> http://www.bbc.co.uk/news/uk-politics-17576745
>>
>> Also, this article was written on 1 April and is far beyond any
>> monitoring scheme ever suggested in the Western World. And I am sure
>> we would have heard about it being mentioned up until this point, if
>> it was real.
>
> It would be nice, but if it's a prank then (a) lots of other
> newspapers are in on it (b) ORG flagged the programme described
> several weeks in advance:
>
> http://wiki.openrightsgroup.org/wiki/Communications_Capabilities_Development_Programme
> http://www.openrightsgroup.org/issues/ccdp
>
> So no, it's in no way a joke. This is absolutely real.
Still *kind of* a joke.
>> So I would take that article with a grain of salt. Particularly the
>> statement about 'real time'. That's not even feasible.
>
> That a desired monitoring regime would require a violation of physics
> has *never* stopped a legislative push for such.
But it has always stopped it from being implemented or executed in
practice. While the development is terrifying, it is also important
to note the lack of actual consequences it will have. Other than
being a huge embarrassment.
But I was always under the influence that the UK didn't really care
about free speech and privacy.
I'm trying to import categorylinks.sql dump into my MySQL database. I'm
able to import it and query for articles in specific categories as long
the category name contains only English-language characters. I don't get
any results if I try to query for non-English category name. My
understanding is that the dump is in UTF-8 format so I tried the following:
create the database using the following command:
CREATE DATABASE wiki CHARACTER SET utf8 COLLATE utf8_general_ci;
import the dump using the following command:
mysql --user root --password=root wiki <
C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8
set my data source URL to the following in my Java code:
jdbc:mysql://localhost/plwiki?useUnicode=true&characterEncoding=UTF-8
It still doesn't work. What am I missing? Are there any instructions on
how to correctly import the dump anywhere?
Thanks,
Piotr
> mysql --user root --password=root wiki <
> C:\Path\plwiki-20111227-categorylinks.sql --default-character-set=utf8
It's -p, not --password=root and it will prompt you for the password.
Regards,
Piotr
I don't see why it *couldn't* be implemented.
Note that the real time statement is no different on how they can snoop
your phone calls in real time.
Sure, the storage requirements would be crazy, but I don't see specific
details on what is to be stored, so it may well be implementable given
enough funding.
Do you have $wgDBmysql5 set in your LocalSettings.php?
Regards,
Piotr
> http://www.bbc.co.uk/news/uk-politics-17576745
>
> This one may be an April 1 joke, let's wait one day. :-)
--
Bináris
>> http://www.bbc.co.uk/news/uk-politics-17576745
> This one may be an April 1 joke, let's wait one day. :-)
No, it really isn't, sadly.
- d.
HTTPS has nothing to do with caching, it just transports informations
between the client and the server so they can actually handle caching.
HTTPS supports caching as well as HTTP since they are exactly the same
protocol, the first just being encrypted.
You are right though, in the sense of most web browsers will BY DEFAULT
not save a copy of the received content whenever it is received through
HTTPS. The reason behind is that HTTPS page is/was usually used to
serve private content. Caching can be explicitly set to caching by
marking it as public, send "Cache-Control: public" and that should work.
I do agree there is probably no use for readers to have HTTPS enabled.
If the purposes is to bypass countries firewall such as in China (or I
think Thailand), they will just intercept the HTTPS connection form the
server on their hardware, decypher it for analysis and resign the
content with their own certificate before sending it back to clients.
That is exactly what you do in a big company when you want to make sure
(as an example) that your employee do not use the chat function in Facebook.
The only thing HTTPS is going to prevent, is being still its password
when logging in or getting the session cookie hijacked by sniffing the
local network. The WMF has already moved its private wikis to HTTPS
just for that :-]
cheers,
--
Antoine "hashar" Musso
Please note you have "plwiki" here and you imported into "wiki".
Assuming your .my.cnf is not making things difficult I ran a small
Jython script to test:
$ jython
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0
Type "help", "copyright", "credits" or "license" for more information.
>>> from com.ziclix.python.sql import zxJDBC
>>> d, u, p, v = "jdbc:mysql://localhost/wiki", "root", None, "org.gjt.mm.mysql.Driver"
>>> db = zxJDBC.connect(d, u, p, v, CHARSET="utf8")
>>> c=db.cursor()
>>> c.execute("select cl_from, cl_to from categorylinks where cl_from=61 limit 10")
>>> c.fetchone()
(61, array('b', [65, 110, 100, 111, 114, 97]))
>>> (a,b) = c.fetchone()
>>> print b
array('b', [67, 122, -59, -126, 111, 110, 107, 111, 119, 105, 101, 95, 79, 114, 103, 97, 110, 105, 122, 97, 99, 106, 105, 95, 78, 97, 114, 111, 100, -61, -77, 119, 95, 90, 106, 101, 100, 110, 111, 99, 122, 111, 110, 121, 99, 104])
>>> for x in b:
... try:
... print chr(x),
... except ValueError:
... print "%02x" % x,
...
C z -3b -7e o n k o w i e _ O r g a n i z a c j i _ N a r o d -3d -4d w _ Z j e d n o c z o n y c h
array('b", [ ... ]) in Jython means that SQL driver returns an array of bytes.
It seems to me that array of bytes contains raw UTF-8, so you need to decode it into
proper Unicode that Java uses in strings.
I think this behaviour is described in
http://bugs.mysql.com/bug.php?id=25528
Probably you need to play with getBytes() on a result object
to get what you want.
//Saper
There would be a small difference if you're behind a caching proxy, but
that's unlikely to make a difference to pretty much everyone.
> I do agree there is probably no use for readers to have HTTPS enabled.
> If the purposes is to bypass countries firewall such as in China (or I
> think Thailand), they will just intercept the HTTPS connection form the
> server on their hardware, decypher it for analysis and resign the
> content with their own certificate before sending it back to clients.
Note that such approach would yield a certificate, which if stored
during the attack and later published, is a proof of their evil-doing.
Any CA willingly doing that (even if "forced by the government") would
(should) be immediately revoked from the browsers certificate bundles.
(I believe such interposition has been done in the past, though)
> That is exactly what you do in a big company when you want to make sure
> (as an example) that your employee do not use the chat function in Facebook.
A company can install its own CA certificate in their own computers, and
have a policy of "we will sniff everything" (note that if the employee
is not conveniently informed of that, the wiretapping could well be
illegal).
I wonder how they handle self-signed certificates.
My problem is actually opposite because I don't get any result where I
use UTF-8 string as an input in the query. But I verified that I don't
get correct results where using the query you provided neither. The link
with the MySQL bug report might be helpful in resolving the problem so
thanks for providing it.
Piotr
1. It would require an ssl terminator on every frontend cache. The ssl
terminators eat memory, which is also what the frontend caches do.
2. HTTPS dramatically increases latency, which would be kind of
painful for mobile.
3. Some countries may completely block HTTPS, but allow HTTP to our
sites so that they can track users. Is it better for us to provide
them content, or protect their privacy?
4. It's still possible for governments to see that people are going to
wikimedia sites when using HTTPS, so it's still possible to oppress
people for trying to visit sites that are disallowed.
Without getting into how other countries censor data (boo!) I agree
with the first two points. SSL terminators are much more memory and
cpu intensive which would require many more machines. Also there are
more RTT's required for https/ssl and our ping latency is not very
good since we do not have a very geographically diverse
infrastructure.
The two solutions for this are #1 more and beefier machines and #2
caching centers in various locations physically closer to users (which
also requires a lot of #1). Sadly the biggest drawback of these two
points is that they both cost a lot of money and that would mean a lot
more pop up banners of Jimmy asking for cash :(
Leslie
P.S. I peronally like the idea of a cookie that you can check box at
the top of the page (one time showing only perhaps?) that would
default send users to https upon request. However I don't think we
can do this with our current infrastructure due to the above issues.
> 3. Some countries may completely block HTTPS, but allow HTTP to our
> sites so that they can track users. Is it better for us to provide
> them content, or protect their privacy?
> 4. It's still possible for governments to see that people are going to
> wikimedia sites when using HTTPS, so it's still possible to oppress
> people for trying to visit sites that are disallowed.
>
> On Sun, Apr 1, 2012 at 7:06 PM, David Gerard <dge...@gmail.com> wrote:
>> Lots of monitoring going into place:
>>
>> https://en.wikipedia.org/wiki/Wikipedia:List_of_articles_censored_in_Saudi_Arabia
>> http://www.bbc.co.uk/news/uk-politics-17576745
>>
>> What are the current technical barriers to redirection to https by default?
>>
>>
>> - d.
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikit...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> Wikit...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Leslie Carr
Wikimedia Foundation
AS 14907, 43821
It inserts the data fine for me. I suspect your java code is failing to
appropiately read them. Try reading the table with a different tool,
such as phpMyAdmin.
> mysql> select * from categorylinks limit 20;
> +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+
> | cl_from | cl_to | cl_sortkey | cl_timestamp | cl_sortkey_prefix | cl_collation | cl_type |
> +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+
> | 0 | Ekspresowe_kasowanko | Golembiovski Andzey | 2009-07-09 21:01:30 | | | page |
> | 2 | Języki_skryptowe | AWK
> AWK | 2011-01-18 01:11:23 | Awk | uppercase | page |
> | 4 | Specjalności_lekarskie | ALERGOLOGIA | 2008-04-25 10:31:22 | | uppercase | page |
> | 6 | Formaty_plików_komputerowych | ASCII | 2011-09-23 11:01:05 | | uppercase | page |
> | 6 | Kodowania_znaków | ASCII | 2011-09-23 11:01:05 | | uppercase | page |
> | 7 | Artykuły_na_medal | ATOM | 2010-12-01 16:40:37 | | uppercase | page |
> | 7 | Artykuły_wymagające_dopracowania | ATOM | 2011-08-16 15:53:43 | | uppercase | page |
> | 7 | Atomy |
> ATOM | 2011-08-09 00:56:39 | | uppercase | page |
> | 8 | Logika_matematyczna | AKSJOMAT | 2007-11-10 08:18:06 | | uppercase | page |
> | 10 | Arytmetyka |
> ARYTMETYKA | 2011-10-17 02:36:39 | | uppercase | page |
> | 11 | Artykuły_pod_opieką_Projektu_Chemia | AMINOKWASY | 2011-08-19 02:48:21 | | uppercase | page |
> | 12 | Alkeny | *
> ALKENY | 2006-08-07 17:23:22 | * | uppercase | page |
> | 13 | Multimedia | ACTIVEX | 2007-05-24 20:20:15 | | uppercase | page |
> | 13 | Windows | ACTIVEX | 2007-05-24 20:20:15 | | uppercase | page |
> | 14 | Interfejsy_programistyczne | !
> APPLICATION PROGRAMMING INTERFACE | 2011-04-27 11:33:17 | ! | uppercase | page |
> | 15 | Amiga | AMIGAOS | 2007-09-09 17:19:11 | | uppercase | page |
> | 15 | Systemy_operacyjne | AMIGAOS | 2007-09-09 17:19:11 | | uppercase | page |
> | 16 | Organizacje_międzynarodowe | ASSOCIATION FOR COMPUTING MACHINERY | 2011-10-19 15:52:28 | | uppercase | page |
> | 18 | Funkcje_boolowskie | ALTERNATYWA | 2007-03-23 17:43:05 | | uppercase | page |
> | 19 | Logika_matematyczna | AKSJOMAT INDUKCJI | 2007-08-31 22:54:55 | | uppercase | page |
> +---------+---------------------------------------+-------------------------------------+---------------------+-------------------+--------------+---------+
> 20 rows in set (0.00 sec)
Once we enable it by default for logged-in users, we will care a lot
more if someone tries to take it down with a DoS attack. Unless the
redirection can be disabled without actually logging in, a DoS attack
on the HTTPS frontend would prevent any authenticated activity.
It suggests a need for a robust, overprovisioned service, with tools
and procedures in place for identifying and blocking or throttling
malicious traffic.
[...]
> 3. Some countries may completely block HTTPS, but allow HTTP to our
> sites so that they can track users. Is it better for us to provide
> them content, or protect their privacy?
> 4. It's still possible for governments to see that people are going to
> wikimedia sites when using HTTPS, so it's still possible to oppress
> people for trying to visit sites that are disallowed.
It's also possible for governments to snoop on HTTPS communications,
by using a private key from a trusted CA to perform a
man-in-the-middle attack. Apparently the government of Iran has done this.
If we really want to protect the privacy of our users then we should
shut down the regular website and serve our content only via a Tor
hidden service ;)
-- Tim Starling
On Sun, Apr 1, 2012 at 6:43 PM, Antoine Musso <hasha...@free.fr> wrote:
> Le 01/04/12 12:55, Petr Bena wrote:
>> I see no point in doing that. Https doesn't support caching well and
>> is generally slower. There is no use for readers for that.
>
> HTTPS has nothing to do with caching, it just transports informations
> between the client and the server so they can actually handle caching.
>
> HTTPS supports caching as well as HTTP since they are exactly the same
> protocol, the first just being encrypted.
>
_______________________________________________
That might indeed by an issue.
That is why you want to use HTTPS off loader at the edge of your
cluster, they will handle unencryption and then server that as
unencrypted traffic again :-]
I believe that is what the WMF is doing by using nginx as an HTTPS
proxy. Someone with better knowledge will confirm.
--
Antoine "hashar" Musso
This may help because:
- It only affect a subgroup of users (the ones from these countries)
- It only affect a subgroup of that subgroup, the logued users (not all)
- It create a blacklist of "bad countries" where citizens are under
surveillance by the governement
This perhaps is not feasible, if theres not easy way to detect the
country based on the ip.
--
--
ℱin del ℳensaje.
On Mon, 02 Apr 2012 08:31:32 -0700, Petr Bena <bena...@gmail.com> wrote:
> I believe it would be best if login form was served using http with
> check box "Disable ssl" which would be not checked as default. The
> target page of form would be ssl page in case users wouldn't check it.
> So that in countries where ssl is problem they could just check it and
> proceed using unencrypted connection.
>
> On Mon, Apr 2, 2012 at 11:34 AM, Tei <oscar...@gmail.com> wrote:
>> Perhaps have a black list of countries that are know to break the
>> privacy of communications, then make https default for logued users in
>> these countries.
>>
>> This may help because:
>>
>> - It only affect a subgroup of users (the ones from these countries)
>> - It only affect a subgroup of that subgroup, the logued users (not
>> all)
>> - It create a blacklist of "bad countries" where citizens are under
>> surveillance by the governement
>>
>> This perhaps is not feasible, if theres not easy way to detect the
>> country based on the ip.
>>
>> --
>> --
>> ℱin del ℳensaje.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Indeed. We're already pretty over provisioned. We have 4 servers per
datacenter, each of which is very bored. All they are doing is acting
as a transparent proxy, after ssl termination. We're using RC4 by
default (due to BEAST), and AES is also available (the processors we
are using have AES support).
Ideally we'll be using STS for logged in users. This will mean it's
impossible to turn off the redirection for users that have already
logged in for whatever period of time we have STS headers set. We need
to consider blocking a DoS from the SSL proxies, the LVS servers, or
the routers.
>> 3. Some countries may completely block HTTPS, but allow HTTP to our
>> sites so that they can track users. Is it better for us to provide
>> them content, or protect their privacy?
>> 4. It's still possible for governments to see that people are going to
>> wikimedia sites when using HTTPS, so it's still possible to oppress
>> people for trying to visit sites that are disallowed.
>
> It's also possible for governments to snoop on HTTPS communications,
> by using a private key from a trusted CA to perform a
> man-in-the-middle attack. Apparently the government of Iran has done this.
>
We really should publish our certificate fingerprints. An attack like
this can be detected. An end-user being attacked can see if the
certificate they are being handed is different from the one we
advertise. We could also provide a convergence notary service (or one
of the other things like convergence).
> If we really want to protect the privacy of our users then we should
> shut down the regular website and serve our content only via a Tor
> hidden service ;)
>
I agree that it's impossible to provide total protection of a user's
privacy. We could provide a number of services that would help users,
though. That said, I don't feel this should be on the top of our
priority list.
- Ryan
Using SSL by default means all transparent proxies inbetween aren't
hit at all, since they'd be a MITM. I don't necessarily see this as a
bad thing, as transparent proxies often break things.
Browsers cache things differently from HTTPS sites, but otherwise
everything should work as normal. The SSL termination proxies
transparently proxy to our frontend caches after termination. Links
are sent as protocol-relative so that we don't split our cache, as
well.
- Ryan
I'd definitely not support doing something like this. This would
incredibly complicate things.
- Ryan
Someone came into #wikimedia-tech a few days ago and asked about something
similar to this. The idea was to use site-wide JavaScript to auto-redirect
users to https on one of the Chinese Wikipedias. I believe this was in
combination with geolocation functionality, but I'm not sure.
Do you have any thoughts on individual wikis doing this, assuming there's
local community consensus?
MZMcBride
Indeed. Detecting a potential MITM is useless if you can't determine if
it's real or not. For instance the switch from RapidSSL to DigiCert
certificate was quite suspicious.
I don't know how to best publicise it, though. I suppose we would list
them somewhere like https://secure.wikimedia.org/servers.html but if
nobody knows it's there...
What's https://secure.wikimedia.org?
- Ryan
Some old experiment. Nothing to see here :-)
--
Antoine "hashar" Musso
The server which contains
https://secure.wikimedia.org/keys.html
Best regards,
Helder