Un-incorporated SCTs from GDCA1

324 views
Skip to first unread message

Brendan McMillion

unread,
Aug 18, 2018, 6:36:12 PM8/18/18
to Certificate Transparency Policy, crypto
Hello ct-policy@

Attached is what I believe to be two un-incorporated SCTs from the GDCA1 log that recently passed compliance monitoring. I apologize to the GDCA team and to Google for the inconvenient timing of this announcement -- I wasn't aware of either of the GDCA logs until they were already accepted into Chrome.

544305317.gdca1.json
544305317.pem
649496496.gdca1.json
649496496.pem

Kat Joyce

unread,
Aug 20, 2018, 6:29:47 AM8/20/18
to Brendan McMillion, ca...@gdca.com.cn, Certificate Transparency Policy, crypto
Hi Brendan,

Thanks for flagging these.  After a quick preliminary check myself, I found the signatures on the SCTs verify, and was also unable to retrieve an inclusion proof for them using get-proof-by-hash.

I have put the GDCA Log Operator email address on this thread too, in case they'd like to comment.

The GDCA Logs had passed their initial monitoring period, but I don't think they've been incorporated in to Chrome yet, so I'd imagine that will be delayed at least until this is resolved one way or another.  It was also brought to our attention that, although they said they had an open acceptance policy, those Logs did not actually add all roots until their monitoring period had completed.  This may also delay their addition to Chrome.

When they are ready to do so, the Chrome guys will comment further on both of these points.

Thanks again!
Kat

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/ct-policy/CABP-pSQgKzGxBJL%3D%3DOOEEZEJRb3jM1868TugX8qWPSJTQ%2B3VMg%40mail.gmail.com.

Lei Xiu

unread,
Aug 21, 2018, 5:41:41 AM8/21/18
to Certificate Transparency Policy, bre...@cloudflare.com, ca...@gdca.com.cn, cry...@cloudflare.com

Hi Brendan, Kat,
 
Many thanks for your comments, we are looking into this issue and will post the findings here as soon as possible.
 
Thanks 

Xiu Lei
GDCA

在 2018年8月20日星期一 UTC+8下午6:29:47,Kat Joyce写道:

Brendan McMillion

unread,
Aug 24, 2018, 2:05:39 PM8/24/18
to Lei Xiu, Certificate Transparency Policy, ca...@gdca.com.cn, crypto
Here's the last SCT I've found for a while. GDCA1 and CNNIC seem to have re-entered good states, and curiously GDCA2 was not affected.
357558023.pem
357558023.gdca1.json

Devon O'Brien

unread,
Aug 27, 2018, 1:36:46 PM8/27/18
to Certificate Transparency Policy, bre...@cloudflare.com, ca...@gdca.com.cn, cry...@cloudflare.com
Hello Xiu Lei,

Do you have any updates regarding this matter? We're looking to better understand the root cause, the scope of impact, whether the issue has been fully rectified, as well as steps taken to prevent future occurrences. 

Thank you,
Devon

Lei Xiu

unread,
Aug 28, 2018, 6:20:31 AM8/28/18
to Certificate Transparency Policy
Hi Devon,

Sorry for the belated update.

We were not able to identify solid evidence for the mentioned SCTs Un-incorporation based on our analysis of the server’s logs. We are now carrying out a large-scale test under the test environment, hopefully to reproduce the issue and identify the root cause.
 
Your suggestions will be much appreciated. 

Ryan Sleevi

unread,
Aug 28, 2018, 2:46:36 PM8/28/18
to jxst...@gmail.com, Certificate Transparency Policy
Hi Lei,

Were you able to confirm that you received those certificates? Do you maintain (separate) logs of certificates you've issued SCTs for, such as with a front-end, and compare that against what the Log has incorporated? Can you speak a bit more about what you have investigated so far?

It's unclear from your response if you're reporting that you're unable to confirm the unincorporated SCTs at all versus simply identifying that you haven't been able to determine a root cause yet. Determining what you have looked at, and been able to confirm/examine, seems hugely valuable to determining what the root cause is or could be, or how to detect and mitigate going forward.

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.

Ryan Sleevi

unread,
Aug 31, 2018, 11:22:41 AM8/31/18
to Ryan Sleevi, jxst...@gmail.com, Certificate Transparency Policy
Checking again to see if there are any updates here as to the root cause or the investigation to date.

I'd again like to point to Venafi's post-mortems as sort of the "Gold Standard" - and to the degree expected of log operators, in terms of detail and timeliness. You can see them at
and

Lei Xiu

unread,
Aug 31, 2018, 12:58:51 PM8/31/18
to rsl...@chromium.org, Certificate Transparency Policy
Hi Ryan,

We did a thorough analysis on the logs of our CT servers and tried to reproduce the error scenarios. 

According to Brendan’s report, the two SCT requests (the two json files provided) were submitted on August 16, 2018 09:43 (UTC).

And according to our logs, there was a Tree update operation on August 16, 2018 09:43 (UTC), showing no new entry was found, which means that the processing of the SCT requests were not completed when performing the update. Due to the CT servers implement a Merkle Tree update every ten minutes, SCTs will not be logged until 9:53, 16 August 2018 (UTC). 

Unfortunately, as per our CT Logs’ policy, we added 525 root certificates to our CT servers on August 16, 2018 09:33 (UTC), and rebooted the CT servers by running script on 9:47, 16 August 2018 (UTC). The next update was not performed at the moment of rebooting. 

We believe this is the root cause of the reported issue.

A summary of the incident timeline: 

August 16, 2018 09:33 (UTC) – The Administrator added 525 root certificates (attachment 1)
August 16, 2018 09:43 (UTC) – Two SCT requests were submitted
August 16, 2018 09:43 (UTC) – Merkle Tree Update Operation (attachment 2)
August 16, 2018 09:47 (UTC) – CT service rebooted by the Administrator and a new log file was created (attachment 3)
  

Feel free to let us know if you have further questions.

Thanks  
Xiu Lei


Ryan Sleevi <rsl...@chromium.org> 于2018年8月31日周五 下午11:22写道:
attachment 1.jpg
attachment 2.png
attachment 3.jpg

Lei Xiu

unread,
Sep 21, 2018, 5:17:05 AM9/21/18
to Certificate Transparency Policy
Hi

As of 21 September 2018, 17:05 (UTC+8), the tree size of the GDCA CT Log 1 has exceeded 13000, and has been operating normally. No other problem report was received, we are wondering if you have any other comments or suggestions?

Thanks.
Xiu Lei
GDCA

Pierre Phaneuf

unread,
Sep 21, 2018, 5:47:31 AM9/21/18
to Certificate Transparency Policy
Hi,

I have a number of questions, trying to understand what the problem was, as we do not expect that this simple restart operation should have caused data loss.

What is your etcd configuration like? How many etcd instances do you use? Are they co-located with the ct-server instances? How many physical machines do you use for your CT log? How many ct-server instances do you use? What storage backend do you use with ct-server? Any special flags used?

On entry submission, ct-server stores the entry into etcd before returning the SCT to the client. This means that the data is stored in a separate process (so restarting ct-server to change the root certificates should have no negative impact), and etcd is also itself a very robust storage. To reproduce this problem, it seems like the etcd storage system would have had to experience some problems, separate from ct-server being restarted?

Thanks in advance,
Pierre

Lei Xiu

unread,
Sep 27, 2018, 6:08:27 AM9/27/18
to Certificate Transparency Policy
 
Hi Pierre,
 
Thanks for the comment.
 
We are not able to respond to the specific questions because we deployed our CT log servers in single instance with SQLite storage backend.
 
Thanks.

Xiu Lei
GDCA

Andrew Meyer

unread,
Sep 27, 2018, 9:56:55 AM9/27/18
to Lei Xiu, Certificate Transparency Policy
...

You were running a _production_ CT log server off of SQLite? Can someone with more experience with ct-server tell me if this is as bad as it sounds?

--
You received this message because you are subscribed to the Google Groups "Certificate Transparency Policy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ct-policy+...@chromium.org.
To post to this group, send email to ct-p...@chromium.org.

Al Cutter

unread,
Sep 27, 2018, 11:11:48 AM9/27/18
to Andrew Meyer, jxst...@gmail.com, ct-p...@chromium.org
It's not a configuration I'd recommended for production, but as a fellow log operator I would just remind and encourage us to all keep the discussion constructive and in the spirit of the Post-mortem culture that we've found has worked so well in the past. 
Even if Lei Xiu and team were unable to provide deeper specifics I do personally at least appreciate the honesty and openness of their response. 

Pierre Phaneuf

unread,
Sep 27, 2018, 11:34:08 AM9/27/18
to jxst...@gmail.com, Certificate Transparency Policy
On Thu, Sep 27, 2018 at 11:08 AM Lei Xiu <jxst...@gmail.com> wrote:

> We are not able to respond to the specific questions because we deployed our CT log servers in single instance with SQLite storage backend.

Which version of the code are you using? Seeing the --version and
--helpfull output might help identify it, in particular whether the
--i_know_stand_alone_mode_can_lose_data flag is supported by your
binary? Also, knowing what flags you are using might help us identify
potential issues?

Do you use a reverse proxy in front of ct-server? I would expect so,
in order to provide the HTTPS endpoint? If you do, the procedure I
would recommend to shut down ct-server when using SQLite in single
node mode would be to stop the reverse proxy (in order to stop new
entries from being submitted), and wait for the log server to have
integrated all the entries.

I'm not certain of what caching is done in the SQLite storage backend,
to be honest, it was mainly used for development, and hasn't seen much
production use. I believe that if it were used in a multi-node
configuration, it might be acceptable, though.

By the way, it is possible to run a mix of storage backends in a
multi-node configuration, should you want to migrate to a different
storage backend.

Lei Xiu

unread,
Sep 30, 2018, 2:44:45 AM9/30/18
to Certificate Transparency Policy
Hi All,

Many thanks for the comments.

Our two CT logs (log.gdca.com.cn and log2.gdca.com.cn) are both hosted on cloud platforms, and SSL certificates were uploaded to the cloud platforms to enable SSL communication. We didn’t use a reverse proxy in front of ct-servers.

We adopted etcd cluster during our initial CT Log inclusion request
(https://bugs.chromium.org/p/chromium/issues/detail?id=654306), however, during the compliance monitoring period, the uptime was low, and there was an occasion where a certificate was not included within the 24 hour MMD. We did an analysis for that and consulted people from the community, and were advised that the effects of the standalone mode may be better than the clustering mode, and for that reason, we chose the standalone mode for our current two CT servers.

Our CT servers are C++ based, and we understand that C++ log server was deprecated in July. We are now studying the new Go based CT server and plan to deploy the servers under cluster mode. In addition, we will work to avoid data losses during the system restarting process by adopting reasonable operation flows, hopefully to provide more reliable and stable service.

Thanks
Xiu Lei
GDCA

Devon O'Brien

unread,
Nov 5, 2018, 4:56:17 PM11/5/18
to Certificate Transparency Policy

Hi Xiu Lei,


Due GDCA Log 1 violating its MMD for certificates logged during a reconfiguration of the CT Log server, Chrome will not be including this Log as a Qualified Log. Given that the failure occurred before it was added to the Chromium code base, this course of action presents the least amount of risk to Chrome clients. GDCA is welcome to apply for qualification with a new CT Log with new key material by filing a new application as described in the Chrome Log Policy and we strongly encourage pursuing the use of an actively maintained CT Log code base.


While GDCA Log 2 has not violated Chrome CT Log Policy, we are still concerned that it was a routine operational procedure that caused GDCA Log 1 to violate its MMD and it’s not clear whether the proposed operational control or those already in place would have protected GDCA Log 2 from a similar failure.


Failure to incorporate certificates after issuing a SCT is one of the more serious failure modes for a CT Log. Before qualifying this new CT Log to be relied upon by CAs, Chrome, and other CT-enforcing user agents, we would like to know in greater detail what steps are being taken to mitigate the risk of this or similar failure modes once qualified.


-Devon

Lei Xiu

unread,
Nov 8, 2018, 12:37:35 AM11/8/18
to Certificate Transparency Policy
Hi Devon,

Thanks for the comment.

We understand and accept your decision to not to include GDCA Log 1 as a Qualified Log. As for GDCA Log 2, we would like to withdraw the inclusion application to avoid any potential risks that may present to the CAs and Chrome clients.

We are now actively working on a Go-based CT Log, and may re-apply for inclusion in the near future.

Thanks.
Xiu Lei

Reply all
Reply to author
Forward
0 new messages