RFC 6962 Pages Extension Specification Proposal

223 views
Skip to first unread message

Pierre Barre

unread,
Jun 26, 2025, 7:49:22 AMJun 26
to certificate-transparency
Hi all,

Following recent discussions about CT scaling challenges and the benchmark results I shared, I'd like to propose a simple extension to RFC 6962 that addresses the caching and efficiency concerns raised about the current get-entries API.

This extension achieves the same efficiency goals as the Static API while maintaining simplicity:

- Simpler client implementation: Direct page requests vs. tile reconstruction
- No checkpoint verification: Pages are self-contained units
- Standard HTTP caching: No special CDN rules needed
- Single request per page: No multiple tile fetches to retrieve entries
- Graceful degradation: Falls back to regular RFC 6962 if pages aren't supported
- No cryptographic complexity: Just fetch the page you need
- No flag day: Logs can adopt this incrementally without coordination
- No ecosystem split: One API serves all clients, old and new
- Trivial to adopt: Existing monitors/clients need minimal changes (just add ?page=N and parse the binary format), while static CT requires essentially reimplementing a CT log client-side

# RFC 6962 Pages Extension Specification Proposal

## Abstract

This document specifies a simple extension to RFC 6962 Certificate Transparency logs that enables efficient caching and batch retrieval through page-based access patterns with a binary format that eliminates base64 encoding and chain duplication.

## 1. Introduction

The RFC 6962 `get-entries` endpoint accepts arbitrary start/end parameters, making caching difficult and responses inefficient due to base64 encoding and duplicate certificate chains. This extension introduces page-based access with an efficient binary format while maintaining full backward compatibility.

## 2. Page-Based Entry Retrieval

### 2.1 Request Format

```http
GET /ct/v1/get-entries?page={page_number}
```

Where `page_number` is a non-negative integer (0-indexed).

### 2.2 Response Format

#### Headers

```http
Content-Type: application/x-ct-entries-page
X-CT-Page-Size: 1000
X-CT-Entry-Range: 42000-42999
Cache-Control: public, max-age=31536000, immutable
```

For the last (potentially partial) page:

```http
X-CT-Entry-Range: 8765000-8765431
Cache-Control: no-store
```

#### Binary Response Structure

```c
struct {
    uint8 format_version;  // Currently 1
    uint64 entry_count;
    uint64 first_entry_index;
    PageEntry entries[entry_count];
} EntriesPage;

struct {
    uint64 timestamp;
    uint32 leaf_length;
    opaque leaf_certificate[leaf_length];
    uint16 chain_length;
    uint8 issuer_hashes[chain_length][32];  // SHA-256 hashes
} PageEntry;
```

### 2.3 Certificate Resolution

Issuer certificates are fetched separately and cached:

```http
GET /ct/v1/get-certificate-by-hash?hash={base64url_sha256}
Content-Type: application/pkix-cert
Cache-Control: public, max-age=31536000, immutable

[binary certificate data]
```

## 3. Backward Compatibility

Servers implementing this extension MUST continue to support traditional `start` and `end` parameters on the same endpoint. When both pages and start/end parameters are supported:

- If `page` parameter is present: Return paged binary response with appropriate headers
- If `start` and `end` parameters are present: Return traditional RFC 6962 JSON response
- If both are present: Return 400 Bad Request

This ensures:

- Page-unaware clients continue working unchanged
- No separate endpoints or URL changes required
- Single implementation can serve both old and new clients
- No user-agent policy violations

## 4. Operational Considerations

### 4.1 Page Size Stability

Once a log begins serving pages, it SHOULD NOT change the page size, as this would invalidate cached responses and complicate client logic. If a page size change is absolutely necessary, the log SHOULD:

1. Continue serving old page requests correctly
2. Announce the change well in advance
3. Support both page sizes during a transition period

### 4.2 Static Deployment

Since pages are immutable once full, logs can pre-generate pages.

### 4.3 CDN Configuration

No special CDN configuration is required. Standard HTTP caching rules apply:

- Cache based on `Cache-Control` headers
- Cache key is simply the URL with query parameters
- No cache invalidation logic needed


Best,
Pierre
RFC 6962 Pages Extension Specification Proposal.txt

Winston de Greef

unread,
Jun 26, 2025, 8:37:41 AMJun 26
to certificate-...@googlegroups.com
Hi Pierre,

Some comments on the proposal:

You mention static pre-generation. Hosting this statically seems like it would be easier if pages would each have their own path. 
Also, it seems like the convention all other ct standards have used is that /ct/v1 prefix is only for RFC 6962 stuff.
Also Also, you mention that a log might want to host multiple page sizes at the same time.
These three things together make me suggest the following url format:
/ct-pages/v1/<page-size>/<page-number>

It's also worth considering making the ct-pages endpoint be able to have a separate prefix compared to the submission prefix, like static-ct (and then dropping the /ct/-pages/ part of the path.

The X-CT-Page-Size and X-CT-Entry-Range headers should not have the X- prefix. Using X- prefixes for headers in standards is discouraged. 
From MDN:

Custom proprietary headers have historically been used with an X- prefix, but this convention was deprecated in 2012 because of the inconveniences it caused when nonstandard fields became standard in RFC 6648; others are listed in the IANA HTTP Field Name Registry, whose original content was defined in RFC 4229. The IANA registry lists headers, including information about their status.

On the application/x-ct-entries-page, I'm not sure what the standard way to do this is. static-ct uses application/octet-stream. I think that fits better than a non-standard mime-type.
  
Also, it seems to me to be a good idea to reuse the MerkleTreeLeaf or TimestampedEntry struct for PageEntry.

The format_version in EntriesPage starts at 0x01 for version one. I think this should be consistent with how RFC 6962 does it, where versions are enums that start with v1(0).

Sincerely,
Winston de Greef


--
You received this message because you are subscribed to the Google Groups "certificate-transparency" group.
To unsubscribe from this group and stop receiving emails from it, send an email to certificate-transp...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/certificate-transparency/39288f66-b08c-439c-90a0-3c47581de409n%40googlegroups.com.

Pierre Barre

unread,
Jun 26, 2025, 8:56:40 AMJun 26
to certificate-transparency
Hi Winston,

Thank you! I think all your points are valid, and makes it cleaner.

I think it would end up being something like this:

=====================================================================
# RFC 6962 Pages Extension Specification Proposal

## Abstract

This document specifies a simple extension to RFC 6962 Certificate Transparency logs that enables efficient caching and batch retrieval through page-based access patterns with a binary format that eliminates base64 encoding and chain duplication.

## 1. Introduction

The RFC 6962 `get-entries` endpoint accepts arbitrary start/end parameters, making caching difficult and responses inefficient due to base64 encoding and duplicate certificate chains. This extension introduces page-based access with an efficient binary format while maintaining full backward compatibility.

## 2. Page-Based Entry Retrieval

### 2.1 Request Format

```http
GET /ct-pages/v1/page/{page_number}

```

Where `page_number` is a non-negative integer (0-indexed).

### 2.2 Response Format

#### Headers

```http
Content-Type: application/octet-stream
CT-Page-Size: 1000

CT-Entry-Range: 42000-42999
Cache-Control: public, max-age=31536000, immutable
```

For the last (potentially partial) page:
```http
CT-Entry-Range: 8765000-8765431
Cache-Control: no-store
```

#### Binary Response Structure

```c
enum { v1(0), (255) } Version;

struct {
    Version format_version;  // v1(0)

    uint64 entry_count;
    uint64 first_entry_index;
    PageEntry entries[entry_count];
} EntriesPage;

struct {
    TimestampedEntry timestamped_entry;  // Reuse RFC 6962 struct
    uint16 chain_length;
    opaque issuer_hashes[chain_length][32];  // SHA-256 hashes

} PageEntry;
```

### 2.3 Certificate Resolution

Issuer certificates are fetched separately and cached:

```http
GET /ct-pages/v1/certificate/{base64url_sha256_hash}


Content-Type: application/pkix-cert
Cache-Control: public, max-age=31536000, immutable

[binary certificate data]
```

### 2.4 Discovery

Logs MUST provide a discovery endpoint:

```http
GET /ct-pages/v1/discover

Content-Type: application/json

{
  "page_size": 1000,
  "static_endpoint": "https://static.example-log.com",  // Optional
}
```

If `static_endpoint` is provided, clients should use it for fetching pages and certificates. Otherwise, pages are served from the same host as the main log.

## 3. Backward Compatibility

The original RFC 6962 endpoints remain unchanged. This extension introduces new endpoints under `/ct-pages/v1/` that:
- MUST be served on the same host as the main log endpoints, UNLESS a separate static serving endpoint is provided via the discovery mechanism
- Do not interfere with existing `/ct/v1/*` endpoints
- Allow clients to opt-in to the new format
- Maintain full compatibility with RFC 6962 clients


## 4. Operational Considerations

### 4.1 Page Size Stability

Once a log begins serving pages, it SHOULD NOT change the page size, as this would invalidate cached responses and complicate client logic. If a page size change is absolutely necessary, the log SHOULD:

1. Continue serving old page requests correctly
2. Announce the change well in advance
3. Support both page sizes during a transition period

### 4.2 Static Deployment

Since pages are immutable once full, logs can pre-generate pages.

### 4.3 CDN Configuration

No special CDN configuration is required. Standard HTTP caching rules apply:
- Cache based on `Cache-Control` headers
- Cache key is simply the URL path

- No cache invalidation logic needed



=====================================================================

Best,
Pierre
RFC 6962 Pages Extension Specification Proposal.txt

Pierre Barre

unread,
Jun 26, 2025, 9:23:09 AMJun 26
to certificate-transparency
One more detail, I think that allowing to change the page size makes things more complicated than they should, here's an even simpler version:

========================
  "static_endpoint": "https://static.example.com",  // Optional
}
```

If `static_endpoint` is provided, clients MUST use it for fetching pages and certificates.


## 3. Backward Compatibility

The original RFC 6962 endpoints remain unchanged. This extension introduces new endpoints under `/ct-pages/v1/` that:
- MUST be served on the same host as the main log endpoints, UNLESS a separate static serving endpoint is provided via the discovery mechanism
- Do not interfere with existing `/ct/v1/*` endpoints
- Allow clients to opt-in to the new format
- Maintain full compatibility with RFC 6962 clients

## 4. Operational Considerations

### 4.1 Page Size Stability

Once a log begins serving pages, it MUST NOT change the page size, as this would invalidate cached responses and complicate client logic.


### 4.2 Static Deployment

Since pages are immutable once full, logs can pre-generate pages.

========================

Best,
Pierre
RFC 6962 Pages Extension Specification Proposal.txt

Pierre Barre

unread,
Jun 26, 2025, 10:34:10 AMJun 26
to certificate-transparency
To follow up, I've implemented the proposal in CompactLog (https://github.com/Barre/compact_log/commit/f2d0f338089daf08ef9a9bfd5659e2835ec5a455)
 

The proposal is more bandwidth efficient than the static api, while requiring 4 times less requests with a page size of 1k (common in current rfc6962 implementations).

================

Comparing Pages Extension vs Static CT Tiles at http://localhost:8080

Tree size: 279332 entries

Full tree requires:
  Pages: 280
  Tiles: 1092

Fetching full tree...
=== Pages Extension API ===
Fetching 280 pages...
............................
Transfer size: 211777513 bytes
Time: 109s
Requests: 280

=== Static CT Tiles API ===
Fetching 1092 data tiles (279332 entries)...
............................................
Transfer size: 242858575 bytes
Time: 143s
Requests: 1092

=== Summary ===
Pages Extension:
  Entries fetched: 280000 (includes 668 beyond tree size)
  Network transfer: 211777513 bytes
  Bytes per entry: 756 bytes/entry
  Total requests: 280
  Time: 109s

Static CT Tiles:
  Entries fetched: 279332 (exact, using partial tile)
  Network transfer: 242858575 bytes
  Bytes per entry: 869 bytes/entry
  Total requests: 1092
  Time: 143s

Pages Extension uses 87.20% of the bandwidth compared to Static CT

=== Compression Test ===
Testing first page with different encodings:
Encoding: br, Size: 815702 bytes
Encoding: gzip, Size: 853680 bytes
Encoding: deflate, Size: 853668 bytes
Encoding: none, Size: 1456634 bytes


Best,
Pierre

Winston de Greef

unread,
Jun 26, 2025, 10:36:42 AMJun 26
to certificate-...@googlegroups.com
Just to clarify, encoding br is brotli?

Sincerely,
Winston de Greef

Pierre Barre

unread,
Jun 26, 2025, 10:39:15 AMJun 26
to Winston de Greef, certificate-transparency
Yes, "br" is Brotli.

The pages API can properly respect HTTP content negotiation (Accept-Encoding headers) and serve appropriate compression based on client capabilities. Static CT, by storing pre-compressed tiles, has to either ignore Accept-Encoding headers (breaking HTTP semantics) or store multiple versions of each tile.

The compression comparison shows the gains between algorithms are relatively minor (815KB brotli vs 853KB gzip), so the main efficiency comes from the larger chunk size enabling better compression ratios overall.

Best,
Pierre
You received this message because you are subscribed to a topic in the Google Groups "certificate-transparency" group.
To unsubscribe from this group and all its topics, send an email to certificate-transp...@googlegroups.com.

Pierre Barre

unread,
Jun 27, 2025, 6:03:12 AMJun 27
to certificate-transparency, Winston de Greef

Pierre Barre

unread,
Jun 28, 2025, 4:51:35 PMJun 28
to certificate-transparency, Winston de Greef
As a follow-up, I've submitted this as an Internet-Draft to the IETF: https://datatracker.ietf.org/doc/html/draft-trans-pages

Comments are welcome, and it would be great if this became an RFC.

Best,
Pierre

Sigitas Zelenkovas

unread,
Jul 1, 2025, 3:13:49 AMJul 1
to certificate-transparency
Hey
This proposal does look extremely useful to me as log follower.

Few thoughts after reading through the draft:
- It's not immediately obvious what sort of path structure should be used when sending requests using `static_endpoint` from GET /ct-pages/v1/discover. Would it be `https://static.example.com/ct-pages/v1/page/0` or it would be something else.
- When `static_endpoint` is provided, does it mean that same response header behavior applies? I assume `static_endpoint` is to allow use of object storage solutions, it would seem quite hard to ensure CT-Entry-Range/Cache-Control will be present and accurate if it did not come directly from compact-log. Unless it's for proxy-like use cases, where requests forwarded to compact-log or from previously cached entries.
- CT-Entry-Range versus 2 separate headers -Start/-End; Range header does seem to make it more clear that entries in response are inclusive, where as Start/End headers usually are more ambiguous. Personally I would prefer 2 separate headers due to few less lines of code when handling responses, but it's something extremely minor.
- It would be nice to have additional reinforcement that page I received is full or partial in terms of HTTP status code; 200 full page, 206 partial content. But it does not seem like it would be entirely correct use of the 206 status code.

Pierre Barre

unread,
Jul 1, 2025, 5:15:48 AMJul 1
to Sigitas Zelenkovas, certificate-transparency
Hi Sigitas,


Thanks for the feedback - those were good catches.

I've clarified the path structure in the draft. It uses the same paths with the static endpoint, so https://static.example.com/ct-pages/v1/page/0 is correct. Added an example to make this clear.

I removed the CT-Page-Size and CT-Entry-Range headers entirely - they were redundant since the binary response already has `first_entry_index` and `entry_count`. This also solves the static hosting concern since now only Cache-Control matters.

Also made a few other improvements: numbered pages are always complete (partial pages don't get numbers), added a /ct-pages/v1/latest endpoint, and a `last_page_at_static` flag for deployment flexibility.

Agree about 206 - not the right semantic fit for this use case.

Let me know if you have other thoughts!

Best,
Pierre

Pierre Barre

unread,
Jul 1, 2025, 5:16:28 AMJul 1
to certificate-transparency, Sigitas Zelenkovas

Sigitas Zelenkovas

unread,
Jul 2, 2025, 8:46:27 AMJul 2
to certificate-transparency
Hey, thanks for the updates, they did clear up some of the unknowns.

The `/v1/latest` endpoint is interesting, but quite hard for me to come up with optimal use case for it. 

Since page is continuously filled with entries, once full I suspect it's going to be replaced with new empty page once full.
If I had to use that endpoint to receive entries ASAP, implementation would be quite difficult and prone to entry misses.
- When request is sent to `/v1/latest`, I will land somewhere between almost-no-entries in page and page-almost-full. 
- To receive new entries I would have to retry the request after x time, when page is empty wait time could be longer, when it's almost full then request should be repeated quite quickly, It still does not guarantee I will get all entries.

For following logs I would consider 2 options somewhat viable:
1. `v1/previous` page endpoint, where previously full generated page would be returned. Not most up-to-date entries, but would be quite close and sufficient in a lot use cases. Additionally acceptance of `If-Modified-Since` would be great, where compact-log would provide status 200 if page changed since timestamp provided or 304 Not Modified. On followers side implementation relatively simple, repeat requests every x amount of time; adjust wait time depending on conditions: page misses occur - reduce wait time between requests, when receiving 304s for the same page couple time in a row, increase wait time.
2. For MOST up-to-date entries something like socket or stream I think would be nice to have, where through open connection new entries would be pushed to all clients. Way more complicated, and might need some sort of load balancing proxy to maintain reasonable connection numbers to the log. 


Bas Westerbaan

unread,
Jul 2, 2025, 9:00:07 AMJul 2
to certificate-...@googlegroups.com
Hi Pierre,

Following recent discussions about CT scaling challenges and the benchmark results I shared, I'd like to propose a simple extension to RFC 6962 that addresses the caching and efficiency concerns raised about the current get-entries API.

This extension achieves the same efficiency goals as the Static API while maintaining simplicity:

- Simpler client implementation: Direct page requests vs. tile reconstruction

Tiles don't need to be reconstructed. A monitor can just pull the data tiles, extract entries, and compute the tree as usual.
  
- Standard HTTP caching: No special CDN rules needed

Which special CDN rules?
 
- Single request per page: No multiple tile fetches to retrieve entries

I don't follow, can you clarify?
 
- Graceful degradation: Falls back to regular RFC 6962 if pages aren't supported
- No cryptographic complexity: Just fetch the page you need
- No flag day: Logs can adopt this incrementally without coordination
- No ecosystem split: One API serves all clients, old and new

An operator that finds running plain RFC 6962 prohibitively expensive, will want to disable the old API. If they all keep the old API running, there is no point to the new API.

Now, onto the proposal proper. The proposal does make the get-entries more cacheable (as StaticCT does), but it's not particularly innovative there. I'd say it's a bit more annoying for a monitor as the page size is variable.

The proposal does not address the cacheability of the get-proof-by-hash or get-sth-consistency, which StaticCT does address. Arguably the availability of those endpoints is of lesser importance than that of get-entries. If you don't care about the availability of get-proof-by-hash and get-sth-consistency, then the extra bits of StaticCT can be ignored and what remains (the data tiles) is very similar to the present proposal.

Best,

 Bas

 
--
Reply all
Reply to author
Forward
0 new messages