Gemini Specification (Work in Progress)

21 views
Skip to first unread message

Jason Evans

unread,
Apr 9, 2021, 6:25:55 AM4/9/21
to

NOTE: This is a work in progress. Until it's finalized, this is NOT the
official specification.

# Abstract

This document specifies the Gemini protocol for file transfer. It can be
thought of as an incremental improvement over Gopher [RFC1436] rather than a
stripped down HTTP [RFC7230]. It runs over TCP [STD7] port 1965 with
encryption provided by TLS [RFC8446] with a simple request and response
transaction.

# Conventions used in this document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [BCP14].

# Overview

An overriding goal of Gemini is to provide a simple protocol that is easy to
implement (requiring a day or two of effort and only a few hundred lines for
a server or a client) while still being useful.

Gemini is served over TCP on port 1965 by default (the first manned Gemini
mission, Gemini 3, flew in March 1965), using TLS to provide an encrypted
transaction. Servers and clients MUST support TLS 1.2 or higher. The type
of TLS certificate used (CA-based or self-signed) is specified in a "best
practices" document as details are still being discussed. The default
port of 1965 is an unpriviledged port on most systems, so the use of an
administrative account is not required to run the service.

Addressing in Gemini is based on URIs [STD66], with the following
modifications:

1. the scheme used is "gemini";
2. the userinfo portion of a URI MUST NOT be used;
3. a empty path component and a path component of "/" are equivalent and
servers MUST support both without sending a redirection;
4. the port defaults to 1965 if not specified;
5. the use of an IP address in the authority section SHOULD NOT be used.

While this document just covers the protocol with some mandates for what
clients and servers have to do, there are other aspects of Gemini that
aren't covered here in the specification which fall outside the core
protocol. Implementors of both clients and servers are RECOMMENDED to
follow the best practice guide for the Gemini protocol.

# The use of TLS

At the time of writing (2021), not all existing TLS libraries support TLS
1.3, but a majority (all?) do support TLS 1.2, thus TLS 1.2 is the minimum
required version. Implementations MUST support TLS SNI (Server Name
Indication), and servers MUST use the TLS close_notify implementation to
close the connection. Clients SHOULD NOT close a connection by default, but
MAY in case the content exceeds constraints set by the user.

## TLS Server certificates

Clients can validate TLS connections however they like (including not at
all) but the strongly RECOMMENDED approach is to implement a lightweight
"TOFU" certificate-pinning system which treats self-signed certificates as
first- class citizens. This greatly reduces TLS overhead on the network
(only one cert needs to be sent, not a whole chain) and lowers the barrier
to entry for setting up a Gemini site (no need to pay a CA or setup a Let's
Encrypt cron job, just make a cert and go).

TOFU stands for "Trust On First Use" and is a public-key security model
similar to that used by OpenSSH. The first time a Gemini client connects to
a server, it accepts whatever certificate it is presented. That
certificate's fingerprint and expiry date are saved in a persistent database
(like the .known_hosts file for SSH), associated with the server's hostname.
On all subsequent connections to that hostname, the received certificate's
fingerprint is computed and compared to the one in the database. If the
certificate is not the one previously received, but the previous
certificate's expiry date has not passed, the user is shown a warning,
analogous to the one web browser users are shown when receiving a
certificate without a signature chain leading to a trusted CA.

This model is by no means perfect, but it is better than just accepting
self-signed certificates unconditionally.

## TLS Client certificates

Although rarely seen on the web, TLS permits clients to identify themselves
to servers using certificates, in exactly the same way that servers
traditionally identify themselves to the client. Gemini includes the
ability for servers to request in-band that a client repeats a request with
a client certificate. This is a very flexible, highly secure and simple
notion of client identity with several applications:

* Short-lived client certificates which are generated on demand and deleted
immediately after use can be used as "session identifiers" to maintain
server-side state for applications. In this role, client certificates act
as a substitute for HTTP cookies, but unlike cookies they are generated
voluntarily by the client, and once the client deletes a certificate and
its matching key, the server cannot possibly "resurrect" the same value
later (unlike so-called "super cookies".

* Long-lived client certificates can reliably identify a user to a
multi-user application without the need for passwords which may be
brute-forced. Even a stolen database table mapping certificate hashes to
user identities is not a security risk, as rainbow tables for certificates
are not feasible.

* Self-hosted, single-user applications can be easily and reliably secured
in a manner familiar from OpenSSH: the user generates a self-signed
certificate and adds its hash to a server-side list of permitted
certificates, analogous to the .authorized_keys file for SSH).

Gemini requests will typically be made without a client certificate. If a
requested resource requires a client certificate and one is not included in
a request, the server can respond with a status code of 60, 61 or 62 (see
section "Client certificates"). A client certificate which is generated or
loaded in response to such a status code has its scope bound to the same
hostname as the request URL and to all paths below the path of the request
URL path. E.g. if a request for gemini://example.com/foo returns status 60
and the user chooses to generate a new client certificate in response to
this, that same certificate should be used for subsequent requests to
gemini://example.com/foo, gemini://example.com/foo/bar/,
gemini://example.com/foo/bar/baz, etc., until such time as the user decides
to delete the certificate or to temporarily deactivate it. Interactive
clients for human users SHOULD make such actions easy and to generally give
users full control over the use of client certificates.

## TLS Issues

Both clients and servers SHOULD handle the case when the TLS close_notify
mechanism is not used (such as a low level socket error that closes the
socket without properly terminating the TLS connection). A client SHOULD
notify the user of such a case; the server MAY log such a case.

Implementators should be aware that TLS 1.2 will send the server name and
the client certificate (if used) in the clear as part of the encryption
negotiation phase of the protocol. A client MAY warn a user if a TLS 1.2
connection is established, and SHOULD warn the user when a client certifiate
will be transmitted via TLS 1.2.

# Requests

The client connects to the server and sends a request which consists of an
absolute URI followed by a CR (character 13) and LF (character 10). The
augmented BNF [STD68] for this is:

request = absolute-URI CRLF

; absolute-URI from [STD66]
; CRLF from [STD68]

When making a request, the URI MUST NOT exceed 1024 bytes, and a server MUST
reject requests where the URI exceeds this limit. A server MUST reject a
request with a userinfo portion. Clients MUST NOT send a fragment as part
of the request, and a server MUST reject such requests as well. If a client
is making a request with an empty path, the client SHOULD add a trailing '/'
to the request, but a server MUST be able to deal with an empty path.

# Replies

Upon a request, the server will send back a status and in the case of a
successful request, the content requested by the client. The status
consists of a two digit response code, possibly some additional information
(which depends upon the response being sent) followed by a CR and LF. The
augmented BNF:

reply = input / success / redirect / tempfail / permfail / auth

input = '1' DIGIT SP prompt CRLF
success = '2' DIGIT SP mimetype CRLF body
redirect = '3' DIGIT SP URI-reference CRLF
; NOTE: [STD66] allows "" as a valid
; URI-reference. This is not intended to
; be valid for cases of redirection.
tempfail = '4' DIGIT [SP errormsg] CRLF
permfail = '5' DIGIT [SP errormsg] CRLF
auth = '6' DIGIT [SP errormsg] CRLF

prompt = 1*(SP / VCHAR)
mimetype = type '/' subtype *(';' parameter)
errormsg = 1*(SP / VCHAR)
body = *OCTET

VCHAR =/ UTF8-2v / UTF-3 / UTF8-4
UTF8-2v = %xC2 %xA0-BF UTF8-tail ; no C1 control set
/ %xC3-DF UTF8-tail

; URI-reference from [STD66]
;
; type from [RFC2045]
; subtype from [RFC2045]
; parameter from [RFC2045]
;
; CRLF from [STD68]
; DIGIT from [STD68]
; SP from [STD68]
; VCHAR from [STD68]
; OCTET from [STD68]
; WSP from [STD68]
;
; UTF8-3 from [STD63]
; UTF8-4 from [STD63]
; UTF8-tail from [STD63]

The VCHAR rule from [STD68] is extended to include the non-control
codepoints from Unicode (and encoded as UTF-8 [STD63]). The body type is
unspecified here, as the contents depend upon the MIME type of the content
being served. Upon sending the complete response (which may include
content), the server closes the connection and MUST use the TLS close_notify
mechanism to inform the client that no more data will be sent.

The status values range from 10 to 69 inclusive, although not all values
are currently defined. They are grouped such that a client MAY use the
initial digit to handle the response, but the additional digit is there to
further clarify the status, and it is RECOMMENDED that clients use the
addtional digit when deciding what to do. Servers MUST NOT send status
codes that are not defined.

# Status codes

There are six groups of status codes:

10-19 Input expected
20-29 Success
30-39 Redirection
40-49 Temporary failure
50-59 Permanent failure
60-69 Client certifiates

A client MUST reject any status code less than '10' and greater than '69'
and warn the user of such. A client SHOULD deal with undefined status codes
between '10' and '69' per the default action of the initial digit. So a
status of '14' should be acted upon as if the client received a '10'; a
status of '22' should be acted upon as if the client received a '20'.

## Input expected

The server is expecting user input from the client. The additional
information sent after the status code is the text that a client MUST use to
prompt the user for the information, and that information is sent back to
the same URI as the query portion. Spaces MUST be encoded as '%20'. There
are currently two status codes defined under this category.

input = '1' DIGIT SP prompt CRLF
prompt = 1*(SP / VCHAR)

If a client receives a 1x response to a URI that already contains a query
string, the client MUST replace the query string with the user input. For
example, if the given URI results in a 10 response:

gemini://example.net/search?hello

The client will send as a request:

gemini://example.net/search?the%20user%20input

### Status 10

The basic input status code. A client MUST prompt a user for input, it
should be URI-encoded per [STD66] and sent as a query to the same URI that
generated this response.

### Status 11---sensitive input

As per status code 10, but for use with sensitive input such as passwords.
Clients should present the prompt as per status code 10, but the user's
input should not be echoed to the screen to prevent it being read by
"shoulder surfers".

## Success

The request was handled and the server has content to send to the client.
The additional information is the MIME type of the content, specified per
[RFC2045]. Client MUST deal with MIME parameters that are not understood by
simply ignoring them.

Response bodies are just raw content, text or binary, like with gopher
[RFC1436]. There is no support for compression, chunking or any other kind
of content or transfer encoding. The server closes the connection after the
final byte, there is no "end of response" signal.

Internet media types are registered with a canonical form. Content
transferred via Gemini MUST be represented in the appropriate canonical form
prior to its transmission except for "text" types, as defined in the next
paragraph.

When in canonical form, media subtypes of the "text" type use CRLF as the
text line break. Gemini relaxes this requirement and allows the transport
of text media with plain LF alone (but NOT a plain CR alone) representing a
line break when it is done consistently for an entire response body. Gemini
clients MUST accept CRLF and bare LF as being representative of a line break
in text media received via Gemini.

Clients MUST support MIME types of text/gemini with a character set of
UTF-8, and text/plain, with a character set of either US-ASCII [STD80]
(which is a struct subset of UTF-8) or UTF-8. A client MAY support
text/plain with other character sets. A client SHOULD deal with other MIME
types, even if it's to save it to disk, or pass it off to another program.

The specification for text/gemini is given in the text/gemini specification.

The only defined status under this group is 20.

success = '2' DIGIT SP mimetype CRLF body
mimetype = type '/' subtype *(';' parameter)
body = *OCTET

### Status 20

The server has successfully parsed and understood the request, and will
serve up content of the given MIME type.

## Redirection

The server is sending the client a new location where the content is
located. The additional information is an abolute or relative URI. If a
server sends a redirection in response to a request with a query string, the
client MUST NOT apply the query string to the new location; if the query
string is imporant to the new location, the server MAY include the query as
part of the redirection. A server SHOULD NOT include fragments in
redirections, but if one is given, and a client already has a fragment it
could apply (from the original URI), it is up to the client which fragment
to apply. Client MUST limit the number of redirections they follow to 5
redirections. There are two defined status code in this category.

redirect = '3' DIGIT SP URI-reference CRLF
; NOTE: RFC-3987/3987 allow "" as a valid
; URI-reference. This is not intended to
; be valid for cases of redirection.

### Status 30---temporary redirection

The basic redirection code. The redirection is temporary and the client
should continue to request the content with the original URI.

### Status 31---permanent redirection

The location of the content has moved permanently to a new location, and
clients SHOULD use the new location to retrieve the given content from then
on.

## Temporary failure

The request has failed. There is no response body. The nature of the
failure is temporary, i.e. an identical request MAY succeed in the future.
The optional message MAY provide additional information on the failure and
if given, a client SHOULD display it to the user. There are five status
codes under this category.

tempfail = '4' DIGIT [SP errormsg] CRLF
errormsg = 1*(SP / VCHAR)

### Status 40

An unspecified condition exists on the server that is preventing the content
from being served, but a client can try again to obtain the content.

### Status 41---server unavailable

The server is unavailable due to overload or maintenance. (cf HTTP 503)

### Status 42---CGI error

A CGI process, or similar system for generating dynamic content, died
unexpectedly or timed out.

### Status 43---proxy error

A proxy request failed because the server was unable to successfully
complete a transaction with the remote host. (cf HTTP 502, 504)

### Status 44---slow down

The server is requesting the client to slow down requests, and SHOULD use an
exponential back off, where subsequent delays between requests are doubled
until this status no no longer returned.

## Permanent failure

The request has failed. There is no response body. The nature of the
failure is permanent, and futher requests of the content will return the
same status and a client SHOULD NOT make the same request. The optional
message MAY provide additional information on the failure and if given, a
clieht SHOULD display it to the user. There are five status codes under
this category.

permfail = '5' DIGIT [SP errormsg] CRLF
errormsg = 1*(SP / VCHAR)

### Status 50

This is the general permanent failure code.

### Status 51---not found

The requested resource could not be found (you can't find things at Area 51)
and no further information is available. It may exist in the future, it may
not. Who knows?

### Status 52---gone

The resource requested is no longer available and will not be available
again. Search engines and similar tools should remove this resource from
their indices. Content aggregators should stop requesting the resource and
convey to their human users that the subscribed resource is gone. (cf HTTP
410)

### Status 53---proxy request refused

The request was for a resource at a domain not served by the server and the
server does not accept proxy requests.

### Status 59---bad request

The server was unable to parse the client's request, presumably due to a
malformed request, or the request violated the contraints listed in the
Request section.

## Client certificates

The requested resource requires a client certificate to access. If the
request was made without a certificate, it should be repeated with one. If
the request was made with a certificate, the server did not accept it and
the request should be repeated with a different certificate. The additional
information may contain more details about why the certificate was required,
or rejected; servers SHOULD include such information, and clients SHOULD
display it to the user. There are three status codes defined for this
category.

auth = '6' DIGIT [SP errormsg] CRLF
errormsg = 1*(SP / VCHAR)

### Status 60

The content requires a client certificate. The client MUST provide a
certificate for the content. The certificate is limited to the host and
path, and a server MAY require a different certificate for a different path
on the same host. A server SHOULD allow the same certificate to be used for
any content along the given path. Examples:

gemini://example.com/private/ -- requires certificate A
gemini://example.com/private/r1 -- requires certificate A
gemini://example.com/private/r2/r3 -- requires certificate A
gemini://example.com/other/ -- requires certificate B
gemini://example.com/other/r1 -- requires certificate B
gemini://example.com/other/r2/r3 -- requires certificate B
gemini://example.com/random -- no certificate required

### Status 61---certificate not authorized

The supplied client certificate is not authorised for accessing the
particular requested resource. The problem is not with the certificate
itself, which may be authorised for other resources.

### Status 62---certificate not valid

The supplied client certificate was not accepted because it is not valid.
This indicates a problem with the certificate in and of itself, with no
consideration of the particular requested resource. The most likely cause
is that the certificate's validity start date is in the future or its expiry
date has passed, but this code may also indicate an invalid signature, or a
violation of a X509 standard requirements.

# Examples of Gemini requests

The examples below have two parties, the Server and Client. Actions of each
are in square brackets '[]', literal text in quotes (but the quotes are NOT
included in the input) with a few terminals like 'CRLF' indicating the
characters code 13 and 10, 'mimetype' representing a MIME type per
[RFC2045], and 'content...' meaning the content requested.

This examle is a server requiring user input, which the client gathers, then
resubmits the request with the user input:

Client: [opens connection]
Client: "gemini://example.net/search" CRLF
Server: "10 Please input a search term" CRLF
Server: [closes connection]
Client: [prompts user, gets input]
Client: [opens connection]
Client: "gemini://example.net/search?gemini%20search%20engines" CRLF
Server: "20 " mimetype CRLF content...
Server: [closes connection]

The client is requesting some content, which in this example, is an image
file:

Client: [opens connection]
Client: "gemini://example.net/image.jpg" CRLF
Server: "20 image/jpeg" CRLF <binary data of JPEG image>
Server: [closes connection]

For this example the server is redirecting the client to the new location of
a resource:

Client: [opens connection]
Client: "gemini://example.net/current" CRLF
Server: "30 /new" CRLF
Server: [closes connection]
Client: [opens connection]
Client: "gemini://example.net/new" CRLF
Server: "20 " mimetype CRLF content...
Server: [closes connection]

Here we have a server requesting a client certificate, and the client
providing one on the subsequent request:

Client: [opens connection, no client certificate sent]
Client: "gemini://example.net/application/" CRLF
Server: "60 Certificate required to maintain server-side state" CRLF
Server: [closes connection]
Client: [does application specific actions to get certificate]
Client: [opens connection, client certificate sent]
Client: "gemini://example.net/application/" CRLF
Server: "20 " mimetype CRLF content...
Server: [closes connection]

In this example, the server is sending a temporary failure with additional
text describing the error:

Client: [opens connection]
Client: "gemini://example.net/data" CRLF
Server: "41 Undergoing maintanence at this time" CRLF
Server: [closes connection]

And the final example, a permanent failure without any further explanation:

Client: [opens connection]
Client: "gemini://example.net/data" CRLF
Server: "50" CRLF
Server: [closes connection]

# Normative References

[BCP14] Key words for use in RFCs to Indicate Requirement Levels
[RFC2045] Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies
[RFC3987] Internationalized Resource Identifiers (IRIs)
[STD63] UTF-8, a transformation format of ISO 10646
[STD66] Uniform Resource Identifier (URI): Generic Syntax
[STD68] Augmented BNF for Syntax Specifications: ABNF
[STD80] ASCII format for network interchange

# Informative References

[RFC1436] The Internet Gopher Protocol
[RFC5246] The Transport Layer Security (TLS) Protocol Version 1.2
[RFC7230] Hypertext Transfer Protocol
[RFC8446] The Transport Layer Security (TLS) Protocol Version 1.3
[STD7] Transmission Control Protocol
Reply all
Reply to author
Forward
0 new messages