I noticed
https://github.com/whiteout-io/browserbox/pull/44 is
attempting to improve browserbox's error reporting, likely in response
to
https://github.com/whiteout-io/browserbox/issues/21. The PR seems to
be a combination of helping gaia-email-libs-and-more get (GELAM) rid of
its error-handling monkeypatches referenced in issue 21 (yeah! :) using
our existing error strings
(imap-disabled/server-maintenance/needs-oauth-reauth/bad-user-or-pass)
as well as adding several new errors of the form "E" + something
(ETLS/ETIMEDOUT/EPROTOCOL).
In the interests of normalizing errors across the email.js apps and
perhaps the apps that use them, I figured it might be worth discussing
on the list.
# Context #
## What GELAM does (flat, user-visible) ##
The gaia-email-libs-and-more string error codes/names surfaced to its
consumers are documented at
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/mailapi.js#L2825.
That call-site is for account creation, but that's basically the sum
total of the errors.
The error names were chosen to be human-readable but also look a little
mechanical. The idea was that our UI could surface the error codes
across all locales and that they could be found via web search, just
like many apps/platforms have used numeric error codes. In practice
there was a lot of confusion about surfacing English-derived error codes
to all locales and we stopped doing that. Now we just log them.
The errors are intended to map to user-visible localized strings. So we
wanted to minimize the number of strings while also conveying sufficient
implementation for the user to be able to correct the problem or give
someone helping them useful information. For Mozilla localization, this
currently entails the use of distinct string identifiers for things that
could theoretically be parameterized. (gmail IMAP support is disabled
versus POP3 support.)
Note that internally errors are handled by catching and logging the
details to the structured logging layer and then a "string as error" is
used where the node.js-style callbacks would use the string
'bad-user-or-pass' devoid of extra context (in most cases). The helper
logic at
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/errorutils.js#L3
provides mapping for the following decisions related to the error:
- shouldReportProblem: Should the error be reported to the user? For
example, temporary "server-maintenance" does not need to be reported, it
just needs to be retried.
- shouldRetry: Should the operation be automatically retried (using
backoff)? Again, "server-maintenance" is a primary case for this.
- wasErrorFromReachableState: Was this an error that happened when we
were talking to the server already, or was it more like a network error
while trying to first talk to the server. This is used for connection
retrying backoff logic
(
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/errbackoff.js).
Because servers may close connections on purpose to indicate an error,
we potentially incorrectly lump network errors after connection with
these intentional closes. (This should likely be revisited, assuming we
can differentiate between the two cases with TCPSocket's assistance.)
GELAM is now moving to use Promises and especially if we can standardize
on error object representations, it probably makes a lot of sense to use
error objects instead of strings since the ability to have Errors track
the complicated control/data-flow state that can result from using
Promises/etc.
## What Gecko's MozTCPSocket does (hierarchical, technical) ##
Gecko/Necko/NSPR use numerically encoded error codes for for
historical/performance reasons. Functions return a 32-bit nsresult that
encodes success or the specific error. Errors are organized by module;
the upper part of the word encodes that it's an error and the module,
the lower bits encode specific error codes.
In
http://dxr.mozilla.org/mozilla-central/source/dom/network/TCPSocket.js#766
we map the wide variety of errors to string errors. The goal was to be
able to indicate the type of error as "Network" or "Security" in a
separate field from the name on the error so users of TCPSocket could
avoid needing to make that distinction themselves, but due to
implication difficulties, that didn't happen. GELAM uses a regexp of
/^Security/ to detect security errors (the exact names of which are
likely to change over time with changes to Necko), and assumes
everything else is a network error.
## What GELAM's ActiveSync layer does ##
A JS error hierarchy is created that is used with instanceof and where
errorObj.name is consistent with the type name. See
https://github.com/mozilla-b2g/jsas/blob/worker-thread/protocol.js#L31
# Assumptions #
Any email app using the email.js libraries will not just spit out all
errors at the user, but instead will try and map them into localized,
somewhat user-friendly strings. Or possibly consume the errors entirely
and involve them in a more complicated UI state machine; like indicating
that the user is not currently connected to the server and only
bothering the user after sustained failures. The app may report the
specific error via logging channels or as details that the user can
otherwise check out and then search for.
Given this, some level of error hierarchy or error tagging is probably
appropriate unless there's a guarantee that the set of returned error
codes won't change without at least a minor version bump and appropriate
changelog-style documentation.
# Suggestions #
## On "E" + errors ##
I suggest we avoid this. I view this naming convention as a historical
thing from C APIs where six-letter rules were in force for line length
or other policy reasons. For example, "man 2 open" lists "EACCES" as an
error message. Things improved with time and we eventually got stuff
like "ENAMETOOLONG" and "EWOULDBLOCK", but I'd argue the JS CamelCase
errors like EvalError/RangeError/ReferenceError/etc. are a better idiom
to follow
## Generic Information to include ##
I think the following information would be useful to include on the
errors in addition to the name and a human-friendly-ish toString()
implementation:
- lib: What library is emitting this error. Although this can probably
be inferred from the stack, it's handy to know for sure that
"browserbox" is throwing versus "smtpclient" when you see a connection
problem. I don't know if it's useful or even makes sense to indicate if
"imap-handler" or "mimelib" is getting upset for (expected) parsing errors.
- isTransient: Is this a transient error that is likely to go away after
retrying? Network errors are transient, a bad password or an SMTP
server receiving to send a message is not.
- isFatal: Not all errors need to result in us closing the connection.
The idea would be to make it very clear if an error is one that is going
to cause the connection to be closed.
- errorType: One of:
- "network": There was a network problem. If the network gets fixed
/ server comes back, this shouldn't happen again.
- "security": There's a security problem, with all the implications
that entails of potential attack, potential server operator
incompetence, the device's date potentially being wrong, wi-fi hotspot
redirects happening, etc.
- "credentials": Your username/password/oauth credentials/whatever
are sad.
- "configuration": There's some type of configuration problem with
your account. This would cover things like IMAP not being enabled, etc.
- "bad-rfc2822-message": There's something wrong with the email
message. This would happen for an invalid SMTP message being sent by
SMTP, or for the case where GMail's DB gets corrupt and although it will
pretend the message exists, when you go to fetch its body parts it
throws weird errors.
- "our-quota" / "their-quota": The idea would be to indicate
failures due to various account limits being reached (folder/Inbox
filled up) or message size limitations (too big!). The our/their
difference is to clarify things that "our" user is on the hook for
(delete messages, attach less stuff, pay the provider more money),
versus things they can't because it's the recipient's mailbox that
filled up, etc.) (I realize that in most cases for SMTP a bounce will
be generated instead of an immediate error.)
- "suspicious-server": The server thinks you are a spammer or a
hacker now and you need to go through some kind of CAPTCHA or
reauthorization or go to a webpage or something.
- connectionInfo: basically { hostname, hostport, useSecureTransport,
connected, secured } where "connected" is a snapshotted indicator of
whether we were connected when the problem happened and "secured" is a
snapshotted indicator of whether a secure connection had been fully
established.
- serverSays: { raw, userMessage, url, showMessageConfidence,
showURLConfidence, errorCodeCoversMessageConfidence }. RFC 3501 says
ALERT means we should absolutely show the user a message. Google has a
WEBALERT thing. These things may include some combination of
unlocalized text (probably in English), localized text, and a URL.
There was some discussion on the IMAP lists about this that gets into
things more deeply. I've really only looked into the gmail case, but
this probably happens elsewhere too. Right now gmail returns English
strings (because it might not be safe to return utf-8 strings) and
URLs. The proper response to that is likely to just show the user the
URL. If gmail starts providing localized utf-8 strings, it might be
appropriate to include them then too.
The idea of the showMessageConfidence value would be to relay how
likely we think it is that the server's message is useful and
appropriately localized. errorCodeCoversMessageConfidence would be 1.0
if we thought the error message we returned was 100% useful and allows
the UI to supersede whatever the server is saying. For example, GMail
indicating the user needs to enable IMAP4 is a very reliable error code
if explicitly supported. Some server might provide an appropriate
localized error message but provide the same URL to resolve problems in
all cases; in that case, the error code is not better than the provided
string. The simple/naive implementation for this might be to just do a
quick search of useless ALERT messages seen in the wild and set
showMessageConfidence to 0 for all of those and to set it to 0.5 for
everything else. When we are aware of servers providing localized error
messages, we set it to 1.0 for those servers.
## Browserbox Info to maybe Include ##
While fixing some recent server-compatibility issues in browserbox, I
added logic to GELAM's monkeypatches also save off the raw command
string built by imapHandler.compiler(this._currentCommand.request) when
a NO/BAD comes back. This was partially to work-around browserbox not
logging the payload of most commands with its logging, but arguably it
is useful information, although there may be sensitive / giant string
risks. (I think I sanity-checked that a failed APPEND wouldn't generate
a super huge string, but I don't see a comment to that effect, so maybe
it is a risk. Password reveals are probably a real risk without extra
guards.)
Andrew