Error messages

153 views

Skip to first unread message

Andrew Sutherland

unread,

Dec 10, 2014, 5:10:03 PM12/10/14

to ema...@googlegroups.com

I noticed https://github.com/whiteout-io/browserbox/pull/44 is
attempting to improve browserbox's error reporting, likely in response
to https://github.com/whiteout-io/browserbox/issues/21. The PR seems to
be a combination of helping gaia-email-libs-and-more get (GELAM) rid of
its error-handling monkeypatches referenced in issue 21 (yeah! :) using
our existing error strings
(imap-disabled/server-maintenance/needs-oauth-reauth/bad-user-or-pass)
as well as adding several new errors of the form "E" + something
(ETLS/ETIMEDOUT/EPROTOCOL).

In the interests of normalizing errors across the email.js apps and
perhaps the apps that use them, I figured it might be worth discussing
on the list.

# Context #

## What GELAM does (flat, user-visible) ##

The gaia-email-libs-and-more string error codes/names surfaced to its
consumers are documented at
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/mailapi.js#L2825.
That call-site is for account creation, but that's basically the sum
total of the errors.

The error names were chosen to be human-readable but also look a little
mechanical. The idea was that our UI could surface the error codes
across all locales and that they could be found via web search, just
like many apps/platforms have used numeric error codes. In practice
there was a lot of confusion about surfacing English-derived error codes
to all locales and we stopped doing that. Now we just log them.

The errors are intended to map to user-visible localized strings. So we
wanted to minimize the number of strings while also conveying sufficient
implementation for the user to be able to correct the problem or give
someone helping them useful information. For Mozilla localization, this
currently entails the use of distinct string identifiers for things that
could theoretically be parameterized. (gmail IMAP support is disabled
versus POP3 support.)

Note that internally errors are handled by catching and logging the
details to the structured logging layer and then a "string as error" is
used where the node.js-style callbacks would use the string
'bad-user-or-pass' devoid of extra context (in most cases). The helper
logic at
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/errorutils.js#L3
provides mapping for the following decisions related to the error:

- shouldReportProblem: Should the error be reported to the user? For
example, temporary "server-maintenance" does not need to be reported, it
just needs to be retried.

- shouldRetry: Should the operation be automatically retried (using
backoff)? Again, "server-maintenance" is a primary case for this.

- wasErrorFromReachableState: Was this an error that happened when we
were talking to the server already, or was it more like a network error
while trying to first talk to the server. This is used for connection
retrying backoff logic
(https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/errbackoff.js).
Because servers may close connections on purpose to indicate an error,
we potentially incorrectly lump network errors after connection with
these intentional closes. (This should likely be revisited, assuming we
can differentiate between the two cases with TCPSocket's assistance.)

GELAM is now moving to use Promises and especially if we can standardize
on error object representations, it probably makes a lot of sense to use
error objects instead of strings since the ability to have Errors track
the complicated control/data-flow state that can result from using
Promises/etc.

## What Gecko's MozTCPSocket does (hierarchical, technical) ##

Gecko/Necko/NSPR use numerically encoded error codes for for
historical/performance reasons. Functions return a 32-bit nsresult that
encodes success or the specific error. Errors are organized by module;
the upper part of the word encodes that it's an error and the module,
the lower bits encode specific error codes.

In
http://dxr.mozilla.org/mozilla-central/source/dom/network/TCPSocket.js#766
we map the wide variety of errors to string errors. The goal was to be
able to indicate the type of error as "Network" or "Security" in a
separate field from the name on the error so users of TCPSocket could
avoid needing to make that distinction themselves, but due to
implication difficulties, that didn't happen. GELAM uses a regexp of
/^Security/ to detect security errors (the exact names of which are
likely to change over time with changes to Necko), and assumes
everything else is a network error.

## What GELAM's ActiveSync layer does ##

A JS error hierarchy is created that is used with instanceof and where
errorObj.name is consistent with the type name. See
https://github.com/mozilla-b2g/jsas/blob/worker-thread/protocol.js#L31

# Assumptions #

Any email app using the email.js libraries will not just spit out all
errors at the user, but instead will try and map them into localized,
somewhat user-friendly strings. Or possibly consume the errors entirely
and involve them in a more complicated UI state machine; like indicating
that the user is not currently connected to the server and only
bothering the user after sustained failures. The app may report the
specific error via logging channels or as details that the user can
otherwise check out and then search for.

Given this, some level of error hierarchy or error tagging is probably
appropriate unless there's a guarantee that the set of returned error
codes won't change without at least a minor version bump and appropriate
changelog-style documentation.

# Suggestions #

## On "E" + errors ##

I suggest we avoid this. I view this naming convention as a historical
thing from C APIs where six-letter rules were in force for line length
or other policy reasons. For example, "man 2 open" lists "EACCES" as an
error message. Things improved with time and we eventually got stuff
like "ENAMETOOLONG" and "EWOULDBLOCK", but I'd argue the JS CamelCase
errors like EvalError/RangeError/ReferenceError/etc. are a better idiom
to follow

## Generic Information to include ##

I think the following information would be useful to include on the
errors in addition to the name and a human-friendly-ish toString()
implementation:

- lib: What library is emitting this error. Although this can probably
be inferred from the stack, it's handy to know for sure that
"browserbox" is throwing versus "smtpclient" when you see a connection
problem. I don't know if it's useful or even makes sense to indicate if
"imap-handler" or "mimelib" is getting upset for (expected) parsing errors.

- isTransient: Is this a transient error that is likely to go away after
retrying? Network errors are transient, a bad password or an SMTP
server receiving to send a message is not.

- isFatal: Not all errors need to result in us closing the connection.
The idea would be to make it very clear if an error is one that is going
to cause the connection to be closed.

- errorType: One of:
- "network": There was a network problem. If the network gets fixed
/ server comes back, this shouldn't happen again.

- "security": There's a security problem, with all the implications
that entails of potential attack, potential server operator
incompetence, the device's date potentially being wrong, wi-fi hotspot
redirects happening, etc.

- "credentials": Your username/password/oauth credentials/whatever
are sad.

- "configuration": There's some type of configuration problem with
your account. This would cover things like IMAP not being enabled, etc.

- "bad-rfc2822-message": There's something wrong with the email
message. This would happen for an invalid SMTP message being sent by
SMTP, or for the case where GMail's DB gets corrupt and although it will
pretend the message exists, when you go to fetch its body parts it
throws weird errors.

- "our-quota" / "their-quota": The idea would be to indicate
failures due to various account limits being reached (folder/Inbox
filled up) or message size limitations (too big!). The our/their
difference is to clarify things that "our" user is on the hook for
(delete messages, attach less stuff, pay the provider more money),
versus things they can't because it's the recipient's mailbox that
filled up, etc.) (I realize that in most cases for SMTP a bounce will
be generated instead of an immediate error.)

- "suspicious-server": The server thinks you are a spammer or a
hacker now and you need to go through some kind of CAPTCHA or
reauthorization or go to a webpage or something.

- connectionInfo: basically { hostname, hostport, useSecureTransport,
connected, secured } where "connected" is a snapshotted indicator of
whether we were connected when the problem happened and "secured" is a
snapshotted indicator of whether a secure connection had been fully
established.

- serverSays: { raw, userMessage, url, showMessageConfidence,
showURLConfidence, errorCodeCoversMessageConfidence }. RFC 3501 says
ALERT means we should absolutely show the user a message. Google has a
WEBALERT thing. These things may include some combination of
unlocalized text (probably in English), localized text, and a URL.
There was some discussion on the IMAP lists about this that gets into
things more deeply. I've really only looked into the gmail case, but
this probably happens elsewhere too. Right now gmail returns English
strings (because it might not be safe to return utf-8 strings) and
URLs. The proper response to that is likely to just show the user the
URL. If gmail starts providing localized utf-8 strings, it might be
appropriate to include them then too.

The idea of the showMessageConfidence value would be to relay how
likely we think it is that the server's message is useful and
appropriately localized. errorCodeCoversMessageConfidence would be 1.0
if we thought the error message we returned was 100% useful and allows
the UI to supersede whatever the server is saying. For example, GMail
indicating the user needs to enable IMAP4 is a very reliable error code
if explicitly supported. Some server might provide an appropriate
localized error message but provide the same URL to resolve problems in
all cases; in that case, the error code is not better than the provided
string. The simple/naive implementation for this might be to just do a
quick search of useless ALERT messages seen in the wild and set
showMessageConfidence to 0 for all of those and to set it to 0.5 for
everything else. When we are aware of servers providing localized error
messages, we set it to 1.0 for those servers.

## Browserbox Info to maybe Include ##

While fixing some recent server-compatibility issues in browserbox, I
added logic to GELAM's monkeypatches also save off the raw command
string built by imapHandler.compiler(this._currentCommand.request) when
a NO/BAD comes back. This was partially to work-around browserbox not
logging the payload of most commands with its logging, but arguably it
is useful information, although there may be sensitive / giant string
risks. (I think I sanity-checked that a failed APPEND wouldn't generate
a super huge string, but I don't see a comment to that effect, so maybe
it is a risk. Password reveals are probably a real risk without extra
guards.)

Andrew

Andris Reinman

unread,

Dec 11, 2014, 5:00:45 AM12/11/14

to Andrew Sutherland, ema...@googlegroups.com

Hi,

You’re right the pull requests are related to the monkeypatching :)

> ## On "E" + errors ##
> I suggest we avoid this. I view this naming convention as a historical thing from C APIs where six-letter rules were in force for line length or other policy reasons.

The E errors are related to similar code in Node.js where all system/network/timeout error objects include code property with the E error string. I agree though that we might skip this as the value is a string, not a constant and when run in a browser environment the native errors do not include the E error property that Node does.

> ## Generic Information to include ##

> - lib: What library is emitting this error.

Good idea and easy to implement

> - isTransient: Is this a transient error that is likely to go away after retrying?

This would be useful. I’m not sure though how easy or difficult it would be to determine if the error is transient or not. If host is unreachable then the error might go away later (or it might not if the host doesn’t actually exist). If Hotmail rejects login with AUTHORIZATIONFAILED then the error is kind of transient but only if I open the webmail in a browser and say that the login attempt was legit – in the IMAP side you wouldn’t have to change anything, just try the same thing until restriction is lifted. On the other hand if you get AUTHENTICATIONFAILED then the password is most probably invalid and retrying wouldn’t work.

> - isFatal: Not all errors need to result in us closing the connection. The idea would be to make it very clear if an error is one that is going to cause the connection to be closed.

Agreed. Should be easier than detecting transient errors.

> - errorType: One of:
> - "network": There was a network problem.

> - "security": There's a security problem

Would this be only for failing/missing STARTTLS and invalid TLS (ie. self signed or expired certs etc.) or should it cover something else as well?

> - "credentials": Your username/password/oauth credentials/whatever are sad.

> - "configuration": There's some type of configuration problem with your account.

Besides disabled IMAP this might also include missing folders – for example if you expect All Mail or Sent Mail etc. to exist for a Gmail account and the required mailbox isn’t there then you have probably hidden it in the Gmail Labels configuration page.

> - "bad-rfc2822-message": There's something wrong with the email message.

ok but I’d rather go with “bad-message”

> - "our-quota" / "their-quota": The idea would be to indicate failures due to various account limits being reached

I think that for SMTP the only quota indication you get is the max message size and that’s not even the max message size of the actual recipients as you’re not talking to the recipients MX but your own MTA/MSA. Regarding IMAP, to be honest I have no idea what happens once you hit your quota. Is there an APPEND error response code for that or is the error only a plain text message? How do I distinct if a) message was too big b) message limit for this mailbox is exceeded or c) all space for this user is used up.

> - "suspicious-server": The server thinks you are a spammer or a hacker

I guess this can only be done with a regex or some kind of list. Gmail provides you an URL but Hotmail on the other hand does not (it does give you the AUTHORIZATIONFAILED response code though) and I’m not sure about other servers.

> - connectionInfo: basically { hostname, hostport, useSecureTransport, connected, secured }

Seems reasonable

> ## Browserbox Info to maybe Include ##

> This was partially to work-around browserbox not logging the payload of most commands with its logging, but arguably it is useful information, although there may be sensitive / giant string risks.

My main email client is the Mail App in OSX and if logging is turned on it logs everything besides login credentials. For example an authentication in the logs usually looks something like this: "2.5417 AUTHENTICATE PLAIN (*** 40 bytes hidden ***)". Our goal at Whiteout as a privacy-first application was to not log anything that might be considered private. The unfortunate outcome where you see 40 lines of "* LIST" and nothing more is not really helpful, at least if you want to debug IMAP protocol related issues. For example if the client sends a quoted date string and the server only allows unquoted atoms then you'd never realize the issue looking at such log as both the quoted date argument (that might raise suspicion) and server response message are missing. So we probably should discuss if and to what extent we can increase logged information in addition to the bare minimum that is logged today. Passwords and authentication tokens must be hidden, no queston about that though. The risk that the error is caused by a strangely formatted password that breaks the parser in server is so much lower than the risk of the log leaking.

Best regards,
Andris
--
Whiteout Networks GmbH c/o Werk1
Grafinger Str. 6
D-81671 München
Geschäftsführer: Oliver Gajek
RG München HRB 204479

Andrew Sutherland

unread,

Dec 11, 2014, 10:30:57 AM12/11/14

to Andris Reinman, ema...@googlegroups.com

On 12/11/2014 05:00 AM, Andris Reinman wrote:
>> - isTransient: Is this a transient error that is likely to go away after retrying?
> This would be useful. I’m not sure though how easy or difficult it would be to determine if the error is transient or not. If host is unreachable then the error might go away later (or it might not if the host doesn’t actually exist). If Hotmail rejects login with AUTHORIZATIONFAILED then the error is kind of transient but only if I open the webmail in a browser and say that the login attempt was legit – in the IMAP side you wouldn’t have to change anything, just try the same thing until restriction is lifted. On the other hand if you get AUTHENTICATIONFAILED then the password is most probably invalid and retrying wouldn’t work.

Yeah, I would expect there to be some amount of heuristics or usage of
an internal quirks "database" for these things. Putting the logic in
browserbox rather than having each client reproduce their own ad hoc
mapping for this seems optimal.

isTransient might be the wrong name for this; in
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/bca7438c78c83db9eddd508340ab8a7974601a9b/js/errorutils.js
we call expose it as shouldRetry and we also have shouldReportProblem.
I was trying to make it a bit more generic but the main idea was to help
make it more clear when the response to an error should be automatically
retrying (with backoff) or if user action is required.

In the case of hotmail and AUTHORIZATIONFAILED, that sounds like a case
where user action is required since the only ways to resolve the problem
are to login to the web UI or to move to a different IP. This would be
covered by the "suspicious-server" errorType I proposed.

It's possible that errorType could be sufficient for isTransient
purposes. Right now in GELAM, we only have retry true for what would be
'network' errorTypes and the 'server-maintenance' error we've seen from
yahoo.com's IMAP server. We could address the yahoo case (which also
includes SELECT/EXAMINE-ing a failure temporarily failing for no clear
reason) by having a specific errorType like
'server-contention-retry-soon'. So then we'd lose
isTransient/shouldRetry unless we started to see errors where we do want
the errorType to be something else and we're also wanting to retry.

>> - "security": There's a security problem
> Would this be only for failing/missing STARTTLS and invalid TLS (ie. self signed or expired certs etc.) or should it cover something else as well?

I can't think of anything else.

But relatedly, on Firefox OS devices we have seen problems with the
date/time being wrong (1980!) and we are planning to detect/report that
specific error as saying "hey, we can't establish a secure connection,
but your date looks really suspicious, so maybe check that out."

In terms of how we use this in Gaia email, if we get a 'security' error
during account creation, we report it. But if we see it after account
creation and during normal usage, we treat it like a network problem
where we can't talk to the server.

>> - "configuration": There's some type of configuration problem with your account.
> Besides disabled IMAP this might also include missing folders – for example if you expect All Mail or Sent Mail etc. to exist for a Gmail account and the required mailbox isn’t there then you have probably hidden it in the Gmail Labels configuration page.

Ah, yeah, that'd be an interesting heuristic to bake in to cover the
case where a critical folder is being hidden by gmail and we know that
the only option is that the user explicitly hid the folder since gmail
will always have trash/all/sent. A user recently reported a problem
like this for gaia email; they changed the setting to make the problem
go away, but it was on my to-do list to see if their hidden Trash folder
was something that would make us unable to CREATE a trash folder
ourselves or whether it would expose it. (I assume it would fail with a
NO since gmail has that rule about forbidding creating folders that are
any of the localized names of its special folders and they probably
didn't bother to otherwise address the edge case.) It sounds like you
may already know the answer to that question!

In this specific case of gmail hiding folders, it might make sense for
it to be an optional warning return by listMailboxes as an additional
argument rather than an error that gets emitted. Although I'm not aware
of any other servers that do/allow weird stuff in this context, so maybe
this is just something the app needs to deal with when doing initial
account creation.

>> - "bad-rfc2822-message": There's something wrong with the email message.
> ok but I’d rather go with “bad-message”

Yeah, I prefer that too.

>> - "our-quota" / "their-quota": The idea would be to indicate failures due to various account limits being reached
> I think that for SMTP the only quota indication you get is the max message size and that’s not even the max message size of the actual recipients as you’re not talking to the recipients MX but your own MTA/MSA.

I think it's possible in the case where your mail server is also the
recipient's server. Like a dovecot/postfix setup without a compulsory
smarthost in between. It's definitely an edge-case.

> Regarding IMAP, to be honest I have no idea what happens once you hit your quota. Is there an APPEND error response code for that or is the error only a plain text message? How do I distinct if a) message was too big b) message limit for this mailbox is exceeded or c) all space for this user is used up.

https://tools.ietf.org/html/rfc5530 documents OVERQUOTA as a response
code with an example of failure for COPY. It also says it's a legal
response even without explicitly supporting QUOTA from
https://tools.ietf.org/html/rfc2087.

Then there's TOOBIG for APPEND as added by the CATENATE extension in
https://tools.ietf.org/html/rfc4469. (And there's the related
discussion on the imapext list about the advisory APPENDLIMIT hint being
standardized, but that wouldn't be an error.

>> - "suspicious-server": The server thinks you are a spammer or a hacker
> I guess this can only be done with a regex or some kind of list. Gmail provides you an URL but Hotmail on the other hand does not (it does give you the AUTHORIZATIONFAILED response code though) and I’m not sure about other servers.

Yeah, as alluded to above, I think this ends up being hardcoded
heuristics keyed off of the server greeting and/or the "ID" response
and/or a magic capability that advertises the server and/or a list of
well-known domains that indicate the given server implementation.

Hotmail does a weird thing for ActiveSync where it will show your inbox
as a folder with only one message in it that that mentions everything is
suspicious. I guess it's good it doesn't do that; I hope you mean there
is more than the AUTHORIZATIONFAILED response and we can regex that?

>> This was partially to work-around browserbox not logging the payload of most commands with its logging, but arguably it is useful information, although there may be sensitive / giant string risks.
>
> My main email client is the Mail App in OSX and if logging is turned on it logs everything besides login credentials. For example an authentication in the logs usually looks something like this: "2.5417 AUTHENTICATE PLAIN (*** 40 bytes hidden ***)". Our goal at Whiteout as a privacy-first application was to not log anything that might be considered private. The unfortunate outcome where you see 40 lines of "* LIST" and nothing more is not really helpful, at least if you want to debug IMAP protocol related issues. For example if the client sends a quoted date string and the server only allows unquoted atoms then you'd never realize the issue looking at such log as both the quoted date argument (that might raise suspicion) and server response message are missing. So we probably should discuss if and to what extent we can increase logged information in addition to the bare minimum that is logged today. Passwords and authentication tokens must be hidden, no queston about that though. The risk that the error is caused by a strangely formatted password that breaks the parser in server is so much lower than the risk of the log leaking.

Yeah, this is what Thunderbird does. The thing gaia email did
pre-email.js was to treat the arguments of the command and the responses
received from the server as user-sensitive data that requires the
logging to be explicitly cranked up to a higher level.

We're overhauling our logging somewhat; our current stepping stone is
https://github.com/mozilla-b2g/gaia-email-libs-and-more/blob/master/js/slog.js
where we might do something like:

slog.log('imap:command', { tag: tag, cmd: cmd, _args: theSensitiveArgs });

Then our logging layer will only capture/log _args if the app was
explicitly set to log private data. Our test framework does this, but
otherwise explicit user action is required to enable it (via
https://wiki.mozilla.org/Gaia/Email/SecretDebugMode).

Andrew

Andris Reinman

unread,

Dec 12, 2014, 4:13:02 AM12/12/14

to Andrew Sutherland, ema...@googlegroups.com

> Ah, yeah, that'd be an interesting heuristic to bake in to cover the case where a critical folder is being hidden by gmail and we know that the only option is that the user explicitly hid the folder since gmail will always have trash/all/sent.

> … but it was on my to-do list to see if their hidden Trash folder was something that would make us unable to CREATE a trash folder ourselves or whether it would expose it.

It is even worse – you CAN create the unlisted folder, or at least it says "OK Success”. But if you now want to select this newly created folder, you get "NO [NONEXISTENT] Unknown Mailbox: [Gmail]/Send Mail (Failure)”. The upside is that all gmail specific folders are prefixed with [Gmail] or [Google Mail], so if you are just going to try to create “Trash” then it shouldn’t conflict. I was also able to create a folder "[Gmail]/Sent Mail” on an non-english account (actual sent folder would be"[Gmail]/Saadetud kirjad”) and this folder was selectable and it didn’t include the \Sent flag, so it seemed like a regular folder even though in english accounts it is a special folder.

I had this issue about a year or two ago in Pipedrive where an user had All Mail disabled while we were relying on that folder. The application had a fallback to use INBOX + Sent Folder if All Mail didn’t exist so it took a bit time until we realized what was going on since most of the stuff seemed to work for the user.

> Hotmail does a weird thing for ActiveSync where it will show your inbox as a folder with only one message in it that that mentions everything is suspicious. I guess it's good it doesn't do that; I hope you mean there is more than the AUTHORIZATIONFAILED response and we can regex that?

Hotmail does a lot of strange things. In case of trying to log in to IMAP from a suspicious location you get "NO [AUTHORIZATIONFAILED] Account is blocked. Login to your account via a web browser to verify your identity.” which is easy to understand. But if you try to log in to SMTP, you get "535 5.0.0 Authentication Failed” which is exactly the same response you’d get for invalid credentials, so there is no way to distinguish that. Felix even said he’d seen cases where Hotmail lets you log in to SMTP but refuse to actually send anything until you validate the login.

Best regards,
Andris

Reply all

Reply to author

Forward

0 new messages