Google 網路論壇不再支援新的 Usenet 貼文或訂閱項目,但過往內容仍可供查看。

Internet Data Handling » mailbox

瀏覽次數:59 次
跳到第一則未讀訊息

Adam Jensen

未讀,
2016年10月21日 晚上10:44:082016/10/21
收件者:
The mailbox library documentation seems to be a little weak. In this
example:

https://docs.python.org/2.7/library/mailbox.html#examples

import mailbox
for message in mailbox.mbox('~/mbox'):
subject = message['subject'] # Could possibly be None.
if subject and 'python' in subject.lower():
print subject

What is the structure of "message"? I guess it's a dictionary but what
is its structure? What are the keys? Is the structure constant or does
it change depending upon the content of the mbox?

I'm a bit new to python documentation. How would a potential user of the
"mailbox" library determine these things? Or did I miss something?

MRAB

未讀,
2016年10月21日 晚上11:23:102016/10/21
收件者:
The docs say that it's subclass of the email.message module's Message.

You can get the email's header fields like it's a dict, except that the
field names are case-insensitive. The author(s) of the module couldn't
use a true dict because of the need for additional functionality.

Ben Finney

未讀,
2016年10月21日 晚上11:45:572016/10/21
收件者:
Adam Jensen <han...@riseup.net> writes:

> import mailbox
> for message in mailbox.mbox('~/mbox'):
> subject = message['subject'] # Could possibly be None.
> if subject and 'python' in subject.lower():
> print subject
>
> What is the structure of "message"?

You're binding that name to each item from the collection returned from
‘mailbox.mbox’. So, let's look at the documentation for that function:

class mailbox.mbox(path, factory=None, create=True)

A subclass of ``Mailbox`` for mailboxes in mbox format. Parameter
`factory` is a callable object that accepts a file-like message
representation (which behaves as if opened in binary mode) and
returns a custom representation. If `factory` is ``None``,
``mboxMessage`` is used as the default message representation. […]

<URL:https://docs.python.org/2.7/library/mailbox.html#mailbox.mbox>

So the above usage doesn't specify a `factory` to create instances,
which means the instances will be instances of ``mboxMessage`` type.

> I guess it's a dictionary but what is its structure? What are the
> keys? Is the structure constant or does it change depending upon the
> content of the mbox?

>From the same documentation you can follow the link to the documentation
for ``mailbox.mboxMessage``. There you'll find it inherits from
``mailbox.Message``, which itself inherits ``email.message.Message``.

So each instance you're getting has (a superset of) the API of
``email.message.Message``, which is a lot of behaviour
<URL:https://docs.python.org/2.7/library/email.message.html#email.message.Message>
including being able to interrogate the message header for its fields,
by name, using the get-an-item ‘foo[bar]’ syntax.

(The message has exactly one header, the header has multiple fields. The
documentation and API erroneously refer to those fields as “headers”,
but that is a common confusion to a lot of software and the Python
standard library makes it too.)

> I'm a bit new to python documentation. How would a potential user of the
> "mailbox" library determine these things? Or did I miss something?

The library reference documentation must assume you at least know the
Python language, including that collections contain items, those items
are each objects, every object has a type, the types inherit in a tree,
etc.

So you'd need to read the in the knowledge that a return value's
behaviour is determined by its type, and that the behaviour of a type is
determined by other behaviour it inherits, etc.

--
\ “Some forms of reality are so horrible we refuse to face them, |
`\ unless we are trapped into it by comedy. To label any subject |
_o__) unsuitable for comedy is to admit defeat.” —Peter Sellers |
Ben Finney

dieter

未讀,
2016年10月22日 凌晨3:25:182016/10/22
收件者:
Adam Jensen <han...@riseup.net> writes:
> ...
> https://docs.python.org/2.7/library/mailbox.html#examples
>
> import mailbox
> for message in mailbox.mbox('~/mbox'):
> subject = message['subject'] # Could possibly be None.
> if subject and 'python' in subject.lower():
> print subject
>
> What is the structure of "message"? I guess it's a dictionary but what
> is its structure? What are the keys?

In addition to the previous (excellent) responses:

A "message" models a MIME (RFC1521 Multipurpose Internet Mail Extensions)
message (the international standard for the structure of emails).
The standard tells you that a message consists essentially of two
parts: a set of headers and a body and describes standard headers
and their intended meaning (e.g. "To", "From", "Subject", ...).
It allows a message to contain non-standard headers as well.

With this knowledge, your "keys" related question can be answered:
there is a (case insensitive) key for each header actually present
in your message. If the message contains several headers with
the same name, the subscription access gives you the first one;
there is an alternative method to access all of them.


andy

未讀,
2016年10月22日 清晨5:55:112016/10/22
收件者:
I would type: help(mailbox) after importing it.

best regards
andy

Adam Jensen

未讀,
2016年10月22日 晚上7:36:322016/10/22
收件者:
On 10/21/2016 11:45 PM, Ben Finney wrote:
> So each instance you're getting has (a superset of) the API of
> ``email.message.Message``, which is a lot of behaviour
> <URL:https://docs.python.org/2.7/library/email.message.html#email.message.Message>
> including being able to interrogate the message header for its fields,
> by name, using the get-an-item ‘foo[bar]’ syntax.

Thanks, the list of functions on that page made it much clearer.

Adam Jensen

未讀,
2016年10月22日 晚上7:41:552016/10/22
收件者:
On 10/22/2016 05:47 AM, andy wrote:
> I would type: help(mailbox) after importing it.

I guess the output of that might be more meaningful once I understand
the underlying structures and conventions.

Adam Jensen

未讀,
2016年10月22日 晚上7:49:392016/10/22
收件者:
On 10/22/2016 03:24 AM, dieter wrote:
> In addition to the previous (excellent) responses:
>
> A "message" models a MIME (RFC1521 Multipurpose Internet Mail Extensions)
> message (the international standard for the structure of emails).
> The standard tells you that a message consists essentially of two
> parts: a set of headers and a body and describes standard headers
> and their intended meaning (e.g. "To", "From", "Subject", ...).
> It allows a message to contain non-standard headers as well.
>
> With this knowledge, your "keys" related question can be answered:
> there is a (case insensitive) key for each header actually present
> in your message. If the message contains several headers with
> the same name, the subscription access gives you the first one;
> there is an alternative method to access all of them.

Thanks. I needed to search for emails to/from a specific person and
extract them from a [Google mail archive][1].

[1]: https://takeout.google.com/settings/takeout

This is my quick and dirty little one-shot script to get the job done.

search_mbox.py
--------------------------------------------------------------
#!/usr/bin/env python2.7
import mailbox
import sys
name = sys.argv[2].lower()
for message in mailbox.mbox(sys.argv[1]):
if message.has_key("From") and message.has_key("To"):
addrs = message.get_all("From")
addrs.extend(message.get_all("To"))
for addr in addrs:
addrl = addr.lower()
if addrl.find(name) > 0:
print message
break
--------------------------------------------------------------

Usage: ./search_mbox.py archive.mbox hanzer > hanzer.mbox

Adam Jensen

未讀,
2016年10月22日 晚上7:52:352016/10/22
收件者:
On 10/21/2016 11:22 PM, MRAB wrote:
> The docs say that it's subclass of the email.message module's Message.
>
> You can get the email's header fields like it's a dict, except that the
> field names are case-insensitive. The author(s) of the module couldn't
> use a true dict because of the need for additional functionality.

I've only looked at python once or twice in the last ten years, and I
jumped into a little project last night with almost no preparation.

Thanks for helping out!


andy

未讀,
2016年10月23日 凌晨3:39:512016/10/23
收件者:
yes - you are right. fortunatelly python autors have thought about
'documntation strings' and 'coding style', the syntax of python itself
helps reading source code (indentation). this allows using auto-
documentation features like help(...).

when i don't know enough about a module like 'mailbox' , i first try a
file search for the source code on the local system: i.e. 'locate
mailbox.py' on a linux system. possibly i have to install the module
first when there is nothing found (using pip or package manager).

this yields on my system ('sudo updatedb' - for updating the db) to this
result:

/usr/lib/python2.7/mailbox.py
/usr/lib/python2.7/mailbox.pyc
/usr/lib/python3.5/mailbox.py

i can read the source file with: 'less /usr/lib/python3.5/mailbox.py'.
within the sourcefile i can study the imports and data structures.

Other sources of information: doc.python.org - even with search:
https://docs.python.org/3/search.html?q=mailbox

and finally all these mail-modules should follow the RFCs ;-)
https://tools.ietf.org/html/rfc2822

best regards
andy

Jason Friedman

未讀,
2016年10月23日 下午4:44:162016/10/23
收件者:
>
> for message in mailbox.mbox(sys.argv[1]):
> if message.has_key("From") and message.has_key("To"):
> addrs = message.get_all("From")
> addrs.extend(message.get_all("To"))
> for addr in addrs:
> addrl = addr.lower()
> if addrl.find(name) > 0:
> print message
> break
> -------------------------------------------------------------


I usually see

if addrl.find(name) > 0:

written as

if name in addrl:

Jon Ribbens

未讀,
2016年10月23日 下午5:15:512016/10/23
收件者:
I suppose technically it would be:

iaddrf name in addrl[1:]:

Jon Ribbens

未讀,
2016年10月23日 下午5:16:232016/10/23
收件者:
s/iaddrf/if/ obviously!

Adam Jensen

未讀,
2016年10月24日 下午5:47:032016/10/24
收件者:
Yeah, that would be more consistent with the 'for addr in addrs'
construct. But I've never been able to take a serious look at Python, or
develop an understanding and consistent style. The catastrophic
battology[1] present in every book I've encountered is too much of an
obstacle.

[1]: http://grammar.about.com/od/ab/g/battologyterm.htm


0 則新訊息