Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to access HTML DOM/source of MIME part?

12 views
Skip to first unread message

Tim Landscheidt

unread,
Jun 11, 2020, 6:36:20 PM6/11/20
to info-gnu...@gnu.org
Hi,

I am subscribed to several newsletters that are sent as
multipart/alternative with one part being text/html that
contains (inter alia) a list of links. I want to write a
command to iterate over those links and prompt for each
whether to call browse-url on it.

Ideally, I would like to use the HTML DOM/source for
that. How can I access that?

(I wouldn't mind examples of tighter integration with shr
(marking (some) links at parsing, iterating visually over
them), but for starters, parsing the DOM (again) would be
enough for me.)

TIA,
Tim


Eric Abrahamsen

unread,
Jun 11, 2020, 7:28:25 PM6/11/20
to Tim Landscheidt, info-gnu...@gnu.org
Tim Landscheidt <t...@tim-landscheidt.de> writes:

> Hi,
>
> I am subscribed to several newsletters that are sent as
> multipart/alternative with one part being text/html that
> contains (inter alia) a list of links. I want to write a
> command to iterate over those links and prompt for each
> whether to call browse-url on it.

This command (if I understand your requirements correctly) is already in
Gnus master, as `gnus-summary-browse-url'. Look for that or, if you're
running an older Emacs, check out here:

https://git.savannah.gnu.org/cgit/emacs.git/tree/lisp/gnus/gnus-sum.el#n9507

Tim Landscheidt

unread,
Jun 11, 2020, 8:04:33 PM6/11/20
to Eric Abrahamsen, info-gnu...@gnu.org
Thanks; AFAICT, my requirements cannot be met by that.

The newsletters I'm thinking about typically have links such
as:

| Header_Link_A
| Header_Link_B

| Item_1_Link_A Item_1_Link_B Item_1_Link_C

| Item_2_Link_A Item_2_Link_B Item_2_Link_C

| Item_3_Link_A Item_3_Link_B Item_3_Link_C

| […]

| Footer_Link_A
| Footer_Link_B
| Footer_Link_C

I want to iterate (only) over Item_1_Link_B, Item_2_Link_B,
Item_3_Link_B, etc.

*But* your pointer gave me the idea that I could iterate
over shr's buttons like gnus-collect-urls does, test if
their URLs match Item_x_Link_B's typical pattern and then
offer to browse them. This would require that
Item_x_Link_B's pattern is (relatively) stable; I have to
check whether that will work reasonably well. Thanks!

Tim

Tim Landscheidt

unread,
Jun 16, 2020, 11:34:39 AM6/16/20
to Eric Abrahamsen, info-gnu...@gnu.org
I wrote:

> […]

> *But* your pointer gave me the idea that I could iterate
> over shr's buttons like gnus-collect-urls does, test if
> their URLs match Item_x_Link_B's typical pattern and then
> offer to browse them. This would require that
> Item_x_Link_B's pattern is (relatively) stable; I have to
> check whether that will work reasonably well. Thanks!

It's not that simple :-(. For starters, some of my newslet-
ters shorten all URLs, putting them into the same format.

But more importantly, with Emacs 26.3, (forward-button 1) in
an HTML mail will always move point to the beginning of the
*Article* buffer because (button-start button) returns
(point-min) for some reason. (In my use case, I can proba-
bly work around that by calling next-button directly.)

Is this a bug? What is the best way to create a minimal re-
producible example?

Tim

Eric Abrahamsen

unread,
Jun 16, 2020, 2:39:25 PM6/16/20
to info-gnu...@gnu.org
I know it's not helpful, but quick testing with Emacs master (28) seems
to work fine. If I display the html part of an article, move point into
the article buffer, and run (forward-button 1), point moves correctly to
the first button.

Something to be aware of is that, sometime not too long ago, Lars
re-implemented links in article bodies using widgets instead of buttons.
TBH I don't really know what that means, or what the implications are,
but it's probably good to know.

Eric


Lars Ingebrigtsen

unread,
Jun 16, 2020, 3:31:48 PM6/16/20
to Eric Abrahamsen, info-gnu...@gnu.org
Eric Abrahamsen <er...@ericabrahamsen.net> writes:

> Something to be aware of is that, sometime not too long ago, Lars
> re-implemented links in article bodies using widgets instead of buttons.

The other way around -- they used to be widgets, and they're now
buttons.

--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no

Eric Abrahamsen

unread,
Jun 16, 2020, 3:43:47 PM6/16/20
to info-gnu...@gnu.org
Lars Ingebrigtsen <la...@gnus.org> writes:

> Eric Abrahamsen <er...@ericabrahamsen.net> writes:
>
>> Something to be aware of is that, sometime not too long ago, Lars
>> re-implemented links in article bodies using widgets instead of buttons.
>
> The other way around -- they used to be widgets, and they're now
> buttons.

Sure enough, I didn't really know what I was talking about :) But at
least that points to a likely source of Tim's problem: he's trying to
use button commands in a version of Emacs/Gnus that is still using
widgets?


Tim Landscheidt

unread,
Jun 17, 2020, 10:50:41 PM6/17/20
to Eric Abrahamsen, info-gnu...@gnu.org
Eric Abrahamsen <er...@ericabrahamsen.net> wrote:

>>> Something to be aware of is that, sometime not too long ago, Lars
>>> re-implemented links in article bodies using widgets instead of buttons.

>> The other way around -- they used to be widgets, and they're now
>> buttons.

> Sure enough, I didn't really know what I was talking about :) But at
> least that points to a likely source of Tim's problem: he's trying to
> use button commands in a version of Emacs/Gnus that is still using
> widgets?

Maybe :-). Anyway, I found a solution for one of my news-
letters that states the number of entries in its subject and
then has two links per entry with one link containing one
information and the other link containing another informa-
tion and additional (older) entries following that:

| (let
| ((subject (gnus-summary-article-subject)))
| (if (string-match "^\\([0-9]+\\) new entries$" subject)
| (let
| ((number-of-entries-todo (string-to-number (match-string 1 subject))))
| (gnus-with-article-buffer
| (article-goto-body)
| (let
| ((article-body-start (point))
| last-field1
| last-url)
| (while (> number-of-entries-todo 0)
| (widget-forward 1)
| (if (< (point) article-body-start)
| (error "Moved past the wrap!"))
| (let
| ((url-at-point (button-get (button-at (point)) 'shr-url))
| (widget-label (let
| ((widget-properties (cdr (widget-at))))
| (buffer-substring-no-properties (plist-get widget-properties :from) (plist-get widget-properties :to)))))
| (when (string-match "^https://domain.com/some-prefix/" url-at-point)
| (if (not (string= last-url url-at-point))
| (setq last-field1 widget-label
| last-url url-at-point)
| (setq number-of-entries-todo (- number-of-entries-todo 1))
| (if (y-or-n-p (format "Browse %s (%s)? " widget-label last-field1))
| (browse-url url-at-point)))))))))))

Now:

a) The formula for widget-label feels way too complicated,
but I did not find a predefined function for that pur-
pose. Did I miss something?

b) I use this code as part of gnus-select-article-hook.
widget-forward does move point internally, but does not
update/recenter the display. Is this due to
gnus-with-article-buffer? What is the best way to make
the *Article* buffer follow point's movement?

TIA,
Tim

Lars Ingebrigtsen

unread,
Jun 26, 2020, 5:30:58 AM6/26/20
to Tim Landscheidt, Eric Abrahamsen, info-gnu...@gnu.org
Tim Landscheidt <t...@tim-landscheidt.de> writes:

> b) I use this code as part of gnus-select-article-hook.
> widget-forward does move point internally, but does not
> update/recenter the display. Is this due to
> gnus-with-article-buffer? What is the best way to make
> the *Article* buffer follow point's movement?

Yes, you should probably set the point with set-window-point or
something like that...
0 new messages