[ANN] Splitflap: generating valid Atom and RSS feeds

89 views
Skip to first unread message

Joel Dueck

unread,
Oct 25, 2021, 2:54:14 PM10/25/21
to Racket Users
This is a beta release of splitflap, a Racket library for generating valid Atom and RSS feeds, including podcast feeds.
The docs are substantially complete but I’m still working on them! Feedback welcome.

Joel

Sage Gerard

unread,
Oct 25, 2021, 7:36:30 PM10/25/21
to racket...@googlegroups.com

Thank you for this!!

Feedback

  • I like your podcast-specific entries
  • The validation logic is refreshing to see
  • Re: boolean arguments, I'd stick to keyword arguments and ask for any/c, not boolean?, in contracts. That way forms like (and ... (member ...)) won't bug users about a non-threatening contract violation, and it's trivial to cast the value yourself.
  • Unsure what licenses are compatible with Blue Oak. If you want more licensing options re: IANA media type to extension mappings, here are some.
  • I normally don't use functions like splitflap-version because I can't assume that a package will define one. I'd use a program that returns a version of a given package.
  • Why is language-codes a procedure?
  • You have a lot of local contract boundaries, so values may get checked more than necessary.
  • Prefer example.com so you don't have to leak your URLs or make up email addresses that actually go to an inbox.
  • txexpr, gregor, and web-server dependencies don't look terribly difficult to remove
--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/55a4e6de-1e08-4219-9675-31e5c598a38dn%40googlegroups.com.

Joel Dueck

unread,
Oct 25, 2021, 10:25:54 PM10/25/21
to Racket Users
Great feedback, thank you. I like all your suggestions.
  • Boolean arguments: great point, will do
  • MIME types: Yes, I should use/add a more complete extension→type mapping, though I probably will continue not to validate MIME types against the IANA list. (My somewhat erroneous note in the docs notwithstanding, it turns out having a non-IANA MIME type or a valid but mismatched type in an enclosure doesn’t actually cause feed validation errors.)
  • language-codes: yes this should be a value, not a procedure. Will change it.
  • Contract boundaries: yes! switching to contract-out is on my list
  • Removing dependencies: yes, I see the appeal. I’m really not eager to reimplement all the timezone handling and temporal comparison stuff in gregor, though.
Joel

Philip McGrath

unread,
Oct 26, 2021, 7:51:56 AM10/26/21
to Joel Dueck, Racket Users
Excited to try this! Generating Atom and RSS feeds is on the to-do list for one of my current projects.

On Mon, Oct 25, 2021 at 10:25 PM 'Joel Dueck' wrote:
  • MIME types: Yes, I should use/add a more complete extension→type mapping, though I probably will continue not to validate MIME types against the IANA list. (My somewhat erroneous note in the docs notwithstanding, it turns out having a non-IANA MIME type or a valid but mismatched type in an enclosure doesn’t actually cause feed validation errors.)

Since you have a dependency on `web-server-lib`, instead of shipping your own "mime.types", you can write:

(define-runtime-path mime.types
  '(lib "web-server/default-web-root/mime.types"))

to use the one it ships. It seems generally useful enough that maybe it should be split into a package like "web-server-mime-types-lib". I've been meaning for years to improve the file, and the general `make-path->mime-types` functionality, with some upstream database, perhaps the fairly comprehensive one at https://github.com/jshttp/mime-db (which pools data from Apache, Nginx, and IANA, and is used by e.g. Github Pages).
 
  • language-codes: yes this should be a value, not a procedure. Will change it.
`system-language` should use `delay/sync` to be safe for concurrent access.

I'm not totally clear about all of the different sets of requirements (RSS, Atom, and, de facto, Apple), but I thought there were more language codes permitted than ISO 639-1 (e.g. https://www.rssboard.org/rss-language-codes points to ISO 639-2, and https://validator.w3.org/feed/docs/rfc4287.html#rfc.section.4.2.7.4 for Atom points to RFC 3066. These standards also allow for the assignment of new codes (and, at least for ISO 639-3, deprecation). I hope the right set of codes might be in the one of the CLDR packages (also used by Gregor): if so, I'd recommend getting it from there.
 
  • Removing dependencies: yes, I see the appeal. I’m really not eager to reimplement all the timezone handling and temporal comparison stuff in gregor, though.
Please keep depending on Gregor! I think it's one of the treasures of the Racket library, and we should all just use it, as even the documentation for `racket/date` suggests, rather than create any more, as Greenspun might put it, "ad hoc, informally-specified, bug-ridden, slow implementation[s] of half of" Gregor.

On a different topic, for the XML stuff, is there a requirement that embedded HTML be represented with the CDATA lexical syntax? Under normal XML rules, this xexpr:

`(content
  ,(cdata #f #f "<![CDATA[<div><p>Hi & < ></p></div>]]>"))

should be semantically equivalent to this one:

'(content "<div><p>Hi & < ></p></div>")

which would generate the XML concrete syntax:

<content>&lt;div&gt;&lt;p&gt;Hi &amp; &lt; &gt;&lt;/p&gt;&lt;/div&gt;</content>

This has the advantage of avoiding prohibition on `]]>` within CDATA concrete syntax, and it lets everyone manipulating these feeds in Racket avoid the need to add and remove "<!CDATA[" and "]]>" from the string inside the CDATA struct. (Tangentially, AIUI the convention is to use `#f` for the start and stop fields when creating cdata and p-i structures in code, though apparently the docs for `source` say something about symbols.)

Regardless, rather than using an ad-hoc encoding scheme for the entities Apple has odd rules about, you can just replace them with symbols or `valid-char?`s and let the library take care of everything. Well, my example code for that has grown complete enough that I'll just make a PR shortly :)

-Philip

Joel Dueck

unread,
Oct 26, 2021, 10:49:25 AM10/26/21
to Racket Users
On Tuesday, October 26, 2021 at 6:51:56 AM UTC-5 Philip McGrath wrote:
I'm not totally clear about all of the different sets of requirements (RSS, Atom, and, de facto, Apple), but I thought there were more language codes permitted than ISO 639-1 (e.g. https://www.rssboard.org/rss-language-codes points to ISO 639-2, and https://validator.w3.org/feed/docs/rfc4287.html#rfc.section.4.2.7.4 for Atom points to RFC 3066. These standards also allow for the assignment of new codes (and, at least for ISO 639-3, deprecation). I hope the right set of codes might be in the one of the CLDR packages (also used by Gregor): if so, I'd recommend getting it from there.

We could probably open it up to more codes for generic feeds, for sure. Podcast feeds are limited to ISO 639-1 by Apple. Also, system language detection would probably always be limited to ISO 639-1 for the foreseeable future, unless I find out that my existing method might encounter (and mis-handle) codes from other lists in some circumstances.
 
On a different topic, for the XML stuff, is there a requirement that embedded HTML be represented with the CDATA lexical syntax?

I’m using CDATA for the traditional reason: it allowed me to punt on validating the internal content. If I didn’t use CDATA, I’d probably want to start handling strings and tagged xexprs differently. Strings would go in as `<content type="text">` and an exception should probably be raised if it can be determined (how?) that the string is actually a string of HTML. Tagged X-exprs would go in as `<content type="html">` with escaped HTML as you suggest. Or perhaps only tagged x-expressions should be allowed. Or perhaps strings should be coerced to a txexpr (by, e.g. putting them inside a 'div).
 
everyone manipulating these feeds in Racket

Although I make this possible, the design intent is that once you put stuff into a food-like struct, that’s the last step before generating the final feed (thus keeping all the guarantees of validation intact). I would hope that content in particular would not need more manipulation between the creation of a feed-item struct and the final output.
 
(Tangentially, AIUI the convention is to use `#f` for the start and stop fields when creating cdata and p-i structures in code, though apparently the docs for `source` say something about symbols.)

Indeed, since the structures returned by xexpr->xml use 'racket for those fields, I though mine ought to match.
 
rather than using an ad-hoc encoding scheme for the entities Apple has odd rules about, you can just replace them with symbols or `valid-char?`s and let the library take care of everything. Well, my example code for that has grown complete enough that I'll just make a PR shortly :)

Sounds good! Just bear in mind that Apple is not only picky about the characters it wants replaced but also about what you replace them with. E.g. &#xA9; and not &copy; for the copyright symbol.

David Storrs

unread,
Oct 26, 2021, 11:25:15 AM10/26/21
to Joel Dueck, Racket Users
On Mon, Oct 25, 2021 at 10:25 PM 'Joel Dueck' via Racket Users <racket...@googlegroups.com> wrote:

  • Removing dependencies: yes, I see the appeal. I’m really not eager to reimplement all the timezone handling and temporal comparison stuff in gregor, though.
Joel

Having done a fair bit of datetime programming, my suggestion on the best way to handle it is to not handle it.  Let some purpose-built library such as gregor do it  instead of trying to roll your own, because datetime math is a nightmare.


On Monday, October 25, 2021 at 6:36:30 PM UTC-5 Sage Gerard wrote:

Thank you for this!!

Feedback

  • I like your podcast-specific entries
  • The validation logic is refreshing to see
  • Re: boolean arguments, I'd stick to keyword arguments and ask for any/c, not boolean?, in contracts. That way forms like (and ... (member ...)) won't bug users about a non-threatening contract violation, and it's trivial to cast the value yourself.
  • Unsure what licenses are compatible with Blue Oak. If you want more licensing options re: IANA media type to extension mappings, here are some.
  • I normally don't use functions like splitflap-version because I can't assume that a package will define one. I'd use a program that returns a version of a given package.
  • Why is language-codes a procedure?
  • You have a lot of local contract boundaries, so values may get checked more than necessary.
  • Prefer example.com so you don't have to leak your URLs or make up email addresses that actually go to an inbox.
  • txexpr, gregor, and web-server dependencies don't look terribly difficult to remove

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.

Sage Gerard

unread,
Oct 26, 2021, 12:01:38 PM10/26/21
to racket...@googlegroups.com

I can understand wanting gregor for timezone offsets when constructing moments, but...

  • Assuming I have the right repository link, gregor's tz/c contract is only (or/c string? (integer-in -64800 64800)) [1]. I can set the feed-timezone parameter in Splitflap to an arbitrary string and the guard won't stop me.
  • The IANA's timezone database changed this month, and gregor's last commit was 2 years ago.

My comment was not meant to say that timezone math is easy to replace, or even that gregor isn't a fit. It's to say  that I'm not seeing correct answers without a name lookup in front of tz/c, and the latest data from the IANA.

But if you were going to do all that in the first place, then I'm not sure what I'd use gregor for outside of relative arithmetic.

[1]: https://github.com/97jaz/gregor/blob/91d71c6082fec4197aaf9ade57aceb148116c11c/gregor-lib/gregor/private/moment.rkt#L91

Philip McGrath

unread,
Oct 26, 2021, 12:30:21 PM10/26/21
to Sage Gerard, Racket Users

On Tue, Oct 26, 2021 at 12:01 PM Sage Gerard <sa...@sagegerard.com> wrote:
  • The IANA's timezone database changed this month, and gregor's last commit was 2 years ago.

My comment was not meant to say that timezone math is easy to replace, or even that gregor isn't a fit. It's to say  that I'm not seeing correct answers without a name lookup in front of tz/c, and the latest data from the IANA.

The timezone database lookup logic is in the `tzinfo` package (https://docs.racket-lang.org/tzinfo/index.html), last updated two months ago: https://github.com/97jaz/tzinfo Further, on Unix and Mac, it simply consults the OS-provided IANA timezone database. (It does look like the `tzdata` package that ships the database for Windows or other cases when you don't rely on the OS could use a pull request, though: https://github.com/97jaz/tzdata)

On Tue, Oct 26, 2021 at 12:01 PM Sage Gerard <sa...@sagegerard.com> wrote:
  • Assuming I have the right repository link, gregor's tz/c contract is only (or/c string? (integer-in -64800 64800)) [1]. I can set the feed-timezone parameter in Splitflap to an arbitrary string and the guard won't stop me.
I guess the check doesn't happen as part of `tz/c`, but I can tell you that this program:

#lang racket
(require gregor)
(now/moment #:tz "Nowhere/Middle")

raises an exception saying, "Cannot find zoneinfo file for [Nowhere/Middle]".

-Philip

Jon Zeppieri

unread,
Oct 26, 2021, 12:30:30 PM10/26/21
to Sage Gerard, racket users list
On Tue, Oct 26, 2021 at 12:01 PM Sage Gerard <sa...@sagegerard.com> wrote:
>
> I can understand wanting gregor for timezone offsets when constructing moments, but...
>
> Assuming I have the right repository link, gregor's tz/c contract is only (or/c string? (integer-in -64800 64800)) [1]. I can set the feed-timezone parameter in Splitflap to an arbitrary string and the guard won't stop me.

I'm guessing you haven't actually tried this:

```
> (moment 2000 #:tz "arbitrary string")
. . Library/Racket/8.0/pkgs/tzinfo/tzinfo/private/tzfile-parser.rkt:21:0:
Cannot find zoneinfo file for [arbitrary string]
```

> The IANA's timezone database changed this month, and gregor's last commit was 2 years ago.

Gregor's repo doesn't contain the IANA tzdb and prefers to rely on the
system's zoneinfo files. Every contemporary Unix (including MacOS)
ships with this data and updates it with OS updates. Windows is a
different story (though I know that in recent years, parts of the
Windows ecosystem works with IANA zones, so maybe those files exist
somewhere?). You're right that the tzdata package has old data. It
would probably make sense for someone who runs Windows to maintain it.
It comes with a script that can update the package for a new version
of the tzdb. I'll do that right now, in fact. Thanks for reminding me.

- Jon

Joel Dueck

unread,
Oct 26, 2021, 12:39:18 PM10/26/21
to Racket Users
On Tuesday, October 26, 2021 at 11:01:38 AM UTC-5 Sage Gerard wrote:
  • Assuming I have the right repository link, gregor's tz/c contract is only (or/c string? (integer-in -64800 64800)) [1]. I can set the feed-timezone parameter in Splitflap to an arbitrary string and the guard won't stop me.
Yep — I left feed-timezone out of the docs because I plan to remove it. Unless I'm missing something? in the end I think it's redundant to tzinfo's current-timezone parameter.

Sage Gerard

unread,
Oct 26, 2021, 1:30:59 PM10/26/21
to racket...@googlegroups.com

> The timezone database lookup logic is in the `tzinfo` package (https://docs.racket-lang.org/tzinfo/index.html)

Thanks.

> Jon: I'm guessing you haven't actually tried this
> Phillip: I guess the check doesn't happen as part of `tz/c`, but I can tell you that this program

Yes, but I'm talking about code we were asked to give feedback on. I focus on `tz/c` because it is documented as a flat contract that checks for "an identifier from the IANA tz database", but it does not parse the timezone name to check correctness.

My feedback says no validation occurs for the timezone name in a parameter for Splitflap. Joel indicated that parameter will go away below, and I'm glad to know of the tzinfo package. But if a limitation in gregor's contracts would oblige you to use tzinfo for validation, then I'd want to know that so that I can assess how much of gregor I really need. It still seems like the timezone data is the hard part, so use a timezone dependency instead of a dependency that misleads the user into incomplete validation.

--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.

Philip McGrath

unread,
Oct 26, 2021, 2:14:15 PM10/26/21
to Sage Gerard, Racket Users
On Tue, Oct 26, 2021 at 1:30 PM Sage Gerard <sa...@sagegerard.com> wrote:

> Jon: I'm guessing you haven't actually tried this
> Phillip: I guess the check doesn't happen as part of `tz/c`, but I can tell you that this program

Yes, but I'm talking about code we were asked to give feedback on. I focus on `tz/c` because it is documented as a flat contract that checks for "an identifier from the IANA tz database", but it does not parse the timezone name to check correctness.

I agree that I would have expected `tz/c` to consult the IANA database.

My feedback says no validation occurs for the timezone name in a parameter for Splitflap. Joel indicated that parameter will go away below, and I'm glad to know of the tzinfo package.

Ah, the undocumented `feed-timezone` parameter is not what I had in mind: I've been considering functions like `feed-item` and `episode`, which take their created and updated timestamps as `moment?`s—I think any Racket function that needs to operate on timestamps with timezones ought to operate on `moment?`s (or perhaps `moment-provider?`s). Gregor does ensure that `moment?`s are valid.

-Philip

Joel Dueck

unread,
Oct 26, 2021, 2:18:22 PM10/26/21
to Racket Users
On Tuesday, October 26, 2021 at 12:30:59 PM UTC-5 Sage Gerard wrote:

Yes, but I'm talking about code we were asked to give feedback on. I focus on `tz/c` because it is documented as a flat contract that checks for "an identifier from the IANA tz database", but it does not parse the timezone name to check correctness.

My feedback says no validation occurs for the timezone name in a parameter for Splitflap. Joel indicated that parameter will go away below, and I'm glad to know of the tzinfo package. But if a limitation in gregor's contracts would oblige you to use tzinfo for validation, then I'd want to know that so that I can assess how much of gregor I really need. It still seems like the timezone data is the hard part, so use a timezone dependency instead of a dependency that misleads the user into incomplete validation

It does seem odd that tz/c uses string? instead of tzid-exists? I’m wondering if that could be changed without breaking a lot of stuff. If not, then it *might* be worth keeping my own feed-timezone parameter that allows only (integer-in -64800 64800). On the other hand, it is also true that if an invalid time zone is supplied anywhere along the way to building the feed data, an exception is going to occur before the feed is generated, which is what I care about for the most part.

In general I appreciate feedback like Sage’s from people who think a lot more carefully than I do about dependencies. I like knowing that if someone has differing time zones for different items within a feed, or cares about gap/overlap resolution, etc, I can let them use gregor to handle it. It's not something I ever encountered in building CMSs or publishing podcasts, but also you never know what a feed will be used for. I will probably experiment with reducing the dependency down to tzinfo/tzdata and using Racket’s native date structs.


Jon Zeppieri

unread,
Oct 26, 2021, 2:39:21 PM10/26/21
to Joel Dueck, Racket Users
On Tue, Oct 26, 2021 at 2:18 PM 'Joel Dueck' via Racket Users
<racket...@googlegroups.com> wrote:
>
>
>
> On Tuesday, October 26, 2021 at 12:30:59 PM UTC-5 Sage Gerard wrote:
>>
>> Yes, but I'm talking about code we were asked to give feedback on. I focus on `tz/c` because it is documented as a flat contract that checks for "an identifier from the IANA tz database", but it does not parse the timezone name to check correctness.
>>
>> My feedback says no validation occurs for the timezone name in a parameter for Splitflap. Joel indicated that parameter will go away below, and I'm glad to know of the tzinfo package. But if a limitation in gregor's contracts would oblige you to use tzinfo for validation, then I'd want to know that so that I can assess how much of gregor I really need. It still seems like the timezone data is the hard part, so use a timezone dependency instead of a dependency that misleads the user into incomplete validation
>
> It does seem odd that tz/c uses string? instead of tzid-exists? I’m wondering if that could be changed without breaking a lot of stuff. If not, then it *might* be worth keeping my own feed-timezone parameter that allows only (integer-in -64800 64800). On the other hand, it is also true that if an invalid time zone is supplied anywhere along the way to building the feed data, an exception is going to occur before the feed is generated, which is what I care about for the most part.

I agree, and I'm the one who wrote `tz/c` the way it is. Go figure. As
you pointed out, the issue with changing it now is backwards
compatibility. Anyhow, I'm definitely open to suggestions.

>
> In general I appreciate feedback like Sage’s from people who think a lot more carefully than I do about dependencies. I like knowing that if someone has differing time zones for different items within a feed, or cares about gap/overlap resolution, etc, I can let them use gregor to handle it. It's not something I ever encountered in building CMSs or publishing podcasts, but also you never know what a feed will be used for. I will probably experiment with reducing the dependency down to tzinfo/tzdata and using Racket’s native date structs.

[Disclaimer: since I am the author of `gregor`, you should definitely
take into account that I'm biased -- though, maybe not as much as
you'd expect. I'm aware of a bunch of design mistakes I made in
gregor.]

To the extent that validation is a concern, gregor is (despite the
`tz/c` issue) much better, on the whole, than racket/base's `date` and
`date*` structs, which will happily let you construct things like "the
31st of February." And yes, this needs to be balanced against the cost
of taking on an additional dependency.

- Jon

Joel Dueck

unread,
Oct 27, 2021, 10:05:13 AM10/27/21
to Racket Users
On Tuesday, October 26, 2021 at 1:39:21 PM UTC-5 zepp...@gmail.com wrote:
To the extent that validation is a concern, gregor is (despite the
`tz/c` issue) much better, on the whole, than racket/base's `date` and
`date*` structs, which will happily let you construct things like "the
31st of February."

I fully agree with that. ...Didn't you mention on Slack while back that you had a replacement for gregor on a shelf somewhere? ;)

Jon Zeppieri

unread,
Oct 27, 2021, 10:20:47 AM10/27/21
to Joel Dueck, Racket Users
https://github.com/97jaz/datetime-lib
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/77f80d13-2aa3-4b88-b175-6c39f2ce2ef7n%40googlegroups.com.

Joel Dueck

unread,
Oct 31, 2021, 11:48:59 AM10/31/21
to Racket Users
I’ll be working on the easy items from this thread in this PR: https://github.com/otherjoel/splitflap/pull/3
Reply all
Reply to author
Forward
0 new messages