I have started using the Public Suffix List for my own web application, as a way to reliably parse a hostname into its parts (subdomains, private domain, public suffix). In order to accomplish this, I parse the Public Suffix List and transform it into a Tree-like JSON object.
The structure is simple: The root object contains keys corresponding to the Top Level Domains found in the Public Suffix List. Each of these keys leads to another object containing all of the Second Level Domains that are found under that TLD, and so on and so forth.
Wildcards entries are registered under the key '*'. Don't register any child nodes under a wildcard node.
For an exception, like "!except", register the domain label without the exclamation ("except"), but also add a special property to that node to mark it as an exception (`@exception: true`). Also don't register any child nodes under an exception node.
For a node that corresponds with the left-most domain in a rule, you can likewise add a special property to demarcate it (`@leaf: true`). All wildcard and exception entries will also be `@leaf` nodes.
Once this transformation is done, it becomes trivial to lookup the public suffix of a hostname. Simply break the hostname into its parts (lowercase), based upon the delimiter (`.`), and then traverse the Public Suffix Tree as far as possible using these parts starting from the TLD. As you traverse, keep track of the last `@leaf` node encountered and its depth.
If you have matched as far as possible and the current @leaf node is not an exception, check to see if there is a wildcard '*' entry under that leaf node. If so, update your results accordingly. In either case, you now know the depth of the public suffix for the hostname in question. The private domain is at that that depth + 1, and any subdomains are found at any depth greater than that.
I have not had a need to keep track of comments for my application, but could probably work in the comments into the structure as well by adding a @comment attribute to leaf nodes, or perhaps a @commentId and keeping a separate hash or array of comments that can be used to lookup the corresponding comment contents.
-----------------------------
Now, the nice thing about this Public Suffix Tree structure is not only that it can easily be used to parse the public suffix of a hostname, but it can also be readily serialized and deserialized as a simple, standard JSON object. This means that the Public Suffix List could be pre-processed into the Public Suffix Tree structure and be made available for download the same way that the Public Suffix List is now. And since modern programming languages have built-in support for parsing JSON, this makes it that much easier to make use of the Public Suffix Tree.
I would like to propose that we perform such pre-processing and make the Public Suffix Tree .json document available for download along-side the existing Public Suffix List .dat document.