Many a time I have been saved by the HTML parsing prowess of BeautifulSoup. When it comes to beautifying HTML dumps for easier analysis it is king; however I have always been bothered by its unorthodox choice of indentation width (just one space per level). So the other day I took the time to hack configurable indent width into the code for version 3.2.0.
; so long you're using version 3.2.0, you can just paste it over the original file at your installation. For details on the changes see the diff below:
***************
*** 544 ****
--- 545,547 ----
+ # Adjustable indentation patch
+ self.indentWidth = parser.indentWidth
+
***************
*** 752 ****
! space = (' ' * (indentTag-1))
--- 755 ----
! space = (self.indentWidth * (indentTag-1))
***************
*** 813 ****
! s.append(" " * (indentLevel-1))
--- 816 ----
! s.append(self.indentWidth * (indentLevel-1))
***************
*** 1080,1082 ****
! def __init__(self, markup="", parseOnlyThese=None, fromEncoding=None,
! markupMassage=True, smartQuotesTo=XML_ENTITIES,
! convertEntities=None, selfClosingTags=None, isHTML=False):
--- 1083,1088 ----
! def __init__(
! self, markup="", parseOnlyThese=None, fromEncoding=None,
! markupMassage=True, smartQuotesTo=XML_ENTITIES,
! convertEntities=None, selfClosingTags=None, isHTML=False,
! indentWidth=' '
! ):
***************
*** 1111 ****
--- 1118,1121 ----
+
+ # Adjustable indentation patch
+ self.indentWidth = indentWidth
+
Now setting a custom indent width is as simple as doing:
soup = BeautifulSoup(html_string.encode('utf-8'), indentWidth=' ')
beautiful_html_string = soup.prettify()