a small worked example

123 views
Skip to first unread message

Paul Becker

unread,
Feb 26, 2026, 12:22:02 PMFeb 26
to software-design-book

Here are two implementations in python of a program which takes a markdown file as input, and outputs the same file but the header lines have each been prefixed with a section number. There is a sample markdown file included at the end which can be used as input.

Which implementation is simpler? Why?

Any other comments?


# implementation 1
import re

def process_markdown(infile, outfile):
    numbers = []

    for line in infile:
        m = re.match(r'(#+)', line)
        if m:
            level = len(m.group(1))

            while len(numbers) > level:
                numbers.pop()
            while len(numbers) < level:
                numbers.append(0)
            numbers[level - 1] += 1
            prefix = '.'.join(map(str, numbers))

            line = line[:level] + prefix + line[level:]

        outfile.write(line)

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)



# implementation 2
import re

def process_markdown(infile, outfile):
    sections = NumberedSections()

    for line in infile:
        header = MarkdownHeader.parse(line)
        if header:
            number = sections.next_at_level(header.level)
            line = header.with_prefix(number)

        outfile.write(line)

class MarkdownHeader:
    @classmethod
    def parse(cls, line):
        m = re.match(r'(#+)', line)
        if not m:
            return None

        level = len(m.group(1))
        return cls(level, line)

    def __init__(self, level, line):
        self.level = level
        self.line = line

    def with_prefix(self, prefix):
        return self.line[:self.level] + prefix + self.line[self.level:]

class NumberedSections:
    def __init__(self):
        self.numbers = []

    def next_at_level(self, level):
        assert level >= 1
        while len(self.numbers) > level:
            self.numbers.pop()
        while len(self.numbers) < level:
            self.numbers.append(0)
        self.numbers[level - 1] += 1

        return '.'.join(map(str, self.numbers))

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)



<sample markdown file>

**Computer programming**

adapted from [wikipedia](https://en.wikipedia.org/wiki/Computer_programming)

Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks.

# History
Programmable devices have existed for centuries.

## Machine language
Machine code was the language of early programs, written in the instruction set of the particular machine, often in binary notation.

## Compiler languages
High-level languages made the process of developing a program simpler and more understandable, and less bound to the underlying hardware.

## Source code entry
Programs were mostly entered using punched cards or paper tape.

# Modern programming

## Quality requirements
Whatever the approach to development may be, the final program must satisfy some fundamental properties.

## Readability of source code
In computer programming, readability refers to the ease with which a human reader can comprehend the purpose, control flow, and operation of source code.

## Algorithmic complexity
The academic field and the engineering practice of computer programming are concerned with discovering and implementing the most efficient algorithms for a given class of problems.

## Methodologies
The first step in most formal software development processes is requirements analysis, followed by testing to determine value modeling, implementation, and failure elimination (debugging).

## Measuring language usage
It is very difficult to determine what are the most popular modern programming languages.

## Debugging
Debugging is a very important task in the software development process since having defects in a program can have significant consequences for its users.

# Programming languages
Different programming languages support different styles of programming (called programming paradigms).

# Learning to program
Learning to program has a long history related to professional standards and practices, academic initiatives and curriculum, and commercial books and materials for students, self-taught learners, hobbyists, and others who desire to create or customize software for personal use.

## Context
In 1957, there were approximately 15,000 computer programmers employed in the U.S., a figure that accounts for 80% of the world's active developers.

## Technical publishers
As personal computers became mass-market products, thousands of trade books and magazines sought to teach professional, hobbyist, and casual users to write computer programs.

## Digital learning / online resources
Between 2000 and 2010, computer book and magazine publishers declined significantly as providers of programming instruction, as programmers moved to Internet resources to expand their access to information.

# Programmers
Computer programmers are those who write computer software.

Shreevatsa R

unread,
Feb 26, 2026, 5:25:28 PMFeb 26
to Paul Becker, software-design-book
On Thu, 26 Feb 2026 at 09:22, Paul Becker <rainc...@gmail.com> wrote:

Here are two implementations in python of a program which takes a markdown file as input, and outputs the same file but the header lines have each been prefixed with a section number. There is a sample markdown file included at the end which can be used as input.

Which implementation is simpler? Why?

Any other comments?


Great example, thanks :)

I think the Clean Code / "Uncle Bob" proponents would pick Implementation 2, focusing on "readability" in the small (the `process_markdown` function being 10 lines instead of 18), oblivious to the cost to the maintainer of having to understand three interfaces (the process_markdown function, and the classes MarkdownHeader and NumberedSections) instead of one.

One could avoid this cost by moving the classes inside the function (as Python allows), say Implementation 2a:

# Implementation 2a
import re

def process_markdown(infile, outfile):

    class MarkdownHeader:
        @classmethod
        def parse(cls, line):
            m = re.match(r'(#+)', line)
            if not m:
                return None
            return cls(level=len(m.group(1)), line=line)


        def __init__(self, level, line):
            self.level = level
            self.line = line

        def with_prefix(self, prefix):
            return self.line[:self.level] + prefix + self.line[self.level:]

    class NumberedSections:
        def __init__(self):
            self.counters = []


        def next_at_level(self, level):
            assert level >= 1
            while len(self.counters) > level:
                self.counters.pop()
            while len(self.counters) < level:
                self.counters.append(0)
            self.counters[level - 1] += 1
            return '.'.join(map(str, self.counters))


    sections = NumberedSections()

    for line in infile:
        header = MarkdownHeader.parse(line)
        if header:
            number = sections.next_at_level(header.level)
            line = header.with_prefix(number)

        outfile.write(line)

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)

 Now process_markdown is 39 lines long, and it's no longer clear it's more readable. Contrast with the following Implementation 1a (which is Implementation 1 with a few comments and variable-name changes):

# Implementation 1a

import re

def process_markdown(infile, outfile):
    """Copy lines from infile to outfile, adding section numbers to Markdown headings.

    For example, a sequence of headings like:
        # Foo
        ## Bar
        ## Baz
        # Qux
    becomes:
        # 1 Foo
        ## 1.1 Bar
        ## 1.2 Baz
        # 2 Qux

    Non-heading lines are passed through unchanged.
    """
    # Stack of section counters, one per heading level.
    # e.g. after "## Foo" then "### Bar", this is [1, 1, 1].
    section_counters = []


    for line in infile:
        m = re.match(r'(#+)', line)
        if m:
            level = len(m.group(1))

            # Trim counters deeper than current level (leaving a sibling scope),
            # or extend with zeros up to current level (entering a deeper scope).
            while len(section_counters) > level:
                section_counters.pop()
            while len(section_counters) < level:
                section_counters.append(0)

            section_counters[level - 1] += 1
            prefix = '.'.join(map(str, section_counters))

            # "## Foo" -> "## 1.2 Foo"

            line = line[:level] + prefix + line[level:]

        outfile.write(line)

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)

I would prefer 1a > 1 >> 2a > 2. I imagine the "Clean Code" preference may be the opposite order.

John Ousterhout

unread,
Mar 3, 2026, 11:59:21 AMMar 3
to Shreevatsa R, Paul Becker, software-design-book
I prefer Implementation 1. I found it relatively simple to understand, even though it has no comments.

Implementation 2 made my head spin; I gave up before fully understanding it. It has a lot more code and it wasn't obvious what purpose the classes serve. At the least, it needs documentation to explain the abstractions provided by the classes, but I think the "abstractions" make it harder to understand the code, not easier. Consider the following lines:

number = sections.next_at_level(header.level)
line = header.with_prefix(number)

Just reading these lines, I have no idea what the "next_at_level" or "with_prefix" methods do, so to understand these lines I have to read the code of the methods. If those methods were documented, then perhaps the documentation would make this obvious. But as the code stands, I have to read the body of the methods. If I have to do that, then what benefit was there in pulling the code out into separate methods? Things would be easier to understand if all the code is in one place, as in Implementation 1.

The problem being solved here is pretty simple, and implementation 2 makes it more complicated than it needs to be.

-John-

--
You received this message because you are subscribed to the Google Groups "software-design-book" group.
To unsubscribe from this group and stop receiving emails from it, send an email to software-design-...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/software-design-book/CAKEM%3DPMR2dA%2Be3ppv0ti7_Rgmz5cy87E5UvOhWtSW40ZtkdSXA%40mail.gmail.com.

Paul Becker

unread,
Mar 8, 2026, 7:06:55 PM (10 days ago) Mar 8
to John Ousterhout, Shreevatsa R, software-design-book

Some thoughts---


I shared the original post with a friend, and part of his response was:

    If you have new preprocessing to add, say link checks or style enforcement, which version would you want to start with?

    To my mind, this example illustrates the difference between programming (fitness to one specific purpose) and software engineering (that purpose in the presence of changes integrated over time).

(emphasis mine)


What are the requirements for this code?

This is an actual program that i wrote because i needed it, and then i realized it might make a good example, in miniature, of the work of software design.

I think implementation 1 would probably be fine if it had comments, and the program's functionality never needed to be changed. This is a realistic possibility. I deliberately omitted documentation of any kind (except for names) to highlight the structure and comprehensibility of the code.

I wrote implementation 2 because i could see that the concerns of file processing, section numbering, and header line format are interwoven in imp1. It's certainly small enough of a program that they don't need to be separated. But it's a small program now, and it may not be in the future. I want to build the habit of getting from small programs to large programs, with good form. Whether i factor it now or later is irrelevant to me in this exercise---as is the scale of the individual components---i am interested in the process, skill, and wisdom involved in going from a small program to a large program, from one module to many modules. How do i establish the boundaries of components? How do i know when something should be private or public? How do i simplify rather than constrain and complicate? These are the lessons i want to internalize and be able to talk about.

I think that the abstractions i used in implementation 2 were not very good. Specifically, MarkdownHeader was not general enough, and NumberedSections should have hid its use of 'level' within its implementation. Imp2 arose from factoring the previous implementation, with an eye on separation of concerns but with too global of a perspective. I didn't think about how hard it would be to understand process_markdown() by itself, or what information NumberedSections should be hiding from the rest of the program. I also didn't think about future extension.

It feels important to point out that i didn't come to these conclusions on my own. It was only after showing my code to others that i began to see what its problems were.

This has led me to another implementation:

# implementation 3
import re

def process_markdown(infile, outfile):
    sections = NumberedSections()

    for line in infile:
        if sections.is_header_line(line):
            line = sections.add_section_number_to_header(line)

        outfile.write(line)

class NumberedSections:
    def __init__(self):
        self._numbers = []

    def is_header_line(self, line):
        return HeaderLine.parse(line) != None

    def add_section_number_to_header(self, line):
        header = HeaderLine.parse(line)

        assert header.level >= 1
        while len(self._numbers) > header.level:
            self._numbers.pop()
        while len(self._numbers) < header.level:
            self._numbers.append(0)
        self._numbers[header.level - 1] += 1

        secnum = '.'.join(map(str, self._numbers))

        header.text = secnum + ' ' + header.text
        return header.formatline()

class HeaderLine:
    @classmethod
    def parse(cls, line):
        m = re.match(r'(#+)', line)
        if not m:
            return None
        level = len(m.group(1))
        return cls(level, line[level:].strip())

    def __init__(self, level, text):
        self.level = level
        self.text = text

    def formatline(self):
        return '#' * self.level + ' ' + self.text + '\n'

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)


Looking forward to anything that this stirs up for you.

Thank you,

paul

Ivan Yordanov

unread,
Mar 9, 2026, 1:33:28 PM (9 days ago) Mar 9
to software-design-book
Few comments

parse method returns None when regex doesn't match. It forces higher level module to check if value is None, otherwise it will result in NPE. It is described in "eliminate special cases" chapter in the book. None/null is always such special case. Instead of null probably it is a good idea to return HeaderLine with 0 level and text
HeaderLine class violates Tell don't ask from other thread, specially the part of code that fills numbers array and it will be nice to remove while loops.
HeaderLine.parse is invoked twice when line is hader.

Addressing this comments could result in code similar to this one:

import re

def process_markdown(infile, outfile):
    sections = NumberedSections()

    for line in infile:
        outfile.write(sections.add_section_number_to_header(HeaderLine.parse(line)))



class NumberedSections:
    def __init__(self):
        self._numbers = []

    def add_section_number_to_header(self, header):
        numbers = header.increment(self._numbers)
        if len(numbers) > 0: self._numbers = numbers #store numbers for next increment
        return header.formatline('.'.join(map(str, numbers)))


class HeaderLine:
    @classmethod
    def parse(cls, line):
        m = re.match(r'(#+)', line)
        if not m:
            return cls(0, line)

        level = len(m.group(1))
        return cls(level, line[level:].strip())

    def __init__(self, level, text):
        self.level = level
        self.text = text

    def increment(self, numbers):
        numbers.extend([0]*(self.level-len(numbers)))
        if self.level > 0: numbers[self.level-1] += 1
        return numbers[:self.level]

    def formatline(self, secnum):
        return ' '.join(['#' * self.level, secnum, self.text]).strip() + '\n'

Now this code fixes the three comments mentioned above, no branching in process_markdown method caused by internal representation, It doesn't violate tell don't ask principle each line is parsed only once.
Although this version also have problems.
 - The line outfile.write(sections.add_section_number_to_header(HeaderLine.parse(line))) is quite ugly and there are a lot of method calls. Splitting it in multiple lines with local variables doesn't change the picture. We still have one extra method call.
 - if we look closer there are 3 places where we branch and the root cause is the same - line is not a header.
 - regular expression could be reused
Currently we have a call stack that looks like
process_markdown -> NumberedSections.add_section_number_to_header -> [header.increment, header.formatline].
With this stack order that's the best I could think of (always open for ideas).
To address these issues we should reorder class hierarchy and responsibilities. NumberedSections have only one property and is a good candidate to go down in the call stack (This is more a heuristic then rule, but classes with less properties should go at the end of the call stack).

A reordered version could look like this:

import re

def process_markdown(infile, outfile):
    header = HeaderLine(NumberedSection([]))

    for line in infile:
        outfile.write(header.process(line) + '\n')

class NumberedSection:
    def __init__(self, numbers):
        self._numbers = numbers

    def increment(self, level):
        self._numbers.extend([0]*(level-len(self._numbers))) # clone self._numbers?
        self._numbers[level-1] += 1
        return NumberedSection(self._numbers[:level])

    def format(self, text):
        return ' '.join(['#' * len(self._numbers), '.'.join([str(n) for n in self._numbers]), text]).strip()

class HeaderLine:
    def __init__(self, section):
        self._section = section
        self._regex = re.compile(r'(#+)')

    def process(self, line):
        m = self._regex.match(line)
        if not m:
            return line.strip()
        level = len(m.group(1))
        self._section = self._section.increment(level)
        return self._section.format(line[level:])

now the long ugly line from previous version looks better - outfile.write(header.process(line))
No branching propagation
reusing the regex
also shorter implementation and no shared members between classes :)

Cheers,
Ivan

John Ousterhout

unread,
Mar 10, 2026, 11:41:34 AM (8 days ago) Mar 10
to Paul Becker, Shreevatsa R, software-design-book
Here are some comments on "implementation 3".

* Overall, this implementation is still longer and more difficult to understand than implementation 1, and it's not obvious to me that this implementation will be easier to extend in the future than implementation 1 (it's hard to say without knowing what the extensions will be). Thus I'd probably go with implementation 1 and be prepared to refactor in the future if extensions require it.

* The classes still seem awkward to me, as discussed below.

* HeaderLine.parse is effectively two different functions rolled into one. One of the functions tests whether a line is a header line; the other function constructs an object based on the line. Each of these functions is used in only one place; why not separate them into a "check for header" function and a class constructor?

* add_section_number_to_header has secret side effects that are not obvious from its name. Not only is it adding section numbers to a given line, but it is building the tables that allow it to compute section numbers. Thus it only works if called exactly once for each header line, in order. If I don't call it for every header line, or if I call it for a random line extracted from the middle of the file, it won't work. As a result, it is entangled with the loop in process_markdown: it only works if used in the style of that loop.

* add_section_number reaches into the implementation of HeaderLine to extract the text and level fields. Then it modifies the text field! As a result, the NumberedSections and HeaderLine classes are entangled.

* Overall, the only way for me to understand this code was to load all of it (process_markdown, NumberedSections, and HeaderLine) into my head at once. This means that the classes don't provide much in the way of abstraction or modularity.

-John-

Paul Becker

unread,
Mar 10, 2026, 7:03:16 PM (8 days ago) Mar 10
to John Ousterhout, Shreevatsa R, software-design-book

Thanks for the comments! Here are some more thoughts---

- I stand by HeaderLine.parse as a single function. This falls under "parse; don't validate". Separating the validation ("is this a header line?") from the parsing (construct a HeaderLine from a line of text) would duplicate both logic and error handling. It allows subtle errors to creep into the code and makes maintenance more difficult. It's better to do both at the same time, and let the user apply the parser however they need to. (Relevant blog post: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ --- Although it's presented in terms of haskell, the central idea is the same.)

- I love the point about add_section_number_to_header() and the loop. An implied property of NumberedSections is that it is a stream-processor. I think an easy solution is to add an interface comment (in the docstring) explaining that it must be called on header lines in order. If the lines really needed to be random-access, then i would design for that.

- HeaderLine's 'level' and 'text' members are part of its interface. It's a data/serialization class.

As for extensions, the obvious one to me is a table of contents. I went ahead and implemented it on top of both implementation 1 and implementation 3. See below.

One of my big takeaways here is that interfaces take up space, both in the code and in the mind. But they also *create* space. They are boundaries. They delineate concern.

Familiarity is also a big issue. Two different keyboard layouts present exactly the same interface, physically in terms of the keyboard, but i may be able to use one much more easily than the other because of prior experience. I think this concept translates to the interfaces inside of our programs, both the explicit and implicit ones. My perspective, my experience shape my ability. The large and small patterns that i am familiar with affect the readability of all code that i read. A great example is functional and imperative styles.

I wonder if this means that the symptoms of complexity (cognitive load and so on) are, at least in part, different in each programmer. It follows that this would lead to different coding styles.

with curiosity,

paul


# implementation 1 + table of contents
import re, io, shutil

def process_markdown(infile, outfile):
    numbers = []  # [int]. ex: [1, 2, 4] represents '1.2.4'
    toc = []  # [(id, text)]

    # process the file, writing to a buffer
    body = io.StringIO()
    for line in infile:

        # header lines
        m = re.match(r'(#+)', line)
        if m:
            level = len(m.group(1))
            text = line[level:].strip()

            while len(numbers) > level:
                numbers.pop()
            while len(numbers) < level:
                numbers.append(0)
            numbers[level - 1] += 1
            secnum = '.'.join(map(str, numbers))
            
            text = secnum + ' ' + text

            id_ = 'header_' + secnum
            anchor = f'<a name="{id_}" />'
            toc.append((id_, text))

            line = '#'*level + ' ' + anchor + text + '\n'

        body.write(line)

    # write table of contents
    outfile.write('<ul>\n')
    for id_, text in toc:
        outfile.write(f'<li><a href="#{id_}">{text}</a></li>\n')
    outfile.write('</ul>\n')

    # write main content
    body.seek(0)
    shutil.copyfileobj(body, outfile)

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)



# implementation 3 + table of contents
import re, io, shutil

def process_markdown(infile, outfile):
    # preprocessing stage

    sections = NumberedSections()

    body = io.StringIO()
    for line in infile:
        if sections.is_header_line(line):
            line = sections.add_section_number_to_header(line)

        body.write(line)

    # output stage

    outfile.write(sections.make_toc())

    body.seek(0)
    shutil.copyfileobj(body, outfile)

class NumberedSections:
    '''
    adds section numbers to header-lines in a markdown file.
    can also produce a table of contents from these lines.
    '''

    def __init__(self):
        self._numbers = []  # [int]. ex: [1, 2, 4] represents '1.2.4'
        self._toc = []  # [HeaderLine]

    def is_header_line(self, line):
        '''
        is this line suitable as an argument to add_section_number_to_header()?
        return bool
        '''
        return HeaderLine.parse(line) != None

    def add_section_number_to_header(self, line):
        '''
        line must be a header-line.
        this function must be called on lines in order.
        
        return modified_line
        '''
        header = HeaderLine.parse(line)

        # update _numbers so that it holds this new section number
        assert header.level >= 1
        while len(self._numbers) > header.level:
            self._numbers.pop()
        while len(self._numbers) < header.level:
            self._numbers.append(0)
        self._numbers[header.level - 1] += 1

        secnum = '.'.join(map(str, self._numbers))
        header.text = secnum + ' ' + header.text

        self._toc.append(header.copy())

        id_ = self._get_anchor_id(header)
        anchor = f'<a name="{id_}" />'
        header.text = anchor + header.text

        return header.formatline()

    def _get_anchor_id(self, header):
        return 'header_' + header.text.split()[0]

    def make_toc(self):
        '''
        return html
        '''
        r = ['<ul>']
        for header in self._toc:
            id_ = self._get_anchor_id(header)
            r.append(f'<li><a href="#{id_}">{header.text}</a></li>')
        r += ['</ul>', '']
        return '\n'.join(r)

class HeaderLine:
    '''
    represents a parsed markdown header line.

    .level = int  # 1 is the highest level, then 2, 3, and so on
    .text = str
    '''

    @classmethod
    def parse(cls, line):
        '''
        return HeaderLine or None
        None indicates the line failed to parse as a header.
        '''
        m = re.match(r'(#+)', line)
        if not m:
            return None
        level = len(m.group(1))
        return cls(level, line[level:].strip())

    def __init__(self, level, text):
        self.level = level
        self.text = text

    def copy(self):
        '''
        return HeaderLine
        '''
        return HeaderLine(self.level, self.text)

    def formatline(self):
        '''
        return string with newline
        '''
        return '#' * self.level + ' ' + self.text + '\n'

if __name__ == '__main__':
    import sys
    process_markdown(sys.stdin, sys.stdout)

Felix Leipold

unread,
Mar 13, 2026, 11:25:57 AM (5 days ago) Mar 13
to Paul Becker, John Ousterhout, Shreevatsa R, software-design-book
This is a really interesting example, because it is simple enough to write it all "inline", but still there is some awkward mixing of concerns. Looking at the state of the program there is the numbers variable that has got a very wide interface (essentially one can mutate, add and remove elements to one's heart's content) even though we only intend to interact with it in a very limited way. Implementation 2 addresses exactly this issue with its NumberedSections class. However, it fails to call out what this thing actually does. Specifically, it does not represent a section, nor a collection of sections. Documentation would surely be nice, but a better name would probably already help. The name that came to my mind is MultiLevelCounter. I have already written similar code in release scripts to bump version numbers (bumpMajor, bumpMinor, bumpPatch), so this kind of counting is not exclusive to section numbering.
The MarkdownHeader class adds relatively little and only introduces new vocabulary. So, I would suggest Solution 1' which only extracts the counting:

import re


class MultiLevelCounter:
    """MultiLevelCounter is a counter that has multiple levels

    It can be used for numbering sections of a text document or for version numbers
    following the semver scheme.

    Examples:

        >>> counter = MultiLevelCounter()
        >>> counter.increment_at(0)
        >>> print(counter.current_string())
        "1"
        >>> counter.increment_at(1)
        >>> print(counter.current_string())
        "1.1"
        >>> counter.increment_at(0)
        >>> print(counter.current_string())
        "2"
        >>> counter.increment_at(2)
        >>> print(counter.current_string())
        "2.1"
 """

    def __init__(self):
        self.numbers = []

    def increment_at(self, level: int) -> None:
        """Increment the counter at the given level (starting with 0)"""

        while len(self.numbers) > level:
            self.numbers.pop()
        while len(self.numbers) < level:
            self.numbers.append(0)

        self.numbers[level - 1] += 1

    def current_str(self) -> str:
        """Returns the current counter value as string, e.g. "1.2.3" """
        return '.'.join(map(str, self.numbers))


def process_markdown(infile, outfile):
    counter: MultiLevelCounter = MultiLevelCounter()


    for line in infile:
        m = re.match(r'(#+)', line)
        if m:
            level = len(m.group(1))
            counter.increment_at(level)
            line = line[:level] + ' ' + counter.current_str() + line[level:]


        outfile.write(line)


if __name__ == '__main__':
    import sys

    process_markdown(sys.stdin, sys.stdout)

Observations

Subjectively, I find that I can read the process_markdown function without knowing how the counter works internally. 

Having the counting separated out, one can also more easily come up with more advanced numbering schemes, e.g.  there could be an initial value. One could also enforce certain rules, like not allowing incrementing at level 3, when level 2 is not set.
One can also more easily give examples or write tests against the numbering logic. It ties in nicely with "General Purpose Modules are Deeper". Generalising a (sub) problem may not only lead to reuse, but also (and arguably more importantly)  guide us to a cleaner implementation.

There are two questions, that help me separating out functionality from a "script style" program:
* What kind of (hypothetical) general purpose library would make writing this script easier?
* Would I be interested in writing unit tests against the extracted bit of functionality?

On a side note: I also find Scala makes for good executable pseudo code with the added bonus that it is faster and "safer" than python:
import scala.collection.mutable

class MultiLevelCounter:
   private val numbers: mutable.Stack[Int] = mutable.Stack.empty

   def increment_at(level: Int): Unit =
      while numbers.size > level do numbers.pop()

      while numbers.size < level do numbers.append(0)

      numbers(level - 1) += 1

   def currentString: String = numbers.mkString(".")

@main
def numberedMarkdown(): Unit =
   val headerPattern = "(#+)".r

   val counter = MultiLevelCounter()
   for (line <- scala.io.Source.stdin.getLines())
      println(headerPattern.findFirstIn(line) match
         case Some(result) =>
            val level = result.length
            counter.increment_at(level)
            line.take(level) + " " + counter.currentString + line.drop(level)
         case None => line
      )

Best regards,

Felix


Reply all
Reply to author
Forward
0 new messages