Here are two implementations in python of a program which takes a markdown file as input, and outputs the same file but the header lines have each been prefixed with a section number. There is a sample markdown file included at the end which can be used as input.
Which implementation is simpler? Why?
Any other comments?
# implementation 1
import re
def process_markdown(infile, outfile):
numbers = []
for line in infile:
m = re.match(r'(#+)', line)
if m:
level = len(m.group(1))
while len(numbers) > level:
numbers.pop()
while len(numbers) < level:
numbers.append(0)
numbers[level - 1] += 1
prefix = '.'.join(map(str, numbers))
line = line[:level] + prefix + line[level:]
outfile.write(line)
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
# implementation 2
import re
def process_markdown(infile, outfile):
sections = NumberedSections()
for line in infile:
header = MarkdownHeader.parse(line)
if header:
number = sections.next_at_level(header.level)
line = header.with_prefix(number)
outfile.write(line)
class MarkdownHeader:
@classmethod
def parse(cls, line):
m = re.match(r'(#+)', line)
if not m:
return None
level = len(m.group(1))
return cls(level, line)
def __init__(self, level, line):
self.level = level
self.line = line
def with_prefix(self, prefix):
return self.line[:self.level] + prefix + self.line[self.level:]
class NumberedSections:
def __init__(self):
self.numbers = []
def next_at_level(self, level):
assert level >= 1
while len(self.numbers) > level:
self.numbers.pop()
while len(self.numbers) < level:
self.numbers.append(0)
self.numbers[level - 1] += 1
return '.'.join(map(str, self.numbers))
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
<sample markdown file>
**Computer programming**
adapted from [wikipedia](https://en.wikipedia.org/wiki/Computer_programming)
Computer programming or coding is the composition of sequences of instructions, called programs, that computers can follow to perform tasks.
# History
Programmable devices have existed for centuries.
## Machine language
Machine code was the language of early programs, written in the instruction set of the particular machine, often in binary notation.
## Compiler languages
High-level languages made the process of developing a program simpler and more understandable, and less bound to the underlying hardware.
## Source code entry
Programs were mostly entered using punched cards or paper tape.
# Modern programming
## Quality requirements
Whatever the approach to development may be, the final program must satisfy some fundamental properties.
## Readability of source code
In computer programming, readability refers to the ease with which a human reader can comprehend the purpose, control flow, and operation of source code.
## Algorithmic complexity
The academic field and the engineering practice of computer programming are concerned with discovering and implementing the most efficient algorithms for a given class of problems.
## Methodologies
The first step in most formal software development processes is requirements analysis, followed by testing to determine value modeling, implementation, and failure elimination (debugging).
## Measuring language usage
It is very difficult to determine what are the most popular modern programming languages.
## Debugging
Debugging is a very important task in the software development process since having defects in a program can have significant consequences for its users.
# Programming languages
Different programming languages support different styles of programming (called programming paradigms).
# Learning to program
Learning to program has a long history related to professional standards and practices, academic initiatives and curriculum, and commercial books and materials for students, self-taught learners, hobbyists, and others who desire to create or customize software for personal use.
## Context
In 1957, there were approximately 15,000 computer programmers employed in the U.S., a figure that accounts for 80% of the world's active developers.
## Technical publishers
As personal computers became mass-market products, thousands of trade books and magazines sought to teach professional, hobbyist, and casual users to write computer programs.
## Digital learning / online resources
Between 2000 and 2010, computer book and magazine publishers declined significantly as providers of programming instruction, as programmers moved to Internet resources to expand their access to information.
# Programmers
Computer programmers are those who write computer software.
Here are two implementations in python of a program which takes a markdown file as input, and outputs the same file but the header lines have each been prefixed with a section number. There is a sample markdown file included at the end which can be used as input.
Which implementation is simpler? Why?
Any other comments?
--
You received this message because you are subscribed to the Google Groups "software-design-book" group.
To unsubscribe from this group and stop receiving emails from it, send an email to software-design-...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/software-design-book/CAKEM%3DPMR2dA%2Be3ppv0ti7_Rgmz5cy87E5UvOhWtSW40ZtkdSXA%40mail.gmail.com.
Some thoughts---
I shared the original post with a friend, and part of his
response was:
If you have new preprocessing to add, say link checks or style
enforcement, which version would you want to start with?
To my mind, this example illustrates the difference between
programming (fitness to one specific purpose) and software
engineering (that purpose in the presence of changes integrated
over time).
(emphasis mine)
What are the requirements for this code?
This is an actual program that i wrote because i needed it, and then i realized it might make a good example, in miniature, of the work of software design.
I think implementation 1 would probably be fine if it had comments, and the program's functionality never needed to be changed. This is a realistic possibility. I deliberately omitted documentation of any kind (except for names) to highlight the structure and comprehensibility of the code.
I wrote implementation 2 because i could see that the concerns of file processing, section numbering, and header line format are interwoven in imp1. It's certainly small enough of a program that they don't need to be separated. But it's a small program now, and it may not be in the future. I want to build the habit of getting from small programs to large programs, with good form. Whether i factor it now or later is irrelevant to me in this exercise---as is the scale of the individual components---i am interested in the process, skill, and wisdom involved in going from a small program to a large program, from one module to many modules. How do i establish the boundaries of components? How do i know when something should be private or public? How do i simplify rather than constrain and complicate? These are the lessons i want to internalize and be able to talk about.
I think that the abstractions i used in implementation 2 were not very good. Specifically, MarkdownHeader was not general enough, and NumberedSections should have hid its use of 'level' within its implementation. Imp2 arose from factoring the previous implementation, with an eye on separation of concerns but with too global of a perspective. I didn't think about how hard it would be to understand process_markdown() by itself, or what information NumberedSections should be hiding from the rest of the program. I also didn't think about future extension.
It feels important to point out that i didn't come to these conclusions on my own. It was only after showing my code to others that i began to see what its problems were.
This has led me to another implementation:
# implementation 3
import re
def process_markdown(infile, outfile):
sections = NumberedSections()
for line in infile:
if sections.is_header_line(line):
line = sections.add_section_number_to_header(line)
outfile.write(line)
class NumberedSections:
def __init__(self):
self._numbers = []
def is_header_line(self, line):
return HeaderLine.parse(line) != None
def add_section_number_to_header(self, line):
header = HeaderLine.parse(line)
assert header.level >= 1
while len(self._numbers) > header.level:
self._numbers.pop()
while len(self._numbers) < header.level:
self._numbers.append(0)
self._numbers[header.level - 1] += 1
secnum = '.'.join(map(str, self._numbers))
header.text = secnum + ' ' + header.text
return header.formatline()
class HeaderLine:
@classmethod
def parse(cls, line):
m = re.match(r'(#+)', line)
if not m:
return None
level = len(m.group(1))
return cls(level, line[level:].strip())
def __init__(self, level, text):
self.level = level
self.text = text
def formatline(self):
return '#' * self.level + ' ' + self.text + '\n'
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
Looking forward to anything that this stirs up for you.
Thank you,
paul
Thanks for the comments! Here are some more thoughts---
- I stand by HeaderLine.parse as a single function. This falls
under "parse; don't validate". Separating the validation ("is this
a header line?") from the parsing (construct a HeaderLine from a
line of text) would duplicate both logic and error handling. It
allows subtle errors to creep into the code and makes maintenance
more difficult. It's better to do both at the same time, and let
the user apply the parser however they need to. (Relevant blog
post:
https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
--- Although it's presented in terms of haskell, the central idea
is the same.)
- I love the point about add_section_number_to_header() and the
loop. An implied property of NumberedSections is that it is a
stream-processor. I think an easy solution is to add an interface
comment (in the docstring) explaining that it must be called on
header lines in order. If the lines really needed to be
random-access, then i would design for that.
- HeaderLine's 'level' and 'text' members are part of its
interface. It's a data/serialization class.
As for extensions, the obvious one to me is a table of contents. I
went ahead and implemented it on top of both implementation 1 and
implementation 3. See below.
One of my big takeaways here is that interfaces take up space,
both in the code and in the mind. But they also *create* space.
They are boundaries. They delineate concern.
Familiarity is also a big issue. Two different keyboard layouts
present exactly the same interface, physically in terms of the
keyboard, but i may be able to use one much more easily than the
other because of prior experience. I think this concept translates
to the interfaces inside of our programs, both the explicit and
implicit ones. My perspective, my experience shape my ability. The
large and small patterns that i am familiar with affect the
readability of all code that i read. A great example is functional
and imperative styles.
I wonder if this means that the symptoms of complexity (cognitive
load and so on) are, at least in part, different in each
programmer. It follows that this would lead to different coding
styles.
with curiosity,
paul
# implementation 1 + table of contents
import re, io, shutil
def process_markdown(infile, outfile):
numbers = [] # [int]. ex: [1, 2, 4] represents '1.2.4'
toc = [] # [(id, text)]
# process the file, writing to a buffer
body = io.StringIO()
for line in infile:
# header lines
m = re.match(r'(#+)', line)
if m:
level = len(m.group(1))
text = line[level:].strip()
while len(numbers) > level:
numbers.pop()
while len(numbers) < level:
numbers.append(0)
numbers[level - 1] += 1
secnum = '.'.join(map(str, numbers))
text = secnum + ' ' + text
id_ = 'header_' + secnum
anchor = f'<a name="{id_}" />'
toc.append((id_, text))
line = '#'*level + ' ' + anchor + text + '\n'
body.write(line)
# write table of contents
outfile.write('<ul>\n')
for id_, text in toc:
outfile.write(f'<li><a href="#{id_}">{text}</a></li>\n')
outfile.write('</ul>\n')
# write main content
body.seek(0)
shutil.copyfileobj(body, outfile)
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
# implementation 3 + table of contents
import re, io, shutil
def process_markdown(infile, outfile):
# preprocessing stage
sections = NumberedSections()
body = io.StringIO()
for line in infile:
if sections.is_header_line(line):
line = sections.add_section_number_to_header(line)
body.write(line)
# output stage
outfile.write(sections.make_toc())
body.seek(0)
shutil.copyfileobj(body, outfile)
class NumberedSections:
'''
adds section numbers to header-lines in a markdown file.
can also produce a table of contents from these lines.
'''
def __init__(self):
self._numbers = [] # [int]. ex: [1, 2, 4] represents '1.2.4'
self._toc = [] # [HeaderLine]
def is_header_line(self, line):
'''
is this line suitable as an argument to add_section_number_to_header()?
return bool
'''
return HeaderLine.parse(line) != None
def add_section_number_to_header(self, line):
'''
line must be a header-line.
this function must be called on lines in order.
return modified_line
'''
header = HeaderLine.parse(line)
# update _numbers so that it holds this new section number
assert header.level >= 1
while len(self._numbers) > header.level:
self._numbers.pop()
while len(self._numbers) < header.level:
self._numbers.append(0)
self._numbers[header.level - 1] += 1
secnum = '.'.join(map(str, self._numbers))
header.text = secnum + ' ' + header.text
self._toc.append(header.copy())
id_ = self._get_anchor_id(header)
anchor = f'<a name="{id_}" />'
header.text = anchor + header.text
return header.formatline()
def _get_anchor_id(self, header):
return 'header_' + header.text.split()[0]
def make_toc(self):
'''
return html
'''
r = ['<ul>']
for header in self._toc:
id_ = self._get_anchor_id(header)
r.append(f'<li><a href="#{id_}">{header.text}</a></li>')
r += ['</ul>', '']
return '\n'.join(r)
class HeaderLine:
'''
represents a parsed markdown header line.
.level = int # 1 is the highest level, then 2, 3, and so on
.text = str
'''
@classmethod
def parse(cls, line):
'''
return HeaderLine or None
None indicates the line failed to parse as a header.
'''
m = re.match(r'(#+)', line)
if not m:
return None
level = len(m.group(1))
return cls(level, line[level:].strip())
def __init__(self, level, text):
self.level = level
self.text = text
def copy(self):
'''
return HeaderLine
'''
return HeaderLine(self.level, self.text)
def formatline(self):
'''
return string with newline
'''
return '#' * self.level + ' ' + self.text + '\n'
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
The MarkdownHeader class adds relatively little and only introduces new vocabulary. So, I would suggest Solution 1' which only extracts the counting:
import re
class MultiLevelCounter:
"""MultiLevelCounter is a counter that has multiple levels
It can be used for numbering sections of a text document or for version numbers
following the semver scheme.
Examples:
>>> counter = MultiLevelCounter()
>>> counter.increment_at(0)
>>> print(counter.current_string())
"1"
>>> counter.increment_at(1)
>>> print(counter.current_string())
"1.1"
>>> counter.increment_at(0)
>>> print(counter.current_string())
"2"
>>> counter.increment_at(2)
>>> print(counter.current_string())
"2.1"
"""
def __init__(self):
self.numbers = []
def increment_at(self, level: int) -> None:
"""Increment the counter at the given level (starting with 0)"""
while len(self.numbers) > level:
self.numbers.pop()
while len(self.numbers) < level:
self.numbers.append(0)
self.numbers[level - 1] += 1
def current_str(self) -> str:
"""Returns the current counter value as string, e.g. "1.2.3" """
return '.'.join(map(str, self.numbers))
def process_markdown(infile, outfile):
counter: MultiLevelCounter = MultiLevelCounter()
for line in infile:
m = re.match(r'(#+)', line)
if m:
level = len(m.group(1))
counter.increment_at(level)
line = line[:level] + ' ' + counter.current_str() + line[level:]
outfile.write(line)
if __name__ == '__main__':
import sys
process_markdown(sys.stdin, sys.stdout)
import scala.collection.mutable
class MultiLevelCounter:
private val numbers: mutable.Stack[Int] = mutable.Stack.empty
def increment_at(level: Int): Unit =
while numbers.size > level do numbers.pop()
while numbers.size < level do numbers.append(0)
numbers(level - 1) += 1
def currentString: String = numbers.mkString(".")
@main
def numberedMarkdown(): Unit =
val headerPattern = "(#+)".r
val counter = MultiLevelCounter()
for (line <- scala.io.Source.stdin.getLines())
println(headerPattern.findFirstIn(line) match
case Some(result) =>
val level = result.length
counter.increment_at(level)
line.take(level) + " " + counter.currentString + line.drop(level)
case None => line
)To view this discussion visit https://groups.google.com/d/msgid/software-design-book/52642fd8-db9b-48c2-88fc-71ca5d7de7f8%40gmail.com.