The fix to a bug in Leo's Rust importer requires an alternative to g.splitLines:
def splitLines(s: str) -> list[str]:
return s.splitlines(True) if s else []
The problem arises from str.splitlines. Importers should split lines only newlines. A crucial invariant fails otherwise.
A new function is required: g.splitLinesAtNewline. This function splits lines only at newlines. It does not split lines at form-feeds and other unusual line-ending characters.
This function is surprisingly complicated. I had to develop it using a new unit test.
Summary
All of Leo's importers use g.splitLinesAtNewline.
g.splitLines will remain as it is: any change would be a breaking change to Leo's API.
Edward
P.S. Here is g.splitLinesAtNewline:
def splitLinesAtNewline(s: str) -> list[str]:
"""
Split lines *only* at '\n', preserving form-feeds and other unusual line-ending characters.
"""
if not s:
return []
lines = s.split(sep='\n')
if lines[-1] == '':
lines.pop()
lines = [f"{z}\n" for z in lines]
if not s.endswith('\n'):
lines[-1] = lines[-1][:-1]
return lines
Notes
The "s" arg must be a unicode string. byte args are not allowed, but all tests pass with g.splitLines defined this way:
def splitLines(s: str) -> list[str]:
return splitLinesAtNewline(g.toUnicode(s))
To repeat, I am not going to change g.splitLines!
EKR