Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

ENB: An alternative to g.splitLines

24 views
Skip to first unread message

Edward K. Ream

unread,
Jan 14, 2025, 10:24:27 AMJan 14
to leo-editor

The fix to a bug in Leo's Rust importer requires an alternative to g.splitLines:


def splitLines(s: str) -> list[str]:

    return s.splitlines(True) if s else []


The problem arises from str.splitlines. Importers should split lines only newlines. A crucial invariant fails otherwise.


A new function is required: g.splitLinesAtNewline. This function splits lines only at newlines. It does not split lines at form-feeds and other unusual line-ending characters. 


This function is surprisingly complicated. I had to develop it using a new unit test.


Summary


All of Leo's importers use g.splitLinesAtNewline.


g.splitLines will remain as it is: any change would be a breaking change to Leo's API.


Edward


P.S. Here is g.splitLinesAtNewline:


def splitLinesAtNewline(s: str) -> list[str]:
    """
    Split lines *only* at '\n', preserving form-feeds and other unusual line-ending characters.
    """
    if not s:
        return []
    lines = s.split(sep='\n')
    if lines[-1] == '':
        lines.pop()
    lines = [f"{z}\n" for z in lines]
    if not s.endswith('\n'):
        lines[-1] = lines[-1][:-1]
    return lines


Notes


The "s" arg must be a unicode string. byte args are not allowed, but all tests pass with g.splitLines defined this way:


def splitLines(s: str) -> list[str]:
    return splitLinesAtNewline(g.toUnicode(s))


To repeat, I am not going to change g.splitLines!


EKR

Reply all
Reply to author
Forward
0 new messages