Tools v2.x issues

43 views
Skip to first unread message

Vince

unread,
Jun 17, 2021, 1:59:40 AM6/17/21
to Standard Ebooks
A couple of things in 2.x.

1. build-manifest and build-spine are now giving stack dumps on the same error that lint (and build-toc and other commands) pretty-prints.
From lint:
 Error  Couldn’t parse XML in /Users/vrice/Library/Mobile
Documents/com~apple~CloudDocs/Books/willa-cather_one-of-ours/src/epub/text/chapter-1-19.xhtml. Exception: Opening and ending tag
mismatch: p line 51 and section, line 64, column 14 (<string>, line 64)

From build-manifest or build-spine:
Traceback (most recent call last):
  File "/Users/vrice/setools/se/easy_xml.py", line 69, in __init__
    self.etree = etree.fromstring(str.encode(xml_string))
  File "src/lxml/etree.pyx", line 3237, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1896, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1784, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1141, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 64
lxml.etree.XMLSyntaxError: Opening and ending tag mismatch: p line 51 and section, line 64, column 14

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/vrice/setools/se/se_epub.py", line 411, in get_dom
    self._dom_cache[file_path_str] = se.easy_xml.EasyXmlTree(file_contents)
  File "/Users/vrice/setools/se/easy_xml.py", line 71, in __init__
    raise se.InvalidXmlException(f"Couldn’t parse XML. Exception: {ex}") from ex
se.InvalidXmlException: Couldn’t parse XML. Exception: Opening and ending tag mismatch: p line 51 and section, line 64, column 14 (<string>, line 64)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/se", line 33, in <module>
    sys.exit(load_entry_point('standardebooks', 'console_scripts', 'se')())
  File "/Users/vrice/setools/se/main.py", line 81, in main
    sys.exit(getattr(module, command_function)(args.plain_output))
  File "/Users/vrice/setools/se/commands/build_manifest.py", line 40, in build_manifest
    node.replace_with(se_epub.generate_manifest())
  File "/Users/vrice/setools/se/se_epub.py", line 922, in generate_manifest
    dom = self.get_dom(file_path)
  File "/Users/vrice/setools/se/se_epub.py", line 423, in get_dom
    raise se.InvalidXhtmlException(f"Couldn’t parse XML in [path][link=file://{file_path.resolve()}]{file_path}[/][/]. Exception: {ex.__cause__}") from ex
se.InvalidXhtmlException: Couldn’t parse XML in [path][link=file:///Users/vrice/Library/Mobile Documents/com~apple~CloudDocs/Books/willa-cather_one-of-ours/src/epub/text/chapter-1-19.xhtml]/Users/vrice/Library/Mobile Documents/com~apple~CloudDocs/Books/willa-cather_one-of-ours/src/epub/text/chapter-1-19.xhtml[/][/]. Exception: Opening and ending tag mismatch: p line 51 and section, line 64, column 14 (<string>, line 64)

It was too late to try to figure out why; I’ll try to look at it this weekend if you don’t get to it first.

2. When doing a build, e.g. se build -c ., any error apparently no longer saves the standard .epub (the advanced .epub is saved). This is … annoying. I routinely build the epub before I have cover art, and consequently I get the "COVER_ARTIST_WIKI_URL could not be found” error. But until now, I still got an epub, which is good, because I don’t care about that error at this point. But with 2.x, we no longer get an epub, so I have to run build again with the check to get one. Why are epubcheck errors causing the epub to be deleted?

Alex Cabal

unread,
Jun 17, 2021, 11:52:06 AM6/17/21
to standar...@googlegroups.com
The idea is that if there's an error in the epub, you want to fix it
before getting an output file. Most errors are more serious than a
missing variable but the toolset can't know that. If you want an ebook
file regardless of errors, then just omit -c.

On 6/17/21 12:59 AM, Vince wrote:
> A couple of things in 2.x.
>
> 1. build-manifest and build-spine are now giving stack dumps on the same
> error that lint (and build-toc and other commands) pretty-prints.
> From lint:
> * Error * Couldn’t parse XML in /Users/vrice/Library/Mobile
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/E0951440-A99B-4CAD-BEFD-0C47ECCDA4AA%40letterboxes.org
> <https://groups.google.com/d/msgid/standardebooks/E0951440-A99B-4CAD-BEFD-0C47ECCDA4AA%40letterboxes.org?utm_medium=email&utm_source=footer>.

Vince

unread,
Jun 17, 2021, 12:24:04 PM6/17/21
to standar...@googlegroups.com
Yes, which means it has to be run twice. (I want the check, because I want to know if there are other problems I don’t know about.)

There is no harm in keeping the output file; the tools have been keeping it the entire time until now. If we don’t want the file, we can delete it. We can delete it; the is no reason for the toolset to do so. All we’re doing is making it harder for the user for absolutely no benefit.

Alex Cabal

unread,
Jun 17, 2021, 1:17:28 PM6/17/21
to standar...@googlegroups.com
The way the code works now is that it runs epubcheck on the temp
directory it uses during build, and that directory is kept if check
fails. Thus there's no actual epub to keep around if check fails.

Previously build would create the epub, then check the epub file, not
the work directory, which is why the file was left around. That was also
why it was annoying to trace epubcheck errors, because you'd have to
explode the epub somewhere and hunt for the problem. Now the output of
build hyperlinks the problem files in the temp folder for you to inspect
directly.

On 6/17/21 11:24 AM, Vince wrote:
> Yes, which means it has to be run twice. (I /want/ the check, because I
> want to know if there are other problems I don’t know about.)
>
> There is no harm in keeping the output file; the tools have been keeping
> it the entire time until now. If we don’t want the file, we can delete
> it. /We/ can delete it; the is no reason for the toolset to do so. All
> we’re doing is making it harder for the user for absolutely no benefit.
>
>
>> On Jun 17, 2021, at 10:51 AM, Alex Cabal <al...@standardebooks.org
>>> <file:///Users/vrice/Library/Mobile><file:///Users/vrice/Library/Mobile
>>> <file:///Users/vrice/Library/Mobile>>
>>> Documents/com~apple~CloudDocs/Books/willa-cather_one-of-ours/src/epub/text/chapter-1-19.xhtml]/Users/vrice/Library/Mobile
>>> Documents/com~apple~CloudDocs/Books/willa-cather_one-of-ours/src/epub/text/chapter-1-19.xhtml[/][/].
>>> Exception: Opening and ending tag mismatch: p line 51 and section,
>>> line 64, column 14 (<string>, line 64)
>>> It was too late to try to figure out why; I’ll try to look at it this
>>> weekend if you don’t get to it first.
>>> 2. When doing a build, e.g. se build -c ., any error apparently no
>>> longer saves the standard .epub (the advanced .epub is saved). This
>>> is … annoying. I routinely build the epub before I have cover art,
>>> and consequently I get the "COVER_ARTIST_WIKI_URL could not be found”
>>> error. But until now, I still got an epub, which is good, because I
>>> don’t care about that error at this point. But with 2.x, we no longer
>>> get an epub, so I have to run build again with the check to get one.
>>> Why are epubcheck errors causing the epub to be deleted?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/A9C2A483-15FB-4073-960A-CBF15AA12D9E%40letterboxes.org
> <https://groups.google.com/d/msgid/standardebooks/A9C2A483-15FB-4073-960A-CBF15AA12D9E%40letterboxes.org?utm_medium=email&utm_source=footer>.

Vince

unread,
Jun 17, 2021, 1:41:40 PM6/17/21
to Standard Ebooks
Yes, and that’s a great thing (seriously, thank you). But the one-liner that creates the epub should be moved up to do so unconditionally.
The new --check-only only does a check without producing a file. If we don’t want a file, that’s what we use.
But if we do want a file, we still don’t get one. That doesn’t make sense. So --check should produce a file, regardless.
It’s great that the errors themselves point to the working directory to make figuring out issues easier, but we should still get the epub.

Alex Cabal

unread,
Jun 17, 2021, 11:09:51 PM6/17/21
to standar...@googlegroups.com
OK, if you want then set up a pull request

On 6/17/21 12:41 PM, Vince wrote:
> Yes, and that’s a great thing (seriously, thank you). But the one-liner
> that creates the epub should be moved up to do so unconditionally.
> The new --check-only /only/ does a check without producing a file. If we
> don’t want a file, that’s what we use.
> But if we /do/ want a file, we still don’t get one. That doesn’t make
> sense. So --check /should/ produce a file, regardless.
> It’s great that the errors themselves point to the working directory to
> make figuring out issues easier, but we should still get the epub.
>
>> On Jun 17, 2021, at 12:17 PM, Alex Cabal <al...@standardebooks.org
>> <mailto:al...@standardebooks.org>> wrote:
>>
>> The way the code works now is that it runs epubcheck on the temp
>> directory it uses during build, and that directory is kept if check
>> fails. Thus there's no actual epub to keep around if check fails.
>>
>> Previously build would create the epub, then check the epub file, not
>> the work directory, which is why the file was left around. That was
>> also why it was annoying to trace epubcheck errors, because you'd have
>> to explode the epub somewhere and hunt for the problem. Now the output
>> of build hyperlinks the problem files in the temp folder for you to
>> inspect directly.
>>
>> On 6/17/21 11:24 AM, Vince wrote:
>>> Yes, which means it has to be run twice. (I /want/ the check, because
>>> I want to know if there are other problems I don’t know about.)
>>> There is no harm in keeping the output file; the tools have been
>>> keeping it the entire time until now. If we don’t want the file, we
>>> can delete it. /We/ can delete it; the is no reason for the toolset
>>> to do so. All we’re doing is making it harder for the user for
>>> absolutely no benefit.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Standard Ebooks" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to standardebook...@googlegroups.com
> <mailto:standardebook...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/standardebooks/86D1C723-D327-4283-832C-AC328F9BC75B%40letterboxes.org
> <https://groups.google.com/d/msgid/standardebooks/86D1C723-D327-4283-832C-AC328F9BC75B%40letterboxes.org?utm_medium=email&utm_source=footer>.

Vince Rice

unread,
Jun 19, 2021, 8:43:59 PM6/19/21
to standar...@googlegroups.com
Done.

> On Jun 17, 2021, at 10:09 PM, Alex Cabal <al...@standardebooks.org> wrote:
>
> OK, if you want then set up a pull request
Reply all
Reply to author
Forward
0 new messages