Google Season of Docs 2020 Proposal

44 views
Skip to first unread message

Brandon David

unread,
Jul 28, 2020, 4:18:42 PM7/28/20
to sympy
Hello SymPy mentors!

I apologize that this introduction is coming so late; I was unable to take advantage of the "exploration" period and didn't know until just recently that applicants are still allowed to contact mentors even after the application deadline. I have been hoping to speak with someone about my proposal and would love to answer any questions or concerns. Would it be possible to do so before the review period ends?

Since submitting my proposal, I have chipped away at its tasks and thought it might be helpful to share some of the early results. In preparation for converting all docstrings to the numpydoc format, I ran the entire smypy library through the numpydoc validator. It took a quite bit of monkey-patching to have it not choke on SymPy's docstrings, so the nearly 17,000 errors it flagged are just a rough measure of the situation:

I also noticed a few issues that the numpydoc validator didn't, such as malformed "See Also" sections and docstrings with repeated section names (e.g. crypto.encipher_hill has two "Notes" sections). Still, I think the above list is certainly enough to get started and I would be eager to do so -- or if you have already settled on a different technical writer, I would be happy to supply them with the work I've already done.

Many thanks for your time and I look forward to speaking further!

Cheers,
Brandon

P.S. If any mentor needs a copy of my proposal, please let me know. I can also be reached at brando...@zoho.com

Aaron Meurer

unread,
Jul 28, 2020, 4:23:35 PM7/28/20
to sympy
Thanks for doing this. Our style guide that was developed last year
differs from numpydoc in a few ways, and it also has some additional
things. But being able to automatically validate those things that can
be validated is good.

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/21d0454c-e1af-4f84-984b-22f0ce4eb29eo%40googlegroups.com.

Aaron Meurer

unread,
Jul 28, 2020, 6:11:10 PM7/28/20
to sympy
You mentioned in your proposal that you contributed to the numpydoc
validator. Can you reference where that work is? Where is the source
for the numpydoc validator?

Aaron Meurer

Brandon David

unread,
Jul 29, 2020, 4:26:51 PM7/29/20
to sympy
The numpydoc validator is available at https://github.com/numpy/numpydoc/blob/master/numpydoc/validate.py and can be run with python -m numpydoc --validate

However, it has some limitations out-of-the-box, e.g. it does not offer package-wide validation or any form of .rst parsing. Rather, it accepts the name of a single object and uses importlib to fetch that object's docstring. As a result, most projects maintain their own validation scripts that wrap numpydoc.validate and make repeated calls to it. For example, scikit-learn has a script that enumerates their functions/classes/methods (using pkgutil), filters that list of objects, calls numpydoc.validate on each, ignores certain error codes, and pretty prints the results:
https://github.com/scikit-learn/scikit-learn/blob/master/maint_tools/test_docstrings.py

And over at pandas, they use numpydoc.validate alongside some custom validation, all rolled into their CI:
https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py
https://github.com/pandas-dev/pandas/blob/master/scripts/tests/test_validate_docstrings.py

Note that pandas' version parses .rst files directly to enumerate the objects to be validated, as that is how the validation script was written before it migrated from pandas to numpydoc. To clarify, I have not contributed code to numpydoc.validate but I did participate in its migration (on GitHub and via email) and adapted/tested both versions for use with SciPy. One issue that cropped up was that the .rst parsing of the original script assumed autosummary, while many projects (like SciPy) use autodoc. For that and other reasons, .rst parsing was removed entirely from numpydoc.validate, at least for now. I see that SymPy has considered migrating from autodoc to autosummary (#18594), which could make it easy to mimic some of the sophisticated things pandas is doing with their docstring validation, including CI.

But despite any .rst parsing, the validation itself is still done through importlib. This dependency makes numpydoc.validate somewhat clunky to use like a linter, as that would require building from source before each validation. I certainly don't want to imply that numpydoc.validate is the perfect tool for all workflows and, in fact, an overenthusiasm for tooling can easily generate technical debt and distract from more valuable work (like actually writing docstrings). My proposed use of numpydoc.validate was just as another tool in the toolbox; a convenient way to populate my tasklist. For example, here is a quick list of the SymPy objects that have custom sections or don't follow the section order specified in the SymPy docstring guide (i.e. Explanation, Examples, Parameters, See Also, References):
https://gist.github.com/brandondavid/02868ca74600897d5d61c43c43e2654a

--Brandon
> > To unsubscribe from this group and stop receiving emails from it, send an email to sy...@googlegroups.com.

Aaron Meurer

unread,
Jul 29, 2020, 6:50:23 PM7/29/20
to sympy
Thanks for the writeup. I agree with you about tooling. For
documentation in particular, over use of tooling can lead to a
situation where documentation is written more for machines than
humans. Consistency in formatting is important and certainly makes it
easier for humans to read documentation, but this can go too far.
Things that are easy for machines to parser are not necessarily the
same as the things that are easy for humans to read and understand.

With that being said, tooling can be helpful because there are a lot
of rules in our documentation style guide, and the more of them that
we can enforce with tests, the better. Otherwise it becomes a burden
on reviewers to know all the rules, and the documentation ultimately
ends up not following it unless we have a technical writer who
constantly reviews all documentation. Of course, the first step is to
actually make it so everything conforms to the style guide. Once that
is done, we can look into how to use tests and tooling to keep it that
way.

Aaron Meurer

On Wed, Jul 29, 2020 at 2:26 PM 'Brandon David' via sympy
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/104147a1-4b1e-48f3-86b4-ddb5de3bdba5o%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages