Query about the GSoC project Hypothesis testing of SymPy

145 views
Skip to first unread message

Pradyot Ranjan

unread,
Mar 3, 2025, 3:53:36 PM3/3/25
to sympy
Hi,
Just wanted to know if this project is still relevant regarding GSoC? If it is, who is the mentor?
I have some experience with hypothesis testing and would love to work here.

Thanks,
Pradyot Ranjan

Aaron Meurer

unread,
Mar 3, 2025, 4:53:38 PM3/3/25
to sy...@googlegroups.com
Yes, that project is still very relevant. If you search the codebase
for hypothesis you'll see that it is currently only used in a few
tests, but we want that to increase by a lot.

What sort of experience do you have with hypothesis?

Aaron Meurer
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sympy+un...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/sympy/afa7d863-666f-475f-ae4c-1ccb8a5d3752n%40googlegroups.com.

Pradyot Ranjan

unread,
Mar 3, 2025, 5:59:34 PM3/3/25
to sy...@googlegroups.com
Last year I worked as a GSoC student for PyBaMM. We had a stretch goal regarding the implementation of hypothesis testing which can be tracked here : 
- https://github.com/pybamm-team/PyBaMM/issues/4703
I also reviewed some PRs regarding this :
- https://github.com/pybamm-team/PyBaMM/pull/4724

Other than this I also worked as an LFX mentee last year where I implemented Fuzz testing (which is similar to Hypothesis's property-based testing in some ways). 


Pradyot Ranjan

unread,
Mar 4, 2025, 4:19:07 AM3/4/25
to sy...@googlegroups.com
What are the components that can benefit most out of hypothesis testing? I can try to implement them before I start writing a proposal it that's okay.

Aaron Meurer

unread,
Mar 4, 2025, 1:46:49 PM3/4/25
to sy...@googlegroups.com
Pretty much any function in SymPy that can have mathematical
properties written about it could potentially benefit from property
testing. However, a big challenge with this project is the input data
generation (the strategies in hypothesis terminology). Generating
arbitrary SymPy expressions is a difficult problem. There was some
initial work on this at https://github.com/sympy/sympy/pull/17190. But
the problem is that just generating expressions itself can be buggy.
Consider the expression I posted about in another mailing list thread.
It takes 8 seconds just to construct, essentially because the
expression constructor itself is buggy.
https://groups.google.com/g/sympy/c/XSJuvibPOro/m/Q3TTETm7AwAJ

So for now, it's better to actually focus on those functions that take
relatively simple inputs. The simplest possible input is an integer.
For instance, several functions in the ntheory module basically just
take an integer as input. The next simplest is polynomials. The
initial work that has been done on hypothesis testing has been in
these modules, but the work hasn't gone very far and there is still
more that can be done there. So I would suggest starting where there
are existing hypothesis tests and expanding the tests in those parts
of SymPy. We'll want to expand beyond that, but building strategies is
one going of the harder parts of this project.

By the way, if you didn't notice on the idea page, this issue has a
lot more details on hypothesis testing in SymPy
https://github.com/sympy/sympy/issues/20914.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CANENgK7CmETia1vkWPr2pTrN3mNi1r%2B%2B-ALPAcPQcmaw9uvA3w%40mail.gmail.com.

Pradyot Ranjan

unread,
Mar 8, 2025, 4:33:17 AM3/8/25
to sy...@googlegroups.com
I tried using hypothesis to test for prime. The function returns nth prime number, and I tried generating nth prime myself and checked both (here is given by hypothesis). The test passes but the only problem is it takes painfully long to test. I tried limiting n value to 100,000 and it still takes around 40s. We can test composite and other related functions similarly. We can mark these tests as "slow" and run them separately if this is the approach we are looking for.

Aaron Meurer

unread,
Mar 8, 2025, 12:34:55 PM3/8/25
to sy...@googlegroups.com
On Sat, Mar 8, 2025 at 2:33 AM Pradyot Ranjan <rickpri...@gmail.com> wrote:
>
> I tried using hypothesis to test for prime. The function returns nth prime number, and I tried generating nth prime myself and checked both (here is given by hypothesis). The test passes but the only problem is it takes painfully long to test. I tried limiting n value to 100,000 and it still takes around 40s. We can test composite and other related functions similarly. We can mark these tests as "slow" and run them separately if this is the approach we are looking for.

This isn't really the right way to use hypothesis in this context. I'm
assuming this is slow because your prime generating test function is
slow. But what's to say that function is even correct? At best you
could have an obviously correct function that is very slow. Or you'll
just be reimplementing the function that's already in sympy, which is
pointless for a test.

For hypothesis, you should think about properties that a function
should have and test those. For prime generation, you can check that
the output is prime using isprime(). Testing that the nth prime is
actually the nth prime is difficult without actually generating all n
primes. prime() basically already does this itself internally, so
that's not really a point to doing this in a test. You could test some
mathematical bounds. Personally, though, I would focus on some other
functions which have more easy to test properties. Not every function
in SymPy is easy to property test, because not every function has
straightforward properties that can be tested. Instead of trying to
come up with properties for various functions, it would be better to
try to find functions that have a fairly obvious set of properties
that can be tested.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CANENgK6x4TvEZJf%3DC_yosMqQwZwBCx-pxvhbG8%3DAzFB_k6JVKA%40mail.gmail.com.

Pradyot Ranjan

unread,
Mar 8, 2025, 3:03:03 PM3/8/25
to sy...@googlegroups.com
That makes a lot more sense. Thanks!
This would be a better test then, I guess:
I tried hypothesis testing of factorint().  This is what my test method looks like:

@given(n=st.integers())
def test_factorint(n):
factors = factorint(n)
product = 1
for prime, exp in factors.items():
product *= prime ** exp
assert product == n


Test runs for all positive and negative integers. I can extend this to test for kwargs as well. This will eliminate a lot of assert statements here. This test also doesn't take any significant amount of time.

Aaron Meurer

unread,
Mar 8, 2025, 4:20:33 PM3/8/25
to sy...@googlegroups.com
Yes, factorint is a better example of something that can be tested
with hypothesis. It's the example I gave on the issue
https://github.com/sympy/sympy/issues/20914.

It's also a good example of how we can start with something simple and
built out a more rigorous test.

There's other properties that could be added to the test as well, for instance

assert isprime(prime)
assert exp >= 1
assert isinstance(prime, int)
assert isinstance(exp, int)

And we can also test the various flags to factorint.

As for the existing test, for now, we should generally leave any
existing manual tests intact. Hypothesis should be treated as an
extension to manual testing, not a complete replacement. For instance,
some of the assertions in that test you showed are based on specific
inputs that are known to potentially cause issues. Hypothesis might
not necessarily generate an example like them. Plus, you'll notice
that that test is marked as @slow, meaning some of the numbers being
tested are too slow compared to the inputs we might want to generate
from hypothesis.

This is actually one thing that will need to be considered in this
project. Hypothesis tries to always generate "interesting" examples in
its strategies, in addition to random ones. But what hypothesis
considers "interesting" is based on some heuristics that apply to a
broad category of programming. For instance, the "interesting"
integers from st.integers() are things like -1, 0, 1, etc. These are
important to test, but for factorint, we also want to make sure we
test "interesting" integers in terms of their prime factorizations.
This might mean numbers that have both small and large prime factors,
numbers that have many prime factors, and numbers that have very few
prime factors, numbers with factors that are interesting corner cases
in terms of the specific algorithms that are implemented, etc.. Some
of these are not distributed very well on the number line, so we might
have to create a custom strategy that generates them with higher
likelihood. Otherwise, they would basically never be chosen at random.

Hypothesis also limits the size of the maximum integer generated by
integers() (probably to something like 2**64). But factorint can
handle numbers much larger than that. Creating custom input strategies
is going to be a big part of this project, so it's something you
should be thinking about, and learn how to do (it also can be one of
the more challenging parts of using hypothesis effectively). As a
start, I would learn how to run hypothesis in verbose mode, so that
you can see the actual inputs it is generating, then to take a look at
those inputs and try to see if they actually cover all the important
cases for the given function.

The code for factorint is very complex, and testing it rigorously
requires testing a lot of different kinds of corner cases. Hypothesis
is very good at this sort of thing, but it wasn't built with these
specific types of corner cases in mind, so it will need some help to
get there.

Aaron Meurer
> To view this discussion visit https://groups.google.com/d/msgid/sympy/CANENgK4_%3D5Dws%3D3H-Pq3pL4dxBe5Do1SvKWj8eFjX7fqJUVxkA%40mail.gmail.com.

Pradyot Ranjan

unread,
Mar 9, 2025, 3:56:46 AM3/9/25
to sy...@googlegroups.com
So my next steps should be:
- Trying to test other aspects of factorint() the one mentioned above.
- Learning and using strategies for generating "interesting" integers in case of factorint()
- Run the hypothesis in verbose mode for more information on generated values.

Should I open a PR for hypothesis testing of factorint()? In that way, we can track progress.
I also discovered another function that can be tested: digits(). A simple example: digits(2345, 34) == [34, 2, 0, 33] can be easily tested by generating N,n (N is the number and n is the base), then calculating accordingly to check the assertion. This can also benefit from hypothesis IMO. Let me know what you think.

Aaron Meurer

unread,
Mar 9, 2025, 1:48:16 PM3/9/25
to sympy
Yes, opening a PR would be a good idea. It will be easier to discuss these ideas on a PR.

Aaron Meurer
Reply all
Reply to author
Forward
0 new messages