I am Naman Rajput, a student interested in the GSoC 2026 project “Classical Mechanics: Efficient Equations of Motion Generation.” Before writing my proposal, I spent the past week setting up a dev install and profiling KanesMethod. I wanted to share my findings and get mentor feedback early.
Problem: N-link planar pendulum, cold-cache timing, KanesMethod.kanes_equations()
The exponent alpha=4.0 at N=4→5 matches the expected O(N^4)–O(N^5) behavior from symbolic expression swell in _form_frstar. Projecting to N=7 (a typical robot manipulator) gives estimated runtimes of 10–30 seconds per kanes_equations() call, making iterative model development impractical.
Side finding: SymPy’s @cacheit mechanism can mask true EoM generation cost by up to 3x within the same session. Reliable timing requires clear_cache() between runs or a fresh process.
I monkey-patched _form_frstar, _form_fr, msubs, and time_derivative at runtime using line_profiler without modifying any SymPy source files.
Top confirmed bottlenecks:
In _form_frstar (kane.py): L486 body.masscenter.acc(N) 27.3% of runtime
Triggers time_derivative twice (vel → acc chain), uncached.
Recomputed fresh on every kanes_equations() call per body.
L515 fr_star = -(MM * msubs(...) + nonMM) 22.6% of runtime
Symbolic N×N matrix multiply on fully expanded MM.
CSE before this step could reduce effective expression size.
L496 MM[j,k] += M * tmp_vel.dot(...) 11.1% of runtime
O(N²) inner loop — 27 dot products at N=3, grows as N² × bodies.
Partial velocities recomputed independently from _form_fr.
In time_derivative (vector/functions.py): L207 ang_vel_in(frame) ^ Vector([v]) 51.4% of td runtime
Rotating-frame cross product, called 42 times for a 3-link system.
L203 express(v[0], frame, ...).diff(t) 39.6% of td runtime
Symbolic diff + frame re-expression on already-expressed vectors.
PROPOSED OPTIMIZATIONS1. Cache acc() per (body, frame) pair on the KanesMethod instance.
Eliminates the 27.3% cost at L486 after the first computation.
2. Share partial velocities between _form_fr and _form_frstar.
Both methods compute them independently today. A pre-computed cache
eliminates the O(N²) redundancy at L496.
3. Apply sympy.cse() before the final matrix multiply at L515.
Extracts repeated sub-expressions from MM before the N×N multiply,
reducing the effective expression size.
QUESTIONS FOR MENTORS
1. Has partial velocity caching been attempted before? I want to check
for prior correctness issues with dependent or auxiliary speeds.
2. Where should a benchmark suite live? I plan to build an asv benchmark
for the n-link pendulum — is sympy/benchmarks/ the right location?
3. For acc() caching, should the cache live on the KanesMethod instance
(reset per kanes_equations() call) or be module-level keyed by
expression identity?
I am currently looking at open issues labeled ‘mechanics’ on GitHub to open a small first PR this week.
Thank you for the detailed codebase, I look forward to any feedback before I finalize my proposal.
Best regards,
Naman Rajput
[GitHub:
https://github.com/NamanRajput-git/]
[HBTU Kanpur, India]