A tool for automatic sequence analysis

31 views
Skip to first unread message

Dmitry

unread,
Oct 27, 2025, 2:51:14 PM (11 days ago) Oct 27
to SeqFan
Hello everyone,
I am working on a Python tool for automatic sequence analysis and would like to share it with the community. Its goal is to identify the underlying structure of a sequence by applying a wide range of analyses.
GitHub Repository: https://github.com/FekDN/SequenceAnalyzer
The tool is open source, and I would be very grateful for your expert feedback. I am particularly interested in:
 - Sequences where the current analysis fails or gives incorrect results.
 - Suggestions for new types of analysis that would be valuable for OEIS contributors.
Thank you.

Tomas Rokicki

unread,
Oct 27, 2025, 3:59:54 PM (11 days ago) Oct 27
to seq...@googlegroups.com
Does it do Berlekamp-Massey, for instance?

-tom


--
You received this message because you are subscribed to the Google Groups "SeqFan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to seqfan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/seqfan/44445d1d-2b66-429b-b616-475ac1a1de4en%40googlegroups.com.


--

Dmitry

unread,
Oct 27, 2025, 4:36:36 PM (11 days ago) Oct 27
to seq...@googlegroups.com
The analyzer doesn't use the Berlekamp-Massey algorithm directly.

For finding linear recurrences, it currently uses a statistical approach: it fits autoregressive models of different orders using least squares and selects the best one based on the Akaike Information Criterion.

BM is an algebraic method that's perfect for finding the exact minimal recurrence for a sequence, while my current approach is more of a statistical best-fit that's robust to noise or slight deviations from a perfect recurrence.

Adding Berlekamp-Massey is an excellent suggestion for a future improvement. Thank you for pointing it out!

пн, 27 окт. 2025 г., 21:59 Tomas Rokicki <rok...@gmail.com>:
You received this message because you are subscribed to a topic in the Google Groups "SeqFan" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/seqfan/fzqIDzQWycE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to seqfan+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/seqfan/CAGia-%3DWch4RuQmLp12umJ4Cz%2Br230BXvrPtb2gU%2BmKsmbfipdA%40mail.gmail.com.

Antti Karttunen

unread,
Oct 27, 2025, 9:03:12 PM (11 days ago) Oct 27
to seq...@googlegroups.com

Thanks, interesting!

I wonder what are the minimal amounts of terms you recommend for some of those analyzes?
to your program, so to make it easier to run it on the b-files.


Best regards,

Antti


Dmitry

unread,
Oct 28, 2025, 6:02:20 AM (10 days ago) Oct 28
to seq...@googlegroups.com
Thank you, that's an excellent question! I'll add this information to
the project description.

The analyzer uses a variety of methods, each with its own data
requirements. Generally, the more complex the pattern a method seeks,
the more data it needs to produce a reliable result.

Here is a brief guide to the recommended minimum sequence lengths for
different types of analysis:

1. Foundational Metrics (5-15+ terms):
Functions: local_dynamics (5), statistical_moments (2),
dependency_model (polynomials, 5-10), anomalies (4).
Why: This is enough data to calculate basic moments (mean,
variance), derivatives (velocity, acceleration), and fit simple
curves. These concepts are not meaningful on fewer points.

2. Core Time Series Analysis (20-40+ terms):
Functions: stationarity (20), cyclicity_and_seasonality (approx.
20-30), entropy_analysis (20), spectral_analysis (20),
structural_breaks (20), complexity_analysis (20),
stream_structure_analysis (30), state_segmentation (HMM, ~36).
Why: These methods need sufficient data to establish a
"statistical regime." Stationarity tests must distinguish trends from
noise, cycle detection needs to see at least one or two full periods,
and HMM needs enough samples to learn the characteristics of each
hidden state.

3. Advanced & Structural Analysis (50+ terms):
Functions: nonlinear_fractal_analysis (Hurst/DFA, 50),
volatility_analysis (GARCH, 50), empirical_mode_decomposition (50),
hybrid_model_analysis (50).
Why: These methods model more complex properties. GARCH analyzes
the volatility of returns, fractal analysis investigates
self-similarity and scaling properties, and EMD decomposes the signal
into nested oscillatory modes—all of which require a longer data
history to be reliable.

4. Highly Demanding (Chaos & Attractor) Analysis (100+ terms):
Functions: nonlinear_fractal_analysis (Correlation
Dimension/Lyapunov Exponent, 100).
Why: These methods attempt to reconstruct a system's attractor in
phase space from a single time series. This is an extremely
data-hungry task, and results on shorter sequences would be
statistically insignificant.

In general, longer sequences yield more reliable and insightful
results, especially for the advanced methods.

пн, 27 окт. 2025 г. в 18:03, Antti Karttunen <antti.k...@gmail.com>:
> To view this discussion visit https://groups.google.com/d/msgid/seqfan/CAB%2B0_%3Dn1Ng3P5sVsfP6-fZChkkmN%2Bm6PkPGAVajY8NSH8-Cm9A%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages