Dear HSF organisers and community,
I have created an architectural framework for data analysis in Python called Lena. I presented it at PyHEP.dev 2024 [1].
It is based on functional programming and has as its core features:
- lazy evaluation. This prevents loading all data into memory and enables optimizations before execution,
- metadata. A scientist does not have to track all metadata in a large analysis manually (existing tools don't allow that),
- computational pipelines as first-class citizens.
For a user, a framework takes many accounting and architectural decisions, leading to a more structured, maintainable and reusable code. It also provides tools and optimisations a scientist wouldn't write themselves. There is no existing framework for data analysis (in the sense of inversion of control); see terminology in [2].
I have learnt that recently NVIDIA introduced lazy evaluation as a core part of their Python libraries. That allows efficient kernel fusion. I have run a benchmark that a global optimization could increase performance up to 50% [3]. I have also recently contacted AMD [4], maybe they get more interested.
Functional programming is closely connected with parallel processing; that is why I'm reaching out to organisations supporting accelerated computing.
Could someone recommend organisations or specific teams/individuals who would be interested to support or integrate the framework?
I'm based in Germany if that is important.
Thanks for your support.
Links:
-- Best regards,
Yaroslav Nikitenko