Repurposing Bio Paper webcrawls

11 views
Skip to first unread message

Jonathan Cline

unread,
Mar 14, 2026, 2:47:12 PM (9 hours ago) Mar 14
to DIYbio
A couple paragraphs in a recent Scientific American article caught my eye:

"Mathematicians find one pi formula to rule them all.
A mixture of AI and algorithms uncovered a hidden structure spanning 2,000 years of equations for pi"

"The group, who also have backgrounds in areas such as physics and math, approached the problem like experimentalists and decided to gather a dataset. Tomer Raz, then a master’s student at Technion, wrote code to download every math paper that had ever been uploaded to the preprint server arXiv.org, running his laptop seven days a week, 24 hours a day, for six weeks to download 455,050 papers at a slow enough rate to respect the website’s limit.

The group then deployed GPT-4o in combination with specialized algorithms to detect pi-related equations, translate them into executable code, and remove trivial duplicates. From nearly half a million papers, they extracted 385 unique formulas, including about 10 percent that originated from the Ramanujan Machine."


Some of you, already having written spidering code long ago and already downloaded every PDF published Bio paper from every major publisher since the 1970s, might want to ponder what new things to do with those Bio PDF's.



-- 
## Jonathan Cline
## jcl...@ieee.org
## Mobile: +1-805-617-0223
########################

Reply all
Reply to author
Forward
0 new messages