Hello, I am a GSoC 2026 applicant interested in working on this project.
I have experience with Python and machine learning, and I am currently exploring the SeqTrainer repository and SBOL-based datasets to better understand the workflow. I am particularly interested in working on dataset preparation, tokenizer development, and evaluating baseline models such as DNABERT.
I would appreciate any guidance on how to get started, especially regarding recommended initial tasks or areas where I can contribute effectively. I would also be happy to take on a small task or issue to begin contributing.
Thank you!