Building state-of-the-art-models isn’t as much a proof of having superior architecture as having superior datasets. Everybody says that we need good datasets but nobody clearly defines what good datasets actually mean. In this talk, Abhishek will cover exactly what it means to say a dataset is good, and what goes into curating and refining to build models that are state-of-the-art. He will also discuss the best existing pretraining/SFT datasets in the field of LLMs.
Abhishek will cover the following topics:
The primary takeaway from this talk is the skillset of curating/refining your own high quality datasets.
This talk is useful for anybody in the industry who is looking to build in-house best models for their tasks or generic state-of-the-art models in the field of LLMs.
The talk will be held online, at 6 PM, via Zoom and YouTube. RSVP to participate - http://has.gy/at1k