? KITAB is a challenging dataset and a dynamic data collection approach for testing abilities of Large Language Models (LLMs) in answering information retrieval queries with constraint filters. A filtering query with constraints can be of the form "List all books written by Toni Morrison that were published between 1970-1980". The dataset was originally contributed by the paper "KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval" Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, and Besmira Nushi. 2023. The dataset is named after the word kitab, which is the word for "book" in Arabic, Swahili, Urdu, Hindi and various Indian and Turkic languages.
KITAB consists of book-related data across more than 600 authors and 13,000 queries with varying number of constraints and complexity. In each query in the dataset, the first constraint is always fixed to an author and the following can vary among the following types of book constraints to test for different constraint satisfaction capabilities:
ALL-BOOKS (Template 1): List all books from the author. This condition enables us to estimate an upper bound of model performance in retrieving relevant information for all queries, regardless of other constraints.
WITH-CONTEXT (Template 2b): First, provide a full list of books from the author as input context to the model. Then, ask the model to list all books from the author that also satisfy other book constraints.
The author list was initially randomly sampled from WikiData and then filtered down to 611 authors to avoid potentially inaccurate data and extreme outliers. For example, this involved removing authors that have very few or too many books and authors that were born before 1850. The collected book data was derived from Open Library and contains all books from the author that are tagged to be in English by Open Library or detected to be in English by the Language Detection service from the Azure Cognitive Services API. More details about author sampling and book data collection and cleaning are present in the paper.
Data Cleaning: Despite our best efforts in collecting a complete and accurate set of books, we also faced a variety of challenges in retrieval and cleaning, which we further describe in Appendix C.1 in the paper. To estimate the extent of which potential data cleaning issues may impact the data quality of KITAB and further evaluation, we also undertook a manual data annotation exercise during which we searched on the web for titles provided by GPT4 and GPT3.5 but that were marked as not from the author in our dataset. In summary, we find that based on a manual annotation of a subsample of queries, less than 5% of the queries to GPT4 and less than 6% of the queries to GPT3.5 may potentially be affected by cases where the model finds a book title that is not in KITAB and that will consequentially be marked as not from the author during our evaluation. While this can be remediated by using further data sources, the impact of missing information on model comparison is minor.
Human Names: Entity recognition for human names was done using both Azure Cognitive Services API and GPT4 (Template 4 in Appendix D in the paper), as we found the two approaches to be complementary for detecting names from different cultures. Note that even after using both these resources, there may still be names that are not recognized by either of these APIs, which is a testimony that more work is required in improving the quality of service of entity recognition for fairness across different languages and cultures.
Author representation: The list of authors in KITAB was sampled randomly from a large set of authors present in Open Library. We see that the rate of irrelevant information generated by current models increases with a lower number of sitelinks in Wikidata. Since the number of sitelinks may also correlate with the age (birth year) of the author or even their nationality and how well their community is linked to the World Wide Web, this observation has important implications on model quality of service across different geographical regions and author popularity and age. While KITAB naturally does contain more authors with a lower number of sitelinks (as indicated by its long-tail distribution of author count vs. their popularity), future fairness measurement investigations in this regard may also need to oversample explicitly from cohorts belonging to given demographic and geographical attributes.
It is located in what was once Avalon International Breads' flagship store on West Willis Street, between Cass and Second, in Midtown. After 25 years, that location closed its doors last January and moved into a shared space on West Canfield Street inside the Jolly Pumpkin restaurant.
For Alwhysee and Almulaiki, the spot is significant. They both lived in the Midtown neighborhood of Detroit while Almulaiki was a student at Wayne State University and fell in love with Avalon and the nearby neighborhood. Alwhysee remembers lines out of the cafe's door. He would grab an oat latte and see familiar faces. So, when it closed, it was a major loss for the community, he said.
The new store is expected to open in early February with a larger menu, including made-to-order sandwiches. Alwhysee and Almulaiki didn't intend to expand so quickly. But when they learned of the opportunity to rent the location they thought it would be a good fit. They wanted to provide a space for that neighborhood as well.
Kitab Cafe and Bookstore offers a selection of coffee, pastries and sandwiches. On the menu: the Adeni chai, a spiced and milky Yemeni tea curated by Almulaiki, inspired by the drink the couple grew up with in their homes; chilled lattes on tap from La Colombe Coffee Roasters, and sandwiches on toasted Zingerman's bread. On the shelves: books about Islam, personal development and nonfiction reads.
"We wanted to open up a space in Hamtramck that not only serves great coffee, not only has great lunch options, great pastry options, but a place that serves as a community space for like-minded individuals that want to have meaningful conversation and are focused on personal growth, like spiritually, emotionally, mentally," Almulaiki, 24, said.
Alwhysee and Almulaiki were neighbors growing up in Hamtramck, then moved to Detroit after they got married, where the couple now lives. Detroit and Hamtramck are home for the pair. They don't plan on leaving and want to raise their family in the area, they said.
"We get like the uncles that want like the double double, the coffee connoisseurs, the people that want a really sweet, iced latte. I think that we're able to do that and we're able to accommodate everybody," Almulaiki said.
The word "kitab" means book in Arabic and other languages. People refer to the store as Hamtramck's cafeteria, she said. The wood paneled store, with its green walls and bookshelves, has the original flooring from its past life as a market. It's nestled on Holbrook Avenue just off of Hamtramck's commercial corridor, Jos. Campau Avenue. The roughly 2-square-mile city is home to more than 27,800 people, many of whom are immigrants, from Yemen and Bangladesh.
59fb9ae87f