Download Large File

0 views
Skip to first unread message

Kylee Evancho

unread,
Aug 3, 2024, 4:10:27 PM8/3/24
to redsinapun

The Guardian 20 Inch Large Bike has a lightweight steel frame, making it easy to control and balance. With a larger frame and a 6-speed easy-to-twist gear shifter, the 20" Large is great for an older more advanced rider. The bike is designed for on and off road use. Featuring our single-lever SureStop Brake System and kid-specific geometry, the Guardian ETHOS is ready for any adventure!

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license. You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

One thing that makes large language models (LLMs) so powerful is the diversity of tasks to which they can be applied. The same machine-learning model that can help a graduate student draft an email could also aid a clinician in diagnosing cancer.

However, the wide applicability of these models also makes them challenging to evaluate in a systematic way. It would be impossible to create a benchmark dataset to test a model on every type of question it can be asked.

In a new paper, MIT researchers took a different approach. They argue that, because humans decide when to deploy large language models, evaluating a model requires an understanding of how people form beliefs about its capabilities.

Their results indicate that when models are misaligned with the human generalization function, a user could be overconfident or underconfident about where to deploy it, which might cause the model to fail unexpectedly. Furthermore, due to this misalignment, more capable models tend to perform worse than smaller models in high-stakes situations.

Rambachan is joined on the paper by lead author Keyon Vafa, a postdoc at Harvard University; and Sendhil Mullainathan, an MIT professor in the departments of Electrical Engineering and Computer Science and of Economics, and a member of LIDS. The research will be presented at the International Conference on Machine Learning.

As a starting point, the researchers formally defined the human generalization function, which involves asking questions, observing how a person or LLM responds, and then making inferences about how that person or model would respond to related questions.

They showed survey participants questions that a person or LLM got right or wrong and then asked if they thought that person or LLM would answer a related question correctly. Through the survey, they generated a dataset of nearly 19,000 examples of how humans generalize about LLM performance across 79 diverse tasks.

They found that participants did quite well when asked whether a human who got one question right would answer a related question right, but they were much worse at generalizing about the performance of LLMs.

People were also more likely to update their beliefs about an LLM when it answered questions incorrectly than when it got questions right. They also tended to believe that LLM performance on simple questions would have little bearing on its performance on more complex questions.

In the meanwhile, the researchers hope their dataset could be used a benchmark to compare how LLMs perform related to the human generalization function, which could help improve the performance of models deployed in real-world situations.

Less than one percent of Earth's water is fresh, liquid, and accessible, and most of that is found in huge lakes scattered around the globe. The Large Lakes Observatory expands and communicates knowledge about the past, present and future of large lakes worldwide.

The Large Lakes Observatory (LLO) has a unique mission: to perform scientific study of the largest lakes of Earth. It is one of the largest water-centered research units at the university and its impact has been felt all over the world.

The faculty, staff, and students of course use their human eyes to observe, but their senses also are extended in fascinating ways by the use of specialized observational platforms and techniques, some of which we will encounter here. Indeed, unusual skills and uncommon equipment often are needed to explore these large, sometimes remote, lake environments. Coordinated teams of investigators may take advantage of remote or autonomous sensors that extend their vision beyond what a single human alone can take in at a given moment. They use specialized equipment to make measurements of the chemistry, biology, and physics of large lakes. Such tools of the trade are not available everywhere, but they are central to the scientists of LLO.

All proposals must be submitted in accordance with the requirements specified in this funding opportunity and in the NSF Proposal & Award Policies & Procedures Guide (PAPPG) that is in effect for the relevant due date to which the proposal is being submitted. It is the responsibility of the proposer to ensure that the proposal meets these requirements. Submitting a proposal prior to a specified deadline does not negate this requirement.

The NSF CISE Directorate supports research and education projects that develop new knowledge in all aspects of computing, communications, and information science and engineering through core programs. The core programs for the participating CISE divisions include:

This solicitation invites proposals on bold new scientific ideas tackling ambitious fundamental research problems that cross the boundaries of two or more CISE core programs listed above. These problems must be well suited to large-scale integrated collaborative efforts. Teams should consist of two or more investigators (PI, co-PI(s), or other Senior/Key Personnel) with complementary expertise. Investigators are strongly encouraged to combine their creative talents and complementary expertise to identify compelling and transformative research approaches where the impact of the results will exceed that of the sum of each of their individual contributions. Investigators are especially encouraged to seek out partnerships in a wide class of institutions that would together produce innovative approaches to the proposed research.

c80f0f1006
Reply all
Reply to author
Forward
0 new messages