Download Large Zip File

0 views

Skip to first unread message

Marva Richardt

unread,

Aug 3, 2024, 3:30:48 PM8/3/24

to deomadlittvol

The Guardian 20 Inch Large Bike has a lightweight steel frame, making it easy to control and balance. With a larger frame and a 6-speed easy-to-twist gear shifter, the 20" Large is great for an older more advanced rider. The bike is designed for on and off road use. Featuring our single-lever SureStop Brake System and kid-specific geometry, the Guardian ETHOS is ready for any adventure!

Images for download on the MIT News office website are made available to non-commercial entities, press and the general public under a Creative Commons Attribution Non-Commercial No Derivatives license. You may not alter the images provided, other than to crop them to size. A credit line must be used when reproducing images; if one is not provided below, credit the images to "MIT."

One thing that makes large language models (LLMs) so powerful is the diversity of tasks to which they can be applied. The same machine-learning model that can help a graduate student draft an email could also aid a clinician in diagnosing cancer.

However, the wide applicability of these models also makes them challenging to evaluate in a systematic way. It would be impossible to create a benchmark dataset to test a model on every type of question it can be asked.

In a new paper, MIT researchers took a different approach. They argue that, because humans decide when to deploy large language models, evaluating a model requires an understanding of how people form beliefs about its capabilities.

Their results indicate that when models are misaligned with the human generalization function, a user could be overconfident or underconfident about where to deploy it, which might cause the model to fail unexpectedly. Furthermore, due to this misalignment, more capable models tend to perform worse than smaller models in high-stakes situations.

Rambachan is joined on the paper by lead author Keyon Vafa, a postdoc at Harvard University; and Sendhil Mullainathan, an MIT professor in the departments of Electrical Engineering and Computer Science and of Economics, and a member of LIDS. The research will be presented at the International Conference on Machine Learning.

As a starting point, the researchers formally defined the human generalization function, which involves asking questions, observing how a person or LLM responds, and then making inferences about how that person or model would respond to related questions.

They showed survey participants questions that a person or LLM got right or wrong and then asked if they thought that person or LLM would answer a related question correctly. Through the survey, they generated a dataset of nearly 19,000 examples of how humans generalize about LLM performance across 79 diverse tasks.

They found that participants did quite well when asked whether a human who got one question right would answer a related question right, but they were much worse at generalizing about the performance of LLMs.

People were also more likely to update their beliefs about an LLM when it answered questions incorrectly than when it got questions right. They also tended to believe that LLM performance on simple questions would have little bearing on its performance on more complex questions.

In the meanwhile, the researchers hope their dataset could be used a benchmark to compare how LLMs perform related to the human generalization function, which could help improve the performance of models deployed in real-world situations.

All proposals must be submitted in accordance with the requirements specified in this funding opportunity and in the NSF Proposal & Award Policies & Procedures Guide (PAPPG) that is in effect for the relevant due date to which the proposal is being submitted. It is the responsibility of the proposer to ensure that the proposal meets these requirements. Submitting a proposal prior to a specified deadline does not negate this requirement.

The NSF CISE Directorate supports research and education projects that develop new knowledge in all aspects of computing, communications, and information science and engineering through core programs. The core programs for the participating CISE divisions include:

This solicitation invites proposals on bold new scientific ideas tackling ambitious fundamental research problems that cross the boundaries of two or more CISE core programs listed above. These problems must be well suited to large-scale integrated collaborative efforts. Teams should consist of two or more investigators (PI, co-PI(s), or other Senior/Key Personnel) with complementary expertise. Investigators are strongly encouraged to combine their creative talents and complementary expertise to identify compelling and transformative research approaches where the impact of the results will exceed that of the sum of each of their individual contributions. Investigators are especially encouraged to seek out partnerships in a wide class of institutions that would together produce innovative approaches to the proposed research.

An individual may participate as PI, co-PI, or Senior/Key Personnel in no more than one Core Programs, Large Projects proposal submitted to each deadline window. Note that limits on participation apply only to this solicitation, and do not carry over from other solicitations that have limits.

These eligibility constraints will be strictly enforced in order to treat everyone fairly and consistently. Any proposal that exceeds this limit at the time of submission for any PI, co-PI, or Senior/Key Personnel will be returned without review. No exceptions will be made.

Hi - I need to filter out the large files stored in my local app so I can find some to delete, free up space and start syncing with the web browser version again. I can't see an option to identify where these large files are - some folders don't show the amount of storage space they're taking so no clues that way - does anyone know a way to do this please? Thanks in advance!