Itwas a natural inclination to task a large language model (LLM) like CHATGPT with creating a poem that delves into the topic of large language models, and subsequently utilize said poem as an introductory piece for this article.
We went straight to the source: MIT assistant professor and CSAIL principal investigator Jacob Andreas, whose research focuses on advancing the field of natural language processing, in both developing cutting-edge machine learning models and exploring the potential of language as a means of enhancing other forms of artificial intelligence. This includes pioneering work in areas such as using natural language to teach robots, and leveraging language to enable computer vision systems to articulate the rationale behind their decision-making processes. We probed Andreas regarding the mechanics, implications, and future prospects of the technology at hand.
Maybe one of the most surprising components here is this phenomenon called in-context learning. If I take a small ML [machine learning] dataset and feed it to the model, like a movie review and the star rating assigned to the movie by the critic, you give just a couple of examples of these things, language models generate the ability both to generate plausible sounding movie reviews but also to predict the star ratings. More generally, if I have a machine learning problem, I have my inputs and my outputs. As you give an input to the model, you give it one more input and ask it to predict the output, the models can often do this really well.
A: It's well-documented that these models hallucinate facts, that they're not always reliable. Recently, I asked ChatGPT to describe some of our group's research. It named five papers, four of which are not papers that actually exist, and one of which is a real paper that was written by a colleague of mine who lives in the United Kingdom, whom I've never co-authored with. Factuality is still a big problem. Even beyond that, things involving reasoning in a really general sense, things involving complicated computations, complicated inferences, still seem to be really difficult for these models. There might be even fundamental limitations of this transformer architecture, and I believe a lot more modeling work is needed to make things better.
Why it happens is still partly an open question, but possibly, just architecturally, there are reasons that it's hard for these models to build coherent models of the world. They can do that a little bit. You can query them with factual questions, trivia questions, and they get them right most of the time, maybe even more often than your average human user off the street. But unlike your average human user, it's really unclear whether there's anything that lives inside this language model that corresponds to a belief about the state of the world. I think this is both for architectural reasons, that transformers don't, obviously, have anywhere to put that belief, and training data, that these models are trained on the internet, which was authored by a bunch of different people at different moments who believe different things about the state of the world. Therefore, it's difficult to expect models to represent those things coherently.
All that being said, I don't think this is a fundamental limitation of neural language models or even more general language models in general, but something that's true about today's language models. We're already seeing that models are approaching being able to build representations of facts, representations of the state of the world, and I think there's room to improve further.
Q: The pace of progress from GPT-2 to GPT-3 to GPT-4 has been dizzying. What does the pace of the trajectory look like from here? Will it be exponential, or an S-curve that will diminish in progress in the near term? If so, are there limiting factors in terms of scale, compute, data, or architecture?
A: Certainly in the short term, the thing that I'm most scared about has to do with these truthfulness and coherence issues that I was mentioning before, that even the best models that we have today do generate incorrect facts. They generate code with bugs, and because of the way these models work, they do so in a way that's particularly difficult for humans to spot because the model output has all the right surface statistics. When we think about code, it's still an open question whether it's actually less work for somebody to write a function by hand or to ask a language model to generate that function and then have the person go through and verify that the implementation of that function was actually correct.
After visiting my nephews for easter, I spent the drive back home thinking about the future they will grow up in. What will their computers look like? What kind of software will they use? Will any of my code still be running?
When I started the SerenityOS project in 2018, I used C++ for everything, simply because it was the language I was most comfortable with. It was the right choice at the time, as it allowed me to bootstrap the project (and a community) very quickly and efficiently.
I tried rewriting parts of SerenityOS in different languages, and while there were some interesting options, they all came with idiosyncratic limitations and dependencies that made them unsuitable for adoption.
In this popular audio program, Connirae Andreas elegantly teaches and demonstrates some of the most powerful linguistic patterns developed in NLP. By practicing these Advanced Language Patterns, you will learn a range of new ways to usefully communicate in everyday life.
Not only are these skills a must for coaches, therapists, teachers, and sales professionals, they are also useful for parents, husbands, ministers, managers, and anyone who uses words to communicate (no offense intended to mimes).
When you practice these precise aspects of language until you really integrate each one into your ongoing behavior, you will find yourself having fewer communication glitches with others as well as being more helpful and positively influential:
This ever popular area of NLP covers how language patterns can be used to creatively and effectively gather information and influence across a variety of contexts; from therapy to business, teaching to training, parenting to general communicating with others, in both personal and professional settings.
1) A single file in audiobook format (.m4b). This format is ideal for use with iTunes, an iPhone/iPod, or any compatible portable media player or smartphone. The .m4b audiobook file format automatically bookmarks your place (in most players), and can even be played at different speeds (faster or slower, depending on your learning needs).
Professor Andreas Andreou is the co-founder of the Johns Hopkins University Center for Language and Speech Processing. Research in the Andreou lab is aimed at brain inspired microsystems for sensory information and human language processing. Notable microsystems achievements over the last 25 years include a contrast sensitive silicon retina, the first CMOS polarization sensitive imager, silicon rods in standard foundry CMOS for single photon detection, and a large scale mixed analog/digital associative processor for character recognition. Significant algorithmic research contributions in pattern analysis and machine intelligence include the vocal tract normalization technique for speech recognition and heteroscedastic linear discriminant analysis, a derivation and generalization of Fisher discriminants in the maximum likelihood framework.
While recent years have seen tremendous progress on tasks like automatic translation and speech recognition, current artificial intelligence systems still fall far short of humans' ability to learn language and to learn from language about the rest of the world. MIT's Language and Intelligence Group, led by Prof. Jacob Andreas, is working towards future in which everyone can interact with software using the languages they already speak.
Prof. Andreas, a member of CSAIL and the department of Electrical Engineering and Computer Science, is the X Consortium assistant professor at MIT in the EECS and in CSAIL. Before joining MIT, he earned his PhD at Berkeley, where he was a member of the Berkeley NLP group and the Berkeley AI Research Lab. He has been the recipient of a Samsung's AI Researcher of the Year award, MIT's Kolokotrones teaching award, and paper awards at NAACL and ICML.
I'm interested in language as a communicative and computational tool. People learn to understand and generate novel utterances from remarkably little data. Having learned language, we use it acquire new concepts and to structure our reasoning. Current machine learning techniques fall short of human abilities in both their capacity to learn language and learn from language about the rest of the world. My research aims to understand the computational foundations of language learning, and to build general-purpose intelligent systems that can communicate effectively with humans and learn from human guidance.
I'm an associate professor at MIT in EECS and CSAIL. I did my PhD work at Berkeley, where I was a member of the Berkeley NLP Group and the Berkeley AI Research Lab. I've also spent time with the Cambridge NLIP Group, and the NLP Group and the (erstwhile) Center for Computational Learning Systems at Columbia.
Much of what humans know (and know how to do) comes not from observation, but rich supervision provided in language by skilled teachers. But almost all machine learning research focuses on learning from comparatively low-level demonstrations or interactions. How do we enable more natural and efficient learning from natural language supervision instead?
What tools do we need to help humans understand the features and representational strategies that black-box machine learning algorithms discover? To what extent do these strategies reflect abstractions that we already have names for?
3a8082e126