Key Code Translator

0 views

Skip to first unread message

Virgil Gardiner

unread,

Aug 5, 2024, 1:09:35 AM8/5/24

to naganimac

Writinga compiler that produces (really ugly) C as its output probably isn't trivial -- a compiler rarely is, and generating code for Python will be more difficult than for a lot of other languages (dynamic typing, in particular, is hard to compile, at least to very efficient output). OTOH, at least the parser will be a lot easier than for some languages.

If by "translating", you mean converting Python to C that's readable and maintainable, that's a whole different question -- it's substantially more difficult, to put it mildly. Realistically, I doubt any machine translation will be worth much -- there are just too large of differences in how you normally approach problems in Python and C for there to be much hope of a decent machine translation.

Have a look at Shedskin. It does exactly that (well, to C++ and for a subset of Python and its modules). But it should be able to provide valuable insight as how to approach this particular problem (although writing your own will certainly not be a trivial task).

Everyone1 is about to get access to the single most useful, interesting mode of AI I have used - ChatGPT with Code Interpreter Advanced Data Analytics (the name has been updated, I am not going to change the post beyond this first instance of the old name). I have had the alpha version of this for a couple months (I was given access as a researcher off the waitlist), and I wanted to give you a little bit of guidance as to why I think this is a really big deal, as well as how to start using it.

Specifically, it gives the AI a general-purpose toolbox to solve problems (by writing code in Python), a large memory to work with (you can upload files up to 100MB, and those can be in compressed form) and integrates that toolbox into the AI in ways that play to the strengths of Large Language Models. This helps address a number of problems that previous versions of ChatGPT had:

It allows the AI to do math (very complex math) and do more accurate work with words (like actually counting words in a paragraph), since it can write Python code to address the natural weaknesses of Large Language Models in math and language. And it is really good at using this tool appropriately, as you can see below.

It makes the AI much more versatile. A remarkable number of problems can be solved with code, and GPT-4 is very good at figuring out when to use Code Interpreter in novel and interesting ways. For example, I asked it to prove to a doubter that the Earth is round with code, and it provided multiple arguments, integrating the text with code and images.

Code Interpreter is an impressive data scientist. I have been using it extensively over the past months, and it is operating at a very advanced level, automating a lot of the complexity of quantitative analysis, and capable of very sophisticated approaches to data. As one way of of illustrating this, I started with a fun dataset, a public domain list of superheroes and their powers. You can download it if you want to try these steps with me.

It is easy to upload data, even compressed data like a ZIP file, by hitting the plus button. You should include an initial prompt with the data, but it can be pretty minimal, I literally used Here is some data on superhero powers, look through it and tell me what you find and got good results. If you have a data dictionary, you can just paste that in, too. The AI is good at figuring out the meaning and structure of the data from context alone.

Now that we have the data loaded, we can have GPT do the worst part of any data analysis job: data merging and cleaning. It will handle this all automatically in a quite sophisticated way, but I find it usually helps to ask directly, as if I was directing a human data analyst. You will also note something really important about the way the system works - it is relentless, usually correcting its own errors when it spots them. It notices, for example, that columns are misnamed and fixes that issue. Impressive as this is, I would still recommend double-checking the results and process, rather than blindly trusting the AI.

Now, on to an analysis. The AI seems knowledgeable about analytical approaches - it is worth reading the exchange below to see what I mean. I prompted I am interested in doing some predictive modelling, where we can predict what powers a hero might have based on other factors. how should we approach this? and it built a Random Forest classifier - cool! But you can also see why it is important to have expert human oversight, since I would diagree with its decision to impute missing data by using the means for numerical data. I would have dropped the data instead, but I could ask the AI to change its approach, or discuss alternate options.

The level of interactivity continues for visualizations, you can go back and forth with the AI asking for improvements and changes. For example, I prompted Create an interactive dashboard with at least 6 insightful charts, including one in 3D. Make the dashboard beautiful. It produced a dashboard, but not exactly what I wanted. So I was able to just ask for changes in English: make this better. include more names, etc. You will also notice that it gave me a downloadable file for the interactive dashboard (you can try it at the link), which I just put in a web browser and it worked - downloadable outputs are another neat trick of Code Interpreter.

And a few more experiments I have done over the past months: visualizing the song of the summer with a 3D interactive plot, building interactive maps, interpreting the Iliad, causal analysis, making animated GIFs from data, analyzing Magic the Gathering, racing bar charts, and a lot more besides.

This is just scratching the surface of Code Interpreter, which I think is the strongest case yet for a future where AI is a valuable companion for sophisticated knowledge work. Things that took me weeks to master in my PhD were completed in seconds by the AI, and there were generally fewer errors than I would expect from a human analyst. Human supervision is still vital, but I would not do a data project without Code Interpreter at this point.

But it is just as clear to me that humans are not going to be replaced by Code Interpreter. Instead, the AI does what we always hope automation will do - free us from the most annoying, repetitive parts of our job so we can focus on the good stuff. By simplifying the process of analysis, I can do more and deeper and more satisfying work. My time becomes more valuable, not less, as I can concentrate on what is important, rather than the rote. Code Interpreter represents the clearest positive vision so far of what AIs can mean for work: disruption, yes, but disruption that leads to better, more meaningful work. I think it is important for all of us to think about how we can take this same approach to other jobs that will be impacted by AI.

I was most struck by this comment at the end: "But it is just as clear to me that humans are not going to be replaced by Code Interpreter. Instead, the AI does what we always hope automation will do - free us from the most annoying, repetitive parts of our job so we can focus on the good stuff. By simplifying the process of analysis, I can do more and deeper and more satisfying work. My time becomes more valuable, not less, as I can concentrate on what is important, rather than the rote. Code Interpreter represents the clearest positive vision so far of what AIs can mean for work: disruption, yes, but disruption that leads to better, more meaningful work. I think it is important for all of us to think about how we can take this same approach to other jobs that will be impacted by AI."

Given that you seem quite impressed by the software in this (its most basic) level, and knowing that the goal of OpenAI is to create a meta-human intelligence with tools like ChatGPT and Code Interpreter as the means to that end, why are you assuming the AI will not replace the more meaningful work as well?

Code Interpreter continues OpenAI\u2019s long tradition of giving terrible names to things, because it might be most useful for those who do not code at all. It essentially allows the most advanced AI available, GPT-4, to upload and download information, and to write and execute programs for you in a persistent workspace. That allows the AI to do all sorts of things it couldn\u2019t do before, and be useful in ways that were impossible with ChatGPT.

It lowers hallucination and confabulation rates. When the AI directly works with Python code, the code helps keep it \u201Chonest\u201D sinces Python generates errors if the code is not correct. And as the code manipulates the data, rather than the LLM itself, there are no errors inserted into the data by the AI. This isn\u2019t perfect, the AI still hallucinates (it often seems to think it can see the graphs it can generate, which this mode of ChatGPT cannot), but these errors are less common, and less likely to impact the code or data itself.