With all the recent hype about AI, many people and articles focus on the insane speed of progress and all the new products that come out or are created right now. But what does it mean for you as a software engineer? Will you really be replaced by an AI? Or do you have to learn how to build AIs so you can stay on top and not loose relevance?
Having “dual-classed” in computer science and mathematics, with an unnecessarily technical Ph.D. thesis as well as hands-on experience working on a php monolith, let me try and share my perspective how to approach AI as a software engineer.
You don’t need to be able to read the papers
First of all, even though arxiv papers have become the new blog posts that are widely shared on social media, you shouldn’t worry about being able to read them. It’s not just the very specific style (often extremely terse due to size constraints. You can easily spend an afternoon on a single paragraph) that makes them almost incomprehensible, it’s also the extensive amount of mathematics.
Not so long ago, this was still a purely academic field. Often there are many formulas, but the formulas are often also simplified versions that omit details “for clarity.” Or because they are taken from earlier work where the detail has been provided.
I personally find even “classics” like the original Attention Is All You Need hard to read. It never states the full model in a single formula and the formulas miss important information like exact matrix dimensions often. So you need to fill in details with experience you only get from being an expert in the field.
Papers are written for other researchers. If you want all the details, you need to go to a textbook, for example the excellent online book Dive Into Deep Learning. It has a short synopsis of the required mathematics (but remember that you’ll normally spend 1-2 years to just learn it at university.)
And you can still get started
This leaves us in a bit of an situation. If you don’t know the actual details it may be hard to even follow high-level intuitions. People who have a lot of experience in research often have great intuition, but that is based in having thoroughly understood similar approaches before. If you cannot root your intuition in math, what can you do?
One of the great things about the recent developments is that there now exist lots of open source implementations. Huggingface is a website which hosts many pretrained models and you can just download models and run them with a few lines of Python code.
Google Colab gives you free notebooks (a kind of interactive shell, with graphics) where you can even run code on a GPU (although there is a time limit in the free version).
And then there are AI-as-a-Service companies like OpenAI (the ones who really pushed the envelope) that allow you to play around with state-of-the-art AI models, at a price, however.
So what you can get is actual hands on experience with the breathing, living programs. In the “olden days” you had to understand the paper and then reimplement it manually, but now you can easily download a program and try it out.
I encourage you to do that and not just go with the working examples but also try all the corner cases and really see where it stops to work well. Being a software engineer you have great intuition about programs, and that’s a totally valid way to approach this space, too.
Here’s what you really need to know
So you’re good to go building AI systems? Yes, you can definitely start building, but it pays off to understand how this technology is different and how to use it. Because contrary to what people claim, these methods have no built in intelligence. If you don’t use them properly, you’ll struggle making it work you’ll see potentially really bad performance in production. So here are a few comments on making it work.
You will need to use Python. For better or worse, the entire data science ecosystem is built on Python. Core libraries might be written in C or Rust for efficiency, but Python is the glue that holds it all together. No other language comes even close with providing you the amount of tooling, so don’t fight it. There are a few contenders like the Julia language, or libraries for other languages like Java, or Rust, but don’t go that way if you’re starting because a lot is still missing and then you’d have to fill in the blanks. Do yourself a favor and use Python.
ML is not just a library. You can now download or use a lot of machine learning models and algorithms. But these algorithms are not like “normal” algorithm. It’s not like an algorithm that computes the shortest path in a graph. Every ML algorithm consists of a “model” that could really do all kinds of things, but then is “trained” on a specific data set to produce the outputs that you want. It’s like a program template that has a lot of free parameters and variables. By itself, the models don’t do anything useful. You need to collect data to train it, which means adjusting the parameters to compute something that matches the training data. For some applications, there are pretrained models. For example, chatGPT is a model that has already been trained on literally terabytes of data. But still, even it has been trained on some data doesn’t mean it works for your specific use case. Which brings us to…
The importance of proper evaluation. Most models are powerful enough that they can achieve close to 100% on the training data set, but may perform really badly on new data. Therefore, you train on the data set but test the model on a separate test data set to see how well it will perform on future data. There is a bit of an art to make sure evaluation is properly done. You’ll want to make sure there isn’t too much overlap between training and test data. If not done properly, you’ll think your model looks fine, but it performs really badly in practice. Evaluation sounds like a lot, but essentially, this is just computing your model on test data and then taking the average of the errors.
I always think that working with ML models is not unlike Test Driven Development, where you define a bunch of tests that describe what kind of results we want from the algorithm. But in ML, the tests are not carefully compiled manually to cover common and edge cases, they are really just random (hopefully typical) pairs of input/output examples. Training then means to take part of the data to adjust the parameters of the model to work well, while evaluation on an additional set makes sure you don’t just memorize the training examples.
Finally some pointers to get started with large language models: start talking with chatGPT or look at the transformer models on huggingface. If you want to really dive deeper into this, Andrej Karpathy has an implementation called nanoGPT and a youtube video that already has 2.4M views. And then start thinking about your application and how you can collect data for evaluation, and then you’re already on a good path.
Happy to learn what you’re experience and questions are!
Nice!