Marginally Interesting - The Newsletter - Issue #8 - Whoa what happened
I was going steady for seven issues of this newsletter, but then the summer vacation came and broke the stride, and yeah... what happened then?
To be honest I'm not so sure either. I vaguely remember there being a lot of requests for potential projects coming in. In fact so many that for a while I was very afraid what might happen if all those projects would start. Which they luckily didn't.
One thing I realized is that the range between too little to do and too much to do in consulting is fairly slim, and there is no easy way to scale up.
I can hardly believe that it is already one year since I started doing consulting. This turned out very differently than I had expected. One important factor was my network which has been one important source of projects.
A Year of ML Consulting
So where am I after a year of consulting?
There are a lot of things related to consulting that I have had to learn. How to figure out a good scope for a project, how to negotiate the terms and the rates, how to kick off a project, how to write invoices, how to do VAT, and so on.
Luckily I met a few other consultants along the way who helped me out with their advice and perspective.
I personally find it very interesting to get to know so many people and companies, and to see the common problems everyone is struggling with, but also how drastically different companies can be.
So, what have I learned in terms of "putting ML into production?"
It is still hard work. The amount of engineering necessary to set up and manage all those tools is considerable. The path from notebooks to production involves manual steps, even in the best case. Some stick to notebooks for production, others use more classical Python projects already for exploratory work. Airflow is what everyone uses for orchestration, but, honestly speaking, the amount of work that needs to go into defining pipelines is huge. Testing is another issue.
Then there is the whole range of additional challenges, from organizational to strategic to political that can get in the way to making ML a success. I believe that data science projects differ from engineering projects in having a higher degree of uncertainty and ignoring this puts considerable obstacles to being successful.
I am also seeing quite a gap between the constantly evolving tooling landscape and what people are actually using. For example, people rather use instances with lots of memory before they start looking into Big Data processing. And if they do, there is significant rewrite involved. The same holds for data infrastructure, where people often run some custom built data pipelines instead of using one of the data platforms that now exist.
It is almost as if these products are being created quicker than companies can migrate. I have come to see each company as a building that has been constructed over time. It takes time and you can't just go in and rip out something to replace it with new technology easily. In a way, you can often tell when a company has been build by looking at their tech stack. But it also often makes sense because what you have works.
What else?
I've written an article for Stripe's Increment magazine on how tooling and people need to go together and you shouldn't just throw tools at organizational problems.
I'll be giving a talk at data2day on December 8 (in German), again a virtual one day conference.
There's also a longer article I wrote together with Marlin Watling, a consultant specialized in HR and change processes, on general factors you need to consider to successfully transform your company to become more AI ready. More on that later.
I plan to pick the newsletter up where I left off and write more regularly about what I'm seeing out there regarding ML in production.