Marginally Interesting - The Newsletter - Issue #3

May 10, 2021

Last week was too busy, and given that Thursday is a holiday in Germany, Monday is the new Friday!

Hooked on Ethics in Data Driven Products

In the last newsletter, I talked about the EU's attempt at regulating the use of AI. I recently finished reading Hooked by Nir Eyal that gave me some more perspective on the extent to which products are designed to be "habit forming." The book does not even mention AI, but it is one way to make products even more custom tailored to each user.

The book proposes a four phase model: from trigger, activity, variable reward, and setting up the next trigger. It will sound extremely familiar if you have ever used a digital product in the past decade.

While Nir Eyal stresses that one should be ethical about using these techniques, the book ends up being mostly about all kinds of cognitive biases ("if I put time into this, it must be worth something") and how they can be exploited to make us addicted to a product.

The book contains a chapter on ethics. The bottom line is that 1% of the population will just easily get addicted to almost anything, the rest is all adults and you're not responsible for what they do, and you probably shouldn't build the product if you wouldn't use it yourself. I think that's simplifying it a bit to be honest.

It's a good thing that NeurIPS, one of the permiere ML conferences, has started to require at least a discussion of potential impact in its papers.

And I've also uninstalled instagram from my phone. And it's been rough. :)

Reviewing in the ML community

Last week the ICLR took place. Starting out as a smaller conference focussing on learning representations it has now become one of the hot conferences for deep learning. Unfortunately the content seems to be visible only for registered attendees for now.

There was an interesting discussion (again) on Twitter about the reviewing process in the ML community. Click on the twitter bird and then click on the referenced tweet to unfold the discussion.

Yann LeCun @ylecun

I totally agree with this thread fro. @KyleCranmer . When we created ICLR, we wanted to keep the acceptance rate high (around 30-35%) so that interesting-yet-unbaked ideas would have a chance a making it through. But it only lasted a few years.

Kyle Cranmer @KyleCranmer

The machine learning publication model is broken. There are too few venues for publishing material that is correct that have a constructive review process. Review for big conferences is optimized to reject not to ensure correctness and improve quality. Telescoping retweet 🧵👇 https://t.co/57YdTR41xK

It turned out that having low acceptance rate to be taken seriously as a conference was more important in the long run.

The reviewing and publication model was one of the main factors that ultimately drove me out of academia. I wasn't alone to struggle with this. Yann LeCun already put some ideas on his homepage (it doesn't have a timestamp, but some time before 2011 it seems), but the search for a better model is ongoing:

Kyle Cranmer @KyleCranmer

Neil Lawrence @lawrennd

Agree with @wellingmax @yeewhye @ryan_p_adams and others on this. Physical constraints (first volume size, then venue size) drove these arbitrary rates in the past. They are (temporarily?) gone. Accept on interest and quality, not arbitrary targets. https://t.co/6uJufA3fUZ

As I see it, academia is really a global cooperative system for research which makes it so hard to change. I don't have a solution myself, but find it interesting to reconsider this given the huge importance that AI has gained in the past years.

Peer review has a long tradition in science, and it plays an essential role in vetting results. But it has also always been used as a way to pick what to publish given physical constraints. In this day and age, however, many of these constraints (number of pages in journals send to all institutes around the globe, or number of conference presentations) are either gone, or don't make sense anymore given the exponential growth of the community.

I don't know where this will be going, but already research has shifted from universities to companies like Google, Facebook, or Baidu who throw a lot of money at computing resources and form independent sub-communities. Social media sites have also built systems to find and discover new and original content (with varying success, of course).

There is little money to be made in this area, but this is an area that's ripe for disruption. And don't get me started on the companies that make money in this area.

How much math do you need to know for ML?

Olli Zeigermann and I gave a talk at this years M3 conference (in German) on how much math you really need to be able to do ML, in practice.

The short version is that "traditionally" ML has been taught at universities as a research topic, and students are required to learn all the details, and ultimately be able to implement and extend ML methods themselves. As ML is become a more common industrial practice however, we wondered whether that's the best approach.

Obviously you need to know about certain core statistical concepts like law of large numbers, curse of dimensionality, generalization, statistical tests (if you're so inclined), but just like you wouldn't reimplement GPU level operations, you wouldn't reimplement logistic regression, right?

One of my favorite analogies is guitar pedals. You don't have to have a degree in electrical engineering to make music. One thing that guitar pedals have already figured out but ML probably hasn't is how to package and productize ML so that you can use it, and use it well without getting bogged down in the details.

What do you think?

Thanks for reading, you can reply to this email if you want to get in touch with me.

ML in Practice

Discussion about this post