Build vs Buy: If you are Buying Machine Learning, albeit you are Doing it Wrong

Dr M Maruf Hossain, PhD, GAICD
Feb 22
5 min read

Updated: Feb 26

Artificial Intelligence (AI), or more precisely Machine Learning (ML), has become an industry trend in the past 10 years. From a buzzword to a new way of automation and decision-making, ML has become mainstream. I have had conversations with multiple organisations keen to use ML, even without a real-world use case. Graduate schools are offering lightweight courses to the masses, consulting companies are providing services on adopting ML, and technology companies are building and selling products using ML.

Originally published at LinkedIn Pulse on 8 November 2022.

I’m not going to recap the definition of ML; there are already hundreds of sites that do. Neither am I going to talk about what machines learn nor how they learn. Instead, I’d encourage you to think about your own childhood. How did you learn?

Most humans learn through reading, asking questions, observing, and so on. Whether it is human or machine, a common part of the learning process is the need for samples or examples. A kid is shown a ball, a car, or the colour red. And they learn to identify the object. A kid with a learning difficulty (unfortunately, 40 years ago, we might not have hesitated to call it a ‘dumb kid’) might need to be told more than once. In that aspect, our computers are the dumbest. Despite their speed in execution, they need thousands of samples to learn a single concept. Not only does it require a large amount of data to learn a concept, but it also needs substantial time and computing power. Practitioners call this process model training.

Often, these learnings don’t translate very well. For example, when I first came to Australia, I realised that what we call ‘orange’ back home is called ‘mandarin’ here. And orange is something different altogether. I’ve learned that fact and adjusted my vocabulary. But for a model to adapt to this simple information, the entire time- and processing-intensive model training process must be revisited. In deep learning, there is a concept called reinforcement learning, which is also another time-consuming round of model training.

Ergo, models trained on one dataset often don’t perform well on another. IBM Watson Health is the prime example of such a million-dollar failure. Their biggest setback was the revelation that its cancer diagnostics tool was not trained on real patient data but on hypothetical cases provided by a small group of doctors at a single hospital. Synthetic data is often bad for ML, as it doesn’t generalise well to the population and is very hard to match its distribution. Which causes blind spots and wasn’t necessarily generalisable to all cases. Even with real-world data, there is no guarantee that the model will not encounter completely new patterns in real-world scenarios, requiring a mechanism for model retraining.

IBM’s failure is exacerbated by other factors, such as treating ML as just another software tool and trying to apply a prebuilt solution to any problem they could get their hands on. This approach doesn’t scale well with ML. It requires business acumen for a solution to work. Moreover, marketing hype never outpaces accountability. While IBM poured money into marketing, they neither had the results out of the box nor a proper mechanism to operationalise machine learning (MLOps) to live up to the hype they were creating. Disruptive innovations are always a gamble, and without proper and thorough testing, it is hard to quantify their effects.

AutoML to rescue the followers, But…

Other companies, like Google, which was already developing and selling machine learning models, either added a mechanism to operationalise machine learning in their product suite or completely repurposed their product into an MLOps solution. Furthermore, to attract more people into using their solution, they have added another technically failed mechanism into the mix: AutoML, which is nothing more than a bunch of algorithms that are applied to your data and whichever algorithm produces the best metric on the “test data” that you’ve provided, is selected as the model. This selection can be tricky, and several questions cannot be answered solely by the metric. Such as: did the test data have good coverage? Can a different model perform better if (a) thousand more samples are added to the test data, or (b) a different set of data is used as test data? The only upside of AutoML is that it provides a quick baseline compared to using a data scientist. But those baseline models are hardly production-worthy.

There are several downfalls when using AutoML. First, it gives a false sense of security. Initially, it works with the given data and makes organisations comfortable using it. Just like any automation, the more you use it, the more catastrophic it is when it fails. Because of this, it’s easy to introduce data bugs. Because AutoML can be opaque, these bugs are very hard to spot. In a Neptune.ai blog, Alexandru Burlacu reported that Google Vision AutoML (beta) used training data in the validation set and thus reported 98.8% accuracy on a binary classification problem, whereas their custom-built solution couldn’t achieve more than 69%. Once they fixed the problem of Google Vision AutoML, it only yielded 67%.

As AutoML becomes overly focused on the data used for training and validation, it is prone to overfitting and spends too much time and computing power on optimisation, ultimately over-optimising for the given data. This is the primary reason that, at the initial stage, AutoML solutions show excellent results but almost always fail in production over the long term.

Finally, AutoML often combines multiple algorithms, making it harder to interpret, especially when interpretability and explainability are paramount. If the model underperforms and needs debugging, this drawback can render AutoML-generated models useless.

Only very trivial scenarios are usually fully covered by AutoML, which vendors often demonstrate when selling their products. Their sales techniques remind me of charlatans selling snake oil. They advertise ML success stories as an achievement of their tools. It is like saying the car is responsible for an accident rather than the driver. Search for “Formula One winners” in Google. Do you see Ferrari, McLaren, or any other makers on the list? No, you see the drivers behind the wheel. The same is true for everything, including ML.

What organisations should really do?

While buying prebuilt models or AutoML is outright wrong, buying an MLOps solution or platform for your MLOps workloads is a better choice. But buying the platform is not what brings success. A business is yet to convert its investment into success. The value must be exploited out of those solutions. And that value comes not only from internally owned data but also from how well that data is processed and interpreted into business outcomes.

Before embarking on an ML journey, do a self-assessment. Here is a quick questionnaire:

Do you have a question that is best answered by ML?
Do you have all the right data to answer that question?
Do you have a data foundation in place to process and store data before it is consumed by your data scientists?

If you answer ‘yes’ to the first two questions, then go for a solution. But you are not ready for ML yet unless you have answered ‘yes’ to the third question.

Remember, our businesses are neither data- nor process-driven, but indeed, value-driven.