The Master Algorithm by Pedro Domingos – Book Summary & Review Deploy Yourself School of Leadership

Walking Into The Future With Algorithms

The human brain is perhaps the most fascinating and wondrous machine there is. Its capacity to learn from its surroundings is unparalleled. That said, the machines conceived by the human brain and built by man are constantly evolving and getting more sophisticated than ever. These machines work on algorithms. Moreover, these algorithms influence every aspect of human life and could have the ability to surpass the human brain’s ability to comprehend and compute.

The Master Algorithm (2016) by Pedro Domingos dives into the algorithms that are currently in use, how these algorithms help, the problems they face, and solutions to those problems. It delves into understanding the implications these ever-evolving algorithms can have on the future.

Machine Learning And Algorithms

To begin with, algorithms are defined as, “sequences of precise instructions that produce the same result every time.” They are present everywhere – in the scheduling of flights, for delivery of online consumer products, etc.

Simple, standard algorithms are designed to work on inputs of information, perform a task and then produce and output. For example, for an algorithm designed to give directions, the input includes two destination points, and the output is the shortest route the algorithm computes between these two points.

***The Master Algorithm by Pedro Domingos***

However, machine learning algorithms are a little more abstract. They output other algorithms, that is, when given a number of input-output pairs, they find an algorithm to turn the inputs into outputs. For example, deciphering handwriting is something that can’t be precisely described. However, if a machine learning algorithm is given a number of handwritten texts as inputs, and the meaning of the text as outputs, the result will be an algorithm that can help decipher one algorithm with the help of another.

This algorithm is exactly how the post office is able to decipher pin codes in different handwritings.

Such machine learning, or ML, algorithms can be used for a number of tasks. What differs is the data collected and the problem that it is used to solve. For example, filtering spam, deciphering the best chess move, or running a medical diagnosis, while needing different algorithms, could use just one ML algorithm with the right type of data.

Hallucinating Patterns And Algorithm Validity

Surprisingly, hallucination is a problem one faces in the world of algorithms. A 1998 bestseller, The Bible Code, claimed that there were hidden predictions in the Bible that could be deciphered selectively skipping letters and lines. This claim was disproven in court when critics shoed that such patterns are seen in Moby Dick too.

These hallucinating patterns in ML context are called overfitting. Overfitting happens when an algorithm becomes powerful enough to learn anything. Thus when a data set like the Bible is the input, with the power of a computer to create complex models, one can always find patterns. However, the resultant model won’t work on any other data. Hence the power of an algorithm should have boundaries and be under control to ensure that the scope of the algorithm isn’t too big. This way results can be kept verifiable and consistent.

However, what does one do if the algorithm discovers a number of patterns that explain the data input, but disagrees on new data? In such cases, which result is accurate, and how does one determine that the results aren’t a fluke?

In such cases, one can use holdout data. A holdout set is data that is used to test the algorithm. Thus, one has to create 2 sets of data from the original data set, one is the training set which is used by the algorithm to learn from and the second is the holdout set for testing. This helps in double-checking patterns and their validity found in the data.

Thus one of the main roles of ML experts is to restrict the power of an algorithm by ensuring that the rules are not too flexible and that the test will perform well for both, the training and the holdout data sets.

Logical Thinking Using Deductive Reasoning And Decision Trees

Pedro Domingo writes that machine Learning experts have specialized branches, their own perspective and their own preferred styles of algorithms. For example, Symbolists create Artificial Intelligence by manipulating symbols and learning rules. Being the oldest branch in AI, symbolists are rationalists who perceive senses as unreliable and trust logic to learn intelligence.

Symbolists, hence, prefer the inverse deduction, that links separate statements. Thus, two statements, “Napoleon is human” and, “Therefore Napoleon is mortal”, linked by the inverse deduction algorithm, will arrive at broad statements such as, “humans are mortal.”

Such an algorithm is good for sorting and data mining, however, it is inefficient and costly for truly large databases. In cases of very large databases, the problem arises because all possible relationships between all variables in the data are considered, resulting in an exponentially increasing complexity.

Hence, decision trees can be used to reduce complexity. Decision trees branch off data into smaller sets by using questions or rules to narrow down the sets further.

For example to sift through a set of medical records data, and then use decision trees such as ‘healthy’, ‘leukaemia’, ‘cancer’, etc. the ML algorithm would then find rules that would result in the division.

Using decision trees prevents overfitting by putting restrictions on the number of questions asked by the decision trees, and ensures that only the most applicable, general rules are applied.

Decision trees are used in medical software that narrow down diagnosis on the basis of symptoms input.

Preventing Overfitting

Another popular branch of ML is Bayesianism. Bayesians are empiricists who opine that true intelligence comes from observation and experimentation, and that logical reasoning is flawed. They use the Bayesian inference that keeps a number of models and hypotheses open simultaneously. How much one believes any one of the hypotheses or models depends on the evidence found in the data.

The Bayesian approach helps in medical diagnosis. Thus, while keeping open to many hypothetical diseases and symptoms, the algorithm sifts through the patient’s record to find the best match. The more data is provided, the more diseases are ruled out leaving one match the statistical winner.

This algorithm prevents overfitting by limiting assumptions about causes and events. Thus to find out that a person having flu also has cough or fever, the algorithm classifies the flu as the cause and cough and fever as the event. The restriction is the assumption that the two events do not influence each other. Thus having a cough does not affect one’s chances of getting a fever. The algorithm focuses only on the cause and effect relationship, thus preventing the algorithm from overfitting.

This can be seen in voice recognition software such as Siri. The Bayesian inference, when a person says, “call the police”, keeps options open to consider the probability of the person saying “call the please”. While sifting through the database, it then checks the frequency of certain words following one another. Thus it becomes clear that the word ‘police’ follows ‘the’ more often than ‘please’.

Unsupervised Learning Algorithms

The human brain has the ability to filter out and focus on relevant information it sees and hears. It is for this reason that one can immediately hear their own name in a noisy crowd, even if it is uttered softly.

An unsupervised learning algorithm works in a similar way. While the previous examples of algorithms use labelled data, such as spam or non-spam, unsupervised learning algorithms are designed to work with raw and noise data.

Clustering algorithms, a type of unsupervised algorithm that works through large amounts of raw data, are often used in voice isolation, or image recognition software. They essentially identify meaningful structure by reducing the dimensionality of the data to its primary essentials.

For example, sketch artists use ten different variations of each facial feature – eyes, nose, ears, etc., narrowing down the options enough to generate a passable drawing based on the description and accurately reproducing faces. Additionally, facial recognition algorithms compare only a few hundred variables instead of a million pixels after pre-processing.

Another type of algorithm, the neural networks, effectively crunch massive amounts of data, processing multiple inputs at the same time, like the human brain. For instance, one neural network algorithm, the biggest ever created, used to sift through randomly selected YouTube videos took only 3 days to go through ten million videos. The program was even able to learn to recognize human faces and cats, without being told what to look for.

All the above-mentioned algorithms work in different ways and are useful for different things. However, what would happen if they were all combined to get one master algorithm?

The Unifying Master Algorithm

The question that arises with all these algorithms is, ‘Which algorithm works best?’

The fact of the matter is that there is no ‘one’ algorithm that is perfect, as all algorithms use different fundamental assumptions. To put this problem into perspective, if an algorithm comes up with something useful for any set of data, a devil’s advocate could spin the same algorithm on another data set, proving that the algorithm is nonsensical. Hence, it is vital to make the right fundamental assumptions about the applied data.

Thankfully, most of the difficult problems in computer science can be solved with one good algorithm, if they are fundamentally related.

Consider a few solved problems such as finding the shortest route in a new city, playing Tetris, controlling urban traffic flow, compressing data, laying out components on a microchip, etc. These were solved when one algorithm was used to find the solution to one of them. It was a wonder in computer science when one algorithm was able to address all these.

However, when it comes to the most pressing important issues that face humanity, one needs to find a more efficient and capable algorithm, which is unfortunately still unavailable. For example, finding a cure for cancer needs an algorithm that can factor in all previously acquired data, as well as keep pace with new scientific discoveries, all while considering the relevance of all the data and discerning and overarching structure that no one has seen yet.

There has been some progress in this field, despite the absence of a comprehensive algorithm. Adam, a research robot at the Manchester Institute of biology learns about genetics, design and carry out experiments, analyse results and thus, suggest hypotheses.

The Key To Success

In the modern business world, ‘data is the new oil’. This means that the business with the best algorithm is the one that will succeed.

In the pre-internet era, problems with reaching a target audience could be solved with better, comprehensive advertising campaigns. However, with the virtually unlimited choices that the internet brings to homes, decision making becomes difficult.

Amazon has, in this respect, been the leader in offering intelligent consumer-centric products and solutions in practically every market. However, it is an ongoing race, and the company with the best data can come up with the best algorithm. Hence, data today is a massive strategic asset. For example, a user’s data trail averages at $1200 per year for the online ad industry. While Google data sells for $20, Facebook’s sells for $5.

It is a gigantic business, and it is paving the way for data unions and data banks that will allow companies, as well as private citizens, to fairly negotiate the usage of their data. While a data bank could allow one to set terms and conditions around the usage and security of the data, data unions would operate like worker unions, where a regulating body of individuals could ensure that the data is being used fairly and responsibly – benefitting everyone.

The Digital Model Of The Individual

Imagine a master algorithm. It would have a vast database comprising of all human knowledge, personalised with all data every individual collects through their life – mails, web searches, phone records, GPD directions, health records, likes, photographs, etc.

Now imagine if all this data, one could download a learned-model digital version of oneself on a flash drive. This digital version could travel in one’s pocket, like a personal butler helping one run one’s life, saving time and reducing hassles.

This butler could file tax returns, send emails, pay credit card bills, plan vacations, etc. in addition to the simple things such as automating web searches or recommending movies. One could even have a conversation with it – exactly like talking to a digital version of oneself.

In an interactive society, it could interact with others on one’s behalf, or apply for jobs on LinkedIn based on learnings from inputs, etc. Imagine a digital personal model interacting with other digital personalities from other companies, applying for personal interviews. The final stage is simply accepting the personal interview, that one’s digital self has confirmed!

The possibilities are simply endless – as described by the author Pedro Domingos.

Conclusion

Using Machine Learning Algorithms that are compiled into one single master algorithm, would help advance humanity towards limitless possibilities. Even with currently available advanced algorithms, that are universal problem solvers, the world has advanced drastically. The sky would then be the limit!!

The Master Algorithm by Pedro Domingos – Book Summary & Review