“This word for a process that leads to a solution to a problem comes from the Arabic name of a 9th century mathematician.” A few weeks ago, three contestants were presented with this question during the final round of America’s favorite quiz show, Jeopardy! Upon completion of the famous Final Jeopardy! jingle, each contestant revealed their answer with hopes of increasing their earnings and winning the game. The outcome? Every contestant had answered incorrectly, and every contestant lost thousands of dollars.
The correct answer, in proper Jeopardy! lingo of course, was “what is an algorithm?”
I could tell you that an algorithm is simply an answer to a Jeopardy! question, or a concept I had to study in order to pass my last Abstract Algebra exam. But really, “what is an algorithm?”
The Merriam-Webster dictionary defines an algorithm to be “a step-by-step procedure for solving a problem or accomplishing some end.” In a general sense an algorithm could be something as basic as a cooking recipe. A recipe is a series of instructions that when applied to a list of ingredients, creates a final product that you can eat for dinner. In a similar sense, an algorithm in data science is a series of calculations that when applied to a set of data, can help us to draw conclusions and ultimately create a mining model.
When you modify the instructions of a recipe, you consequently alter the final product. The same is true for data mining algorithms; different algorithms, when applied to data, generate different results. Classification algorithms can be used to predict the outcome of a future case, by analyzing the other variables in the initial set of data. Clustering algorithms work to group cases with similar characteristics together. Association algorithms are used to create rules describing correlations between variables. The algorithms within these subgroups differ as well, each performing varying calculations on the set of data that can alter the conclusions being made.
We as data scientists use these algorithms to analyze data in a variety of ways by establishing patterns, making predictions, creating clusters, and so on, in attempt to find the methods that will produce the most accurate and useful results. In comparison to our recipe example, we are providing the ingredients (the data) and are then testing a number of recipes (algorithms) to find the one that produces the most delicious meal.