I was recently interested in learning more about decision tree-based learning methods so I discussed some of their key elements with practitioner, Samson Donick.
Decision Tree-based machine learning involves using a tree-like structure to perform classification and regression tasks, which aim to predict the class or value of an out-of-sample datapoint according to its features. Entrepreneur and algorithmic trader Samson Donick is an expert in decision tree-based machine learning. After speaking with Samson Donick, here are some of the basic elements of tree-based learning methods that he mentioned.
What Is Machine Learning?
Samson Donick first explained machine learning to me, so that I could better understand why tree-based methods are helpful.
Put simply, machine learning involves developing and using computer systems that can adapt without needing explicit instructions. Instead, they use statistical models and algorithms to analyze patterns in data.
What Is Tree-Based Machine Learning?
Decision Trees are the basis for some of the most popular algorithms in Machine Learning., which are commonly used to model spatial and tabular datasets. Four levels of tree-based learning escalate from the simplest form to the most complicated.
Whichever form of tree-based learning is used, the foundational tree-like model is similar. The construction of these trees is normally considered to be “greedy” because each split is determined without looking ahead. However, Samson Donick and other researchers have proposed alternative construction methods, like the one in his paper “Uncovering feature interdependencies in high-noise environments with stepwise look ahead decision forests”. Nevertheless, all decision trees start with all the training data in a top node that then divides into two branches at every following level. The final branch is where the nodes don’t split anymore. These are the decision nodes, which are also known as the leaves.
At every depth, a tree-based model includes conditions for a given feature’s values. The answer determines which branch to pursue next until the nodes do not split any further. The prediction for this tree is then based on that final leaf node.
Decision Tree Classifier and Regressor
Decision Trees can be used for both classification and regression tasks. The decision tree is the simplest form of tree-based machine learning, where a single tree is used to fit on in-sample data, and then make predictions on out-of-sample data.
Samson Donick then explained to me that the next category of tree-based learning is Ensemble Methods. Ensemble Methods and Random Forests involve using multiple trees to learn from a single training dataset. For instance, fifty tree models can model different subsamples of the test data set and decide an output based on what most models conclude.
Model accuracy can also be improved by a method known as Boosting. This method creates one algorithm after another. The training data teaches the first model, then the second model learns from the same training data and the mistakes from the first model. The third model learns from the training data and the mistakes made by the first and second models.
Extreme Gradient Boosting
Extreme Gradient Boosting is one of the most complicated methods of tree-based learning and one of the most accurate. It is more advanced than traditional boosting but also constructed from several Decision Tree models. The goal is to reduce the overfitting problem with traditional boosting.
A Summary of Tree-Based Machine Learning with Samson Donick
I enjoyed speaking with Samson Donick, as he is an esteemed, published researcher on tree-based learning and more complicated topics, such as interdependencies in high-noise environments. Donick was very helpful answering my questions and helping me learn more about these algorithms. For those interested in finding out more about decision-tree machine learning, there are many great online resources available.