All you need to know about Decision Tree

Supervised Machine Learning Method

6 min readDec 11, 2022

Classification Regression Supervised Machine Learning — Image by Chen from Pixabay

Why Decision Tree?

Decision Tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.

Decision trees are a form of data mining approach that create a model for categorising data. The models are created in the shape of a tree structure and as a result, they fall under the category of supervised learning. In addition to classification models, decision trees are also employed in the construction of regression models that forecast class labels or values to assist in decision-making. A decision tree can use both numerical and categorical data, such as gender, age, etc.

Structure of a decision tree

A decision tree’s root node, branches, and leaf nodes make up its structure. The central nodes indicate the test on an attribute, while the branched nodes represent the results of a tree. A class label is represented by the leaf nodes.

Working of a decision tree

In the context of supervised learning, a decision tree is effective for both discrete and continuous variables. The most important attribute in the dataset is used to divide it into subsets. The algorithms are used to identify the attribute and split it.
The root node, which is the significant predictor node, makes up the decision tree’s structure. The decision nodes, which are the tree’s sub-nodes, are where the splitting process begins. The leaf or terminal nodes are the nodes that do not divide any further.
A top-down technique is used to partition the dataset into homogeneous, non-overlapping sections. The observations are provided by the top layer at a single location, where they are then divided into branches. Due to the process’s singular concentration on the current node rather than the future nodes, it is known as the “Greedy Approach”.
The decision tree will continue to execute up until and unless a stop condition is achieved.
A decision tree’s construction generates a lot of noise and outliers. A technique known as “Tree pruning” is used to get rid of these outliers and noisy data. Consequently, the model’s accuracy rises.
An evaluation of a model’s accuracy is done using test tuples and class labels in a test set. The percentages of classification test set tuples and classes that the model correctly classifies are used to define an accurate model.

Figure 1 : Unpruned and Pruned Tree.

Types of Decision Tree

Models for classification and regression that are based on a tree-like structure are developed using decision trees. Subsets of the data are separated into smaller groups. A tree with decision nodes and leaf nodes is the outcome of a decision tree. The following describes two varieties of decision trees:

1. Classification

The categorisation process involves creating models that describe significant class labels. They are used in the fields of pattern recognition and machine learning. Fraud detection, medical diagnosis, and other applications of machine learning are made possible by decision trees and classification models. A classification model’s two-step approach includes:

Learning : On the basis of the training data, a classification model is created.
Classification : The model’s accuracy is examined before it is utilised to categorise the fresh data. Discrete values like “yes” or “no” and other similar expressions are used as class labels.

Figure 2 : Classification Model.

2. Regression

Regression analysis, or the prediction of numerical properties, uses regression models. These also go by the name continuous values. As a result, the regression model predicts the continuous values rather than the class labels.

Algorithms Used

The “ID3” decision tree method was created in 1980 by J. Ross Quinlan, a machine researcher. Other algorithms he invented, including C4.5, succeeded this one. Both algorithms used a greedy strategy. The top-down recursive divide-and-conquer method of tree construction in algorithm C4.5 avoids backtracking. The training dataset for the algorithm included class labels and was separated into smaller subsets as the tree was built.

Attribute selection technique, attribute list, and data partition are the first three parameters chosen.
The attribute list contains descriptions of the training set’s attributes.
The method for choosing the appropriate attribute for tuple discrimination is included in the attribute selection method.
The attribute selection method determines the tree structure.
A single node is where a tree is built from.
When many class labels are represented in a tuple, the tuples split. As a result, the tree will begin to grow branches.
Which attribute should be chosen for the data division depends on how the data is split.
The branches are grown from a node using this strategy based on the results of the test.
The partitioning and splitting process is carried out iteratively.

n * |D| * log |D|

Where, n is the number of attributes in training dataset D and |D| is the number of tuples.

Figure 3 : A discrete value splitting

Lists of Algorithms used in Decision Tree

ID3

When constructing the decision tree, the entire collection of data S is regarded as the root node. The next step involves separating the data into pieces and iterating over each attribute. The algorithm verifies and adds properties that weren’t added before the iterations. The ID3 algorithm takes a long time to split data and is not the best option because it overfits the data.

C4.5

As the data are categorised as samples, it is a sophisticated algorithm. In contrast to ID3, both continuous and discrete values can be handled well. There is a pruning technique that gets rid of the undesirable branches.

CART

The algorithm can handle classification and regression tasks. Decision points are produced by taking the Gini index into consideration, unlike ID3 and C4.5. For the splitting approach, a greedy algorithm is used in an effort to lower the cost function. The Gini index is used as the cost function in classification tasks to represent the leaf node purity. Sum squared error is used as the cost function to determine the best prediction in regression problems.

CHAID

It stands for Chi-square Automatic Interaction Detector, a method that works with any kind of variable, as the name would imply. They could be continuous, nominal, or ordinal variables. The classification model use the Chi-square test, whereas regression trees employ the F-test.

MARS

Multivariate adaptive regression splines is what it stands for. When doing regression jobs, when the data is primarily non-linear, the algorithm is specifically applied.

Greedy Recursive Binary Splitting

Two branches result from the binary splitting procedure. The split cost function is calculated in order to split apart the tuples. The procedure is then repeated to determine the cost function of the other tuples for the split with the lowest cost.

List of Applications

Information specialists frequently employ decision trees when doing analytical research. They may be extensively employed in business to examine or anticipate challenges. The decision tree’s adaptability makes it possible to apply it in various situations:

1. Healthcare

Decision trees enable the prediction of a patient’s disease status based on factors like age, weight, sex, etc. Other predictions include determining a medicine’s impact while taking into account its composition, manufacturing history, etc.

2. Banking sectors

Decision trees assist in determining a person’s loan eligibility by taking into account factors including his or her family, income, and financial situation. Additionally, it can spot loan defaults, credit card theft, etc.

3. Educational Sectors

Using decision trees, it is possible to select students for further consideration depending on factors such as their merit score, attendance, etc.

List of Advantages

Senior management and other stakeholders can see the comprehensible outcomes of a decision model.
Preprocessing of the data, such as normalisation, scaling, etc., is not necessary when creating a decision tree model.
A decision tree can handle both numerical and categorical data, demonstrating its higher usage efficiency than other methods.
A decision tree is a flexible algorithm since missing values in the data have no impact on the decision-making process.

Connect with Me

LinkedIn: http://linkedin.com/in/abhishek-mehta2k

Github: http://github.com/abhishek-mehta2k