Gini Index For Decision Trees (决策树中特征的基尼指数计算方法)

Before starting with the Gini Index, let us first understand what splitting is and what are the measures used to perform it.


What are Splitting Measures?

With more than one attribute taking part in the decision-making process, it is necessary to decide the relevance and importance of each of the attributes. Thus placing the most relevant at the root node and further traversing down by splitting the nodes.

As we move further down the tree, the level of impurity or uncertainty decreases, thus leading to a better classification or best split at every node. To decide the same, splitting measures such as Information Gain, Gini Index, etc. are used.


What is Information Gain(信息增益)?

Information Gain is used to determine which feature/attribute gives us the maximum information about a class.

  • Information Gain is based on the concept of entropy, which is the degree of uncertainty, impurity or disorder.
  • Information Gain aims to reduce the level of entropy starting from the root node to the leave nodes.

Formula for Entropy

E(S)=∑i=1c−pilog2pi

where, ‘p’, denotes the probability and E(S) denotes the entropy.

Entropy is not preferred due to the ‘log’ function as it increases the computational complexity.


What is Gini Index(基尼指数)?

Gini index or Gini impurity measures the degree or probability of a particular variable being wrongly classified when it is randomly chosen.

But what is actually meant by ‘impurity’?

If all the elements belong to a single class, then it can be called pure. The degree of Gini index varies between 0 and 1,
where,
0 denotes that all elements belong to a certain class or if there exists only one class, and
1 denotes that the elements are randomly distributed across various classes.

A Gini Index of 0.5 denotes equally distributed elements into some classes.


Formula for Gini Index

Gini=1−∑i=1n(pi)2

where pi  is the probability of an object being classified to a particular class.

While building the decision tree, we would prefer choosing the attribute/feature with the least Gini index as the root node.

Let’s understand with a simple example of how the Gini Index works.


Example of Gini Index

Past TrendOpen InterestTrading VolumeReturn
PositiveLowHighUp
NegativeHighLowDown
PositiveLowHighUp
PositiveHighHighUp
NegativeLowHighDown
PositiveLowLowDown
NegativeHighHighDown
NegativeLowHighDown
PositiveLowLowDown
PositiveHighHighUp

Table: Gini Index example


Calculating the Gini Index

Calculating the Gini Index for Past Trend

P(Past Trend=Positive): 6/10

P(Past Trend=Negative): 4/10

  • If (Past Trend = Positive & Return = Up), probability = 4/6
  • If (Past Trend = Positive & Return = Down), probability = 2/6

Gini index = 1 - ((4/6)^2 + (2/6)^2) = 0.45

  • If (Past Trend = Negative & Return = Up), probability = 0
  • If (Past Trend = Negative & Return = Down), probability = 4/4

Gini index = 1 - ((0)^2 + (4/4)^2) = 0

  • Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Past Trend = (6/10)0.45 + (4/10)0 = 0.27


Calculation of Gini Index for Open Interest

P(Open Interest=High): 4/10

P(Open Interest=Low): 6/10

  • If (Open Interest = High & Return = Up), probability = 2/4
  • If (Open Interest = High & Return = Down), probability = 2/4

Gini index = 1 - ((2/4)^2 + (2/4)^2) = 0.5

  • If (Open Interest = Low & Return = Up), probability = 2/6
  • If (Open Interest = Low & Return = Down), probability = 4/6

Gini index = 1 - ((2/6)^2 + (4/6)^2) = 0.45

  • Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Open Interest = (4/10)0.5 + (6/10)0.45 = 0.47


Calculation of Gini Index for Trading Volume

P(Trading Volume=High): 7/10

P(Trading Volume=Low): 3/10

  • If (Trading Volume = High & Return = Up), probability = 4/7
  • If (Trading Volume = High & Return = Down), probability = 3/7

Gini index = 1 - ((4/7)^2 + (3/7)^2) = 0.49

  • If (Trading Volume = Low & Return = Up), probability = 0
  • If (Trading Volume = Low & Return = Down), probability = 3/3

Gini index = 1 - ((0)^2 + (1)^2) = 0

  • Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Trading Volume = (7/10)0.49 + (3/10)0 = 0.34


Gini Index attributes or features

Attributes/FeaturesGini Index
Past Trend0.27
Open Interest0.47
Trading Volume0.34

Table 1: Gini Index attributes or features

From the above table, we observe that ‘Past Trend’ has the lowest Gini Index and hence it will be chosen as the root node for how decision tree works.

We will repeat the same procedure to determine the sub-nodes or branches of the decision tree.

We will calculate the Gini Index for the ‘Positive’ branch of Past Trend as follows:

Past TrendOpen InterestTrading VolumeReturn
PositiveLowHighUp
PositiveLowHighUp
PositiveHighHighUp
PositiveLowLowDown
PositiveLowLowDown
PositiveHighHighUp

Table: Gini Index calculation for the Positive branch of Past Trend


Calculation of Gini Index of Open Interest for Positive Past Trend

P(Open Interest=High): 2/6

P(Open Interest=Low): 4/6

  • If (Open Interest = High & Return = Up), probability = 2/2
  • If (Open Interest = High & Return = Down), probability = 0

Gini index = 1 - (sq(2/2) + sq(0)) = 0

  • If (Open Interest = Low & Return = Up), probability = 2/4
  • If (Open Interest = Low & Return = Down), probability = 2/4

Gini index = 1 - (sq(0) + sq(2/4)) = 0.50

  • Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Open Interest = (2/6)0 + (4/6)0.50 = 0.33


Calculation of Gini Index for Trading Volume

P(Trading Volume=High): 4/6

P(Trading Volume=Low): 2/6

  • If (Trading Volume = High & Return = Up), probability = 4/4
  • If (Trading Volume = High & Return = Down), probability = 0

Gini index = 1 - (sq(4/4) + sq(0)) = 0

  • If (Trading Volume = Low & Return = Up), probability = 0
  • If (Trading Volume = Low & Return = Down), probability = 2/2

Gini index = 1 - (sq(0) + sq(2/2)) = 0

  • Weighted sum of the Gini Indices can be calculated as follows:

Gini Index for Trading Volume = (4/6)0 + (2/6)0 = 0


Gini Index attributes or features

Attributes/FeaturesGini Index
Open Interest0.33
Trading Volume0

Table 2: Gini Index attributes or features

We will split the node further using the ‘Trading Volume’ feature, as it has the minimum Gini index.

Learn how to make a decision tree to predict the markets and find trading opportunities using AI techniques with our Quantra course.


Conclusion

Gini Index, unlike information gain, isn’t computationally intensive as it doesn’t involve the logarithm function used to calculate entropy in information gain. This is why Gini Index is preferred over Information gain.

发表评论

匿名网友