# An Introduction to Shannon Entropy

When you think of entropy, you may think of closed systems, especially if you come from a physics background. However, this article is not about the thermodynamic definition of entropy but rather its statistical one. (Although some do group the two together as having related meaning.)

The "formal simplified" definition of entropy is this--a measurement of disorder in a system. Some say you could also argue that it is a measurement of "orderliness." I would argue that it is a measurement of complexity. Regardless, what all of these definitions have in common is this-- entropy is a method of "describing" something. Shannon's entropy is a formula, that when given a "something", returns this description. (That is very, very simplified.)

Shannon's entropy works with sets of data. Take a look at the following set:

$x=\{john,bill,bill,john,bill\}$

Ok so we have a set of discrete values. What are the probabilities of each of the values?

$p(x=bill)=3/5=0.6$

$p(x=john)=2/5=0.4$

Fairly easy so far. At this point let's take a look at the full formula of Shannon's entropy:

### Shannon's Entropy Formula

$\displaystyle H(X) = - \sum_{i=1}^np(x_i)\log_2 p(x_i)$

Let's read through this. The entropy is the negative sum of the product of each element's probability times log-base-2 of the probability of the element. So:

$\displaystyle H(X) = -1 * ( (.6*\log_2(0.6)) + (.4*\log_2(0.4)) ) = .97095059445469$

Hmm. Almost equaled one. What if there were 3 bills and 3 johns?

$\displaystyle H(X) = -1 * ( (.5*\log_2(0.5)) + (.5*\log_2(0.5)) ) = 1$

Hmm. Equalled one. What if there were 4 bills and 1 john?

$\displaystyle H(X) = -1 * ( (.8*\log_2(0.8)) + (.2*\log_2(0.2)) ) = 0.72192809488737$

Hmm. So that time its obvious that the probability distribution had a real effect on it. What if i had 2 bills, 1 john, and 1 micah?

$\displaystyle H(X) = -1 * ( (.5*\log_2(0.5)) + (.25*\log_2(0.25)) + (.25*\log_2(0.25)) ) = 1.5$

### Shannon's Entropy Approximates Bits to Represent

An interesting takeaway, if we were to take the ceiling of the entropy (like Math.ceil()) we'd get the number of bits it would take to represent the set, which is fascinating. Try it yourself!