When study Machine Learning, there are some confusing concept which is not easily understand. Want to share those terms in this blog. 1st thing I want to share is Entropy mainly learned from Bishop and Shannon.
Entropy is not recently made term. Its origin is from physics, which is an extensive propertyof a thermodynamic system. And usually it means disorder from statistical mechnics. In Machine Learning, according to “Noiseless coding theorem, Shannon”, it means the minimum number of bit for transfer the probability state.
Let’s say the amount of information for event x as h(x). Then if x and y are independent, the amount of information from event x and y can be written as h(x)+h(y). Here, we can have the formula like this. \(h(x) = -log_2(x)\)
Set minus not to make h(x) less than 0. Let’s have the expectation of h(x) , then we can have with below formula \(H[x]=-\sum_xp(x)log_2p(x)\)
Here, H[x] is entropy of random variable x.