1. Information Theory
Arun Prakash A
a.arun283 @ gmail.com
Department of Electronics and Communication
Kongu Engineering College, India
Winter 2018
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 1 / 14
2. A special Hour
We would have travelled 100000
kilometers in space by the time I finish
the lecture ,
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 2 / 14
3. Importance of the Subject
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 3 / 14
4. Importance of the Subject
How to encode an information and reproduce it exactly at the receiver
without an error?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 3 / 14
5. Importance of the Subject
How to encode an information and reproduce it exactly at the receiver
without an error?
How much one can compress the data/information? Are there any
limitation?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 3 / 14
6. Importance of the Subject
How to encode an information and reproduce it exactly at the receiver
without an error?
How much one can compress the data/information? Are there any
limitation?
How can I measure performance of one system against another?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 3 / 14
8. Information
What is information? and How to quantify and measure it? ,
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
9. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
10. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
11. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.
If you have some thought about it - but slightly uncertian - The the
answer adds some Information.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
12. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.
If you have some thought about it - but slightly uncertian - The the
answer adds some Information.
If You know the answer already as 8848 m - Information is zero (or you
gained nothing).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
13. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.
If you have some thought about it - but slightly uncertian - The the
answer adds some Information.
If You know the answer already as 8848 m - Information is zero (or you
gained nothing).
So information is somehow related to uncertainty of something or
someone.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
14. Information
What is information? and How to quantify and measure it? ,
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.
If you have some thought about it - but slightly uncertian - The the
answer adds some Information.
If You know the answer already as 8848 m - Information is zero (or you
gained nothing).
So information is somehow related to uncertainty of something or
someone.
Whenever we deal with uncertainty, we make use of probability theory!
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 4 / 14
16. Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 5 / 14
17. Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.
He Quantified the abstract
concept of information.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 5 / 14
18. Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.
He Quantified the abstract
concept of information.
Measure of Information is the
amount of uncertainity
associated with the source that
generates information.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 5 / 14
19. Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.
He Quantified the abstract
concept of information.
Measure of Information is the
amount of uncertainity
associated with the source that
generates information.
Today,Majority of the
informations are of the
following forms :
1 Text
2 Audio
3 Image
4 Video
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 5 / 14
20. Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.
He Quantified the abstract
concept of information.
Measure of Information is the
amount of uncertainity
associated with the source that
generates information.
Today,Majority of the
informations are of the
following forms :
1 Text
2 Audio
3 Image
4 Video
Representation of an
information for : processing,
storage and transmission is
called data.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 5 / 14
22. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
23. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
24. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
25. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
26. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.
Each alphabets has probability of occurrence P(xk) associated with it
such that ∑K−1
k=0 P(xk) = 1
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
27. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.
Each alphabets has probability of occurrence P(xk) associated with it
such that ∑K−1
k=0 P(xk) = 1
∴ Information I can be measured as
Ik = −log2(P(xk))
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
28. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.
Each alphabets has probability of occurrence P(xk) associated with it
such that ∑K−1
k=0 P(xk) = 1
∴ Information I can be measured as
Ik = −log2(P(xk))
in bits (not binary digit!)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
29. Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.
Each alphabets has probability of occurrence P(xk) associated with it
such that ∑K−1
k=0 P(xk) = 1
∴ Information I can be measured as
Ik = −log2(P(xk))
in bits (not binary digit!)
Why there is a negative sign?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 6 / 14
31. Properties of Information
Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 7 / 14
32. Properties of Information
Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .
Information is always a positive quantity. I(xk) ≥ 0 for 0 ≤ P(xk) ≤ 1.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 7 / 14
33. Properties of Information
Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .
Information is always a positive quantity. I(xk) ≥ 0 for 0 ≤ P(xk) ≤ 1.
I(xk) > I(xi ) for P(xk) < P(xi ).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 7 / 14
34. Properties of Information
Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .
Information is always a positive quantity. I(xk) ≥ 0 for 0 ≤ P(xk) ≤ 1.
I(xk) > I(xi ) for P(xk) < P(xi ).
If a symbol x1 carries an information I1 and a symbol x2 carries an
information I2, then the combined information by x1 and x2 simply is
I1 + I2.
I(xk xi ) = I(xk) + I(xi )
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 7 / 14
35. Properties of Information
Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .
Information is always a positive quantity. I(xk) ≥ 0 for 0 ≤ P(xk) ≤ 1.
I(xk) > I(xi ) for P(xk) < P(xi ).
If a symbol x1 carries an information I1 and a symbol x2 carries an
information I2, then the combined information by x1 and x2 simply is
I1 + I2.
I(xk xi ) = I(xk) + I(xi )
● (Note: This is true if and only if the probability of occurrence of
the symbols are statistically independent and the source is
Memoryless).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 7 / 14
36. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
37. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
38. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
39. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
40. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
41. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
42. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
43. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
44. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits
I(x0&x1) = I(0) + I(1) = 2.41 bits
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
45. Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits
I(x0&x1) = I(0) + I(1) = 2.41 bits
→Solve the same if the probability of occurrence is equally likely.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 8 / 14
46. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
47. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
48. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
49. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?
1 Seed
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
50. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?
1 Seed
2 DoG
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
51. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?
1 Seed
2 DoG
3 Dose
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
52. 3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?
Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?
1 Seed
2 DoG
3 Dose
4 Do DoGs See God?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 9 / 14
53. 4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 10 / 14
54. 4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.
Source alphabets: ?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 10 / 14
55. 4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.
Source alphabets: ?
But : Pk = 1
M
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 10 / 14
56. 4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.
Source alphabets: ?
But : Pk = 1
M
∴I = log2( 1
1
M
)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 10 / 14
57. 4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.
Source alphabets: ?
But : Pk = 1
M
∴I = log2( 1
1
M
)
= log2(2N
) = N bits
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 10 / 14
58. Entropy
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
59. Entropy
Entropy is the measure of average information produced by the
source.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
60. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
61. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2)....
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
62. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)...,
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
63. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
64. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
H = E[I(mk)]
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
65. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
66. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
67. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.
So, entropy gives an average number of bits required to represent the
source.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
68. Entropy
Entropy is the measure of average information produced by the
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.
So, entropy gives an average number of bits required to represent the
source.
It is the foundation of Source Coding Theorem.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 11 / 14
70. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
71. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
72. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Properties ,
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
73. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Properties ,
Bound : 0 ≤ H(K) ≤ log2(K)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
74. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Properties ,
Bound : 0 ≤ H(K) ≤ log2(K)
Minimum value of the Entropy is zero. It will be zero if and only if
Pk= 0 or 1 (i.e Impossible or sure)
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
75. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Properties ,
Bound : 0 ≤ H(K) ≤ log2(K)
Minimum value of the Entropy is zero. It will be zero if and only if
Pk= 0 or 1 (i.e Impossible or sure)
The maximum value of the Entropy is log2(K) if all the alphabets are
equally likely (i.e Highly Random source ∴ maximum uncertainty).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
76. Properties of Entropy
Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.
Properties ,
Bound : 0 ≤ H(K) ≤ log2(K)
Minimum value of the Entropy is zero. It will be zero if and only if
Pk= 0 or 1 (i.e Impossible or sure)
The maximum value of the Entropy is log2(K) if all the alphabets are
equally likely (i.e Highly Random source ∴ maximum uncertainty).
*: Please do think of it logically in your mind, don’t just accept
or memorize. Of course, we prove it later!
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 12 / 14
77. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
78. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
79. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
What is the probability distribution function?. Binomial.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
80. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
What is the probability distribution function?. Binomial.
Number of symbols (K) in the source:
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
81. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
What is the probability distribution function?. Binomial.
Number of symbols (K) in the source: 2.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
82. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
What is the probability distribution function?. Binomial.
Number of symbols (K) in the source: 2. (0,1) (a,b) or
(*,$)..whatever
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
83. Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.
Solution:
What is the probability distribution function?. Binomial.
Number of symbols (K) in the source: 2. (0,1) (a,b) or
(*,$)..whatever
Symbol Probability: (p,1 − p). Can you derive it?
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 13 / 14
84. Ctd..
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 14 / 14
88. Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
According to the property 2 of entropy, maximum of entropy is given by
log2(K) = log2(2) = 1 bits.
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 14 / 14
89. Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
According to the property 2 of entropy, maximum of entropy is given by
log2(K) = log2(2) = 1 bits.
The only value that satisfy the above is p = 0.5,(i.e equally likely).
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 14 / 14
90. Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
According to the property 2 of entropy, maximum of entropy is given by
log2(K) = log2(2) = 1 bits.
The only value that satisfy the above is p = 0.5,(i.e equally likely).
Propability p
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Entropy
H
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X: 0.506
Y: 0.9999
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 14 / 14