Information and Entropy of a Source

Information Theory
Arun Prakash A
a.arun283 @ gmail.com
Department of Electronics and Communication
Kongu Engineering College, India
Winter 2018
Arun Prakash A a.arun283 @ gmail.com (KEC) Information Theory Winter 2018 1 / 14

A special Hour
We would have travelled 100000
kilometers in space by the time I finish
the lecture ,

Importance of the Subject

How to encode an information and reproduce it exactly at the receiver
without an error?

without an error?
How much one can compress the data/information? Are there any
limitation?

without an error?
How much one can compress the data/information? Are there any
limitation?
How can I measure performance of one system against another?

Information

Information
What is information? and How to quantify and measure it? ,

Information
Let us consider an example, What is the elevation of Mount Everest?
Intuitively:

Information
Intuitively:
If you have no idea about it - Totally uncertain (probability is zero) -
Then the answer gives you a high information.

Information
Intuitively:
If you have some thought about it - but slightly uncertian - The the
answer adds some Information.

Information
Intuitively:
If You know the answer already as 8848 m - Information is zero (or you
gained nothing).

Information
Intuitively:
gained nothing).
So information is somehow related to uncertainty of something or
someone.

Information
Intuitively:
gained nothing).
So information is somehow related to uncertainty of something or
someone.
Whenever we deal with uncertainty, we make use of probability theory!

Claude Shannon
Figure: Shannon

Claude Shannon
Figure: Shannon
A Paper published by Shannon
in 1948 have laid the
foundation for today’s digital
era.

Claude Shannon
Figure: Shannon
era.
He Quantified the abstract
concept of information.

Claude Shannon
Figure: Shannon
era.
Measure of Information is the
amount of uncertainity
associated with the source that
generates information.

Claude Shannon
Figure: Shannon
era.
Today,Majority of the
informations are of the
following forms :
1 Text
2 Audio
3 Image
4 Video

Claude Shannon
Figure: Shannon
era.
Today,Majority of the
informations are of the
following forms :
1 Text
2 Audio
3 Image
4 Video
Representation of an
information for : processing,
storage and transmission is
called data.

Information,Entropy

Information,Entropy
Let us consider a discrete memoryless source that generates
information from a fixed set of alphabets.

Information,Entropy
For ex: All the electronic textual informations in the world is
generated from a fixed set of 128 alphabets (ASCII Codes).

Information,Entropy
So we can think of the source X as a set containing K unique
elements (alphabets)
X = (x0,x2...xK−1)

Information,Entropy
X = (x0,x2...xK−1)
Suppose if we consider ASCII K = 128, x65 corresponds to letter ’A’
and x127 corresponds to ’DEL’.

Information,Entropy
X = (x0,x2...xK−1)
Each alphabets has probability of occurrence P(xk) associated with it
such that ∑K−1
k=0 P(xk) = 1

Information,Entropy
X = (x0,x2...xK−1)
such that ∑K−1
k=0 P(xk) = 1
∴ Information I can be measured as
Ik = −log2(P(xk))

Information,Entropy
X = (x0,x2...xK−1)
such that ∑K−1
k=0 P(xk) = 1
Ik = −log2(P(xk))
in bits (not binary digit!)

Information,Entropy
X = (x0,x2...xK−1)
such that ∑K−1
k=0 P(xk) = 1
Ik = −log2(P(xk))
in bits (not binary digit!)
Why there is a negative sign?

Properties of Information

Information I(xk) = 0 for P(xk) = 1 (i.e it is certain to occur). ∴
There is no gain information .

Information is always a positive quantity. I(xk) ≥ 0 for 0 ≤ P(xk) ≤ 1.

I(xk) > I(xi ) for P(xk) < P(xi ).

If a symbol x1 carries an information I1 and a symbol x2 carries an
information I2, then the combined information by x1 and x2 simply is
I1 + I2.
I(xk xi ) = I(xk) + I(xi )

If a symbol x1 carries an information I1 and a symbol x2 carries an
information I2, then the combined information by x1 and x2 simply is
I1 + I2.
I(xk xi ) = I(xk) + I(xi )
● (Note: This is true if and only if the probability of occurrence of
the symbols are statistically independent and the source is
Memoryless).

Few Problems
1.A memoryless source has 4 alphabets(symbols/elements) which are
equally likely to occur.Calculate the information of a single alphabet?
solution

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
since they are equally likely, P(xk) = 1
4

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
2. In PCM ’1’s occur with a probability of 0.75 and ’0’s occur with a
probability of 0.25. What is the amount of information generated by such
a source?

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits
I(x0&x1) = I(0) + I(1) = 2.41 bits

Few Problems
solution
X = (x0,x1,x2,x3), k = 0,1,2,3.
4
∴ I(xk) = log2( 1
1
4
)
I(xk) = 2 bits
a source?
Source: X = (0,1),P(0) = 0.75&P(1) = 0.25 ∴ ∑(P(xk)) = 1
I(x0 = 0) = log2( 1
0.25) = 2 bits
I(x1 = 1) = log2( 1
0.75) = 0.41 bits
I(x0&x1) = I(0) + I(1) = 2.41 bits
→Solve the same if the probability of occurrence is equally likely.

3. A source emits the following sequence with all its alphabets ”Do Geese
See God?”. Calculate the information of the source?

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
can you frame a new sentence or few words from the source
alphabets?

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
alphabets?
1 Seed

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
alphabets?
1 Seed
2 DoG

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
alphabets?
1 Seed
2 DoG
3 Dose

Source :
X = (d,D,e,s,S,G,o,?),k = 0,1,...,6 P(x0 = d) =?...P(x6 =?) =?
alphabets?
1 Seed
2 DoG
3 Dose
4 Do DoGs See God?

4. If there are M equally likely and independent messages, then prove that
the amount of information carried by each message will be I=N where
M = 2N
and N is an integer.

M = 2N
Source alphabets: ?

M = 2N
Source alphabets: ?
But : Pk = 1
M

M = 2N
Source alphabets: ?
But : Pk = 1
M
∴I = log2( 1
1
M
)

M = 2N
Source alphabets: ?
But : Pk = 1
M
∴I = log2( 1
1
M
)
= log2(2N
) = N bits

Entropy

Entropy
Entropy is the measure of average information produced by the
source.

Entropy
source.
The Source emits sequence m0,m1... in a signaling interval

Entropy
source.
The Source emits sequence m0,m1... in a signaling interval with a
probability of P(m1),P(m2)....

Entropy
source.
probability of P(m1),P(m2).... So,The information of each sequence
in that signaling interval would be I(m1),I(m2)...,

Entropy
source.
in that signaling interval would be I(m1),I(m2)..., then the average
information of the source is

Entropy
source.
H = E[I(mk)]

Entropy
source.
H = E[I(mk)]
=
K−1
∑
k=0
PkIk

Entropy
source.
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.

Entropy
source.
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.
So, entropy gives an average number of bits required to represent the
source.

Entropy
source.
H = E[I(mk)]
=
K−1
∑
k=0
PkIk
H(X) =
K−1
∑
k=0
Pk log2(
1
pk
)
.
So, entropy gives an average number of bits required to represent the
source.
It is the foundation of Source Coding Theorem.

Properties of Entropy
Motivation:

Motivation:
It is always helpful to study the properties of everything we learn.
Salt as an example.

Motivation:
Salt as an example.
It helps us to understand the outcome when they undergo certain
processes like dissolving in water and acid.

Motivation:
Salt as an example.
Properties ,

Motivation:
Salt as an example.
Properties ,
Bound : 0 ≤ H(K) ≤ log2(K)

Motivation:
Salt as an example.
Properties ,
Minimum value of the Entropy is zero. It will be zero if and only if
Pk= 0 or 1 (i.e Impossible or sure)

Motivation:
Salt as an example.
Properties ,
The maximum value of the Entropy is log2(K) if all the alphabets are
equally likely (i.e Highly Random source ∴ maximum uncertainty).

Motivation:
Salt as an example.
Properties ,
The maximum value of the Entropy is log2(K) if all the alphabets are
equally likely (i.e Highly Random source ∴ maximum uncertainty).
*: Please do think of it logically in your mind, don’t just accept
or memorize. Of course, we prove it later!

Problem
Consider a source which emits two symbols with a probability p and 1-p,
respectively.Prove that the entropy is Maximum only when both symbols
are equally likely. Plot the variation of entropy as a function of probability
p.

Problem
p.
Solution:

Problem
p.
Solution:
What is the probability distribution function?. Binomial.

Problem
p.
Solution:
Number of symbols (K) in the source:

Problem
p.
Solution:
Number of symbols (K) in the source: 2.

Problem
p.
Solution:
Number of symbols (K) in the source: 2. (0,1) (a,b) or
(*,$)..whatever

Problem
p.
Solution:
Number of symbols (K) in the source: 2. (0,1) (a,b) or
(*,$)..whatever
Symbol Probability: (p,1 − p). Can you derive it?

Ctd..

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
According to the property 2 of entropy, maximum of entropy is given by
log2(K) = log2(2) = 1 bits.

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
The only value that satisfy the above is p = 0.5,(i.e equally likely).

Ctd..
H =
1
∑
0
Pk log2(
1
pk
)
= −p0 log2(p0) − p1 log2(p1)
= −p log2(p) − (1 − p)log2(1 − p)
The only value that satisfy the above is p = 0.5,(i.e equally likely).
Propability p
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Entropy
H
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
X: 0.506
Y: 0.9999

Information and Entropy of a Source

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Information and Entropy of a Source