# Some comments on information theory

C

#### Cleopatre VII

##### Guest
Some time ago my attention was drawn to the post: Language, Sounds and Intelligent Design

I thought that I could share my knowledge of information theory from my studies, starting with probability calculus, which I currently teach in high school and in graduate studies. However, to start the discussion, I would like to write a few introductory posts. I have in mind that not all forum participants deal with mathematics.

This series of posts will focus primarily on information. Today's post is an introduction to the theory of probability. It turns out that the theory of probability and information are related, because the current information theory is based on probability models.

But why does information seem so important? Well, everything we perceive or describe, we get to know precisely because we receive information from the environment. However, we will move on to information itself in further posts.

The basic concept of theory of probability is, of course, the concept of probability itself. Probability may seem to be an intuitive concept, but its mathematical definition requires the establishment of specific models, which, as usually in mathematics, are idealized.

The scientific study of probability is a modern development of mathematics. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, but exact mathematical descriptions arose much later. There are reasons for the slow development of the mathematics of probability. Whereas games of chance provided the impetus for the mathematical study of probability, fundamental issues are still obscured by the superstitions of gamblers. However, we will talk about them after a short mathematical introduction.

Now let's think about the model of a six-sided dice. Suppose the sides of the dice are indistinguishable, and we can roll a natural number from 1 to 6. In our model the dice is also perfectly symmetrical and no physical conditions are taken into account that may have any influence on the result. Therefore, the discrete probability distribution for a roll of such a dice is as follows:

 x_i​ 1​ 2​ 3​ 4​ 5​ 6​ p_i​ 1/6​ 1/6​ 1/6​ 1/6​ 1/6​ 1/6​

x_i - random variables,
p_i - the probability that a random variable has a given value (in this case 1, 2, 3, 4, 5 or 6).

However, to take a closer look at the concept of a random variable, it is worth introducing some mathematical definitions. They will appear in further parts of our discussion.

I kindly ask forum participants to comment on the idea of this discussion and such posts. Do you prefer them to be longer or rather shorter than the current one? How often should they appear?

Let me add that in the next post I would like to discuss the Laplace’s definition of probability.

This could be good for me. I may like the use of entropy and random walks in my favorite model but I'm really an electrical engineer who picks up things in an incomplete kind of way a lot. Length and how often is kind of up to you but the length looks fine.

This could be good for me. I may like the use of entropy and random walks in my favorite model but I'm really an electrical engineer who picks up things in an incomplete kind of way a lot. Length and how often is kind of up to you but the length looks fine.
So I am very happy. I decided to start with the probability. Maybe in the following parts I will also include some simple examples so that everyone can try to solve them on their own. And then we'll go ahead and discuss. Of course, I will make every effort to ensure that everything is clear to everyone.

I may like the use of entropy and random walks in my favorite model
I am a theoretical physicist, but I must admit: I never understood the concept of entropy (or negentropy aka information). Therefore I am looking forward to learning something new from the discussions in this series....

Observational astronomer here. I tend to grope my way forward with a vague understanding of frequentist and Bayesian probability, and while I've some experience with eg Markov chain monte carlo it's not something I can claim deep professional knowledge about. This sounds like a very interesting and useful series and I encourage you to continue, @Cleopatre VII !

Thanks to everyone present for their comments. Therefore, as I announced, I am going to tell you a little bit about probability in Laplace's terms.

Definition (Laplace)
Let a finite set Ω of all possible elementary events be given. Then any subset A of the set Ω is called an event.

Remark
In the case of the dice, the set of elementary events is formed by the number of dots that can be obtained in a single roll, i.e. Ω1={1,2,3,4,5,6}.
For a change, in a roll of two dices, the set of elementary events is formed by pairs of numbers of dots that can be obtained in a single roll on each of the dice, i.e. Ω2={(1,1),(1,2),…,(1,6),(2,1),(2,2),…,(6,6)}.
Hence the set of elementary events in the second case is a Cartesian product given as
Ω21×Ω1.
For given sets A and B, the Cartesian product is the set of all ordered pairs (a, b) such that a belongs to set A and b belongs to set B.

Example 1.1.
The cartesian product of the three-element set of letters {a, b, c} and the two-element set of numbers {1,2} is a six-element set of {a1, a2, b1, b2, c1, c2}.
For finite sets, cartesian product is m×n-element set, where m is the number of elements of the first set and n is the number of elements of the second set.

A random variable is therefore a function that assigns numbers to elementary events.

The probability of an event A is given by the number P(A)=n/N ,
where: N is the number of all elementary events, and n is the number of events conducive to A (satisfying the condition in determining the event A).
It can therefore be written as:
P(A)=|A|/|Ω| ,
where |⋅| denotes the number of all elements of a given set.
Probability has the following properties:
1. 0≤P(A)≤1,
2. P(Ω)=1,
3. P(∅)=0, where ∅ is an empty set,
4. P(A∪B)=P(A)+P(B)-P(A∩B).

Example 1.2.
We will calculate the probability of drawing the letter A in the English alphabet, assuming that the probability of drawing each letter is identical.
Solution
The English alphabet consists of 26 letters, therefore |Ω|=26. The letter A only occurs once in the alphabet, so |A|=1, where |A| denotes the event consisting in drawing the letter A. There can be only one such event, as there are no more letters A in the alphabet.
Substituting to the formula given earlier, we get:
P(A)=|A|/|Ω| =1/26.
So the probability of drawing the letter A is equal to 1/26.

Example 1.3.
We will calculate the probability of drawing one of the first three letters of the Hebrew alphabet.
Solution
There are 22 letters in the Hebrew alphabet, so |Ω|=22. The first three are א (Alef), ב (Bet), ג (Gimel). Hence, the set A has 3 elements, i.e. |A|=3. Therefore, the probability that one of the first three letters will be drawn is
P(A)=|A|/|Ω| =3/22.

It should be noted that the calculus of probability involves some very common false beliefs that some gamblers may be unaware of.
However, we will discuss this in the next post. Now I would like to know if you have any questions or comments about the current post.

Finally, I give two simple exercises for the readers.
Exercise 1. What is the probability of getting an even number in a single roll of a dice?
Exercise 2. What is the probability of getting a number that is divisible by 3 in a single roll of a dice?

• • Stella Marys, Breo, Persephone and 11 others
Thanks to everyone present for their comments. Therefore, as I announced, I am going to tell you a little bit about probability in Laplace's terms.

Definition (Laplace)
Let a finite set Ω of all possible elementary events be given. Then any subset A of the set Ω is called an event.

Remark
In the case of the dice, the set of elementary events is formed by the number of dots that can be obtained in a single roll, i.e. Ω1={1,2,3,4,5,6}.
For a change, in a roll of two dices, the set of elementary events is formed by pairs of numbers of dots that can be obtained in a single roll on each of the dice, i.e. Ω2={(1,1),(1,2),…,(1,6),(2,1),(2,2),…,(6,6)}.
Hence the set of elementary events in the second case is a Cartesian product given as
Ω21×Ω1.
For given sets A and B, the Cartesian product is the set of all ordered pairs (a, b) such that a belongs to set A and b belongs to set B.

Example 1.1.
The cartesian product of the three-element set of letters {a, b, c} and the two-element set of numbers {1,2} is a six-element set of {a1, a2, b1, b2, c1, c2}.
For finite sets, cartesian product is m×n-element set, where m is the number of elements of the first set and n is the number of elements of the second set.

A random variable is therefore a function that assigns numbers to elementary events.

The probability of an event A is given by the number P(A)=n/N ,
where: N is the number of all elementary events, and n is the number of events conducive to A (satisfying the condition in determining the event A).
It can therefore be written as:
P(A)=|A|/|Ω| ,
where |⋅| denotes the number of all elements of a given set.
Probability has the following properties:
1. 0≤P(A)≤1,
2. P(Ω)=1,
3. P(∅)=0, where ∅ is an empty set,
4. P(A∪B)=P(A)+P(B)-P(A∩B).

Example 1.2.
We will calculate the probability of drawing the letter A in the English alphabet, assuming that the probability of drawing each letter is identical.
Solution
The English alphabet consists of 26 letters, therefore |Ω|=26. The letter A only occurs once in the alphabet, so |A|=1, where |A| denotes the event consisting in drawing the letter A. There can be only one such event, as there are no more letters A in the alphabet.
Substituting to the formula given earlier, we get:
P(A)=|A|/|Ω| =1/26.
So the probability of drawing the letter A is equal to 1/26.

Example 1.3.
We will calculate the probability of drawing one of the first three letters of the Hebrew alphabet.
Solution
There are 22 letters in the Hebrew alphabet, so |Ω|=22. The first three are א (Alef), ב (Bet), ג (Gimel). Hence, the set A has 3 elements, i.e. |A|=3. Therefore, the probability that one of the first three letters will be drawn is
P(A)=|A|/|Ω| =3/22.

It should be noted that the calculus of probability involves some very common false beliefs that some gamblers may be unaware of.
However, we will discuss this in the next post. Now I would like to know if you have any questions or comments about the current post.

Finally, I give two simple exercises for the readers.
Exercise 1. What is the probability of getting an even number in a single roll of a dice?
Exercise 2. What is the probability of getting a number that is divisible by 3 in a single roll of a dice?

No questions. Pretty elementary so far.

Ex. 1: 50%
Ex. 2: 33%

No questions. Pretty elementary so far.

Ex. 1: 50%
Ex. 2: 33%
Very good. So we can move on soon...

I have BS in Mechanical Engineering from over 20 years ago and haven't looked at too much math since then. I hope to follow along.
Cartesian product
Fyi, I had to look up what this meant.
4. P(A∪B)=P(A)+P(B)-P(A∩B).
I also don't know or remember what the 'U' and upside down 'U' mean. Can you please explain them?
Exercise 1. What is the probability of getting an even number in a single roll of a dice?
Exercise 2. What is the probability of getting a number that is divisible by 3 in a single roll of a dice?
I cheated and only solved for #2 and got the same answer as psychegram (I scrolled down to much and saw his answer to #1, so maybe people who answer should hide them when they post and please give a little more time before moving along, since there may be more people that need a little more time, fwiw). But I did it by counting the pairs from here - Probability: Rolling Two Dice

Is there an easier way to calculate the answer? Also, is there significance to the pattern in terms of the diagonal rows I counted in the above web link in relation to probability?

3. P(∅)=0, where ∅ is an empty set,
I'm interested in zero and empty set as you explain things and in general. Would this '3.' mean that there is a potential of a probability (of an event) and also possibly mean that there is no potential of probability (of an event)?

Edit Added: I also have some math background I forgot about in relation to receiving a MA in Counseling Psychology.

Last edited:
If the game or the reality is manipulated from outside the system, how useful are formulas? The test tube/Petrie dish is not operating in a vacuum or unbiased space or neutral environment. The cats are in a mysterious box. Furthermore, given that there must be laws governing overt external manipulation which we do not fully understand, how can that even be factored in? Is there an equation for hubris? Or to account for the wishful thinking of the black box manipulators? What about the will or desire of a soul?

The die may be cast, but the effects of the earthquake on the dice when they are rolled is incalculable.

• Cleopatre VII
I also don't know or remember what the 'U' and upside down 'U' mean. Can you please explain them?
Of course. ∪ and ∩ are notations from set theory. ∪ is the sum of the sets, and ∩ is the common part of the sets. Let me give you a few examples.

Example 1.
Let A = {1,2,3} and B = {5,6,7,8}. Then the sum of the sets includes all the elements that belong to any of the sets, i.e. A∪B={1,2,3,5,6,7,8}.

Example 2.
Let A = {-2,1,5} and B = {1,5,7,12}. Note that the numbers 1 and 5 belong to both set A and set B. However, they do not duplicate when we add the sets, i.e. A∪B={-2,1,5,7,12}.

The lack of duplication results from the fact that the sum of the sets is equivalent to the logical "or" – disjunction. The most commonly used designation of disjunction is ∨. As I write: “it is raining or it is not raining” I can write it as “it is raining ∨ it is not raining” Classical disjunction is a truth functional operation which returns the truth value "true" unless both of its arguments are "false". What does this mean for sets? Well, a certain number may belong to one of the sets or to both of them, then it belongs to the sum of the sets:
x∈(A∪B)⇔x∈A∨x∈B,
where ∈ means belonging to the set and ⇔ means equivalence.
We are used to that "or" means either one option or the other option. In this case, no. Our "or" means either one option, the other option, or both.

In turn, ∩ means the common part of the set, the so-called intersection. This operation is associated with the logical "and" – conjunction. The most commonly used designation of conjunction is ∧. When I say "I like purple and white" I mean I like both purple and white. If I don't like one of these colours then the conjunction is false. Thus, the intersection of sets contains only those elements that belong to both sets, i.e.
x∈(A∩B)⇔x∈A∧x∈B
Then the job for you. Try to find the intersection of the following sets yourself:
A = {1,2,3} and B = {5,6,7,8},
A = {-2,1,5} and B = {1,5,7,12}.
My hint is that the empty set you mentioned will appear here.

You asked a very interesting question:
I'm interested in zero and empty set as you explain things and in general. Would this '3.' mean that there is a potential of a probability (of an event) and also possibly mean that there is no potential of probability (of an event)?
How is it with this empty set and the probability? We say that the probability of hitting a point is zero. But on the other hand, we think we're hitting the point. So is there a paradox here? Well, in the case of infinite sets, the probability of a given event is not an inherent feature of the set and the event, but depends on the method of selecting the event. I will write more about it in the next post, because it is a very interesting topic and in a way it is connected with what I would like to write about, i.e. with some false beliefs about probability.
Is there an easier way to calculate the answer? Also, is there significance to the pattern in terms of the diagonal rows I counted in the above web link in relation to probability?
In your link, I see a double roll of a dice. I asked for a simpler case - one throw. In this case, the situation is very clear. According to the model, the probability that any number of dots will be drawn is 1, which is 100%. Each number of dots may be drawn with a probability 1/6. Even numbers of dots are 2, 4, 6, so we write 1/6+1/6+1/6=3/6=1/2, which is 50% (6/6=1 - 100%).
When we have two rolls, we already have 36 possible pairs. So the probability that each pair will be drawn is no longer 1/6, but 1/36.
I hope you understand better now. If necessary, ask and investigate the matter to the deepest depth. I personally like to be asked a lot.

• • Stella Marys, Breo, Gruchaa and 5 others
If the game or the reality is manipulated from outside the system, how useful are formulas? The test tube/Petrie dish is not operating in a vacuum or unbiased space or neutral environment. The cats are in a mysterious box. Furthermore, given that there must be laws governing overt external manipulation which we do not fully understand, how can that even be factored in? Is there an equation for hubris? Or to account for the wishful thinking of the black box manipulators? What about the will or desire of a soul?

The die may be cast, but the effects of the earthquake on the dice when they are rolled is incalculable.
Well, I agree with you. However, I am talking about mathematical models for now. As I pointed out in my first post, mathematical models are idealized. We are talking about a perfectly symmetrical dice and perfect conditions. It is obvious that such ideal beings do not correspond perfectly with reality, but in some way they bring it closer.

This is how science works. Science is looking for numerous models and approximations that are easy to understand and that relate to some degree to observable reality. At the same time, any abnormal conditions are sometimes ruled out. What is most typical is considered. Mathematical statements are true within a given system, they are not reality in themselves.

Hence, I am absolutely not saying that any probability is 1/6. I am describing only the mathematical model. When I write about entropy, I will refer to statistical physics. By the way, I will mention Maxwell's demon, I think it is related to your statement.

How useful are the formulas? As long as they can be useful, it is worth using them. After all, the scope of their applicability is not infinite.

I get it. Under "normal" circumstances, models of probability and orderly information work fine, I guess. My point is that It seems we have entered a twilight zone where the laws of "normal reality" that allow for useful models to be calculated and used are breaking down. Like extreme cold or heat where strange, unpredictable things start happening to matter. I do see the models as comforting for sure. They may still even be somewhat predictive. But I am not sure "science" is looking for something easy that kind of more or less relates to what is observable. "Close enough" for horseshoes, hand grenades and science?

If half of this discussion is based on information, then all the radical new information coming down the pike, whether it be from the C's or from the manifestation and implementation of the Great Reset, changes every calculation of the probability of events/outcomes. And then there is the false and unreliable information to contend with as well as the unknown information (Rumsfeld forgive me). If information is a big part of any model or system, the model is only as good as the information. All you have to do is look at climate change models to find that devil in those details. Or the "to Vax or not to Vax" debate where you have people believing in 2 distinct realities. I don't think abnormal conditions can be ruled out in any formula these days because we have entered a state where abnormal is becoming normal; IOW, all we have is abnormal now!

As for probabilities, the stock market will keep going up until it doesn't. The probability of a crash is 100% but the probability it will happen tomorrow is very low. Not very scientific and not very useful, I know. But that is the reality.

It's like surfing. The ocean is chaotic and unpredictable. A surfer is best served by keeping the senses open and by careful positioning in relation to a constantly changing observed and felt reality. (In small waves, it is possible to be careless and get away with it, but, in the really big thumpers, you have to 100% 'go for it' or 100% get the hell out of the way, so to speak. Fractional decisions usually have negative outcomes.)

You know what? I think I am in the wrong classroom here. I'll take an "F" and I won't disrupt the class any further. My apologies.

Last note: it's funny you mention Maxwell's Demon. I have been working on a song titled "Hounded by Demons". It will be done soon and I will post it. But the Demon that relates to systems of human containment, manipulation, information and probability is an unseen but very real one, and not a hypothetical mental construct. I don't mean to be rude. It is an interesting discussion. I love system dynamics. I think that is what we are relying on at this point, if there is to be any hope. Hmm. The opposite of Murphy's Laws.

Something like: In a system where anything can go wrong, everything will go wrong, BUT the system will be miraculously rebalanced by it's own disintegration. IDK, it is late and I am tired.

You know what? I think I am in the wrong classroom here. I'll take an "F" and I won't disrupt the class any further. My apologies.
But you absolutely do not disturb, and everything you write is very important, what's more - in line with what I myself think. For now, I present simple models, but I only start with them. I myself experience paranormal phenomena regularly. They cannot be explained in terms of modern probability theory.

Hence, I fully agree with your allegations. I am writing about models here, I am not saying that they are consistent with reality. I believe that science should strive to be more precise and take into account everything that we experience that the theory does not explain.

I am definitely glad you are taking part in this discussion. Do not apologize. Thank you for being here.

Then the job for you. Try to find the intersection of the following sets yourself:
A = {1,2,3} and B = {5,6,7,8},
A = {-2,1,5} and B = {1,5,7,12}.
Ok, thank you for the descriptions and explanations in your entire post.

#1 – empty set
#2 – 1, 5
I will write more about it in the next post, because it is a very interesting topic and in a way it is connected with what I would like to write about, i.e. with some false beliefs about probability.
Ok, thank you.