… are wanted here.

Everyone learns differently, and everyone has a unique understanding of mathematics. Therefore, each person needs an explanation of a mathematical concept that is tailored to them. For everyone, there is one explanation that stands out as the best—the best explanation in the world.

Explanations as a Starting Point

On this page you will find a number of explanations—especially ones that you won’t normally encounter in schoolbooks and that go much deeper than what is common among YouTubers or TikTokers. These explanations invite a critical engagement with the material currently taught in schools. To some, this may seem unusual, since mathematics is often regarded as a rigid set of rules. But it is not. There is plenty of “evidence” for that right here.

The explanations presented here are neither the only “correct” ones nor are they complete. Rather, they are meant to serve as a starting point for delving more deeply into the topics. Engaging with mathematics in depth is an exciting journey into your own inner world of structures, logic, and abstraction. May the following ideas help guide you along that path.

Unique Explanations

Almost everyone has the desire to be unique and to be valued for that uniqueness. In mathematics, individuality begins with the fact that each person develops their own understanding of mathematical objects.

Take fractions as an example: they can be seen as parts of a whole, as two-dimensional numbers, as results of division, as outcomes of distribution, as mathematical operators, as points on the number line, as ratios, or as shares. Fractions can be represented with fraction strips, pie charts, or subdivisions of a time span. They can be experienced by sensing different weights, by perceiving different levels of brightness, or by hearing different volumes. All pitches, for instance, arise from different subdivisions of time, and a sound can be described as a tone with differently weighted overtones—again expressible as fractions.

Every student who encounters the topic of fractions will, out of these many possibilities, form their own personal picture of what fractions are. It cannot be otherwise. Realizing this individuality can be a strong motivation to want to learn more about mathematics.

New Mathematics

For some of the explanations on this page, you will also find further questions—questions without standard solutions. Engaging with these questions can quickly lead to new mathematics. Many people assume that students cannot discover anything new in mathematics, but the opposite is true: as soon as you step beyond the usual school exercises and phrase the questions even slightly differently from the way they appear in textbooks, you may arrive at mathematical ideas that have never been published before.

The likelihood of inventing—or discovering—new mathematics in this way is actually quite high. And the joy of making your own contribution to the “history of mathematics” is priceless.

Get Involved

If you know of an explanation that you find helpful but that isn’t yet on this page, send it to me and I will publish it here (as long as it is mathematically correct).
If you are looking for an explanation that isn’t included, feel free to write to me at martinwabnik@gmail.com. We will find someone who can explain it.

Any constructive feedback is, of course, always welcome!

The Standard Model of School Mathematics

We can think of numbers as line segments on the number line. But we can also arrange them vertically, which allows us to see many relationships that are not visible with “horizontal” numbers. For example, the multiplication of fractions or exponentiation with rational exponents can be very clearly understood using this model.

Martin Wabnik’s YouTube Channel

This YouTube channel offers over 400 high-quality mathematics videos covering various topics in school mathematics. The focus is on developing a deep understanding of mathematics, not on providing empty, mechanical solution recipes. Every video is built on a clear didactic concept, carefully executed both visually and verbally. Anyone who truly wants to understand mathematics will find here exactly what they have been looking for.

Arithmetic

The Beginning of Mathematics

When people are asked where mathematics begins, the most common answer is:

“One and one is two.”

This is often accompanied by the opinion that this statement is proven, indisputably true, will always remain so, and therefore nothing new can arise in mathematics.

However, all of these opinions are factually incorrect. Since there are approximately 100,000 publications worldwide each year containing new mathematics, it cannot be said that there is nothing new in this field.

The reason this statement is often considered indisputable and proven may be that adults have possibly forgotten how much cognitive effort they had to invest as children to learn counting and arithmetic. Here, a few aspects of addition are presented to show that even the statement “one and one is two” can and must be shaped individually. To practice mathematics seriously, we should understand what our starting point can be. There are several meaningful possibilities for this, but they are not simply given; they must be worked out.

Is 1 Plus 1 Equal to 2 ?

The adjacent image could illustrate the statement: One elephant and another elephant are two elephants. But how do we know that? Most people—at least those who read this sentence—have never counted real elephants in their lives. In school, counting and arithmetic are taught using many different materials, typically with dice, counting tiles, number rods, and perhaps also with apples and pears.

Thus, to recognize an addition in this image, we must be able to come to the somewhat amusing conclusion that elephants are like apples and pears—at least regarding their countability. To seriously maintain this statement, we must be able to distinguish it with our common sense from all sorts of nonsense, which will be demonstrated in the following.

There are no elephants corresponding to the first two illustrations. Can we still add them? A serious question: To what extent must things exist if we want to add them? Can we also add imaginary concepts? We sometimes say, “I just had two thoughts at once.” So, can thoughts be counted? Do they need to occur simultaneously, or is it sufficient if they occur consecutively? We will not be able to definitively resolve these and similar questions. However, they demonstrate that the statement “One plus one is two” cannot be unconditionally true if we cannot even determine to what it is meant to be applied.

What can we apply numbers and calculations to with complete certainty? For example, when baking a cake. If the recipe states that \(2\) eggs are to be added, we add eggs to the batter until we have counted to \(2\). Viewed this way, counting and calculating is a practical mental concept that works perfectly in everyday life. However, this has nothing to do with absolute truth.

  1. If „adding two eggs“ means to take eggs in our hands and incorporate them into a cake batter, the question arises whether one can add elephants, since it is not so simple to add them to anything—at least, this is more difficult than with eggs.
  2. If we add \(2\) eggs to a batter, i.e., calculate \(1+2\), we obtain only one batter. In this case, is \(1+2=1\)?

Can we add an elephant and an apple even if the elephant eats the apple?

If one adds an apple to an elephant, one does not end up with \(2\) objects, but only one, because the elephant will likely eat the apple. And even if it does not, what could the \(2\) represent? Two completely unspecified objects? As we can see from these questions, the things we want to add should be sufficiently similar and temporally constant. However, what this similarity should be cannot be defined in general, just as the minimum time span during which things should remain unchanged in order to be addable cannot be generally specified.

We can also understand addition in a completely abstract way as a method that assigns a third number to two given numbers. For example, the numbers \(2\) and \(6\) are assigned the number \(8\), and the numbers \(7\) and \(3\) are assigned the number \(10\). The first assignments can be seen in the table on the right. To obtain further ones, a set of rules — such as the standard algorithm for addition — could be defined, allowing all other numbers to be added as well.

When people claim that the statement “One plus one is two.” is absolutely true, they often mean: “\(1+1=2\) is written in that list. So the equation must be correct!”

That the sequence of symbols \(1+1=2\) appears in the list is, by human judgment, hardly disputable. But what have we actually achieved with that? If we limit ourselves to judging the correctness of \(1+1=2\) only by whether this sequence of symbols appears in the list or not, then we indeed have a (quite) obviously correct sequence of symbols before us—but one that is meaningless so far. If we deliberately ignore the way in which the addition of numbers manifests itself in the real world, we may be right, but we lack relevance. With such an argument, we cannot justify that mathematics is correct (or perhaps even true), but only that a particular sequence of symbols appears i a list.

angelicavaihel-Pixabay

We can only add things that we can also count.

Here is, of course, an incomplete list of things that we cannot count:
rain, fun, catle, leisure time, courage, wood, snow, thunder, lightning, cheese, equipment, tea, traffic, underbrush, homework, trash, music, staff, luggage, baggage, clothing, research, livestock, sand, milk, oil, honey, weather, wool, broccoli, furniture, work, butter, news, dust, iron, gold, meat, money, love, happiness, heat, thirst, pasta, electricity, knowledge, and so on.

For these cases, \(1+1=2\) does not hold!

Normally, ‚time‘ has no plural. At times we sometimes need more time, but at no time can we need more ‚times‘. Even though we can have good times, we still can’t add them – it just doesn’t add up. Nevertheless we can add newspapers, and when we want to multiply 2 newspapers by 3, we write ‚2 times 3‘, but ‚The Times‘ is at all times treated as singular, even though it’s sold many times every day to people who don’t have time anymore since they read ‚The Times‘.

Bread – in the sense of a foodstuff – does not have a plural form. You need loaves of bread to be able to count and add up bread. But a bakery can have ‘breads’ in the sense of types of bread such as rye bread, sourdough bread, or baguettes. In German today, it is exactly the opposite: 2 breads (2 Brote) always means two loaves of bread, and if you want to distinguish between rye bread, sourdough bread, and baguettes, you need to use the term ‘types of bread’ (Brotsorten).

In English, ‚mathematics‘ is grammatically a plural form but is treated – like ‚Mathematik‘ in German – as uncountable and singular. The science that deals with counting, among other things, is itself uncountable, yet it includes subfields like geometry, which can be divided into multiple countable geometries (e.g., non-Euclidean geometries), or algebra, which as a branch of mathematics is uncountable but deals with algebras (e.g., Boolean algebras or Lie algebras).

As we can see, the validity of \(1+1=2\) depends on temporal, technical, linguistic, cultural, etc. contexts and does not apply in itself.

Fun fact: In Japanese, most nouns do not have a plural form. So does \(1+1=2\) not apply in Japan?

Let’s return to the first image: most people see two elephants on the left being added together, resulting in the same two on the right. In fact, however, the identical graphic appears four times in this image. Since we take into account that we cannot add the same elephant twice, we interpret the two identical graphics on the left as two different elephants, and on the right we do not see two more elephants or two representations of one elephant (after all, we are presented with two identical graphics), but rather another representation of the same two different elephants on the left.

This is a good example of how we humans tailor reality to fit our conceptual framework. So if \(1+1=2\) is true, it is because we humans want it to be true and, if necessary, we even bend actuality to make it fit.

Fractions

The fraction strips are in the Google Drive folder „Die besten Erklärungen der Welt“ and can be downloaded free of charge.

License Notice: I am deliberately releasing these materials under the most open license available: Creative Commons CC0 1.0 (Public Domain Dedication). This means you are free to print the fraction strips, modify them, include them in teaching materials, use them on YouTube or in books—even commercially. Attribution is not required.

Fractions

There are many ways to define what fractions are. To be able to work with fractions, we simply need to choose one definition and then derive all the properties of fractions from it. Here, we choose to view fractions as parts of a unit on the number line. The following PDF also introduces the first ways of expressing them.

Fractions can appear in many different contexts. Some of them are illustrated in the following PDF.

Expanding Fractions

When we divide the parts of a fraction into smaller parts, we create a fraction of the same size. This process is called expanding. How we can visualize this is shown in the following PDF.

Reducing Fractions

If the numerator and denominator of a fraction have a common factor, we can divide both the numerator and denominator by this factor without remainder. This produces a fraction of the same size. The fraction then consists of fewer, but larger parts. In the following PDF, we explore this visually using fraction strips.

Least Common Denominator

To add, subtract, or compare two fractions, we expand them so that they have the same denominator. One way to do this is to expand each fraction using the denominator of the other fraction. However, this can lead to unnecessarily large denominators. Therefore, fractions are usually expanded only to the least common denominator. The least common denominator is the smallest common multiple of both denominators. The following PDF shows how this is done and provides examples.

Comparing Fractions

We humans can immediately recognize which of two given natural numbers is larger. For fractions with different denominators, this is not necessarily the case. However, if we expand fractions to have the same denominator, this becomes straightforward.

Adding fractions

At first glance, adding fractions may seem simple: make the fractions have the same denominator and then add the numerators. In fact, there are a few more steps involved: check the fractions for reducibility and simplify if necessary, determine the least common denominator, expand both fractions to the least common denominator, and then check again for reducibility and simplify if needed. The following PDF allows these steps to be followed using fraction strips—not only with the simplest fractions, but also with those that a typical student may encounter naturally. The fraction strips serve here as a standard model. The reasoning is: if adding fractions works with the fraction strips, this method can be considered valid and applicable to all fractions.

Subtracting Fractions

In a similar way to how we add fractions, we can also subtract fractions. However, if we want to visualize this calculation using fraction strips, we have to change the direction of our thinking, and we cannot simply place the strips next to each other as we do when adding. The following PDF contains several worked-out examples in detail.

Multiplying Fractions

When we multiply fractions, we multiply numerator by numerator and denominator by denominator. But why, actually? In the following PDF, it is shown how we can understand this. In addition, the fraction strips illustrate why we can simplify „crosswise“ and why we can swap the numerators as well as the denominators when multiplying fractions.

Dividing Fractions

Dividing Fractions – Explaining the Reciprocal Rule – Measuring (Quotitive Division)

Im PDF wird die Kehrwertregel am Beispiel des Teilens von 2/3 durch 4/5 begründet

We divide by a fraction by multiplying with its reciprocal. This is called the reciprocal rule.
But why does the reciprocal rule work? To answer that, we need to ask ourselves what division really means. There are several ways to understand it:

Division is often understood as measurement. When we divide \(12\) by \(3\) , we can ask: How many times does \(3\) fit into \(12\)? The answer is \(4\), because \(3\) fits into \(12\) four times. When we measure the length of a distance, we proceed in a similar way: We take a measuring stick that is, for example, \(1\) meter long, and ask how many times this stick fits along a given distance. If the stick fits exactly four times, then the distance is \(4\) meters long.



If we want to understand the division of fractions as measurement, we can represent the fractions using fraction bars.
The PDF shows how fractions can be divided visually and how the reciprocal rule can be justified.

Didactical note: In the PDF, the visual explanation using fraction strips demonstrates how the validity of the reciprocal rule can be directly seen.
Moreover, the most difficult case is shown: both fractions have different numerators and denominators, none of the numerators and none of the denominators is equal to \(1\) , and the second fraction is greater than the first. While similar visual explanations exist, this is the only one that can handle the most complex case without relying on analogies.

Visual Explanation of the Reciprocal Rule – Video (German with English subtitles)

Kehrwertregel – anschauliche Begründung – Messen

The reciprocal rule states: To divide by a fraction, multiply by its reciprocal. In mathematics, such a rule is not just stated; it is also justified. A purely visual justification is presented in this video.

Dividing Fractions – Justifying the Reciprocal Rule – Sharing (Partitive Division)

We can also understand division of numbers as „sharing.“ If we distribute \(15\) apples into \(3\) baskets, there are \(5\) apples in each basket. That is why \(15 \div 3 = 5\).
But how do we, for example, distribute \(\frac{4}{5}\) across \( \frac{2}{3} \) ? What could that even look like? The PDF shows how, by cleverly dividing areas, the justification of the reciprocal rule can be read off directly and visually. This brief presentation shows the „most difficult“ case: neither numerators nor denominators match, all are different from \(1\), and the fraction being divided into is smaller than \(1\).

Didactical note: The PDF presents the only existing visual justification of the reciprocal rule that is based on the idea of sharing (partitive division).
In this explanation as well, the validity of the reciprocal rule can be seen directly. There are indeed more general approaches — for example, distributing a certain amount of water among containers with different base areas, where the water level changes accordingly when poured — but only this explanation allows the numbers involved in the multiplication to be read off directly from the visual representation.

Assoziativ- und Kommutativgesetz

Assoziativ- und Kommutativgesetz

The first two formulas that are usually introduced in mathematics class are these:
1) \(a+b=b+a\) and
2) \(a+(b+c)=(a+b)+c\)

The first formula is called the Commutative Law of Addition, and the second formula is called the Associative Law of Addition.
These formulas express something we already know from everyday life: No matter in which order we add things, the result is always the same.
In the video, we will look at how we can apply these formulas.

Why Is a Negative Times a Negative Positive?

The rule that a negative times a negative equals a positive — for example, that \( -2 \times (-3) = + \; 6 \) — is the starting point of many more or less serious discussions about the correctness of mathematics. This is understandable, since this “negative times negative” seems to contradict the understanding of multiplication that we have known since elementary school. Back then, we learned that multiplication is a shorthand form of addition. For example, \(3 \times 4\) is either \(3 + 3 + 3 + 3\) or \(4 + 4 + 4\). In this context, an expression like \( (-3) \times (-4) \) simply does not make any sense.

The problem is indeed very deep: we cannot prove that “negative times negative” must inherently equal “positive.” However, we can show in certain mathematical models how such calculations make sense and also lead to correct results. In the following PDF, one topic is the number line as the standard mathematical model for arithmetic with numbers, and the other two models are more connected to everyday life: walking back and forth, and eating chocolate cookies. Yes, even for chocolate cookies, “negative times negative” equals “positive”!

In this video, a different explanation is presented: It is about the fact that every number should have an opposite number. The opposite number of a negative number must then be a positive number.

Assoziativ- und Kommutativgesetz
Assoziativ- und Kommutativgesetz

Expression Manipulation

Transformations of expressions provide a clear illustration of how the needs of highly able mathematics learners differ from those of other students. While most students try their best to imitate what was written on the board when transforming expressions, highly able mathematics learners ask questions that may seem bizarre to most of their classmates. Some of these questions may even be difficult for math teachers to answer. For example: What is an expression? What is a manipulation of an expression? In a manipulation of an expression, is an existing expression changed into another one, or is a new expression found in addition to an existing one? What makes a manipulation correct, and why? How can the reason for its correctness be understood intuitively? How can we prove that two expressions yield the same result for all numbers, even though there are infinitely many numbers? And why do we perform manipulations of expressions at all?

The following PDF does not provide complete answers to all of these questions. However, it shows several possible approaches one might take to gain a deep understanding of expressions.

Understanding Equivalent Transformations

Just like expression manipulations, equivalent transformations can be illustrated on the number line. Interestingly, understanding them is much more complicated than actually performing equivalent transformations. We can observe this phenomenon in many areas of mathematics, and it represents part of the strength of mathematics: anyone can apply mathematics by substituting numbers into a formula without having to understand the reasoning behind the formula. However, the part of mathematics that involves striving for understanding is by far the more interesting one.

The pq-Formula for Quadratic Equations

pq-Formel für quadratische Gleichungen

The pq-formula is an important formula used to solve quadratic equations. In the video, we look at the formula itself, but not at how it is derived. Several examples are also worked through in which the pq-formula is applied. We observe that some quadratic equations have two solutions, while others have only one solution or no solution at all.

In this video, not only are the calculations shown, but it is also explained how to determine whether the pq-formula can actually be applied to a given equation. The following holds: the pq-formula is applicable when, by substituting values for p and q into the standard form of a quadratic equation, the given equation results. However, since this statement sounds rather cumbersome, the video does not go into it further; instead, the substitution process is demonstrated in a clear and visual way.

This also makes it easy to see which parentheses must be “carried along” when negative numbers occur in the given equation. Of course, in this video as well, all wording and notation follow the precise conventions used in mathematics. People who are used to the casual style common among YouTubers may find this demanding. Those who wish to learn how the calculations are “officially” stated and written will find what they are looking for here.

Exponential Function with Salt Dough

Exponentialfunktion mit Salzteig

Exponential functions appear frequently in everyday life. We are practically surrounded by these functions. To demonstrate this fact in a very tangible way, this video models an exponential function using salt dough. This gives us a physical way to visualize what exponential functions are like.

And in the end, there’s even a striking insight: we can almost see for ourselves why any number raised to the power of \(0\) is always equal to \(1\).

Why is \(2^0=1\) ?

Just as multiplication is introduced as a shorthand for addition — for example, \(2+2+2=3 \times 2\) — exponentiation is first explained as a shorthand for multiplication, where, for instance, \(2 \times 2 \times 2 = 2^3\). As long as both the bases and the exponents are positive natural numbers, this causes no problem.

However, with \(2^1\), we already start to wonder. It is defined that \(2^1=2\). This still fits the definition of multiplication, since \(2=1 \times 2\). But when the exponent is \(0\), we must ask what \(2^0\) is supposed to mean.

Well, it is defined that \(2^0=1\), just as \(22^0=1\) and \( \left( \frac{1}{222} \right)^0 =1\). This may seem quite strange to many people. The following PDF explains why this definition was made and why it actually makes sense.

Differential and Integral Calculus

Derivative Without a Limit

Ableitung ohne Grenzwert

The differential quotient is the central concept of differential calculus. In this video, we explore how we can gain an intuitive understanding of what the differential quotient means. From a geometric point of view, it is about determining the slope of a tangent that touches the graph of a function at a single point. The difficulty is that, in order to determine the slope of a line — and a tangent is a line — we normally need two points, while the tangent shares only one point with the graph of the function.

Usually, this problem is solved by defining the slope of the tangent as the limit of the slopes of the secants. In this video, however, we take a completely different approach: we examine the slopes of the secants that are located in the neighborhood of the point of tangency. We then find that there is exactly one slope that is not a secant slope — the tangent slope. Thus, we determine the tangent slope by excluding all the other slopes. In doing so, we do not even need the concept of a limit.

Power Rule – Derivative

The power rule is used for differentiating power functions, and therefore also for all polynomial functions. As is customary in mathematics, the rule is first proven before it is applied.

The proof for natural exponents can be carried out by expanding a binomial. However, the power rule actually holds for all real exponents — a fact that is usually omitted in school mathematics.

For the general proof, one needs the chain rule and the derivative of the logarithmic function, but the amount of work required in writing it down is actually much less.

Chain Rule – Derivation and Visual Explanation

Kettenregel – Herleitung und anschauliche Erklärung

The chain rule is used to differentiate composite functions. In the video, the chain rule is introduced, a complete example is worked out, the formal justification is shown, and we also look at how we can understand the chain rule intuitively.

It must be explained why the product of the derivatives of the inner and outer function happens to equal the slope of the composite function. In addition, we consider why the derivatives are multiplied and not, for instance, added.

Fundamental Theorem of Calculus

In a sense, the Fundamental Theorem of Calculus states that (under certain conditions) the derivative of an area function is exactly equal to the function values of the function whose area between its graph and the x-axis it measures.

Put more simply: Finding the area is the opposite of finding the slope, and vice versa. That might indeed seem surprising!

The formal proof of the Fundamental Theorem is short, but it does not offer any intuitive understanding of this relationship. Therefore, the accompanying PDF provides a visual explanation of why (under certain conditions) integration is the inverse operation of differentiation.

Hauptsatz der Differential- und Integralrechnung

Under certain conditions, the Fundamental Theorem of Calculus can (in a somewhat simplified form) be understood as follows: An antiderivative can be used to calculate an area.*

However, an antiderivative, as the opposite of a derivative, has at first nothing to do with area. And yet, it works. How we can understand this relationship visually is shown in the video.

*More precisely: The area between the graph of a function f and the x-axis on the interval [a; b] can be determined by the difference of the function values F(b) and F(a) of an antiderivative F.

Improper Integrals

With improper integrals, one achieves the remarkable feat of having an infinitely wide area in front of us that nevertheless has a finite area. This usually defies common sense. All the more surprising is that there is an extremely simple explanation by which we can understand this seemingly paradoxical phenomenon.

Probability and Statistics

What is probability?

Was ist Wahrscheinlichkeit?

IIn school, mainly two different concepts of probability are taught: the Laplace probability concept and the frequentist probability concept. Both of these concepts present considerable difficulties in understanding—apart from the fact that, according to the journal mathematik lehren, they are also circular. However, there is a very simple concept of probability that arises when the axiomatic probability is reduced to a school-level approach: probabilities are proportions. Furthermore, one can skip the discussion of whether this is the „correct“ probability or not: we are calculating with proportions anyway, whether we determine probability by drawing lots or by integrating a density function.

Probability of Rain

Regenwahrscheinlichkeit

In weather forecasts, one can hear statements such as: „The probability of rain today is 80%.“ The problem with this is that weather is not a random experiment, and therefore there cannot be a true probability of rain. In the video, we examine what such a probability of rain could mean. It does not mean that 80% of the area of the forecast region will receive rain, and it also does not mean that it will rain 80% of the time. Rather, it means that in the past, on 80% of the days with comparable weather conditions, it rained. Therefore, the „probability“ is actually a relative frequency. As can be read online, even this relative frequency is not always communicated to the audience. In the video, the discussion is not about whether this is actually the case, but about a possible explanation: this explanation is loss aversion, which can also be captured mathematically.

Bayes’ Rule – Illustration

Regel vonBayes – Veranschaulichung

Bayes’ rule is somewhat „strange“ because on the left side of the formula there is information that — at first glance — does not appear on the right side. In this video, it is shown using simple, intuitive methods how this can be understood.

Empirical Law of Large Numbers

Suppose we have a box containing one blue ball and one red ball. We randomly draw a ball, record its color, and then put the ball back. We repeat this process, drawing a ball each time, and continue in this way. If we carry out this procedure \( 100\) times, it is quite likely that we will draw approximately \(50 \) blue balls and approximately \(50 \) red balls. This „fact“ is the core statement of the empirical law of large numbers. But why is this the case? Certainly not because the „laws of chance“ demand it—as is sometimes grandiosely claimed. (And even if that were true, the question of „why?“ would still remain unanswered.) The shortest possible answer is: Because there are far more possible sequences with approximately \(50 \) blue balls than there are sequences with other distributions.

In the following PDF, the situation is explained in detail. It precisely defines what is meant by „far more possible sequences“ and also shows how the empirical law of large numbers can be understood using the Galton board. Furthermore, it demonstrates how, even without knowledge of combinatorics, the numbers of possibilities can be calculated for the first few trials using (extended) Pascal’s triangles.

The unique method of explaining the empirical law of large numbers presented in the PDF has the enormous advantage that the regularities can be recognized intuitively after only the first few trials. Therefore, it is possible to omit the usual (and often confusing) note that the empirical law of large numbers applies only to very, very many trials.

Relative Frequency and a Consequential Misconception

Die relative Häufigkeit und ein folgendreicher Irrtum

A widespread misconception is that the relative frequency of an event approaches the probability of that event as the number of trials increases. While it may well happen that after 100 coin tosses we obtain „heads“ approximately 50 times (i.e., the relative frequency of „heads“ is then close to the probability of „heads“), this does not have to occur.

In this video, it is demonstrated how much the understanding of probability theory suffers from this misconception, how this can be avoided, and what the reality actually is. It also shows how simple the underlying mathematics really is.

Why the Relative Frequency Does Not Have to Approach Probability

The claim that the relative frequency of an event must approach the probability of that event when a random experiment is repeated „many“ times is frequently (and incorrectly) cited as the central statement of the empirical law of large numbers. There are many formulations of this, some more incorrect than others. A completely false formulation can be found (at least, it could be found there), for example, on the pages of the MUED e. V. association (for members):

„This famous law of large numbers states that, in many independent repetitions of a random experiment—whether coin tosses, dice rolls, lottery draws, card games, or anything else—the relative frequency and the probability of an event must always come closer together: The more we toss a fair coin, the closer the proportion of ‚heads‘ approaches its probability of one-half; the more we roll dice, the closer the proportion of sixes approaches the probability of rolling a six; and the more we play the lottery, the closer the relative frequency of drawing the number 13 approaches the probability of 13. There is no disputing this law; in a sense, this law is the crowning achievement of all probability theory.“

Why This Is So Important:

To put it very briefly: If this law were correct in this formulation, there would not exist a single (repeatable) random experiment.

An example: Suppose we toss a coin randomly, so that the outcomes H (heads) and T (tails) are possible. Each outcome is supposed to have a probability of 0.5. Further, suppose we toss the coin 50 times and obtain T every single time. Then the relative frequency of T after the first trial was 1, after the second trial it was still 1, and the same after the third, fourth, and so on. In these first 50 trials, the relative frequency of T did not approach the probability of T at all.

It is often argued that the relative frequency and probability „approach each other in the long run“ or „after very many trials.“ But what is that supposed to mean? Does the coin, if it showed T too often in the first 50 trials, have to show H more frequently until the 100th trial to restore the balance? Or must the relative frequency only approach the probability by the 1000th trial?

No matter how we look at it: If relative frequency and probability must approach each other, the outcome of a coin toss could no longer depend on chance at some point, but would have to follow what the coin showed in the previous trials. In that case, the coin toss would no longer be a random experiment. One may debate what exactly randomness is, but it is certainly part of a random experiment like a coin toss that there is no law dictating what the coin must show.

The situation becomes even worse when, in an introductory course on probability, it is claimed that the probability of an event is the number toward which the relative frequency of the event “tends” if the random experiment is repeated sufficiently often. Apart from the fact that students have no clear understanding of what “tends” or “sufficiently often” means, they also cannot reconcile such a “definition” with their conception of randomness. This approach leads to the absurd idea that someone must tell the coin what to do, or that the coin has a memory and actively seeks to balance relative frequency and probability. As a result, the fundamental concepts of probability theory—namely, randomness and probability—become so contradictory that a student’s understanding of this area of mathematics is effectively impossible.

Unnecessary Statistics

One can read in any book on the introduction to probability theory that the relative frequency of an event does not have to approach the probability of that event, even after an arbitrarily large number of trials. All statistical methods that infer probability from relative frequency would be unnecessary if relative frequency were required to approach probability. The concepts of “convergence in probability” and the weak law of large numbers exist precisely because the relative frequency of an event does not analytically converge to the probability of the event. Why, nevertheless, incorrect mathematics is still taught in German schools is, for me personally, incomprehensible.

What Actually Holds

The weak law of large numbers holds. Applied to the case above, this law, stated in simplified form, means that the relative frequency does not have to approach the probability, but rather that the probability of it being close increases.

More precisely: We can ask how large the probability is that the relative frequency of T lies within a certain interval around the probability of T. Since the probability of T is 0.5, we can, for example, set the interval to (0.4, 0.6). The probability that the relative frequency of T falls within this interval becomes increasingly larger as the number of trials increases.

Let us consider the following random experiment: A box holds one blue and one red ball. One ball is drawn at random, its color is recorded, and the ball is then placed back into the box. After that, another ball is drawn at random, and so on.

If a red ball was drawn on the first trial, the probability of drawing a red ball on the second trial is just as large as on the first trial, namely 0.50.50.5. The same applies to other numbers of trials: for example, if after 99 trials 99 red balls have been drawn, the probability of drawing a red ball on the 100th trial is still 0.50.50.5. And this also holds if 99 blue balls were drawn before, or if any other combination of blue and red balls was obtained.

That means: the probability of drawing 100 red balls is just as large as the probability of drawing any other combination of blue and red balls. Therefore, after 100 trials, the relative frequency of red balls can be equal to 1. In that case, it is as far away as possible from the probability of drawing a red ball and not close to 0.50.50.5. The same reasoning applies for 1,000, for 10,000, and for any other number of trials. Thus, there is no number of trials—no matter how large—for which it must be true that the relative frequency of red balls lies close to 0.50.50.5. Therefore, as the number of trials increases, the relative frequency of red balls does not have to approach the probability of drawing a red ball.

The empirical law of large numbers is not by Bernoulli

It is often claimed that Jakob Bernoulli (1654–1705) was the first to formulate the empirical law of large numbers. However, that is not correct—at least not if one considers the common formulations of this law. Bernoulli did not write that the relative frequencies of an event A settle, for a sufficiently large number nnn of repetitions, at the probability of A. Nor did he write that the relative frequencies of an event A stabilize, as the number of trials increases, at the probability of A. And he also did not write that the relative frequencies of A must approach the probability of A as the number of trials increases.

This is, what Bernoulli actually wrote:

„Main Theorem: Finally, the theorem follows upon which all that has been said is based, but whose proof now is given solely by the application of the lemmas stated above. In order that I may avoid being tedious, I will call those cases in which a certain event can happen successful or fertile cases; and those cases sterile in which the same event cannot happen. Also, I will call those trials successful or fertile in which any of the fertile cases is perceived; and those trials unsuccessful or sterile in which any of the sterile cases is observed. Therefore, let the number of fertile cases to the number of sterile cases be exactly or approximately in the ratio \(r\) to \(s\) and hence the ratio of fertile cases to all the cases will be \(\frac{r}{r+s}\) or \(\frac{r}{t}\), which is within the limits \(\frac{r+1}{t}\) and \(\frac{r-1}{t}\). It must be shown that so
many trials can be run such that it will be more probable than any given times (e.g., \(c\) times) that the number of fertile observations will fall within these limits rather than outside these limits — i.e., it will be \(c\) times more likely than not that the number of fertile observations to the number of all the observations will be in a ratio neither greater than \(\frac{r+1}{t}\) nor less than \(\frac{r-1}{t}\).“ (Bernoulli, James 1713, Ars conjectandi, translated by Bing Sung, 1966)

German translation (slightly different):

„Satz: Es möge sich die Zahl der günstigen Fälle zu der Zahl der ungünstigen Fälle genau oder näherungsweise wie , also zu der Zahl aller Fälle wie \( \frac{r}{r+s}=\frac{r}{t} \) – wenn \( r+s=t \) gesetzt wird – verhalten, welches letztere Verhältniss zwischen den Grenzen \( \frac{r+l}{t} \) und \( \frac{r-l}{t} \) enthalten ist. Nun können, wie zu beweisen ist, soviele Beobachtungen gemacht werden, dass es beliebig oft (z. B. c-mal) wahrscheinlicher wird, dass das Verhältniss der günstigen zu allen angestellten Beobachtungen innerhalb dieser Grenzen liegt als ausserhalb derselben, also weder grösser als \( \frac{r+l}{t} \) , noch kleiner als \( \frac{r-l}{t} \) ist.“ (Bernoulli 1713, S. 104), in der Ausgabe: Wahrscheinlichkeitsrechnung (Ars conjectandi), Dritter und vierter Theil, übersetzt von R. Haussner, Leipzig, Verlag von Wilhom Engelmann, 1899)

What Bernoulli actually wrote is, in fact, correct and comes very close to the weak law of large numbers.

Computationally impossible

There are infinitely many possible situations in which the relative frequency of an event does not approach the probability of that event but instead diverges from it.

An example: Suppose we toss a coin randomly, so that we can obtain the outcomes T (Tails) and H (Heads). Both outcomes are assumed to have a probability of \(0.5\). Now, suppose we have tossed the coin \(100\) times and obtained \(50\) T and \(50\) H. Then the relative frequency of T is \(0.5\). If we toss the coin once more, we will either get T — in which case the relative frequency of T becomes \(0.\overline{5049}\) — or we will get H — in which case the relative frequency of T becomes \(0.\overline{4851}\). In both cases, the relative frequency of T moves away again from the probability of T. If, after obtaining T on the 101st trial, we continue to obtain T on subsequent trials, the relative frequencies of T will be as follows:

\(\approx 0.5098\), \(\approx 0.5146\), \(\approx 0.5192\), \(\approx 0.5283\), usw.

Thus, the relative frequency of T moves farther and farther away from the probability of T, which contradicts the so-called Law of Large Numbers.

Unusual Sequences

The empirical Law of Large Numbers is often justified by saying that unusual outcomes may occur, but they are so unlikely that they practically never happen. For example, in 30 coin tosses, the outcome HHHHHHHHHHHHHHHHHHHHHHHHHHHHHH is said to be very unusual and also unlikely, while the outcome HTTHTHHHTHTTHTTTHTTHTHHTTTTHT is considered much more normal and therefore more likely.

This is not only wrong because both outcomes have exactly the same probability, but also because we humans imagine the unusualness of certain outcomes into the outcomes themselves. In other words, the coin “knows” nothing about unusual results. Let’s look at an example:

Let’s assume we have ten balls labeled with the digits from 0 to 9. The ball labeled 0 is green, and all the others are yellow. We draw ten times at random, with replacement and with order.

If we only pay attention to the colors, we would probably not consider the result

unusual. But if, upon examining the digits, we discover the following sequence

the sample would probably be regarded as unusual.

However, we can set entirely different standards if we wish: we might decide that a sample is to be considered unusual if its sequence of digits appears in the first decimal places of π. The sample

is ordinary, because it does not occur even within the first \(200\) million decimal places of π. The sample

is unusual, because it occurs at position 3,794,572. The sample

even appears at position 851 and is therefore extremely unusual.

Whether unusual or not, the probability of each sample is exactly the same, namely \[\frac{1}{10\,000\,000\,000}\]

What actually holds true

Let’s look at an example: We randomly draw one ball from a box containing two balls. One ball is blue, and the other is red. We draw several times, with replacement and with order.

We will now focus on the numbers of red balls. In the following tables, these numbers are listed depending on the number of trials performed. For example, when drawing eight times, there are 56 outcomes with exactly 3 red balls.

These numbers are represented to scale in the following bar charts. As we can see, with an increasing number of trials, the numbers of outcomes in the middle grow much faster than those at the edges. The more often we perform the experiment, the greater the differences become between the middle and the edges.

That means: There are simply far more outcomes with approximately 50% red balls than there are outcomes with much fewer or much more red balls. And the proportion of outcomes near the center becomes larger and larger as the number of trials increases.

We can observe this phenomenon also with other proportions of red balls in the population: If two-thirds of the balls in the population are red, we see a clustering of outcomes with about two-thirds red balls.

What we see here can be summarized—somewhat simplified, but not incorrectly—by the following statement: The relative frequencies of red balls are, in most outcomes, similar to the probability of drawing a red ball.

In statistical terms, this sounds like this: Most samples are similar to the population.

So if we flip a coin 100 times and get about 50 H, this is not because the relative frequency stabilizes, or because the coin strives for a balance between H and T, or because some dark force influences the fall of the coin — but simply because there are far, far more outcomes that contain about 50 H than outcomes that contain much fewer or much more H.

Weak Law of Large Numbers

The empirical law of large numbers does not exist in real mathematics, because while it may express a certain kind of experience, it does not contain any provable statement. The law from actual mathematics that perhaps comes closest to the empirical law of large numbers is the weak law of large numbers. It deals with the relationship between the relative frequency of an event and the probability of that event when a large number of trials are performed.

This relationship is often incorrectly described as follows: The relative frequency of an event becomes closer and closer to the probability of that event the more often the random experiment is carried out. But that is not true, because that would mean, for example, for repeated coin tossing: if after \(50\) coin tosses we have obtained exactly \(25\) H and exactly \(25\) T, we could not then get \(5\) H in a row, since that would cause the relative frequency of the event H to move away from the value \(50\) %. So, according to that reasoning, some dark force would have to guide our hand to prevent too many H outcomes.

Stated in everyday language (and correctly), the weak law of large numbers says: The probability that the relative frequency of an event lies close to the probability of that event becomes larger and larger as the random experiment is repeated more often (and it even converges to \(1\) if the random experiment is carried out infinitely many times). In technical terms, this phenomenon is called convergence in probability.

Applied to coin tossing, this means: The more frequently we toss the coin, the greater the probability becomes that the relative frequency of H lies close to the probability of H — that is, close to \(50\) % — (and it even converges to \(1\) if the coin is tossed infinitely many times).

In the following PDF, the weak law of large numbers is explained as simply as possible — but not simpler! It is presented in a mathematically correct way, but applied only to the simplest possible case, the coin toss. In addition, it shows how one can visualize this law and what kinds of misinterpretations exist.

Strong Law of Large Numbers

If we toss a coin repeatedly and record the results, we obtain a sequence of outcomes such as

(H, H, T, H, T, T, T, H, …).

The weak law of large numbers makes a statement about such a sequence — namely, how the probability that the relative frequency of, for example, H lies near \(0.5\) develops as the coin toss is repeated infinitely many times. The strong law of large numbers, on the other hand, makes a statement about all possible (infinite) sequences of outcomes. It states that for (in a certain mathematical sense) almost all of these sequences, the relative frequency of, for example, H actually converges to \(0.5\) when the coin toss is repeated infinitely many times. The term almost is a technical term here; it is defined in the context of measure theory.

The following PDF explains the strong law of large numbers. It does not present a simplified or informal version of the law, but the actual mathematical law — though it is explained only for the simplest possible case, the coin toss. All formal necessities, such as the means of centered random variables, are explained in the text. Everything that is not absolutely essential is omitted. In addition, the usual misinterpretations are discussed and corrected.

Stochastic Independence – Visual Explanation

Stochastische Unabhängigkeit

The stochastic independence of two events is defined by a formula that, from a visual or graphical standpoint, does not reveal much. However, to understand it on an intuitive level, we can turn to a visual interpretation. There, we will see that stochastic independence concerns the relationship between two sets. When this relationship is, in a certain sense, harmonious, the events are stochastically independent.

More precisely, the events A and B are stochastically independent exactly when the proportion of A within the sample space is equal to the proportion of the intersection of A and B within B.

This perspective is especially important because schools often teach that stochastic independence requires two random experiments. According to the formula, however, that is not correct, since the events A and B are subsets of the sample space of one and the same random experiment.

There is also a common misconception that events A and B have “nothing to do with each other” or “do not influence each other” when they are stochastically independent. But in principle, events of a random experiment are simply subsets of the sample space; they just exist — they do not, and cannot, actively influence one another.

Conditional Probability – Simple Explanation

Bedingte Wahrscheinlichkeit

In this video, we first look at the formula that defines conditional probability. However, the formula itself is rather bare, so we will also look at a diagram that allows us to visualize conditional probability in a much clearer way. This will lead us to a formulation of conditional probability in “ordinary” words.

For example: Let A and B be events of a random experiment. The probability of A given B — written as P(A|B) — is the portion of A within B.

Alternatively, we can say: The probability of A given B is the proportion of the probability of A relative to the probability of B.

In the situation shown on the right, we are dealing with the random experiment: a single random draw of one figure.

Set A contains all yellow figures, and set B contains all half-circles.

The probability of A given B — that is, P(A|B) — is (in this case) the proportion of yellow half-circles among all half-circles, that is, the proportion of A within B.We have:\[ P(A|B)= \frac{1}{2} \]

There are many persistent misconceptions about the concept of conditional probability. For example, it is often claimed that two random experiments (or at least two actions within one random experiment) are required in order to define conditional probabilities. According to this idea, “the probability of A given B” is said to be the probability that A occurs when B has already occurred — or, in other words, the probability that A occurs on the second trial if it is already known that B occurred on the first trial.

This incorrect notion makes it impossible to solve certain problems involving conditional probability. For example: A container holds two black and two white balls. Two draws are made without replacement. The question is: What is the probability that a black ball is drawn first, given that a white ball is drawn second?

If one holds the belief that only the first draw can serve as a condition for the second draw, one will likely conclude that the problem is formulated incorrectly. However, if the sample space is written out with all possible pairs of draws and the problem is treated as shown in the video, this apparent contradiction will (certainly) be resolved.

Miscellaneous

The Standard Model of School Mathematics

We can think of numbers as line segments on the number line. But we can also arrange them vertically, which allows us to see many relationships that are not visible with “horizontal” numbers. For example, the multiplication of fractions or exponentiation with rational exponents can be very clearly understood using this model.

Position Vector – Definition

A position vector is not a vector that is located at a certain place; rather, a position vector is a vector that is defined by a location. Locations are points in the coordinate system. Once a location is fixed, there is exactly one arrow that leads from the origin of the coordinate system to that point. This arrow represents exactly one vector — namely, the position vector defined by that point. In the video, we will also look at this idea graphically.

This video is especially suitable for people who value a mathematically precise yet visually intuitive definition of the position vector. For that reason, it focuses specifically on how one can understand that, although the position vector is defined by a location and there is only one arrow that leads from the origin to that point, this vector itself is nevertheless not bound to that location.

Anyone who has completed the marathon of searching the internet for an exact and contradiction-free definition of the position vector will be greatly relieved by the end of the video.

Surface Area of a Sphere – Cosmetics and Nanoparticles

Kugeloberfläche – Kosmetik und Nanopartikel

Let a sphere be given. If we divide the volume of this sphere into several smaller spheres, we find that the sum of the surface areas of the smaller spheres is greater than the surface area of the original large sphere. The smaller the spheres are, the larger the total surface area becomes.

In everyday life, extremely small “spheres” sometimes occur. For example, C₆₀ fullerenes are incorporated into certain cosmetic products. C₆₀ fullerenes are molecules that resemble small soccer balls. Among other reasons, because these molecules have a very large surface area per unit volume, they are highly chemically reactive. This also means that—if they have undesirable side effects such as toxicity—these side effects can be very strong.

In this video, we will examine mathematically how the total surface area increases as the size of the spheres decreases. By the way, this same issue can also be found in many areas of everyday life—for instance, in fine particulate matter (fine dust) that we inhale.

Mathematical Induction

Vollständige Induktion

Mathematical induction is a proof method that is (usually) used to show that a statement holds true for every natural number. It consists of two steps:

  1. You show that the statement is true for a first number—usually for 1.
  2. You show that if the statement is true for a certain number, then it is also true for the following number. From this, you conclude that the statement holds true for all natural numbers.

In the video, you will see how to understand this method in an intuitive, visual way, because—in a sense—we apply this method, for example, whenever we walk somewhere. After that, you can see several examples in which statements are proven using the method of mathematical induction.

Help! My Child Can’t Do Math!

Hilfe! Mein Kind kann kein Mathe!

Even when it feels as if nothing is working anymore, there are still ways to deal with the situation—either well or poorly.
Unfortunately, I often see cases where, for example, a grade of F in math is turned into a much bigger problem, to the point where the child is even declared ill and must then be treated by a dyscalculia therapist.
A respectful attitude and an optimistic mindset, however, can allow the situation to remain what it actually is: a single number written on a test paper.