es

To enhance pedagogical effectiveness, the treatment of the dot product is presented in several distinct sections

Geometrical interpretation

Duality

Orthogonality

Projection

Solvability

We denote by 𝔽 one of the following four fields: ℤ, a set of integers, ℚ, a set of rational numbers, ℝ, a set of real numbers, and ℂ, a set of complex numbers. However, in this section, we mostly use only one of them----the set of real numbers mostly because the definition of length or norm definition involves only ℝ (its extension for ℂ is discussed later in inner product section).

This section is devoted to one of the most important operations in all of linear algebra---dot product. Many operations and algorithms involve dot product, including convolution, correlation, matrix multiplication, duality, the Fourier transform, signal filtering, and many others.

Dot Product

We met many times in previous sections a special linear combination of numerical vectors. For instance, a linear equation in n unknowns
\begin{equation} \label{EqDot.1} a_1 x_1 + a_2 x_2 + \cdots + a_n x_n = b , \end{equation}
which we prefer to write in succinct form ax = b, where a = (𝑎₁, 𝑎₂, … , 𝑎n), x = (x₁, x₂, … , xn), and b = (b₁, b₂, … , bn) are numerical vectors from 𝔽n. Another widely used application of this peculiar linear combination is observed in multiplications of matrices.
The dot product of two lists (or arrays) of the same size \( {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \) and \( {\bf y} = \left[ y_1 , y_2 , \ldots , y_n \right] \) is the following expression, denoted by xy, \begin{equation} \label{EqDot.2} {\bf x} \bullet {\bf y} = x_1 y_1 + x_2 y_2 + \cdots + x_n y_n , \end{equation} subject that all multiplications in (2) make sense and their sum is justified. Expression \eqref{EqDot.2} is naturally called the scalar product when it is referred to numerical vectors with entries from the field of scalars 𝔽.

Remark 1:    Although textbooks on linear algebra define dot product for vectors of the same vector space (mostly because it leads to fruitfully theory and geometric applications), our definition extends the dot product for vectors from different vector spaces, but of the same dimension and over the same field 𝔽 of scalars. Importance of this definition stems from practical applications; for instance, in calculus you learn that the line integral involves definition of the dot product for vector field F with infinitesimal dr:

\[ \int_C \mathbf{F} \bullet {\text d}\mathbf{r} \]
for some path C. Another famous example constitutes definition of the Laplacian operator via dot product of two gradient operators:
\[ \Delta = \nabla \bullet \nabla = \left( \frac{\partial}{\partial x_1} , \frac{\partial}{\partial x_2} , \ldots , \frac{\partial}{\partial x_n} \right) \bullet \left( \frac{\partial}{\partial x_1} , \frac{\partial}{\partial x_2} , \ldots , \frac{\partial}{\partial x_n} \right) \]
   
Example 1: Suppose we have a vector a = (3,2,1) and a list of three matrix entries b = (A₁, A₂, A₃), where A₁, A₂, and A₃ are some matrices of the same dimensions. Then their dot product is the matrix \[ \mathbf{a} \bullet \mathbf{b} = 3\,\mathbf{A}_1 + 2\,\mathbf{A}_2 + \mathbf{A}_3 . \] As it is seen from the relation above, this expression leads to the definition of dot product with values in the set of matrices.

Let us make a numerical experiment by choosing the following matrices: \[ \mathbf{A}_1 = \begin{bmatrix} 1& 2.1 \\ -3& 2.2 \\ -3& -1.5 \end{bmatrix}, \quad \mathbf{A}_2 = \begin{bmatrix} -4& 1.3 \\ 1& 2.6 \\ 5& -3.1 \end{bmatrix}, \quad \mathbf{A}_3 = \begin{bmatrix} 2& 1.7 \\ 2& 6.2 \\ 8& 3.9 \end{bmatrix} . \] Then their dot product will be \[ \mathbf{a} \bullet \mathbf{b} = \begin{bmatrix} -3.& 10.6 \\ -5.& 18. \\ 9.& -6.8 \end{bmatrix} . \]

with(LinearAlgebra): A1 := Matrix([[1, 2.1], [-3, 2.2], [-3, -1.5]]); A2 := Matrix([[-4, 1.3], [1, 2.6], [5, -3.1]]); A3 := Matrix([[2, 1.7], [2, 6.2], [8, 3.9]]); a := Vector([3, 2, 1]); b := [A1, A2, A3]; result := a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
$$\dd{Matrix(3, 2, [[-3.0, 10.6], [-5.0, 18.0], [9.0, -6.799999999999999]])} $$
A1 = {{1, 2.1}, {-3, 2.2}, {-3, -1.5}}; A2 = {{-4, 1.3}, {1, 2.6}, {5, -3.1}}; A3 = {{2, 1.7}, {2, 6.2}, {8, 3.9}}; a = {3, 2, 1}; b = {A1, A2, A3}; a . b
{{-3., 10.6}, {-5., 18.}, {9., -6.8}}
   ■
End of Example 1

Remark 2:    In applications, numerical vectors are usually associated with measurements and so inherit units. For instance, integer 5 is associated with 5 millions of dollars in Wall street offices, the same number is considered as a 5 dollars bill by a bank's clerk, but mechanical engineer may look at it as 5 centimeters, and computer science folks consider this information as 5 GB. Only mathematicians see in 5 an integer or number without any unit. Therefore, vectors and scalars in Linear Algebra are not related to any specific unit measurements. Now we all appreciate the beauty of mathematical language because we enter our particular information into a computer---this device recognizes only electric pulses as on or off---there is no room for any unit. When Joseph Fourier (1768--1830) introduced the Fourier transform in 1822

\[ \hat{f}(\xi ) = \int_{\mathbb{R}^n} f(x)\, e^{-{\bf j} 2\pi\,{\bf x}\bullet \xi} {\text d}^n \mathbf{x} , \qquad {\bf j}^2 = -1, \]
he used the dot product, xξ = (x₁, x₂, … , xn) • (ξ₁, ξ₂, … , ξn), with two n-dimensional vectors of different units (say if x measures time, then ξ corresponds to frequency because their dot product should be dimensionless---otherwise the exponential term makes no sense).    
Example 2: Numerical vectors (i.e., vectors from 𝔽n) that are used in the definition of dot product, Eq.(2), may have distinct units depending on applications. We give two examples from mechanics.

If v represents a displacement (e.g., it has SI units in meters) and f represents a force (e.g., with units in Newtons), then fv represents a work (in Newton-meters or Joules). Therefore, the force has units of "Joules per meter".

If v represents a velocity (e.g., in meters per second) and φ represents momentum (e.g., in kg/s), then φ(v) = mv represents kinetic energy (in kg²/s² or Joules). Therefore, the dual vector (momentum) has units of "kg*m/s".    ■

End of Example 2

Remark 3:    Recall that two vector spaces V and U are isomorphic (denoted VU) if there is a bijective linear map between them. This bijection (which is one-to-one and onto mapping) can be achieved by considering ordered bases α = [ a₁, a₂, … , an ] and β = [ b₁, b₂, … , bn ] in these vector spaces V and U, respectively. Then components of every vector with respect to a chosen ordered basis can be identified uniquely with an n-tuple. Therefore, the algebraic formula \eqref{EqDot.2} is essentially applied to two isomorphic copies of the Cartesian product 𝔽n. Geometric interpretation of the dot product, which is coordinate independent and therefore conveys invariant properties of these products, is given in the Euclidean space section.

Note:    The definition of the dot product does not restrict of applying it to two distinct isomorphic versions of the direct product 𝔽n ≅ 𝔽n×1 ≅ 𝔽1×n. It is the basic computational building-block from which many operations and algorithms are built. So you can find the dot product of a row vector with a column vector. However, we try to avoid writing it as matrix multiplication,

\[ \left[ x_1 , x_2 , \ldots , x_n \right] \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \left[ {\bf x} \bullet {\bf y} \right] \in \mathbb{F}^{1 \times 1} \cong \mathbb{F} , \]
because the right-hand side is a 1×1 matrix that a computer solver always treats differently from a scalar. However, the dot product can be applied to row vectors and column vectors:
\[ \left[ x_1 , x_2 , \ldots , x_n \right] \bullet \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} \bullet \left[ x_1 , x_2 , \ldots , x_n \right] \in \mathbb{F} . \quad    ▣ \]

When evaluating dot product, Maple does not distinguish rows from columns till some extend. Dot product can be accomplished with two Maple commands: with(LinearAlgebra): a := <1, 2, 3> b := Vector[row]([3, 2, 1] c := Vector([1, -1, 1]); result := DotProduct(a, b); \[ result := 10 \] DotProduct(b, a) \[ 10 \] a . b \[ \begin{bmatrix} 3 & 2 & 1 \\ 6&4& 2 \\ 9&6&3 \end{bmatrix} \] However, b . a \[ 10 \] DotProduct(b, c) \[ 2 \] DotProduct(c, b) \[ 2 \] b . c \[ 2 \] c . c \[ 3 \] But c . b \[ \begin{bmatrix} 3&2&1 \\ -3&-2&-1 \\ 3&2&1 \end{bmatrix} \] As you see, when you specify one vector as a column and another as a row, the order matters when you "dot product" (not Maple comamnd DotProduct): ???????

a := Matrix([[1], [2], [3]]); b := Matrix([[3, 2, 1]]); result := DotProduct(a, b);
$$\dd{DotProduct\left( \begin{bmatrix}1\\ 2\\ 3 \end{bmatrix} , \ \begin{bmatrix} 3& 2& 1 \end{bmatrix} \right)}$$
Maple gets stuck. However, when you change the order of components, the output is correct:
with(LinearAlgebra): b := Matrix([[3, 2, 1]]); # 1×3 row vector a := Matrix([[1], [2], [3]]); # 3×1 column vector result := Multiply(b, a); # scalar (1×1 matrix)
$$\dd{result := [10]}$$

Josiah Gibbs
The term "dot product" was first introduced by the American physicist and mathematician Josiah Willard Gibbs (1839--1903) in the 1880s. Initially, the scalar product appeared in a pamphlet distributed to his students at Yale University. Gibbs's pamphlet was eventually incorporated into a book entitled Vector Analysis that was published in 1901 and coauthored with one of his students.

One of the main and fruitful applications of the dot product is observed when scalar product involves numerical vectors from 𝔽n or their isomorphic copies. Upon introducing an ordered basis α = [e₁, e₂, … , en] in a finite dimensional vector space V, every its vector v = ce₁ + ce₂ + ⋯ + cnen is uniquely identified with the corresponding coordinate vector ⟦vα = (c₁, c₂, … , cn) ∈ 𝔽n.    

Example 3: Two vectors of length two dotted together looks like: \[ \begin{pmatrix} 2 \\ -3 \end{pmatrix} \bullet \begin{pmatrix} 5 \\ 4 \end{pmatrix} = 2 \cdot 5 + (-3) \cdot 4 = -2 . \] with(LinearAlgebra)
a := Vector([2, -3]); b := Vector[row]([5, 4]); result := DotProduct(a, b); # Returns -2
\[ result := -2 \] with(LinearAlgebra)
result := DotProduct(b, a);
\[ result := -2 \] You can also evaluate the dot product as multiplication: b . a \[ -2 \]
Dot[{2, -3}, {5, 4}]
-2

Calculate the dot product of two three dimensional vectors a = (3, 2, 1) and b = (4, −5, 2).

Solution: Using the component formula (1) for the dot product of three-dimensional vectors \[ \mathbf{a} \bullet \mathbf{b} = a_1 b_1 + a_2 b_2 + a_3 b_3 , \] we calculate the dot product to be \[ \mathbf{a} \bullet \mathbf{b} = 3 \cdot 4 - 2 \cdot 5 + 1 \cdot 2 = 4. \]

with(LinearAlgebra): a := Vector([3, 2, 1]); b := Vector([4, -5, 2]); result := DotProduct(a, b); # Returns 4
$$\dd{3 \cdot 4 + 2 \cdot (-5) + 1 \cdot 2 = 4}$$
a = {3, 2, 1}; b = {4, -5, 2}; a . b
Dot[a, b]
4

Maple uses the command DotProduct for evaluation dot product

$$\dd{x = -1,\quad x = -2} \tag{1} \label{eq:roots}$$
a.b
Dot[a , b]
However, vectors for dot product must be entered into Mathematica notebook as n-tuples from 𝔽n, but not in matrix form as row vectors or column vectors. So the following commands will not be executed.
a = {{3, 2, 1}}; b = {{4, −5, 2}};
a.b
Dot products have incompatible shapes
a = {{3}, {2}, {1}}; b = {{4}, {−5}, {2}};
a.b
Dot products have incompatible shapes
   ■
End of Example 3

Not every curvilinear system of coordinates supports dot product, as the following example shows.    

Example 4: Let us consider a plane ℝ² equipped with polar coordinates. Then every point P is uniquely identified with a polar pair (r, θ), where r is the distance P from a reference point O (known as pole, which is usually the origin) and angle θ formed by the line OP and polar axis, which is usually abscissa. For two points P₁(r₁, θ₁) and P₂(r₂, θ₂), you cannot form the dot product \[ P_1 \bullet P_2 = r_1 r_2 + \theta_1 \theta_2 \qquad \mbox{is wrong} \] because its components have different units: distances are measured in meters (in SI system) and angles have no units (in SI).

Definition of dot product in polar coordinates is presented in section "Dot product in coordinate systems" and Example 23.    ■

End of Example 4

 

Properties of dot product


The dot product is not defined for vectors of different dimensions. It does not matter whether vectors are columns or rows or n-tuples. so you can evaluate dot product of row vectors with column vectors---they must be from the vector spaces over the same field. Therefore, this definition is valid not only for n-tuples (elements from 𝔽n), but also for column vectors and row vectors.

The basic properties (1--4) of the dot product are valid for vectors from the same vector space, but the last one involves compatible vector dimensions. In presented properties, u, v, and w are finite dimensional vectors, and λ is a number (scalar):

Theorem 1: Let u, v, w be vectors of the same finite size and λ be a scalar. Then the following properties hold:
  1. uu > 0 and uu = 0 if and only if u = 0.
  2. uv = vu                   (commutative law);
  3. (u + v) • w = uw + vw         (distributive law);
  4. (λ u) • v = λ (uv) = u • (λ v)     (associative law);
  5. for any two column vectors u ∈ ℝn×1, v ∈ ℝm×1, and matrix A ∈ ℝm×n, the following equation holds:
    vAu = ATvu,     where AT (A′) is the transpose of matrix A.
    A similar relation holds for row vectors: uvA = uATv.

  1. This property is trivial because \[ \mathbf{u} \bullet \mathbf{u} = u_1^2 + u_2^2 + \cdots + u_n^2 > 0 \] unless all components of vector u are zeroes.
  2. Applying the definition of dot product to u · v and v · u, we obtain \begin{align*} \mathbf{u} \bullet \mathbf{v} &= u_1 v_1 + u_2 v_2 + \cdots + u_n v_n \\ \mathbf{v} \bullet \mathbf{u} &= v_1 u_1 + v_2 u_2 + \cdots + v_n u_n \end{align*} Since product of two numbers from field 𝔽 is commutative, we conclude that u · v = v · u.
  3. Since every finite dimensional vector space is isomorphic to 𝔽n, we can assume that these vectors u, v, and w belong to the direct product 𝔽n. Then \[ \mathbf{u} + \mathbf{v} = \left( u_1 , \ldots , u_n \right) + \left( v_1 , \ldots , v_n \right) = \left( u_1 + v_1 , \ldots , u_n + v_n \right) . \] Taking the dot product with w, we get \begin{align*} \left( \mathbf{u} + \mathbf{v} \right) \bullet \mathbf{w} &= \left( u_1 + v_1 , u_2 + v_2 , \ldots , u_n + v_n \right)\left( w_1 , \ldots , w_n \right) \\ &= u_1 w_1 + v_1 w_1 + \cdots u_n w_n + v_n w_n \\ &= \mathbf{u} \bullet \mathbf{w} + \mathbf{v} \bullet \mathbf{w} . \end{align*}
  4. The left-hand side is \[ \left( \lambda\,\mathbf{u} \right) \bullet \mathbf{v} = \lambda\,u_1 v_1 + \cdots + \lambda u_n v_n = \lambda \left( u_1 v_1 + \cdots + u_n v_n \right) , \] which equal to the right-hand side λ (uv).
  5. For matrix A = [𝑎i,j] ∈ ℝm×n, we have \[ \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} = \sum_{i=1}^m v_i \left( \mathbf{A}\,\mathbf{u} \right)_i \] where the i-th component of A u is \[ \left( \mathbf{A}\,\mathbf{u} \right)_i = \sum_{j=1}^n a_{i.j} u_j . \] Changing the order of summation, we get \begin{align*} \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} &= \sum_{i=1}^m v_i \sum_{j=1}^n a_{i.j} u_j \\ &= \sum_{j=1}^n \sum_{i=1}^m v_i a_{i.j} u_j = \sum_{j=1}^n u_j \left( \mathbf{A}^{\mathrm T} \mathbf{v} \right)_j , \end{align*} which is u • (ATv.
    The distributive property claims that a dot product can be broken into the sum of two dot products, by representing one of the vectors as the sum of two vectors.

    Note that the associative law for scalar product (vu) • wv • (uw) is not valid in general; see the following example.    

Example 5: Let’s see some quick examples to make sure that all properties are clearly true.
  1. Any nonzero vector will work; for instance, v = (3, −2, 1) ∈ ℝ³. Then \[ \mathbf{v} \bullet \mathbf{v} = 3^2 + (-2)^2 + 1^2 = 9+4+1 = 14 > 0. \]
          {3, -2, 1} . {3, -2, 1}
          14
  2. Commutativity holds because the dot product is implemented element- wise, and each element-wise multiplication is simply the product of two scalars. Scalar multiplication is commutative, and therefore the dot product is commutative.

    Let \[ \mathbf{v} = \left( 1, 2, 3 \right) , \quad \mathbf{u} = \left( 4, -6, 5 \right) \in \mathbb{R}^3 . \] Then their scalar product is 7, independently of the order of multiplication,as Mathematica confirms:

          v = {1, 2, 3}; u = {4, -6, 5}; v.u
          7
          Dot[u, v]
          7
  3. Suppose we need to find a dot product of two numerical vectors vu, one of which has large entries. For instance, \[ \mathbf{v} = \begin{pmatrix} 3791 \\ -5688 \\ 2894 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 2 \\ 4 \end{pmatrix} . \] Scalar product of these numerical vectors involves large unpleasant multiplications and summation. Using distributive property, we break vector v into sum of four vectors: \[ \mathbf{v} = \mathbf{v}_1 + \mathbf{v}_2 + \mathbf{v}_3 + \mathbf{v}_4 , \] where \[ \mathbf{v}_1 = \begin{pmatrix} 1 \\ -8 \\ 4 \end{pmatrix} , \ \mathbf{v}_2 = \begin{pmatrix} 90 \\ -80 \\ 90 \end{pmatrix} , \ \mathbf{v}_3 = \begin{pmatrix} 700 \\ -600 \\ 800 \end{pmatrix} , \ \mathbf{v}_4 = \begin{pmatrix} 3000 \\ -5000 \\ 2000 \end{pmatrix} . \] The corresponding four scalar products are not tedious to find:
          u = {3,2,4}; v1 = {1,-8,4}; v2 = {90, -80, 90}; v3 = {700,-600,800}; v4 = {3000,-5000,2000}; d1 = u.v1
          3
          d2 = u.v2
          470
          d3 = u.v3
           4100
          d4 = u.v4
           7000
    Adding these four numbers, we get the required dot product: \begin{align*} \mathbf{v} \bullet \mathbf{u} &= \left( \mathbf{v}_1 + \mathbf{v}_2 + \mathbf{v}_3 + \mathbf{v}_4 \right) \bullet \mathbf{u} = \mathbf{v}_1 \bullet \mathbf{u} + \mathbf{v}_2 \bullet \mathbf{u} + \mathbf{v}_3 \bullet \mathbf{u} + \mathbf{v}_4 \bullet \mathbf{u} \\ &= 3+470+4100+7000 = 11573 . \end{align*}
          d1+d2+d3+d4
          11573
  4. We set λ = 3.1415926, v = (236, -718), u = (892, 435). Without computer assistance determination of corresponding dot products will be time consuming. So we ask Mathematica for help and find that \[ \left( \lambda\mathbf{u} \right) \bullet \mathbf{v} = \lambda \left( \mathbf{u} \bullet \mathbf{v} \right) = \mathbf{u} \bullet \left( \lambda \mathbf{v} \right) \approx 319871. \]
          la =3.1415926; v = {236,-718}; u = {892, 435}; (la*u) . v
          319871.
          u . (la*v)
          319871.
          la*(v . u)
          319871.
  5. Let us take a singular matrix and two 3-column vectors: \[ \mathbf{A} = \begin{bmatrix} 1&2&3 \\ 4&5&6 \\ 7&8&9 \end{bmatrix} , \quad \mathbf{u} = \begin{pmatrix} 35 \\ -11 \\ 17 \end{pmatrix} , \quad \mathbf{v} = \begin{pmatrix} 23 \\ 97 \\ 41 \end{pmatrix} . \]
          A = {{1, 2, 3}, {4, 5, 6}, {7,8 ,9}};
    u = {35,-11,17}; v = {23,97,41};
    We set vectors v and u as 3-tuples (elements of ℝ³) but not column vectors (elements of ℝ3×1) because Mathematica is smart to understand that we want column vectors.
           u = {{35}, {-11}, {17}}; v = {{23}, {97}, {41}};
          Dot[u, A . v]
           25049
    and
          Dot[Transpose[A] . u, v]
           25049
    Now we check the same property for row vectors:
           Dot[u . A, v]
           25049
           Dot[v . Transpose[A], u]
           25049

    Rectangular matrix:    we consider 3-by-2 matrix \[ \mathbf{A} = \begin{bmatrix} 1&2 \\ 3&4 \\ 5&6 \end{bmatrix} \] and two column vectors \[ \mathbf{u} = \begin{pmatrix} a \\ b \end{pmatrix}, \qquad \mathbf{v} = \begin{pmatrix} c \\ d \\ e \end{pmatrix} . \] Then \[ \mathbf{A}\,\mathbf{u} = \begin{bmatrix} 1&2 \\ 3&4 \\ 5&6 \end{bmatrix}\begin{pmatrix} a \\ b \end{pmatrix} = \begin{pmatrix} 1\cdot a + 2 \cdot b \\ 3 \cdot a + 4 \cdot b \\ 5 \cdot a + 6 \cdot b \end{pmatrix} . \] Its dot product with v becomes \[ \mathbf{v} \bullet \mathbf{A}\,\mathbf{u} = c \left( a + 2\,b \right) + d \left( 3\,a + 4\, b \right) + e \left( 5\,a + 6\, b \right) . \tag{A} \] On the other hand, \[ \mathbf{A}^{\mathrm T} \mathbf{v} = \begin{bmatrix} 1 & 3 & 5 \\ 2&4&6 \end{bmatrix} \begin{pmatrix} c \\ d \\ e \end{pmatrix} = \begin{pmatrix} 1\cdot c + 3\cdot d + 5\cdot e \\ 2 \cdot c + 4 \cdot d + 6 \cdot e \end{pmatrix} . \] Then its dot product with u becomes \[ \mathbf{A}^{\mathrm T} \mathbf{v} \bullet \mathbf{u} = a \left( c + 3\, d + 5\, e \right) + b \left( 2\,c + 4\, d + 6\, e \right) . \tag{B} \] Do you need a computer assistance to check that expression (A) is the same as (B) ? I don't need it.

  6. Extra warning:    We demonstrate that the identity   v • (uw) = (vu) • w is not valid for arbitrary vectors v, u, and w. It is true only when v is a scalar multiple of w. Indeed, let k = uw and c = vu. Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} \quad \iff \quad \mathbf{v} k = c \mathbf{w} . \] from the latter, it follows that vectors v and w are collinear.

    We choose three vectors (and write them in column form): \[ \mathbf{v} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 4 \end{pmatrix} , \quad \mathbf{w} = \begin{pmatrix} 5 \\ 6 \end{pmatrix} . \] Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \begin{pmatrix} 1 \\ 2 \end{pmatrix} \cdot 39 = \begin{pmatrix} 39 \\ 78 \end{pmatrix} \]

          v = {1, 2}; u = {3, 4}; w = {5, 6};
          u . w
          39
          v*39
          {39, 78}
    and \[ \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} = 11 \cdot \begin{pmatrix} 5 \\ 6 \end{pmatrix} = \begin{pmatrix} 55 \\ 66 \end{pmatrix} . \]
          Dot[v, u]
           11
          11*w
          {55, 66}
    However, for some vectors, we observe v • (uw)) = (vu) • w. Let us pick up three vectors \[ \mathbf{v} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \quad \mathbf{u} = \begin{pmatrix} 3 \\ 5 \end{pmatrix} , \quad \mathbf{w} = \begin{pmatrix} 2 \\ 4 \end{pmatrix} . \] Then \[ \mathbf{v} \bullet \left( \mathbf{u} \bullet \mathbf{w} \right) = \mathbf{v} \,26 = 26 \begin{pmatrix} 1 \\ 2 \end{pmatrix} , \]
           v = {1, 2}; u = {3,5}; w = {2,4}; u.w
           26
    and \[ \left( \mathbf{v} \bullet \mathbf{u} \right) \bullet \mathbf{w} = 13 \mathbf{w} = 13 \begin{pmatrix} 2 \\ 4 \end{pmatrix} . \]
           v.u
           13
    Since \[ 26 \begin{pmatrix} 1 \\ 2 \end{pmatrix} = 13 \begin{pmatrix} 2 \\ 4 \end{pmatrix} , \] we conclude that this identity is valid for collinear vectors v, and w, and arbitrary u.
   ■
End of Example 5
Theorem 2 (Cauchy inequality): For any two real numerical vectors v and u of the same finite dimension, the following inequality holds:

\begin{equation} \label{EqDot.3} \left( \mathbf{u} \bullet \mathbf{v} \right)^2 \leqslant \left( \mathbf{u} \bullet \mathbf{u} \right) \left( \mathbf{v} \bullet \mathbf{v} \right) . \end{equation} Equality holds in Eq.\eqref{EqDot.3} if and only if u and v are linearly dependent, i.e., u = λv for some scalar λ.

There are known many proofs of Cauchy's inequality (see. for instance, Marcus, M. & Minc). Here is one of them.

It is convenient to introduce the following notation:     ∥v∥² = vv. Positive square root of this quantity is call the norm in mathematics. Then Cauchy inequality can be rewritten as \[ \left\vert {\bf u} \bullet {\bf v} \right\vert \le \| {\bf u} \| \cdot \| {\bf v} \| , \] Suppose first that either u or v is zero. Then their dot product is zero and the Cauchy inequality holds.

Now suppose that neither u nor v is zero. It follows that ∥u∥ > 0 and ∥v∥ > 0 because the dot product xx > 0 for any nonzero vector x. We have \begin{align*} 0 &\le \left( \frac{{\bf u}}{\| {\bf u} \|} + \frac{{\bf v}}{\| {\bf v} \|} \right) \bullet \left( \frac{{\bf u}}{\| {\bf u} \|} + \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf u}}{\| {\bf u} \|} \right) + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \left( \frac{{\bf v}}{\| {\bf v} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \left( {\bf u} \bullet {\bf u} \right) + \frac{2}{\| {\bf u} \| \cdot \| {\bf v} \|} \left( {\bf u} \bullet {\bf v} \right) + \frac{1}{\| {\bf v} \|^2} \left( {\bf v} \bullet {\bf v} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \, \| {\bf u} \|^2 + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \frac{1}{\| {\bf v} \|^2} \, \| {\bf v} \|^2 \\ &= 1 + 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + 1 \end{align*} Hence,     −∥u∥ · ∥v∥ ≤ uv. Similarly, \begin{align*} 0 &\le \left( \frac{{\bf u}}{\| {\bf u} \|} - \frac{{\bf v}}{\| {\bf v} \|} \right) \bullet \left( \frac{{\bf u}}{\| {\bf u} \|} - \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf u}}{\| {\bf u} \|} \right) - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \left( \frac{{\bf v}}{\| {\bf v} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \left( {\bf u} \bullet {\bf u} \right) - \frac{2}{\| {\bf u} \| \cdot \| {\bf v} \|} \left( {\bf u} \bullet {\bf v} \right) + \frac{1}{\| {\bf v} \|^2} \left( {\bf v} \bullet {\bf v} \right) \\ &= \frac{1}{\| {\bf u} \|^2} \, \| {\bf u} \|^2 - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + \frac{1}{\| {\bf v} \|^2} \, \| {\bf v} \|^2 \\ &= 1 - 2 \left( \frac{{\bf u}}{\| {\bf u} \|} \bullet \frac{{\bf v}}{\| {\bf v} \|} \right) + 1 \end{align*} Therefore,     uv ≤ ∥u∥ · ∥v∥. By combining the two inequalities, we obtain the Cauchy inequality.

   
Since Cauchy's inequality depends on an integer n (size of numerical vectors), we are going to prove it by mathematical induction.

The case n = 1 is trivially true. When n = 2, Cauchy’s inequality just says \[ \left( a_1 b_1 + a_2 b_2 \right)^2 \leqslant \left( a_1^2 + a_2^2 \right) \left( b_1^2 + b_2^2 \right) . \] Expanding both sides to find the equivalent inequality \[ 0 \leqslant \left( a_1 b_2 \right)^2 - 2 \left( a_1 b_2 a_2 b_1 \right) + \left(a_2 b_1 \right)^2 . \] From the well-known factorization x² − 2xy + y² = ( x − y)² one finds \[ 0 \leqslant \left( a_1 b_2 - a_2 b_1 \right)^2 , \] and the nonnegativity of this term confirms the truth of Cauchy's inequality for n = 2.

Now that we have proved a nontrivial case of Cauchy’s inequality, we are ready to look at the induction step. If we let H(n) stand for the hypothesis that Cauchy’s inequality is valid for n, we need to show that H(2) and H(n) imply H(n + 1). With this plan in mind, we do not need long to think of first applying the hypothesis H(n) and then using H(2) to stitch together the two remaining pieces. Specifically, we have \begin{align*} & \quad a_1 b_1 + a_2 b_2 + \cdots + a_n b_n + a_{n+1} b_{n+1} \\ &= \left( a_1 b_1 + a_2 b_2 + \cdots + a_n b_n \right) + a_{n+1} b_{n+1} \\ & \leqslant \left( a_1^2 + a_2^2 + \cdots + a_n^2 \right)^{1/2} \left( b_1^2 + b_2^2 + \cdots + b_n^2 \right)^{1/2} + a_{n+1} b_{n+1} , \\ & \leqslant \left( a_1^2 + a_2^2 + \cdots + a_n^2 + a_{n+1}^2 \right)^{1/2} \left( b_1^2 + b_2^2 + \cdots + b_n^2 + b_{n+1}^2 \right)^{1/2} \end{align*} where in the first inequality, we used the inductive hypothesis H(n) and in the second inequality we used H(2) in the form \[ \alpha \beta + a_{n+1} b_{n+1} \leqslant \left( \alpha^2 + a_{n+1}^2 \right)^{1/2} \left( \beta^2 + b_{n+1}^2 \right)^{1/2} , \] with the new variables \[ \alpha = \left( a_1^2 + a_2^2 + \cdots + a_n^2 \right)^{1/2} , \quad \beta = \left( b_1^2 + b_2^2 + \cdots + b_n^2 \right) . \]

   
Example 6: First, we generate randomly two vectors of size 4:
   u = RandomInteger[{-9, 9}, 4]
   {9, 8, -2, 5}
   v = RandomInteger[{-9, 9}, 4]
   {-1, 2, -1, 6}
\[ \mathbf{u} = \left( 9, 8, -2, 5 \right) , \qquad \mathbf{v} = \left( -1, 2, -1, 6 \right) , \] Their dot product is
   Dot[u, v]
   39
Square norms of these two vectors are
   Dot[u,u]
   174
   Dot[v,v]
   42
Since the left-hand side of Eq.(3) is
   Dot[u,v]^2
   1521
but the right-hand side is
   174*42
   7308
These results confirm the Cauchy inequality for these two randomly chosen vectors.

Now we set vector u to be u = 2.71828 v. Using Mathematica, we repeat all previous calculations for these two vectors.

   u = 2.71828*v; Dot[u, v]^2
   13034.3
The square norm of u is
   Dot[u, u]
   310.34
Its product with ∥v∥² is
   Dot[u, u] * 42
   13034.3
We check it with the previous dot product squared and get
   Dot[u, u] * 42 == Dot[u,v]^2
   True

Equality:    We consider two vectors in ℝ²: \[ \mathbf{v} = \left( 1 , 2 \right) , \qquad \mathbf{u} = \left( 2 , 4 \right) \] They are linearly dependent because u = 2v. Their scalar product is \[ \left( 1 , 2 \right) \bullet \left( 2 , 4 \right) = 1 \cdot 2 + 2 \cdot 4 = 10. \] I hope that you can square this number without a computer assistance. Norms squared of these vectors are \[ \| \mathbf{v} \|^2 = 1^2 + 2^2 = 5 , \qquad \| \mathbf{u} \|^2 = 2^2 + 4^2 = 4 + 16 = 20 . \] Their product becomes \[ \| \mathbf{v} \|^2 \cdot \| \mathbf{u} \|^2 = 5 \cdot 20 = 100 = 10^2 = \left( \mathbf{v} \bullet \mathbf{u} \right)^2 . \]    ■

End of Example 6

Important Note:    Cauchy's inequality is not valid for complex vector spaces as the following example shows for two 2-vectors u = (1, j) and v = (1, −j), where j is the imaginary unit vector of the complex plane ℂ so j² = −1.

\[ \mathbf{u} \bullet \mathbf{u} = 1^2 + \mathbf{j}^2 = 0 ,\quad \mathbf{v} \bullet \mathbf{v} = 1^2 + \left( -\mathbf{j} \right)^2 = 0, \quad \mathbf{u} \bullet \mathbf{v} = 1^2 - \mathbf{j}^2 = 2 . \]
u = {1, I}; v = {1, -I}; u . u
0
v . v
0
Dot[u, v]
2

The inequality \eqref{EqDot.3} is also referred to as the Cauchy--Schwarz or Cauchy--Bunyakovsky--Schwarz or usually as CBS inequality.

The inequality \eqref{EqDot.3} was first proved by the French mathematician, engineer, and physicist baron Augustin-Louis Cauchy in 1821. In 1859, Victor Bunyakovsky extended this inequality to the case of infinite summation, that is, he established the integral version of the Cauchy inequality. Contribution of Hermann Schwarz (1843--1921) to the Cauchy inequality is unknown neither to me nor to AI except that he married a daughter of the famous mathematician Ernst Eduard Kummer. In 1888, about 30 years after Bunyakovsky's publication, Hermann presented a proof similar to Bunyakovsky's one,

         
 Augustin-Louis Cauchy    Victor Yakovlevich Bunyakovsky    Hermann Amandus Schwarz

The first step toward the Bunyakovsky result is to establish inequality

\begin{equation} \label{EqDot.4} \left( \sum_{k=1}^{\infty} a_k b_k \right)^2 \leqslant \left( \sum_{i=1}^{\infty} a_i^2 \right) \left( \sum_{i=1}^{\infty} b_i^2 \right) \end{equation}
subject that
\begin{equation} \label{EqDot.5} \sum_{k=1}^{\infty} a_k^2 < \infty , \quad \sum_{i=1}^{\infty} b_i^2 < \infty \qquad \Longrightarrow \qquad \sum_{k=1}^{\infty} a_k b_k < \infty . \end{equation}
In order to verify the latter, we consider a familiar factorization
\[ 0 \leqslant \left( x - y \right)^2 = x^2 - 2xy + y^2 , \]
from which we observe the bound
\[ xy \leqslant \frac{1}{2}\, x^2 + \frac{1}{2}\, y^2 \qquad\mbox{for all real } x, y . \]
Now, when we apply this inequality to x = 𝑎k and y = bk and then sum over all k, we find the interesting additive inequality
\begin{equation} \label{EqDot.6} \sum_{k=1}^{\infty} a_k b_k \leqslant \frac{1}{2} \sum_{i=1}^{\infty} a_i^2 + \frac{1}{2} \sum_{i=1}^{\infty} b_i^2 . \end{equation}
A positive definite matrix with real entries is a symmetric matrix where all of its eigenvalues are strictly positive. This means that when the matrix is applied to any non-zero vector, it results in a vector that is scaled and not reflected or reduced to zero. In simpler terms, it represents a transformation that only stretches vectors and doesn't flip them or collapse them to a single point.
Theorem 3: If x and y are n-column vectors (so they belong to ℝn×1) and A is an n × n positive definite matrix, then \[ \left( \mathbf{x} \bullet \mathbf{y} \right)^2 \leqslant \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) . \]
Since A is positive definite, there exists a nonsingular matrix T such that A = T ′ T, where T ′ = TT is transpose matrix.Then its inverse is A−1 = T−1(T ′)−1. Defining u = Tx and v = (T ′)−1y, we find that \begin{align*} \left( \mathbf{x} \bullet \mathbf{y} \right)^2 &= \left( \mathbf{x} \bullet {\bf T}' \,{\bf T}'^{-1}\mathbf{y} \right)^2 = \left( {\bf T}\,\mathbf{x} \bullet {\bf T}'^{-1}\mathbf{y} \right)^2 \\ &= \left( \mathbf{u} \bullet \mathbf{v} \right)^2 \leqslant \left( \mathbf{u} \bullet \mathbf{u} \right) \left( \mathbf{v} \bullet \mathbf{v} \right) \\ &= \left( {\bf T}\,\mathbf{x} \bullet {\bf T}\,\mathbf{x} \right) \left( {\bf T}'^{-1} \mathbf{y} \bullet {\bf T}'^{-1} \mathbf{y} \right) = \left( \mathbf{x} \bullet {\bf T}'\,{\bf T}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf T}^{-1} {\bf T}'^{=1} \mathbf{y} \right) \\ &= \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1} \mathbf{y} \right) . \end{align*} We have equality if and only if one of the vectors u = Tx and v = (T ′)−1y is a scalar multiple of the other.
   
Example 7: First, we generate randomly two vectors of size 4:
   x = RandomInteger[{-9, 9}, 4]
   {-4, 3, 9, 5}
   y = RandomInteger[{-9, 9}, 4]
   {-3, 6, 0, -5}
\[ \mathbf{x} = \left( -4, 3, 9, 5 \right) , \qquad \mathbf{y} = \left( -3, 6, 0, -5 \right) , \] Their squared dot product is
   Dot[x, y]^2
   25
Let A be the following defective, but not positive definite matrix \[ \mathbf{A} = \begin{bmatrix} -257& -308& -260& 372 \\ 140& 169& 142& -206 \\ 191& 230& 193& -278 \\ 71& 86& 72& -105 \end{bmatrix} . \] Its inverse is \[ \mathbf{A}^{-1} = \begin{bmatrix} 111& 364& 116& -628 \\ -60& -199& -62& 342 \\ -83& -270& -87& 466 \\ -31& -102& -32& 175 \end{bmatrix} . \]
   A = {{-257, -308, -260, 372}, {140, 169, 142, -206}, {191, 230, 193, -278}, {71, 86, 72, -105}}; Inverse[A]
   {{111, 364, 116, -628}, {-60, -199, -62, 342}, {-83, -270, -87, 466}, {-31, -102, -32, 175}}
Its characteristic polynomial is det(λIA) = (λ² − 1)². Dot products with respect to matrix A of these two vectors are
   Dot[x, A.x]
   5031
   Dot[y, Inverse[A].y]
   24347
Using Theorem 3, we obtain inequality \[ 25 = \left( \mathbf{x} \bullet \mathbf{y} \right)^2 < \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) = 5031 \cdot 24347. \] If we choose y = kx, the inequality becomed equality.

Another matrix:    We repeat all previous calculations with another not positive definite matrix \[ \mathbf{A} = \begin{bmatrix} -249& -292& -276& 412 \\ 136& 161& 150& -226 \\ 185& 218& 205& -308 \\ 69& 82& 76& -115 \end{bmatrix} . \] Its characteristic polynomial is det(λIA) = (λ − 1)³(λ + 1). The corresponding dot products are

   Dot[x, A . x]
   4059
   Dot[y, Inverse[A].y]
   -25297
So we observe a violation of Theorem 3 because the right-hand side is negative: \[ 25 = \left( \mathbf{x} \bullet \mathbf{y} \right)^2 <??? \left( \mathbf{x} \bullet {\bf A}\,\mathbf{x} \right) \left( \mathbf{y} \bullet {\bf A}^{-1}\mathbf{y} \right) = 4059 \cdot \left( -25297 \right) < 0 . \]    ■
End of Example 7

The following assessment provides a matrix version of Cauchy's inequality. Recall that |·| = det(·) denotes the determinant of a square matrix.

Theorem 4: Suppose that both A and B are m-by-n matrices with real entries, so A, B ∈ ℝm×n. Then \[ \left\vert \mathbf{A}^{\mathrm T}\mathbf{B} \right\vert^2 \leqslant \left\vert \mathbf{A}^{\mathrm T} \mathbf{A} \right\vert \cdot \left\vert \mathbf{B}^{\mathrm T} \mathbf{B} \right\vert , \] with equality if and only if rank(A) < n or rank(B) < n, or B = A C for some nonsingular matrix C.
Clearly the inequality holds when |A ′B| = 0 and, in this case, equality holds if and only if rank(A) < n or rank(B) < n.

For the remainder of the proof we assume |A ′B| ≠ 0. Using the singular value decomposition we can write A and B as A = PDQ₁ and B = PDQ₂, where the m × n matrix Pi and n × n matrix Qi satisfy PiPi = QiQi = In, and Di is an n × n diagonal matrix with positive diagonal elements. It then follows that \[ \left\vert \mathbf{A}^{\mathrm T}\mathbf{B} \right\vert^2 = \left\vert \mathbf{A} \mathbf{Q}'_1 \mathbf{D}_1 \mathbf{P}'_1 \mathbf{P}_2 \mathbf{D}_2 \mathbf{Q}_2 \right\vert^2 = \left\vert \mathbf{D}_1 \right\vert^2 \left\vert \mathbf{D}_2 \right\vert^2 \left\vert \mathbf{P}'_1 \mathbf{P}_2 \right\vert^2 , \] while |A ′A| = |D₁|² and |B ′B| = |D₂|². Hence the conclusion of Theorem 4 follows directly from |P₁′P₁| ≤ 1. Also we have equality if and only if PP₁′ = PP₂′ , and since this is equivalent to A and B having the same column space; the proof is complete.

   
Example 8: We use Mathematica to generate pseudorandom two matrices.
   matA = RandomInteger[{-6, 6}, {6, 6}]
   {{4, 5, -2, 1, 5, -5}, {3, 3, 5, 6, -2, -6}, {-6, -2, -6, -6, 3, 0}, {0, -3, -3, -1, 6, 2}, {-1, 2, -5, -6, 0, 3}, {-5, 4, -3, 1, 6, 5}}
\[ \mathbf{A} = \begin{bmatrix} 4& 5& -2& 1& 5& -5 \\ 3& 3& 5& 6& -2& -6 \\ -6& -2& -6& -6& 3& 0 \\ 0& -3& -3& -1& 6& 2 \\ -1& 2& -5& -6& 0& 3 \\ -5& 4& -3& 1& 6& 5 \end{bmatrix} , \]
   matB = RandomInteger[{-7, 7}, {6, 6}]
   {{-5, 3, -1, 1, 4, 0}, {-6, 3, -1, -1, -3, -3}, {-6, -2, 6, 2, -6, 5}, {-7, -6, -5, -5, -5, 4}, {7, 4, -1, -5, -5, 1}, {0, 5, 6, 0, -7, -7}} 5}}
\[ \mathbf{B} = \begin{bmatrix} -5& 3& -1& 1& 4& 0 \\ -6& 3& -1& -1& -3& -3 \\ -6& -2& 6& 2& -6& 5 \\ -7& -6& -5& -5& -5& 4 \\ 7& 4& -1& -5& -5& 1 \\ 0& 5& 6& 0& -7& -7 \end{bmatrix} , \] Now we calculate determinants:
   Det[Transpose[matA] . matB]^2
   3547645567098071
   Det[Transpose[matA] . matA] * Det[Transpose[matB] . matB]
   354764556709807104
Subtracting these two numbers, we get
   % - 3547645567098071
   351216911142709033
Since this number is positive, we conclude that Theorem 4 is valid for our 6×6 matrices.    ■
End of Example 8
Otto Hölder
A famous German mathematician Ludwig Otto Hölder (1859--1937) is credited for establishing the following inequality in 1889, which now bears his name. However, this inequality was first found by Leonard James Rogers (1888).

Its proof is based on Young’s inequality

\[ ab \leqslant \frac{a^p}{p} + \frac{b^q}{q} . \]
Theorem 5 (Hölder's inequality): For two vectors u and v in ℝn, and for any p, q > 1 such that \( \displaystyle \quad \frac{1}{p} + \frac{1}{q} = 1 , \quad \) the inequality \[ \sum_{i=1}^n \left\vert u_i \, v_i \right\vert \leqslant \left( \sum_{i=1}^n \left\vert u_i \right\vert^p \right)^{1/p} \left( \sum_{i=1}^n \left\vert v_i \right\vert^q \right)^{1/q} \] holds, with equality if and only if one of the vectors is a scalar multiple of the other.
Let us introduce the following notations: \[ x_i = a_i^p , \quad y_i = b_i^q , \quad \alpha = \frac{1}{p} , . \] Since b = y1−α, we have \begin{align*} \sum_{i=1}^n a_i b_i &= \sum_{i=1}^n x_i^{\alpha} y_i^{1 - \alpha} \leqslant \left( \sum_{i=1}^n x_i \right)^{\alpha} \left( \sum_{i=1}^n y_i \right)^{1-\alpha} \\ &= \left( \sum_{i=1}^n a_i^p \right)^{1/p} \left( \sum_{i=1}^n b_i^q \right)^{1/q} . \end{align*}
   
Example 9: To verify Hölder's inequality, we generate randomly two vectors of length 9:
u = RandomInteger[{-8, 8}, {1, 9}]
{{0, -5, 2, 8, -6, 7, -2, -6, -2}}
v = RandomInteger[{-8, 8}, {1, 9}]
{{-8, -6, 2, 4, -7, 2, -1, 1, 5}}
\begin{align*} \mathbf{u} &= \begin{pmatrix} 0& -5& 2& 8& -6& 7& -2& -6& -2 \end{pmatrix} , \\ \mathbf{v} &= \begin{pmatrix} -8& -6& 2& 4& -7& 2& -1& 1& 5 \end{pmatrix} . \end{align*} Observe that Mathematica represents these vectors as row matrices, so u and v ∈ ℝ1×9. Their dot product is \[ \mathbf{u} \bullet \mathbf{v} = 108 . \]
Dot[v . Transpose[u]]
{{108}}
For p = 2, we get the Euclidean norms: \[ \| \mathbf{u} \|_2 = \sqrt{222} , \qquad \| \mathbf{v} \|_2 = 10\sqrt{2} \]
Norm[u]
Sqrt[222]
Norm[v]
10 Sqrt[2]
and their product becomes \[ \| \mathbf{u} \|_2 \cdot \| \mathbf{v} \|_2 \approx 210.713 , \] which proves Cauchy's inequality (= Hölder's inequality with p = 2).
N[Norm[u]*Norm[v]]
210.713
For arbitrary p, say p = 4, we have \[ \| \mathbf{u} \|_{4} \cdot \| \mathbf{v} \|_{4/3} \approx 220.24 . \] We check whether row vectors u and v are vectors for Mathematica:
VectorQ[u]
False
VectorQ[v]
False
So Maple treats these vectors as matrices. We are forced to redefine these vectors as 9-tuples
u = {0, -5, 2, 8, -6, 7, -2, -6, -2}; v = {-8, -6, 2, 4, -7, 2, -1, 1, 5};
Now we ask Mathematica to calculate the product of norms:
N[Norm[u, 4]*Norm[v, 4/3]]
220.24
If we exchange vectors u and v, Hölder's inequality remains valid but with different numbers: \[ 108 = \mathbf{u} \bullet \mathbf{v} \leqslant \| \mathbf{v} \|_{4} \cdot \| \mathbf{u} \|_{4/3} \approx 226.995 . \]
N[Norm[v, 4]*Norm[u, 4/3]]
226.995
   ■
End of Example 9
Corollary 1: Suppose that A and B are square nonnegative definite matrices and α is a scalar satisfying 0 < α < 1. Then \[ \left\vert \mathbf{A} \right\vert^{\alpha} \left\vert \mathbf{B} \right\vert^{1-\alpha} \leqslant \left\vert \alpha\,\mathbf{A} + \left( 1 - \alpha \right) \mathbf{B} \right\vert , \] with equality if and only if A = B or αA + (1 − α)B is singular.
Since αA + (1 − α)B is also nonnegative definite, the Hölder's inequality for matrices formulated in Corollary 1 clearly holds when A or B is singular, with equality if and only if αA + (1 − α)B is also singular. For the remainder of the proof, we assume that both A and B are positive definite.
Using decomposition Theorem:
Let A and B be m × m symmetric matrices with B being positive definite. Let Λ = diag(λ₁, λ₂, … , λm), where λ₁, λ₂, … , λm are the eigenvalues of B−1A. Then a nonsingular matrix C exists, such that \[ \mathbf{C}\,\mathbf{A}\,\mathbf{C}^{\mathrm T} = \Lambda , \quad \mathbf{C}\,\mathbf{B}\,\mathbf{C}^{\mathrm T} = \mathbf{I}_m . \]

we can write A = TΛTt and B = TTt, where T is a nonsingular matrix, Λ = diag(λ₁, λ₂, … , λn), and λ₁, λ₂, … , λn are the eigenvalues of B−1A, and Tt is transpose matrix. Thus, the proof will be complete if we can show that \[ \left\vert \Lambda \right\vert^{\alpha} = \prod_{i=1}^n \lambda_i^{\alpha} \leqslant \left\vert \alpha \Lambda + \left( 1 - \alpha \right) \mathbf{I}_n \right\vert = \prod_{i=1}^n \left( \alpha \lambda_i + 1 - \alpha \right) , \] with equality if and only if Λ = In. This result is easily confirmed by showing the function g(λ) = αλ + 1 − α − λα is minimized at λ = 1 when 0 ≤ α ≤ 1.

   
Example 10: Using Mathematica, we randomly generate two matrices.
T = RandomInteger[{-7, 7}, {5, 5}]
{{5, 5, 6, 0, -7}, {-2, 5, 1, -7, 6}, {-3, -2, -3, 2, 3}, {-1, 6, 7, 3, 5}, {4, 4, 1, 3, 3}}
A = Transpose[T] . T
{{55, 31, 34, 17, -49}, {31, 106, 87, -9, 31}, {34, 87, 96, 11, -7}, {17, -9, 11, 71, -12}, {-49, 31, -7, -12, 128}}
\[ \mathbf{A} = \begin{bmatrix} 55& 31& 34& 17& -49 \\ 31& 106& 87& -9& 31 \\ 34& 87& 96& 11& -7 \\ 17& -9& 11& 71& -12 \\ -49& 31& -7& -12& 128 \end{bmatrix} ; \]
T = RandomInteger[{-7, 8}, {5, 5}]
{{-1, 1, 2, -6, 5}, {4, 7, 6, 7, 7}, {7, -2, -5, -7, -7}, {1, 7, 6, 5, -1}, {7, -6, -2, 1, 2}}
B = Transpose[T] . T
{{116, -22, -21, -3, -13}, {-22, 139, 108, 86, 49}, {-21, 108, 105, 93, 77}, {-3, 86, 93, 160, 65}, {-13, 49, 77, 65, 128}}
\[ \mathbf{B} = \begin{bmatrix} 116& -22& -21& -3& -13 \\ -22& 139& 108& 86& 49 \\ -21& 108& 105& 93& 77 \\ -3& 86& 93& 160& 65 \\ -13& 49& 77& 65& 128 \end{bmatrix} . \] Both symmetric matrices are positive definite because their eigenvalues are all positive:
N[Eigenvalues[A]]
{202.75, 161.15, 69.4332, 21.6113, 1.05561}
N[Eigenvalues[B]]
{378.699, 117.667, 86.0813, 61.9575, 3.59505}
Next, we calculate their determinants:
Det[A]
51753636
Det[B]
854392900
\[ \det\left( \mathbf{A} \right) = 51753636 , \qquad \det\left( \mathbf{B} \right) = 854392900 . \] Choosing α = ¼', we calculate the matrix C = ¼A + ¾B:
mat = (1/4)*A + (3/4)*B
{{403/4, -(35/4), -(29/4), 2, -22}, {-(35/4), 523/4, 411/4, 249/4, 89/ 2}, {-(29/4), 411/4, 411/4, 145/2, 56}, {2, 249/4, 145/2, 551/4, 183/4}, {-22, 89/2, 56, 183/4, 128}}
\[ \mathbf{C} = \frac{1}{4}\,\mathbf{A} + \frac{3}{4}\,\mathbf{B} = \begin{bmatrix} 403/4& -35/4& -29/4& 2& -22 \\ -35/4& 523/4& 411/4& 249/4& 89/2 \\ -29/4& 411/4& 411/4& 145/2& 56 \\ 2& 249/4& 145/2& 551/4& 183/4 \\ -22& 89/2& 56& 183/4& 128 \end{bmatrix} . \] Its determinant is \[ \det\left( \mathbf{C} \right) \approx 2.11706 \times 10^9 . \]
Det[mat]
541968030603/256
N[%]
2.11706*10^9
The left-hand side of the formula is \[ \left\vert \mathbf{A} \right\vert^{1/4} \cdot \left\vert \mathbf{B} \right\vert^{3/4} \approx 4.23866 \times 10^8 . \]
N[Det[A]^(1/4) * Det[B]^(3/4)]
4.23866*10^8
   ■
End of Example 10

 

Dot Product and Linear Transformations


The multiplication of scalars is an operation formally described by a mapping
\[ \mathbb{R}^2 \ni (x, y) \mapsto x\,y \in \mathbb{R} , \]
which is not linear. However, if we fix one variable say y = u, then the map xu x : ℝ ⇾ ℝ is linear. Hence, this multiplication gives rise to two families of linear maps, xu x, yv y : ℝ ⇾ ℝ. For this reason, we say that the dot product ℝ² = ℝ × ℝ ↦ ℝ is bilinear. The graph of this function is a surface containing two families of lines.
Plot3D[x*y, {x, -5, 5}, {y, -5, 5}]
Figure 1: The Dot Product in Plane Geometry

The fundamental significance of the dot product is that it is a linear transformation of vectors in each multiplier. This means that the function f(v) = uv is a linear functional for any fixed vector u. Then scalar product can be defined as a bilinear form:

\[ \left. \begin{array} {ccc} U \times V & \rightarrow & \mathbb{F} \\ (\mathbf{u}, \mathbf{v}) & \mapsto & \mathbf{u} \bullet \mathbf{v} \in \mathbb{F} \end{array} \right\} \qquad \mathbf{u} \in U, \ \mathbf{v} \in V . \]
Hence, the dot product is bilinear and this can be applied to arbitrary linear combinations, so that
\[ \left( \sum_i \alpha_i \mathbf{x}_i \right) \bullet \left( \sum_j \beta_j \mathbf{y}_i \right) = \sum_{i,j} \alpha_i \beta_j \left( \mathbf{x}_i \bullet \mathbf{y}_j \right) , \qquad \forall \alpha_i , \beta_j \in \mathbb{R} . \]
   
Example 11: We consider two (pseudo)randomly generated vectors from ℝ³: \[ \mathbf{a} = \begin{pmatrix} 0.116624 \\ 0.364083 \\ 0.571789 \end{pmatrix} , \quad \mathbf{b} = \begin{pmatrix} 0.864995 \\ 0.163004 \\ 0.717815 \end{pmatrix} . \]
a = RandomReal[{0, 1}, 3]
{0.116624, 0.364083, 0.571789}
b = RandomReal[{0, 1}, 3]
{0.864995, 0.163004, 0.717815}
Each of these vectors is a linear combination of standard unit vectors, \[ \mathbf{e}_1 = \mathbf{i} = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} , \quad \mathbf{e}_2 = \mathbf{j} = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} , \quad \mathbf{e}_3 = \mathbf{k} = \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} . \] So \[ \mathbf{a} = 0.116624 \mathbf{i} + 0.364083 \mathbf{j} + 0.571789\mathbf{k} \] and \[ \mathbf{b} = 0.864995 \mathbf{i} + 0.163004 \mathbf{j} + 0.717815 \mathbf{k} . \] Since dot products of standard unit vectors are well known because they form orthonormal system, we can find the dot product of these vectors as linear combination: \begin{align*} \mathbf{a} \bullet \mathbf{b} &= \left( 0.116624 \mathbf{i} + 0.364083 \mathbf{j} + 0.571789\mathbf{k} \right) \\ &\bullet \left( 0.864995 \mathbf{i} + 0.163004 \mathbf{j} + 0.717815 \mathbf{k} \right) \\ &= \left( 0.116624 \right) \cdot \left( 0.864995 \right) \mathbf{i} \bullet \mathbf{i} + \left( 0.116624 \right) \cdot \left( 0.163004 \right) \mathbf{i} \bullet \mathbf{j} \\ & \quad + \left( 0.116624 \right) \cdot \left( 0.717815 \right) \mathbf{i} \bullet \mathbf{k} + \left( 0.364083 \right) \cdot \left( 0.864995 \right) \mathbf{j} \bullet \mathbf{i} \\ & \quad + \left( 0.364083 \right) \cdot \left( 0.163004 \right) \mathbf{j} \bullet \mathbf{j} + \left( 0.364083 \right) \cdot \left( 0.717815 \right) \mathbf{j} \bullet \mathbf{k} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.864995 \right) \mathbf{k} \bullet \mathbf{i} + \left( 0.571789 \right) \cdot \left( 0.163004 \right) \mathbf{k} \bullet \mathbf{j} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.717815 \right) \mathbf{k} \bullet \mathbf{k} \\ &= \left( 0.116624 \right) \cdot \left( 0.864995 \right) \mathbf{i} \bullet \mathbf{i} \\ & \quad + \left( 0.364083 \right) \cdot \left( 0.163004 \right) \mathbf{j} \bullet \mathbf{j} \\ & \quad + \left( 0.571789 \right) \cdot \left( 0.717815 \right) \mathbf{k} \bullet \mathbf{k} \\ &= 0.570664 . \end{align*} because \[ \mathbf{i} \bullet \mathbf{j} = \mathbf{i} \bullet \mathbf{k} = \mathbf{k} \bullet \mathbf{j} = 0 \] and \[ \mathbf{i} \bullet \mathbf{i} = \mathbf{j} \bullet \mathbf{j} = \mathbf{k} \bullet \mathbf{k} = 1 . \]
a . b
0.570664
   ■
End of Example 11

 

Metric via Dot Product


In mathematics, a metric space is a set where a notion of distance between any two elements (usually called points) is defined, satisfying specific axioms: non-negativity, identity of indiscernibles, symmetry, and the triangle inequality. Essentially, it's a set with a defined distance function that allows us to measure how "far apart" any two elements are.

A vector space, by definition, has no metric inside it, which is very desirable property. It turns out that the scalar product can be used to define length or distance between vectors transferring ℝn into a metric space, known as the Euclidean space. In order to distinguish metric space from vector space with metric, mathematicians call the magnitude or length of vector v by norm and denote it as ∥v∥.    

Example 13: Let us start with a triangle ΔABC in the Euclidean plane, with vertices A, B, and C. Suppose we know the lengths of sides \( \displaystyle \quad a = \left\vert BC \right\vert , \ b = \left\vert AC \right\vert , \ c = \left\vert AB \right\vert . \quad \) Suppose that we have to determine the angles (or their cosines) of this triangle. Call α the angle at vertex A (respectively, β, γ for the other angles). We know that the length of the orthogonal projection of a segment is shrunk by a factor equal to the cosine of the angle between the two directions.
line = Graphics[{Purple, Thickness[0.01], Line[{{0, 0}, {2.7, 0}, {1, 1.2}, {0, 0}}]}] perp = Graphics[{Blue, Dashed, Thick, Line[{{1, 1.2}, {1, 0}}]}] txt = Graphics[{Black, Text[Style["\[Alpha]", FontSize -> 18, Bold], {0.25, 0.15}], Text[Style["\[Beta]", FontSize -> 18, Bold], {2.23, 0.15}], Text[Style["A", FontSize -> 18, Bold], {-0.1, -0.2}], Text[Style["a", FontSize -> 18, Bold], {2.1, 0.6}], Text[Style["b", FontSize -> 18, Bold], {0.3, 0.6}], Text[Style["a cos\[Beta]", FontSize -> 18, Bold], {1.8, -0.2}], Text[Style["b cos\[Alpha]", FontSize -> 18, Bold], {0.5, -0.2}], Text[Style["B", FontSize -> 18, Bold], {2.71, -0.2}], Text[Style["C", FontSize -> 18, Bold], {1.0, 1.33}]}]; Show[txt, perp, line]
Triangle

A first relation between the angles α and β is found if we project the sides AC and BC onto AB \[ c = b\,\cos\alpha + a\,\cos\beta . \] Projecting similarly onto the other sides, we are led to two further equations (which can be obtained from the first one by the use of circular permutations). Here is the system \[ \begin{cases} b\,\cos\alpha + a\,\cos\beta \phantom{+b\,\cos\gamma} &= c , \\ \phantom{b\,\cos\alpha} c \,\cos\beta + b\,\cos\gamma &= a , \\ c\,\cos\alpha + \phantom{c\.\cos\beta} + a\,\cos\gamma &= b . \end{cases} \] It is linear in the three variables x₁ = cosα, x₂ = cosβ, x₃ = cosγ. Let us solve this system by Gaussian elimination: \begin{align*} \left( \begin{array}{ccc|l} b&a&0&c \\ 0&c&b&a \\ c&0&a&b \end{array} \right) &\sim \left( \begin{array}{ccc|l} b&a&0&c \\ 0&c&b&a \\ 0& -\frac{c}{b}\,a & a& b - \frac{c}{b}\,c \end{array} \right) \\ &\sim \left( \begin{array}{ccc|l} b&a&0&c \\ 0&c&b&a \\ 0&0& 2a & b - \frac{c^2}{b} + \frac{a^2}{b} \end{array} \right) \end{align*} This yields

Solve[{ b*Cos[al] + a*Cos[be] == c, c*Cos[be] + b*Cos[ga] == a, c*Cos[al] + a*Cos[ga] == b}, {Cos[al], Cos[be], Cos[ga]}]
{{Cos[al] -> -((a^2 - b^2 - c^2)/(2 b c)), Cos[be] -> -((-a^2 + b^2 - c^2)/(2 a c)), Cos[ga] -> -((-a^2 - b^2 + c^2)/(2 a b))}}
\[ 2a\,\cos\gamma = \frac{a^2 + b^2 - c^2}{b} , \qquad \cos\gamma = \frac{a^2 + b^2 - c^2}{2ab} ,
\] as well as two similar expressions for the other angles. We have obtained the law of cosines \[ c^2 = a^2 + b^2 - 2ab\,\cos\gamma . \] In particular, we get the Pythagorean theorem for right triangle: \[ c^2 = a^2 + b^2 \qquad \iff \qquad \cos\gamma = 0 . \]    ■
End of Example 13

With standard basis in ℝn

\[ \mathbf{e}_1 = \left( 1, 0, 0, \ldots , 0 \right) , \quad \mathbf{e}_2 = \left( 0, 1, 0, \ldots , 0 \right) , \quad \ldots , \quad \mathbf{e}_n = \left( 0, \ldots , 0, 1 \right) , \]
every vector is uniquely represented as linear combination of these basis vectors
\[ \mathbf{v} = v_1 \mathbf{e}_1 + v_2 \mathbf{e}_2 + \cdots + v_n \mathbf{e}_n . \]
The Euclidean metric on the vector space ℝn is defined through norm (or length)
\begin{equation} \label{EqDot.7} \| \mathbf{u} \| = + \sqrt{\mathbf{u} \bullet \mathbf{u}} = + \sqrt{u_1^2 + u_2^2 + \cdots + u_n^2} , \end{equation}
where "plus" upfront of the square root indicates that only positive root is chosen out of two branches. Here u = ue₁ + ue₂ + ⋯ + unen is the expansion of vector u with respect to the standard (ordered) basis.
In mathematics, norm ∥·∥ is used to define a distance or length between two vectors:
\[ \| \mathbf{u} - \mathbf{v} \| = + \sqrt{\left( \mathbf{u} - \mathbf{v} \right) \bullet \left( \mathbf{u} - \mathbf{v} \right)} = \sqrt{\left( u_1 - v_1 \right)^2 + \cdots + \left( u_n - v_n \right)^2} . \]

In order to convince you that the above is a sensible notion of length, we consider a possible change in length under scalar (λ ∈ ℝ) multiplication of a vector:
\[ \| \lambda \mathbf{v} \| = \sqrt{\left( \lambda \mathbf{v} \right) \bullet \left( \lambda \mathbf{v} \right)} = \sqrt{\lambda^2 \left( \mathbf{v} \bullet \mathbf{v} \right)} = \left\vert \lambda \right\vert \| \mathbf{v} \| . \]
Evidently, if a vector is multiplied with a scalar λ its length scales with the modulus |λ|. This certainly makes intuitive sense and explains why the square root has been included in Eq. (7). (Otherwise, the length would scale with the square of the scalar.)

We summarize properties of Euclidean norm in the following assesment.

Theorem 6: The Euclidean length (norm) ∥·∥ : ℝn ↦ ℝ≥0 defined in Eq. (7) has the following properties, for all v, u ∈ ℝn and all λ ∈ ℝ.
  1. v∥ > 0 for v ≠ 0          (positivity).
  2. ∥λv∥ = |λ| ∥v∥        (scaling).
  3. v + u∥ ≤ ∥v∥ + ∥u∥     (triangle inequality).
  4. | ∥v∥ − ∥u∥ | ≤ ∥vu∥ ≤ ∥v∥ + ∥u∥.
  1. This property follows immediately from the definition \[ \mathbf{v} \bullet \mathbf{v} = v_1^2 + v_2^2 + \cdots + v_n^2 . \] A sum of positive entries cannot be negative, and it is zero only when every term is zero.
  2. It has been shown previously.
  3. By definition, \begin{align*} \| \mathbf{v} - \mathbf{u} \|^2 &= \left( \mathbf{v} - \mathbf{u} \right) \bullet \left( \mathbf{v} - \mathbf{u} \right) \\ &= \mathbf{v} \bullet \mathbf{v} + \mathbf{v} \bullet \mathbf{u} + \mathbf{u} \bullet \mathbf{v} + \mathbf{u} \bullet \mathbf{u} = \| \mathbf{v} \|^2 + \| \mathbf{u} \|^2 + 2\,\mathbf{v} \bullet \mathbf{u} . \end{align*} Using Cauchy's inequality, we get \[ \left\vert \mathbf{v} \bullet \mathbf{u} \right\vert \le \| \mathbf{v} \| \cdot \| \mathbf{u} \| . \] This yields \[ \| \mathbf{v} - \mathbf{u} \|^2 \le \| \mathbf{v} \|^2 + \| \mathbf{u} \|^2 + 2\,\| \mathbf{v} \| \cdot \| \mathbf{u} \| = \left( \| \mathbf{v} \| + \| \mathbf{u} \| \right)^2 . \]
  4. Similarly to the previous proof, \[ \| \mathbf{v} - \mathbf{u} \|^2 = \left( \mathbf{v} - \mathbf{u} \right) \bullet \left( \mathbf{v} - \mathbf{u} \right) = \| \mathbf{v} \|^2 + \| \mathbf{u} \|^2 - 2\left( \mathbf{v} \bullet \mathbf{u} \right) \] Again from Cauchy's inequality, \[ - 2\left( \mathbf{v} \bullet \mathbf{u} \right) \ge 2\,\| \mathbf{v} \| \cdot \| \mathbf{u} \| , \] which leads to \[ \| \mathbf{v} - \mathbf{u} \|^2 \ge \| \mathbf{v} \|^2 + \| \mathbf{u} \|^2 - 2\, \| \mathbf{v} \| \cdot \| \mathbf{u} \| = \left( \| \mathbf{v} \| - \| \mathbf{u} \| \right)^2 . \] The next inequality \( \displaystyle \quad \| \mathbf{v} - \mathbf{u} \| \le \| \mathbf{v} \| + \| \mathbf{u} \| \quad \) follows from the triangle inequality (property 3).
   
Example 14: Mathematica is a very smart CAS, and it can evaluate norms of vectors independently how you write it either as an n-tuple or in matrix form (row or column).
    u = {{-1, 3, -2, 2, 3}}; Norm[u]
    3 Sqrt[3]
    uu = {-1, 3, -2, 2, 3}; Norm[uu]
    3 Sqrt[3]
    uuu = {{-1}, {3}, {-2}, {2}, {3}}; Norm[uuu]
    3 Sqrt[3]
We will use Mathematica for random;y generating vectors. However, its output is always given in matrix form, In order to convert the output in vector form (as an element of 𝔽n), use Flatten command:
     u = Flatten[RandomInteger[{-7, 9}, {1, 6}]]; VectorQ[u]
     True

Using Mathematica, we verify properties included in Theorem 6 with the following examples.

  1. This formulas follows from the identity: \[ \mathbf{v} \bullet \mathbf{v} = v_1^2 + v_2^2 + \cdots + v_n^2 . \] which is zero only when all components of vector v are zeroes.
  2. We generate a vector of size five:
         v = Flatten[RandomReal[{-1, 1}, {1, 5}]]
         {0.371645, -0.811594, 0.591386, -0.037053, 0.758338}
    Upon choosing a real scalar λ = 2.71, we multiply
         2.71*v
         {1.00716, -2.19942, 1.60266, -0.100414, 2.0551}
    Its norm is
         Norm[2.71*v]
         3.55722
    Now we calculate the norm of v and multiply the result by λ = 2.71,
         2.71*Norm[v]
         3.55722
  3. We generate two vectors of length five:
         v = RandomInteger[{-8, 8}, {1, 5}]
         {{-1, -7, 0, 6, 1}}
         u = RandomInteger[{-7, 8}, {1, 5}]
         {{-1, 3, -2, 2, 3}}
    In order to verify the triangle inequality, we evaluate both sides:
         N[Norm[u] + Norm[v]]
         14.5235
         N[Norm[u + v]]
         10.198
    \[ 10.198 \approx \| \mathbf{v} + \mathbf{u} \| < \| \mathbf{v} \| + \| \mathbf{u} \| \approx 14.5235 . \]
  4. We choose randomly two vectors
         v = Flatten[RandomInteger[{-7, 9}, {1, 6}]]
        u = Flatten[RandomInteger[{-7, 9}, {1, 6}]]
         {5, 9, -5, 1, 2, -2}
        {-1, -6, 3, -5, -3, -1}
    Their norms are
         Norm[v]
         2 Sqrt[35]
         Norm[u]
         9
    The difference of norms is
         N[Norm[v] - Norm[u]]
        2.83216
    The norm of their difference is
         N[Norm[v - u]]
         19.6723
    The sum of norms is
         N[Norm[v] + Norm[u]]
         20.8322
    These numbers confirm inequalities in part 4.
   ■
End of Example 14

Scalar multiplication property allows us to define, for any non-zero vector v ∈ ℝn, an associated normalized vector n with unit length given by

\[ \mathbf{n} = \frac{\mathbf{v}}{\| \mathbf{v} \|} \quad \Longrightarrow \quad \| \mathbf{n} \| = \| \mathbf{v} \|^{-1} \| \mathbf{v} \| = 1 . \]
Each component of n is divided by the scalar value ∥v∥ ≥ 0. Therefore, the unit vector n = n(v) represents the direction of vector v. If v = 0, then its length is zero and the corresponding direction vector does not exist. Remember that you must check the value ∥v∥ before dividing to be sure it is greater than your zero divide tolerance. The zero divide tolerance is the absolute value of the smallest number by which you can divide confidently.    
Example 15: Let us consider a vector v = (4, 3) ∈ ℝ². Its Euclidean norm is \( \displaystyle \quad \| \mathbf{v} \| = \sqrt{4^2 + 3^2} = \sqrt{16 + 9} = \sqrt{25} = 5 . \quad \) Then the corresponding direction vector becomes \[ \mathbf{n} = \frac{\mathbf{v}}{\| \mathbf{v} \|} = \frac{(4, 3)}{5} = \left( \frac{4}{5}, \frac{3}{5} \right) = \left( 0.8, 0.6 \right) . \]
n = Graphics[{Blue, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, 0}, {0.8, 0.6}}]}]; v = Graphics[{Purple, Thickness[0.01], Arrowheads[0.1], Arrow[{{0.8, 0.6}, {4, 3}}]}]; ax = Graphics[{Black, Thick, Arrow[{{-0.3, 0}, {4.3, 0}}]}]; ay = Graphics[{Black, Thick, Arrow[{{0, -0.3}, {0, 3.3}}]}]; txt = Graphics[{Text[ Style["n", Blue, FontSize -> 18, Bold], {1, 0.5}], Text[Style["v", Purple, FontSize -> 18, Bold], {4, 2.6}], Text[Style["O", Black, FontSize -> 18, Bold], {-0.2, -0.2}], Text[Style["x-axis", Black, FontSize -> 18, Bold], {4, 0.3}], Text[Style["y-axis", FontSize -> 18, Bold], {0, 3.6}]}]; Show[n, v, ax, ay, txt]
Figure 15.1: Normalizating vector

Since we have only scaled v by a positive amount ∥v∥ = 5, the direction of n is the same as v. There are infinitely many unit vectors. Imagine drawing them all, emanating from the origin. The figure that you will get is a circle of radius one!
with(plots): R := 200: # arrow length N := 36: # number of arrows (change as needed) arrows := [ seq( plots:-arrow( [0,0], [ R*cos(2*Pi*k/N), R*sin(2*Pi*k/N) ], color = green ), k = 0..N-1 ) ]: plots:-display(arrows, view = [-R-50 .. R+50, -R-50 .. R+50], axes = boxed);plots:-display(arrows, view = [-R-50 .. R+50, -R-50 .. R+50], axes = boxed);
with(plots): R := 200: # arrow length N := 36: # number of arrows (change as needed) arrows := [ seq( plots:-arrow( [0,0], [ R*cos(2*Pi*k/N), R*sin(2*Pi*k/N) ], color = green ), k = 0..N-1 ) ]: plots:-display(arrows, view = [-R-50 .. R+50, -R-50 .. R+50], axes = boxed);plots:-display(arrows, view = [-R-50 .. R+50, -R-50 .. R+50], axes = boxed);
Figure 15.2: Unit vectors

   ■
End of Example 15

Upon introducing the norm (meaning length or magnitude) of a vector, \( \displaystyle \quad \| {\bf v} \| = +\sqrt{{\bf v} \bullet {\bf v}} , \quad \) Cauchy's inequality can be written as

\begin{equation} \label{EqDot.8} -1 \leqslant \frac{\mathbf{u} \bullet \mathbf{v}}{\| \mathbf{u} \| \cdot \| \mathbf{v} \|} \leqslant 1 . \end{equation}
   
Example 16: Let us consider a rectangular parallelepiped (also known as rectangular cuboid) with sizes 𝑎, b, and c. We want to compute the angle between the diagonals on two adjacent faces. First, we plot the prizm with two vectors along adjacent diagonals.
edge = Graphics[{Purple, Thick, Line[{{0, 0}, {-0.8485, -0.8485}, {1.2515, -0.8485}, {1.2515, 0.1515}, {2.1, 1}, {0, 1}, {0, 0}}]}]; edge2 = Graphics[{Purple, Thick, Line[{{0, 1}, {-0.8485, -0.1515}, {-0.8485, -0.8485}}]}]; edge3 = Graphics[{Purple, Thick, Line[{{2.1, 1}, {2.1, 0.0}, {1.2515, -0.8485}}]}]; line = Graphics[{Purple, Thick, Line[{{0, 0}, {2.1, 0.0}}]}]; line2 = Graphics[{Purple, Thick, Line[{{-0.8485, 0.1515}, {1.2515, 0.1515}}]}]; ar1 = Graphics[{Blue, Thickness[0.01], Arrow[{{-0.8485, -0.8485}, {0, 1}}]}]; ar2 = Graphics[{Blue, Thickness[0.01], Arrow[{{-0.8485, -0.8485}, {2.1, 0}}]}]; txt = Graphics[{Black, Text[Style["A", FontSize -> 18, Bold], {0.0, 1.12}], Text[Style["B", FontSize -> 18, Bold], {-0.9, -0.98}], Text[Style["C", FontSize -> 18, Bold], {2.2, 0.1}], Text[Style["\[Theta]", FontSize -> 18, Bold], {-0.5, -0.6}], Text[Style["a", FontSize -> 18, Bold], {-0.55, 0.6}], Text[Style["b", FontSize -> 18, Bold], {1.1, 1.12}], Text[Style["c", FontSize -> 18, Bold], {2.2, 0.5}]}]; circ = Graphics[{Red, Thick, Circle[{-0.8485, -0.8485}, 0.57, {0.3, 1.12}]}]; Show[edge, edge2, edge3, line, line2, ar1, ar2, txt, circ]
Figure 16.1: Cuboid

Using points A(0, 0, c), B(𝑎, 0, 0), and C(0, b, 0), we find vectors \[ \mathbf{u} = \vec{BA} = (-a, 0, c) \qquad\mbox{and} \qquad \mathbf{v} = \vec{BC} = (-a, b, 0) . \] Their norms \[ \| \mathbf{u} \| = \sqrt{a^2 + c^2} , \qquad \|\mathbf{v} \| = \sqrt{a^2 + b^2} \] and dot product \[ \mathbf{u} \bullet \mathbf{v} = a^2 \] show that formula (8) can be written as \[ \frac{\mathbf{u} \bullet \mathbf{v}}{\| \mathbf{u} \| \cdot \| \mathbf{v} \|} = \frac{a^2}{\sqrt{a^2 + c^2}\cdot\sqrt{a^2 + b^2}} = \cos\theta \] because the ratio is definitely less than 1. For instance, if 𝑎 =1, b = 2, and c = 3, we get \[ \cos\theta = \frac{1}{\sqrt{5}\cdot \sqrt{10}} \qquad \Longrightarrow \qquad \theta = \mbox{arctan}\left( \frac{1}{5\sqrt{2}} \right) \approx 0.14049 . \]

N[ArcTan[1/(5 Sqrt[2])]]
0.14049
Converting radians into degrees, we multiply the latter by 180/&pi::
N[ArcTan[1/(5 Sqrt[2])]]*180/Pi
8.04947

Another angle problem:    We consider the same cuboid, but now we are going to determine the angle between a diagonal on a face and a diagonal of the cuboid. Upon plotting Figure 16.2, we are after the angle ∠ABD.

edge = Graphics[{Purple, Thick, Line[{{0, 0}, {-0.8485, -0.8485}, {1.2515, -0.8485}, {1.2515, 0.1515}, {2.1, 1}, {0, 1}, {0, 0}}]}]; edge2 = Graphics[{Purple, Thick, Line[{{0, 1}, {-0.8485, -0.1515}, {-0.8485, -0.8485}}]}]; edge3 = Graphics[{Purple, Thick, Line[{{2.1, 1}, {2.1, 0.0}, {1.2515, -0.8485}}]}]; line = Graphics[{Purple, Thick, Line[{{0, 0}, {2.1, 0.0}}]}]; line2 = Graphics[{Purple, Thick, Line[{{-0.8485, 0.1515}, {1.2515, 0.1515}}]}]; ar1 = Graphics[{Blue, Thickness[0.01], Arrow[{{-0.8485, -0.8485}, {0, 1}}]}]; ar2 = Graphics[{Blue, Thickness[0.01], Arrow[{{-0.8485, -0.8485}, {2.1, 1}}]}]; txt = Graphics[{Black, Text[Style["A", FontSize -> 18, Bold], {0.0, 1.12}], Text[Style["B", FontSize -> 18, Bold], {-0.9, -0.98}], Text[Style["D", FontSize -> 18, Bold], {2.2, 1.1}], Text[Style["\[Theta]", FontSize -> 18, Bold], {-0.53, -0.46}], Text[Style["D", FontSize -> 18, Bold], {2.2, 1.1}], Text[Style["\[Theta]", FontSize -> 18, Bold], {-0.53, -0.46}], Text[Style["a", FontSize -> 18, Bold], {-0.55, 0.6}], Text[Style["b", FontSize -> 18, Bold], {1.1, 1.12}], Text[Style["c", FontSize -> 18, Bold], {2.2, 0.5}]}]; circ = Graphics[{Red, Thick, Circle[{-0.8485, -0.8485}, 0.65, {0.59, 1.12}]}]; Show[edge, edge2, edge3, line, line2, ar1, ar2, txt, circ]
Figure 16.2: Cuboid

Since point D has coordinates (0, b, c), we find vector \( \displaystyle \quad \mathbf{v} = \vec{BD} = \left( -a, b, c \right) . \quad \) Then the dot product of vectors u and v becomes \[ \mathbf{u} \bullet \mathbf{v} = \left( -a, 0, c \right) \bullet \left( -a, b, c \right) = a^2 + c^2 . \] Then formula (8) yields \[ \cos\theta = \frac{\mathbf{u} \bullet \mathbf{v}}{\| \mathbf{u} \| \, \| \mathbf{v} \|} = \frac{a^2 + c^2}{\sqrt{a^2 + c^2}\,\sqrt{a^2 + b^2 + c^2}} . \] Taking inverse cosine function, we find the angle to be \[ \theta = \mbox{arccos} \left( \frac{\sqrt{a^2 + c^2}}{\sqrt{a^2 + b^2 + c^2}} \right) . \] For 𝑎 =1, b = 2, c = 3, we get \[ \theta = \mbox{arccos} \left( \frac{\sqrt{10}}{\sqrt{14}} \right) \approx 0.563943 = 32.3115^{\circ} . \]

N[ArcCos[Sqrt[5/7]]]
0.563943
%*180/P%*180/Pii
32.3115
   ■
End of Example 16

Dot product in not standard bases

When ordered basis [e₁, e₂, … , en] in ℝn is not a standard one, the dot product must be modified in order to maintain regular (Euclidean) distance between points or vectors:

\[ \mathbf{u} \bullet \mathbf{v} = \sum_{\alpha , \beta} g_{\alpha , \beta} u^{\alpha} v^{\beta} = g_{\alpha , \beta} u^{\alpha} v^{\beta} , \]
where u = u¹e₁ + u²e₂ + ⋯ + unen, v = v¹e₁ + v²e₂ + ⋯ + vnen, and the Einstein summation convention is employed. The metric tensor provides the scalar product of a pair of contravariant vector
\[ g_{\alpha , \beta} = \mathbf{e}_{\alpha} \bullet \mathbf{e}_{\beta} . \]
For example, in ℝ², the metric tensor is the 2-by-2 matrix
\[ g_{\alpha , \beta} = \mathbf{e}_{\alpha} \bullet \mathbf{e}_{\beta} = \begin{bmatrix} \mathbf{e}_1 \bullet \mathbf{e}_1 & \mathbf{e}_1 \bullet \mathbf{e}_2 \\ \mathbf{e}_2 \bullet \mathbf{e}_1 & \mathbf{e}_2 \bullet \mathbf{e}_2 \end{bmatrix} , \]
and corresponding square norm becomes
\[ \| \mathbf{u} \|^2 = g_{1,1} \left( u^1 \right)^2 + g_{2,2} \left( u^2 \right)^2 + 2\,g_{1,2} u^1 u^2 . \]

If [e¹, e², … , en] is the dual basis, then the inverse of gi,j is the raised-indices metric tensor for the covector space:

\[ g^{i,j} = \mathbf{e}^i \bullet \mathbf{e}^j . \]
In two dimensional case, it is convenient to write the covariant metric tensor in matrix form
\[ g^{i,j} = \begin{bmatrix} \mathbf{e}^1 \bullet \mathbf{e}^1 & . \mathbf{e}^1 \bullet \mathbf{e}^2 \\ \mathbf{e}^2 \bullet \mathbf{e}^1 & . \mathbf{e}^2 \bullet \mathbf{e}^2 \end{bmatrix} . \]
Then using the Einstein summation convention, the dot product of covectors is defined as
\[ \phi \bullet \psi = g^{i,j} \phi_i \psi_j = \sum_{i,j} g^{i,j} \phi_i \psi_j . \]
Since eiej = δij (Kronecker's delta), the dot product of vectors and covectors is just standard scalar product:
\[ \phi \bullet \mathbf{v} = \left( \phi_1 , \phi_2 , \ldots , \phi_n \right) \bullet \left( v^1 , v^2 , \ldots v^n \right) = \sum_{i=1}^n \phi_{i} v^i . \]
   
Example 17: We consider a couple of examples of dot product in flat (ℝ²) using non standard bases. So we start with a simple example of two vector basis: \[ \mathbf{e}_1 = \mathbf{i} = (1. 0) , \qquad \mathbf{e}_2 = 2\,{\bf j} = (0, 2) , \] where i and j are standard unit vectors along abscissa and ordinate, respectively.Writing metric tensor in matrix form, we get \[ {\bf g} = \left[ g_{i,j} \right] = \begin{bmatrix} \mathbf{e}_1 \bullet \mathbf{e}_1 & \mathbf{e}_1 \bullet \mathbf{e}_2 \\ \mathbf{e}_2 \bullet \mathbf{e}_1 & \mathbf{e}_2 \bullet \mathbf{e}_2 \end{bmatrix} = \begin{bmatrix} 1&0 \\ 0&4 \end{bmatrix} \] because \[ \mathbf{e}_2 \bullet \mathbf{e}_2 = (2\,{\bf j}) \bullet (2\,{\bf j}) = 4\left({\bf j} \bullet {\bf j} \right) = 4, \quad \mathbf{e}_1 \bullet \mathbf{e}_2 = 0 . \] Choosing a vector (point) u = (3, 4) = 3 i + 4 j∈ ℝ², we find its magnitude \[ \| \mathbf{u} \|^2 = \mathbf{u} \bullet \mathbf{u} = 3^2 + 4^2 = 25 \quad \Longrightarrow \quad \| \mathbf{u} \| = 5 . \] Now we expand this vector in basis α = [e₁, e₂] = [i, 2j]: \[ \mathbf{u} = 3\,\mathbf{e}_1 + 2\,\mathbf{e}_2 . \] Calculating its dot product, we find \begin{align*} \mathbf{u} \bullet \mathbf{u} &= \sum_{i,j=1}^2 g_{i,j} \, u^i \, u^j \\ &= g_{1,1} \, 3 \times 3 + g_{2,2} \ 2\times 2 = 1 \cdot 3 \times 3 + 4\cdot 2 \times 2 = 25 . \end{align*}
RJB

Another basis:    \[ \beta = \left[ \mathbf{b}_1 , \mathbf{b}_2 \right] = \left[ \mathbf{i} , \mathbf{i} + \mathbf{j} \right] . \] The vector v = (−3, 4) has the following coordinates in basis β: \[ \mathbf{v} = \left( -3, 4 \right) \qquad \Longrightarrow \qquad \left[\!\left[ \mathbf{v} \right]\!\right]_{\beta} = \left( v^1 , v^2 \right) = \left( -7 , 4 \right) \] because \[ \mathbf{v} = -3\mathbf{i} + 4\mathbf{j} = -3\mathbf{i} + 4 \left( \mathbf{i} + \mathbf{j} - \mathbf{i} \right) = -3\mathbf{i} + 4 \left( \mathbf{i} + \mathbf{j} \right) - 4\mathbf{i} . \] Here we used a mathematical trick: add and subtruct the same value. The second component j is replaced by j + ii = b₂ − i. The dot product in basis β becomes \begin{align*} \mathbf{v} \bullet \mathbf{v} &= g_{1,1} v^1 v^1 + g_{1,2} v^1 v^2 + g_{2,1} v^2 v^1 + g_{2,2} v^2 v^2 \\ &= g_{1,1} \left( -7 \right)^2 + g_{1,2} \left( -7 \right) \left( 4 \right) + g_{2,1} \left( 4 \right) \left( -7 \right) + g_{2,2} \left( 4 \right)^2 \\ &= g_{1,1} 49 - g_{1,2} 28 - g_{2,1} 28 + g_{2,2} 16 . \end{align*} The components of the metric tensor are \begin{align*} g_{1,1} &= \mathbf{b}^1 \bullet \mathbf{b}^1 = \mathbf{i} \bullet \mathbf{i} = 1 , \\ g_{1,2} &= \mathbf{b}^1 \bullet \mathbf{b}^2 = \mathbf{i} \bullet \left( \mathbf{i} + \mathbf{j} \right) = \mathbf{i} \bullet \mathbf{i} + \mathbf{i} \bullet \mathbf{j} = 1 + 0 = 1 , \\ g_{2,1} &= \mathbf{b}^2 \bullet \mathbf{b}^1 = \mathbf{i} \bullet \left( \mathbf{i} + \mathbf{j} \right) = \mathbf{i} \bullet \mathbf{i} + \mathbf{i} \bullet \mathbf{j} = 1 + 0 = 1 , \\ g_{2,2} &= \mathbf{b}^2 \bullet \mathbf{b}^2 = \left( \mathbf{i} + \mathbf{j} \right) \bullet \left( \mathbf{i} + \mathbf{j} \right) = \mathbf{i} \bullet \mathbf{i} + \mathbf{j} \bullet \mathbf{j} = 2 . \end{align*} Using these values, we calculate the dot product \begin{align*} \mathbf{v} \bullet \mathbf{v} &= 49 - 28 - 26 + 2\cdot 16 = 25 . \end{align*}

49 - 28 - 28 + 32
25
   ■
End of Example 17

Applications

Scalar products are intimately associated with a variety of physical concepts. For example, if the vector is mean-centered---the average of all vector elements is subtracted from each element---then the dot product this vector with itself is called variance in statistics. So it provides a measurement of dispersion across a data set.

The work done by a force applied at a point serves as a primary example of dot product because the work is defined as the product of the displacement and the component of the force in the direction of displacement (i.e., the projection of the force onto the direction of the displacement). Thus, the component of the force perpendicular to the displacement "does no work." If F is the force (in Newtons) and s is the displacement (in meters), then the work W is by definition equal to

\[ W = F_{\parallel} s = F\,s\,\cos\left( {\bf F}, {\bf s} \right) = {\bf F} \bullet {\bf s} \quad (in Joules). \]
Suppose the force makes an obtuse angle with the displacement, so that the force is "resisting." Then the work is regarded as negative, in keeping with formula above.    
Example 27: There are many physical examples of line integrals, but perhaps the most common is the expression for the total work done by a force F when it moves its point of application from a point A to a point B along a given curve C. We allow the magnitude and direction of F to vary along the curve. Let the force act at a point r and consider a small displacement dr along the curve; then the small amount of work done is dW = F • dr (note that dW can be either positive or negative). Therefore, the total work done in traversing the path C is \[ W_C = \int_C {\bf F} \bullet {\text d}{\bf r} . \]

Naturally, other physical quantities can be expressed in such a way. For example, the electrostatic potential energy gained by moving a charge q along a path C in an electric field E is     −qC E • dr. We may also note that Ampere's law concerning the magnetic field B associated with a current-carrying wire can be written as \[ \oint_C {\bf B} \bullet {\text d}{\bf r} = \mu_0 I , \] where I is the current enclosed by a closed path C traversed in a right-handed sense with respect to the current direction.    ■

End of Example 27
    The work W must of course be independent of the coordinate system in which the vectors F and x are expressed. The dot product as we know it from Eq.\eqref{EqDot.7} does not have this property. In general, using matrix transformation, we have
\[ s = {\bf A}\,\mathbf{x} \bullet {\bf A}\,\mathbf{y} = {\bf A}^{\mathrm T} {\bf A}\,\mathbf{x} \bullet \mathbf{y} . \]
Only if A−1 equals AT (i.e. if we are dealing with orthonormal transformations) s will not change. It appears as if the dot product only describes the physics correctly in a special kind of coordinate system: a system which according to our human perception is ‘rectangular’, and has physical units, i.e. a distance of 1 in coordinate x means indeed 1 meter in x-direction. An orthonormal transformation produces again such a rectangular ‘physical’ coordinate system. If one has so far always employed such special coordinates anyway, this dot product has always worked properly.

   
Example 28: In geometry, a barycentric coordinate system is a coordinate system in which the location of a point is specified by reference to a simplex (a triangle for points in a plane, a tetrahedron for points in three-dimensional space, etc.). Barycentric coordinates were invented by the German mathematician and theoretical astronomer August Ferdinand Möbius (1790--1868). He introduced them in his work "Der barycentrische Calcül" published in 1827.

2D case:    we start with flat (ℝ²).

The area of a 2D triangle whose vertices are 𝑎 = (x𝑎, y𝑎), b = (xb, yb), c = (xc, yc) (as shown in figure 1) is given by \[ \mbox{area} = \frac{1}{2} \begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix} . \]

pointA = Graphics[{Purple, Disk[{0, 0}, 0.02]}]; pointB = Graphics[{Purple, Disk[{1.1, 0.2}, 0.02]}]; pointC = Graphics[{Purple, Disk[{0.4, 0.9}, 0.02]}]; pointD = Graphics[{Purple, Disk[{0.45, 0.4}, 0.02]}]; line1 = Graphics[{Black, Thick, Line[{{0, 0.0}, {0.4, 0.9}, {1.1, 0.2}, {0, 0}}]}]; line2 = Graphics[{Brown, Thick, Line[{{0, 0.0}, {0.45, 0.4}, {0.4, 0.9}}]}]; line3 = Graphics[{Brown, Thick, Line[{{1.1, 0.2}, {0.45, 0.4}}]}]; txt = Graphics[{Black, Text[Style[Subscript[A, c], FontSize -> 18, Bold], {0.46, 0.2}], Text[Style[Subscript[A, b], FontSize -> 18, Bold], {0.3, 0.4}], Text[Style[Subscript[A, c], FontSize -> 18, Bold], {0.65, 0.45}], Text[Style["a", FontSize -> 18, Bold], {0.0, -0.1}], Text[Style["b", FontSize -> 18, Bold], {1.1, 0.0}], Text[Style["b", FontSize -> 18, Bold], {1.1, 0.1}], Text[Style["c", FontSize -> 18, Bold], {0.4, 1.0}]}]; Show[line1, pointA, pointB, pointC, pointD, line2, line3, txt]
Figure 1: Barycentric Coordinates

Note that this area is a "signed" area. In other words, it has a sign. To obtain the area the way we are used to define it, we take the absolute value. If the vertices live in 3D space, the area of the corresponding triangle is \[ \mbox{Area} = \frac{1}{2}\,\| (b-a) \times (c-b) \| . \] This always gives a positive answer.

Barycentric coordinates allow us to express the coordinates of p = (x, y) in terms of 𝑎, b, c. More specifically, the barycentric coordinates of p are the numbers β and γ such that \[ p = a + \beta\left( b-a \right) + \gamma \left( c -a \right) \] If we regroup 𝑎, b, c, we obtain \begin{align*} p &= a + \beta\,b - \beta\,a + \gamma\,c - \gamma\,a \\ &= \left( a - \beta \right) a + \beta\,b + \gamma\,c . \end{align*} It is customary to define a third variable α by \[ \alpha = 1 - \beta - \gamma . \] Then we have \[ p = \alpha\,a + \beta\, b + \gamma\,c , \qquad \alpha + \beta + \gamma = 1 . \] The barycentric coordinates of the point p in terms of the points 𝑎, b, c are the numbers α, β, γ such that p = α𝑎 + βb + γc, with α + β + γ = 1.

Barycentric coordinate are defined for all points in the plane. They have several nice features:

  1. A point p is inside the triangle defined by 𝑎, b, c if and only if \[ 0 < \alpha < 1 , \quad 0 < \beta < 1 , \quad 0 < \gamma < 1 . \] This property provides an easy way to test if a point is inside a triangle.
  2. If one of the barycentric coordinates is 0 and the other two are between 0 and 1, the corresponding point p is on one of the edges of the triangle.
  3. If any barycentric coordinate is less than zero, then p must lie outside of the triangle.
  4. If two of the barycentric coordinates are zero and the third is 1, the point p is at one of the vertices of the triangle.
  5. By changing the values of α, β, γ between 0 and 1, the point p will move smoothly inside the triangle. This can (and will) be applied to other properties of the vertices such as color problem.
  6. The center of the triangle is obtained when α = β = γ = ⅓. If the triangle is made of a certain substance which is evenly distributed throughout the triangle, then these values of α, β, γ would give us the center of gravity.
Note that it is sufficient to find two parameters out of 𝑎, b, c because their sum is 1. One way of determination α, β, γ is to write equation \[ p = a + \beta\left( b-a \right) + \gamma \left( c -a \right) \] in terms of the coordinates of the various points involved. This gives us the following system \[ \begin{cases} x &= x_a + \beta \left( x_b - x_a \right) + \gamma \left( x_c - x_a \right) , \\y &= y_a + \beta \left( y_b - y_a \right) + \gamma \left( y_c - y_a \right) \end{cases} \] which can be solved using your favorite method.

Let A𝑎, Ab and Ac be as in figure 1 and let A denote the area of the triangle. Also note that the point inside the triangle on figure 1 is the point we called p. Consider the triangles in the figure. These are different triangles drawn for a fixed value of β. They have the same area since they have the same base and height. This area was denoted Ab on figure 1. Thus, we see that Ab only depends on β. Therefore, we have \[ A_b = C\beta . \] for some constant C. When p is on b that is when β = 1, we have Ab = A. Hence, A = C. Therefore we see that \[ \beta = \frac{A_b}{A} . \] Similarly, we have \[ \alpha = \frac{A_a}{A} , \qquad \gamma = \frac{A_c}{A} . \] In coordinates, these parameters become \[ \beta = \frac{\begin{vmatrix} x_a - x_c & x - x_c \\ y_a - y_c & y - y_c \end{vmatrix}}{\begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix}} , \] \[ \gamma = \frac{\begin{vmatrix} x_b - x_a & x - x_a \\ y_b - y_a & y - y_a \end{vmatrix}}{\begin{vmatrix} x_b - x_a & x_c - x_a \\ y_b - y_a & y_c - y_a \end{vmatrix}} , \]

Let us assume that we are using the RGB model. That is all colors can be obtained by mixing R (red), G (green) and B (blue). Usually, with such a model, the level of each color channel is a number between 0 and 255. To specify the color at any point, we must specify a triplet (R, G, B) where R, G, and B are integers between 0 and 255. They indicate how much of red, green and blue is used in the color. When using Java, there is a built-in class to handle colors. It is called Color. This class has some built-in predefined colors. Here are some examples: Color.black, Color.blue, Color.cyan, Color.gray, Color.green, Color.magenta, Color.orange, Color.pink, Color.red, Color,white, Color.yellow. To get any other color, one uses a statement such as new Color(R,G,B) where R,G,B are integers between 0 and 255.

It is also possible to use a single integer to represent colors. Keeping in mind that an integer has 32 bits, bits 0 − 7 contain the R level, bits 8 − 15 contain the G level and bits 16−23 contain the B level. The remaining bits are unused and set to 0. Let us see now how barycentric coordinates can be used to smoothly color a triangle, given the color of its vertices. Using the notation above, let us assume that C𝑎 is the color of 𝑎, Cb is the color of b and Cc is the color of c. Each color is in fact a triplet. We will use the notation C𝑎 = (R𝑎, G𝑎, B𝑎) and similar notation for the remaining points. We would like to color the triangle so that there is a smooth coloring throughout the triangle. We use the fact that by changing the values of α, β, γ between 0 and 1, the point p = α𝑎 + βb + γc will move smoothly inside the triangle. In other words, small changes in α, β, γ will result in small changes in the location of p. We apply this to colors. We let \[ C = \alpha C_a + \beta C_b + \gamma C_c \] (we really do this for every color channel). Small changes in α, β, γ will result in small changes in the color. Therefore, the color will change smoothly as we move within the triangle. To color smoothly a triangle given the color of its vertices, we can use the following algorithm:

  1. For each point P = (x, y) inside the triangle, find α, β, γ.
  2. Use α, β, γ to interpolate the color of the point from the color of the vertices using relation \[ C = \alpha C_a + \beta C_b + \gamma C_c \]
  3. Plot the point with coordinates (x, y) and color computed above.

3D case:   

We use the same notation as in the 2D case. The only difference is that now points have three coordinates. So, we have 𝑎 = (x𝑎, y𝑎, z𝑎), b = (xb, yb, zb) and c = (xc, yc, zc). Barycentric coordinates are extended naturally to 3D triangles and they have the same properties. In other words, we have the same equation for point p = α𝑎 + βb + γc.

The only difference of 3D compared to 2D is that the area of the triangle is alwuas positive independently on orientation. We define the following quantities:

  • n is the normal to the triangle T with vertices (𝑎, b, c) in counterclockwise order. In other words, n = (b − 𝑎) × (c − 𝑎).
  • n𝑎 is the normal to T𝑎, the triangle with area A𝑎 as shown in figure 1. T𝑎 = (b, c, p) in counterclockwise order. Thus, n𝑎 = (cb) × (pb).
  • nb is the normal to Tb, the triangle with area Ab as shown in figure 1. Tb = (c, 𝑎, p) in counterclockwise order. Thus, nb = (𝑎 − c) × (p − c).
  • nc is the normal to Tc, the triangle with area Ac as shown in figure 1. Tc = (𝑎, b, p) in counterclockwise order. Thus, nc = (b − 𝑎) × (p − 𝑎).
The quantity \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \,\| \mathbf{n}_a \|} = 1 \quad \) if p is inside T, −1 otherwise. The same is true for \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_b}{\| \mathbf{n} \| \,\| \mathbf{n}_b \|} \quad \) and \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_c}{\| \mathbf{n} \| \,\| \mathbf{n}_c \|} .\quad \) Multiplying A𝑎 / A by \( \displaystyle \quad \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \,\| \mathbf{n}_a \|} \quad \) will then give us a signed area, depending on whether p is inside T or outside. Since A𝑎 = ∥n𝑎∥ and A = ∥n∥, we have \begin{align*} \frac{A_a}{A} \,\frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \, \| \mathbf{n}_a \|} &= \frac{\| \mathbf{n}_a \|}{\| \mathbf{n} \|}\,\frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \| \, \| \mathbf{n}_a \|} \\ &= \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \|^2} . \end{align*} We obtain similar formulas for the other ratios. Thus, in the case of a 3D triangle, we can define the barycentric coordinates by: \[ \alpha = \frac{\mathbf{n} \bullet \mathbf{n}_a}{\| \mathbf{n} \|^2} , \quad\beta = \frac{\mathbf{n} \bullet \mathbf{n}_b}{\| \mathbf{n} \|^2} , \quad \gamma = \frac{\mathbf{n} \bullet \mathbf{n}_c}{\| \mathbf{n} \|^2} . \]    ■
End of Example 28

 

Dot product in floating point format


There are only finite many numbers in a computer. Besides integers, these are the so-called floating-point numbers:
\[ a = \pm\left( \frac{d_1}{b} + \frac{d_2}{b^2} + \cdots + \frac{d_t}{b^t} \right) \cdot b^{\alpha} . \]
Here α, t, b, d₁, d₂, … , dt are integers. The number b > 0 is said to be the base (usually b = 2 IEEE-754 binary format,, or its power in computations, but humans prefer b = 10) of a computer arithmetic. (Other bases (such as 4 or 8) were used in the early days of computer arithmetic, but studies performed in the 1970s showed that they are of little interest except 16.) The rational number in parentheses is the mantissa---it essentially dictates the precision of the floating-point number, and α is the exponent of a floating-point 𝑎. The numbers di ∈ {0, 1, … , b − 1} are termed digits with d₁ ≠ 0, and t is the length of the mantissa. After all, there are some integers, L and U, which are the bounds of α: L ≤ α ≤U. A special floating-point number is 𝑎 = 0.

All arithmetic operations performed by computers over floating point numbers are subject to round-off procedure, which is a mapping of real numbers into floating-point numbers. Let fl(x) denote the rounding-off result for x. For instance, fl(π) = 0.31415926 × 10−1. Then

\[ fl(x) = x \left( 1 + \varepsilon \right) , \]
where |ε| ≤ η as long as fl(x) ≠ 0. We define η as the lowest upper bound for |ε|:
\[ \eta = \frac{1}{2}\,b^{1-t} . \]

The dot product of two vectors, when calculated with floating-point numbers, can be affected by rounding errors. This is because floating-point arithmetic involves approximations, and repeated multiplication and addition can lead to accumulated errors, especially when dealing with very large or very small numbers (Avogadro's number = 6.023 x 1023 and Planck's constant = 6.626068 x 10-34).

Since xy does not necessarily have a representation in the floating-point format with bounded mantissa, there is no algorithm which always computes dot-products exactly. Thus, one seeks to construct an algorithm which computes a floating-point number with minimal deviation from the exact result where the deviation should not depend on the number of addends.

Although floating point error analysis done in numerical linear algebra, we consider a simple algorithm for evaluation of the dot product of two vectors xy:


s = 0;
for i = 1:n
    s = s + x(i)∗y(i);
end

At the first step, we have
\[ fl(x_1 y_1 ) = x_1 y_1 \left( 1 + \delta_1 \right) , \qquad\mbox{with} \quad |\delta_1 | \le \eta . \]
The next step is evaluated with 2 rounding errors: the multiplication: (1 + δ₂) and then the addition: (1 + ϵ₂).
\begin{align*} s_2 &= \left( s_1 + x_2 y_2 \,(1 + \delta_2 ) \right) \left( 1 + \epsilon_2 \right) \\ &= x_1 y_1 \left( 1 + \delta_1 \right)\left( 1 + \epsilon_2 \right) + x_2 y_2 \left( 1 + \delta_2 \right)\left( 1 + \epsilon_2 \right) . \end{align*}
The structure becomes clearer with s₃ = (s₂ + xy₃(1 + ϵδ₃))(1 + ₃) or
\begin{align*} s_3 &= x_1 y_1 \left( 1 + \delta_1 \right) \left( 1 + \epsilon_2 \right) \left( 1 + \epsilon_3 \right) \\ & \quad + x_2 y_2 \left( 1 + \delta_2 \right) \left( 1 + \epsilon_2 \right) \left( 1+ \epsilon_3 \right) \\ & \quad +x_3 y_3 \left( 1 + \delta_3 \right) \left( 1 + \epsilon_3 \right) . \end{align*}
Since |δi| ≤ η, |ϵi| ≤ η, we can “simplify” a bit:
\begin{align*} s_3 &= x_1 y_1 \left( 1+ \delta_1 + \epsilon_2 + \epsilon_3 \right) + x_2 y_2 \left( 1 + \delta_2 + \epsilon_2 + \epsilon_3 \right) \\ & \quad + x_3 y_3 \left( 1 + \delta_3 + \epsilon_3 \right) + O(\eta^2 ) . \end{align*}
Each term above has one δ and some ϵ’s. The emerging pattern is
\[ s_k = \sum_{i=1}^k x_i y_i \left( 1 + \mbox{up to $k$ rounding terms} \right) + O(\eta^2 ) . \]
The number of ϵ terms goes down as i increases (the last term will only have 1).

Now we apply the triangle inequality (and |δi|, |ϵi| ≤ η) to the difference between the computed and exact values:

\begin{align*} \left\vert s_n - \mathbf{x} \bullet \mathbf{y} \right\vert &= \left\vert \sum_{i=1}^n x_i y_i \left( 1 + \mbox{up to $n$ rounding terms} \right) - \sum_{i=1}^n x_i y_i + O(\eta^2 ) \right\vert \\ &= \left\vert \sum_{i=1}^n x_i y_i \left( \mbox{up to $n$ rounding terms} \right) + O(\eta^2 ) \right\vert \\ & \le \sum_{i=1}^n \left\vert x_i \right\vert \left\vert y_i \right\vert n \left\vert \mbox{rounding term bounded} \right\vert + O(\eta^2 ) \\ & \le \sum_{i=1}^n \left\vert x_i \right\vert \left\vert y_i \right\vert n \eta + O(\eta^2 ) \\ &= n\eta\,|\mathbf{x}| \bullet |\mathbf{y}| + O(\eta^2 ) \end{align*}
As long as xy ≠ 0, we can write this result as
\[ \frac{\left\vert \mathbf{x} \bullet \mathbf{y} - fl (\mathbf{x} \bullet \mathbf{y}) \right\vert}{\left\vert \mathbf{x} \bullet \mathbf{y} \right\vert} \le \eta n\,\frac{|\mathbf{x}| \bullet |\mathbf{y}|}{\left\vert \mathbf{x} \bullet \mathbf{y} \right\vert} + O( \eta^2 ) . \]
   
Example 29: We consider two vectors from ℚ4: \[ \mathbf{a} = \left( \frac{3}{7} , \ \frac{1}{3},\ \frac{12}{11} , \ \frac{7}{13} \right) , \quad \mathbf{b} = \left( \frac{7}{9} , \ \frac{3}{11} , \ \frac{11}{3} , \ \frac{13}{23} \right) . \] Their dot product in exact arithmetics is \[ \mathbf{a} \bullet \mathbf{b} = \frac{3589}{759} \approx 4.728590250329381 . \]
(3/7)*(7/9) + (1/3)*(3/11) + (12/11)*(11/3) + (7/13)*(13/23)
3589/759
If we round these vectors to two decimal places, we get \[ \mathbf{a}_2 = \left( 0.43 , \ 0.33,\ 1.09 , \ 0.54 \right) , \quad \mathbf{b}_2 = \left( 0.78 , \ 0.27 , \ 3.67 , \ 0.57 \right) . \] Their scalar product is \[ \mathbf{a}_2 \bullet \mathbf{b}_2 = 4.7326 . \]
0.43*0.78 + 0.33*0.27 + 1.09*3.67 + 0.54*0.57
4.7326
As you see, the second decimal place is not correct and error is −0.00400975. Now we consider the same vectors rounded to four decimal places: \[ \mathbf{a}_4 = \left( 0.4286 , \ 0.3333,\ 1.0909 , \ 0.5385 \right) , \quad \mathbf{b}_4 = \left( 0.7778 , \ 0.2727 , \ 3.6667 , \ 0.5652 \right) . \] Their dot product becomes \[ \mathbf{a}_4 \bullet \mathbf{b}_4 = 4.72862 . \] Its error is −0.0000289697.
0.4286*0.7778 + 0.3333*0.2727 + 1.0909*3.6667 + 0.5385*0.5652
4.72862
In order to convice you that dot product with at least four terms leads to possible (on average) modification in the last digit (of course, when all calculations are performed in floating point arithmetic), we consider another example.

We consider two numerical vectors with entries from ℚ4: \[ \mathbf{a} = \left( \frac{3}{7} , \ \frac{2}{5},\ \frac{6}{13} , \ \frac{11}{23} \right) , \quad \mathbf{b} = \left( \frac{7}{15} , \ \frac{5}{12} , \ \frac{23}{47} , \ \frac{5}{11} \right) . \] Their dot product in exact arithmetics is \[ \mathbf{a} \bullet \mathbf{b} = \frac{3481}{4230} \approx 0.8229314420803783 . \]

(3/7)*(7/15) + (2/5)*(5/12) + (6/13)*(13/27) + (11/23)*(23/47)
3481/4230
If we round these vectors to two decimal places, we get \[ \mathbf{a}_2 = \left( 0.43 , \ 0.4,\ 0.46 , \ 0.48 \right) , \quad \mathbf{b}_2 = \left( 0.47 , \ 0.42 , \ 0.48 , \ 0.49 \right) . \] Their scalar product is \[ \mathbf{a}_2 \bullet \mathbf{b}_2 = 0.8261 \approx 0.83 . \]
0.43*0.47 + 0.4*0.42 + 0.46*0.48 + 0.48*0.49
0.8261
As you see, the second decimal place is not correct and error is −0.00706856. Now we consider the same vectors rounded to four decimal places: \[ \mathbf{a}_4 = \left( 0.4286 , \ 0.4,\ 0.4615 , \ 0.4783 \right) , \quad \mathbf{b}_4 = \left( 0.4667 , \ 0.4167 , \ 0.4815 , \ 0.4894 \right) . \] Their dot product becomes \[ \mathbf{a}_4 \bullet \mathbf{b}_4 = 0.823 . \]
0.4286*0.4667 + 0.4*0.4167 + 0.4615*0.4815 + 0.4783*0.4894
0.823
   ■
End of Example 29

 

  1. Find the dot product of the following pairs of vectors. \[ {\bf (a)\ \ } \begin{pmatrix} 1 \\ -3 \\ 4 \end{pmatrix} , \quad \begin{pmatrix} 8 \\ 6 \\ 1 \end{pmatrix} \qquad {\bf (b)\ \ } \begin{pmatrix} 3 \\ 5 \\ 6 \end{pmatrix} , \quad \begin{pmatrix} -9 \\ 2 \\ 3 \end{pmatrix} \]
  2. how that for any vectors, x, y ∈ ℝn, we have \[ \| \mathbf{x} + \mathbf{y} \|^2 = \| \mathbf{x} \|^2 + 2\,\mathbf{x} \bullet \mathbf{y} + \| \mathbf{y} \|^2 . \] 𝑥 , 𝑦 ∈ 𝑅 𝑛 , we have
  3. For vectors u, v ∈ ℝn, show that \( \displaystyle \quad (\mathbf{u} \bullet \mathbf{v}) = \frac{1}{4} \left( \| \mathbf{u} + \mathbf{v} \|^2 - \| \mathbf{u} - \mathbf{v} \|^2 \right) . \)
  4. Prove the parallelogram identity: \[ \| \mathbf{u} + \mathbf{v} \|^2 + \| \mathbf{u} - \mathbf{v} \|^2 = 2\, \| \mathbf{u} \|^2 + 2\,\| \mathbf{v} \|^2 . \]
  5. What is the angle between the vectors i + j and i + 3j?
  6. What is the area of the quadrilateral with vertices at (1, 1), (4, 2), (3, 7) and (2, 3)?
  7. Find cos(θ) where θ is the angle between the vectors \[ \left( 3, -2, 7 \right) \quad \mbox{and} \quad \left( 5,3,4 \right) . \]
  8. Find cos(θ) where θ is the angle between the vectors \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  9. Verify the Cauchy inequality for vectors \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  10. Find proju(v) where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  11. Find proju(v) where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  12. Decompose the vector v into v = v + v, where \[ \left( \right) \quad \mbox{and} \quad \left( \right) . \]
  13. Show that \[ \mathbf{u} \bullet \left( \mathbf{v} - \mbox{proj}_u (\mathbf{v}) \right) = 0 \] and conclude every vector in &Ropf'n can be written as the sum of two vectors, one of which is orthogonal and another is parallel to the given vector.
  14. Are the vectors u = (2, 3, −1, 4) and v = (?, ?, ?, ?) T orthogonal?
   
  1. Aldaz, J. M.; Barza, S.; Fujii, M.; Moslehian, M. S. (2015), "Advances in Operator Cauchy—Schwarz inequalities and their reverses", Annals of Functional Analysis, 6 (3): 275–295, doi:10.15352/afa/06-3-20
  2. Bunyakovsky, Viktor (1859), "Sur quelques inequalities concernant les intégrales aux différences finies" (PDF), Mem. Acad. Sci. St. Petersbourg, 7 (1): 6
  3. Cauchy, A.-L. (1821), "Sur les formules qui résultent de l'emploie du signe et sur > ou <, et sur les moyennes entre plusieurs quantités", Cours d'Analyse, 1er Partie: Analyse Algébrique 1821; OEuvres Ser.2 III 373-377
  4. Deay, T. and Manogue, C.A., he Geometry of the Dot and Cross Products, Journal of Online Mathematics and Its Applications 6.
  5. Gibbs, J.W. and Wilson, E.B., Vector Analysis: A Text-Book for the Use of Students of Mathematics & Physics: Founded Upon the Lectures of J. W. Gibbs, Nabu Press, 2010.
  6. Magnus, J. R. (1988). Linear Structures. Charles Griffin, London.
  7. Marcus, M. & Minc, H. (1992). A Survey of Matrix Theory and Matrix Inequalities. Dover Publications. Corrected reprint of the 1969 edition.
  8. Schwarz, H. A. (1888), "Über ein Flächen kleinsten Flächeninhalts betreffendes Problem der Variationsrechnung" (PDF), Acta Societatis Scientiarum Fennicae, XV: 318, archived (PDF) from the original on 2022-10-09
  9. Solomentsev, E. D. (2001) [1994], "Cauchy inequality", Encyclopedia of Mathematics, EMS Press
  10. Steele, J. M. (2004). The Cauchy–Schwarz Master Class: An Introduction to the Art of Mathematical Inequalities. Cambridge University Press.
  11. Vector addition