es
Recall that 𝔽 denotes one of the following fields of numbers: ℤ, integers, ℚ, rational numbers, ℝ, real numbers; hence we exclude from our consideration ℂ, complex numbers in this section. It is caused by applications of affine transformations in geometry and computer graphics that utilize only real numbers. We denote by 𝔽m×n or 𝔽m,n the vector space of m-by-n matrices with entries from field 𝔽.

According to Wikipedia, the term linear function can refer to two distinct concepts, based on the context:

  • In Calculus, a linear function is a polynomial function of degree zero or one; in other words, a function of the form f(x) = m x + b for some constants m and b ∈ ℝ.
  • In Linear Algebra, a linear function is a linear mapping, or linear transformation: fx + y) = λf(x) + f(y). for any scalar λ and any two vectors x and y.
A matrix A of size m-by-n (written as m x n) defines a linear map upon multiplication from left:
\begin{align*} \mathbb{F}^{n\times 1} &\longrightarrow \, \mathbb{F}^{m\times 1} , \\ \mathbb{F}^{n\times 1} \ni {\bf x} &\longrightarrow \, \mathbf{A}\,\mathbf{x} \in \mathbb{F}^{m\times 1} , \end{align*}
also denoted by A : 𝔽m×1 ⇾ 𝔽n×1. The same matrix defines a linear transformation between row vectors upon multiplication from right,
\begin{align*} \mathbb{F}^{1\times n} &\longleftarrow \, \mathbb{F}^{1\times m} , \\ \mathbb{F}^{1\times n} \ni {\bf v}\,{\bf A} &\longleftarrow \, \mathbf{v} \in \mathbb{F}^{1\times m} . \end{align*}
Such a map has the basic property A 0 = 0 for column vectors and 0 A = 0 for row vectors.
$Post := If[MatrixQ[#1], MatrixForm[#1], #1] & (* outputs matrices in MatrixForm*)
Remove[ "Global`*"] // Quiet (* remove all variables *)

Affine Transformations

An affine transformation or affinity (in 1748, Leonhard Euler introduced the term affine, which stems from the Latin, affinis, "connected with") is a geometric transformation that preserves the parallelism of lines and the ratio of distances between points. Affine transformation is closely related to projective transformation---this technique is widely used in computer graphics, image processing, machine learning, and neural networks to perform geometric transformations in a simple way using transformation matrices.

Although there are several open computer vision libraries for affine transformations such as openGL and openCV, we prefer to use Mathematica and its build-in commands: AffineTransform and TransformationMatrix.

The following definition may give an impression that an affine transformation is not a general object as it is expected in mathematical literature---formal/abstract definition will be given later. However, it provides us an idea where affine transformations come from. Also, it can be shown (see, for instance, Kostrikin & Manin's book) that any affine transformation is isomorphic to the following algebraic approach.

(Algebraic Representation) Any map f : 𝔽n×1 ↣ 𝔽m×1 of the form \begin{equation} \label{EqAffine.1} \mathbb{F}^{n\times 1} \ni \mathbf{x} \longrightarrow f({\bf x}) = {\bf A}\,{\bf x} + \mathbf{b} \end{equation} for some fixed column vector b ∈ 𝔽m×1 and m-by-n matrix A, is called an affine map or transformation. Similarly, a mapping between row vectors \begin{equation} \label{EqAffine.2} \mathbb{F}^{1\times m} \ni \mathbf{v} \longrightarrow f({\bf v}) = {\bf v}\,{\bf A} + \mathbf{w} \end{equation} for some given row vector w ∈ 𝔽1×n, is also called an affine transformation.
Both formulae \eqref{EqAffine.1} and \eqref{EqAffine.2} are just short cuts of the general transformation of the form (system of linear equations)
\[ \begin{cases} y_1 &= a_{1,2} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n + b_1 , \\ y_2 &= a_{2,2} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n + b_2 , \\ \ \vdots & \qquad \vdots \qquad \vdots \\ y_m &= a_{m,2} x_1 + a_{m,2} x_2 + \cdots + a_{m,n} x_n + b_m . \end{cases} \]

Affine map is a geometric transformation that preserves co-linearity (i.e., all points lying on a line initially still lie on a line after transformation) and ratios of distances (e.g., the midpoint of a line segment remains the midpoint after transformation), but not necessarily Euclidean distances and angles. Since f(0) = b, such a map can be be linear only when b = 0 in Eq.\eqref{EqAffine.1} or w = 0 in Eq.\eqref{EqAffine.2}. Formulae \eqref{EqAffine.1} and \eqref{EqAffine.2} show that an affine transformation is the composition of a linear transformation (including scaling, homothety, similarity, reflection, rotation, shearing) and a translation.

   
Example 1: There are a few countries (exclusively by the U.S. and its formerly and presently governed territories, in addition to a few South American and some countries in the Pacific) that still use Fahrenheit (°F) temperature scale while the majority of countries utilize the International System of Units (SI), which includes Celsius (°C) scale. First proposed in 1724 by physicist Daniel Gabriel Fahrenheit (who had also invented the mercury thermometer in 1714), the Fahrenheit temperature scale was used before domination of metric unit system. Celsius scale is named after the Swedish astronomer Anders Celsius (1701–1744), who developed a variant of it in 1742.

Today, however, Fahrenheit has been replaced by the Celsius (and in scientific applications, Kelvin) scale in all but a handful of the world's countries. On the kelvin scale, 0°K is equal to −273.15 °C and the boiling point of water is 373°K, which is 100°C. \[ \mbox{F}^{\circ} = \frac{9}{5}\,\mbox{C}^{\circ} + 32 . \] For converting Fahrenheit (°F) into Celsius (°C) scale, use the formula: \[ \mbox{C}^{\circ} = \frac{5}{9} \left( \mbox{F}^{\circ} - 32^{\circ} \right) . \] Note that both, the transformation from Fahrenheit scale into Celsius scale and its reverse, are not linear maps because they do not preserve zero. Actually, these transformations are affine ones.    ■

End of Example 1

Basically, there are five affine transformations or their compositions in 2D and 3D:

  • Translate moves a set of points a fixed distance in each coordinate.
  • Scale scales a set of points up or down in each coordinate.
  • (Proper) Rotation (with determinant +1) rotates a set of points about the origin in counterclockwise direction,
  • Improper rotation (having determinant to be −1) including reflection/mirroring and inversion.
  • Shear offsets a set of points a distance proportional to their x, y or/and z coordinates.
Note that only shear and non-uniform scale change the shape determined by a set of points. A subclass of affine transformations that locally preserves angles, but not necessarily lengths is called the set of conformal maps.    

Example 2: Let us consider an affine transformation \[ \begin{split} x &\mapsto x+y +1 , \\ y &\mapsto 2\,y - 1 . \end{split} \] It is convenient to rewrite this transformation in matrix/vector form using either column vectors \[ \begin{bmatrix} x \\ y \end{bmatrix} \,\mapsto \, \begin{bmatrix} 1 & 1 \\ 0 & 2 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} \phantom{-}1 \\ -1 \end{bmatrix} \] or row vectors \[ \begin{bmatrix} x & y \end{bmatrix} \,\mapsto \, \begin{bmatrix} x & y \end{bmatrix} \begin{pmatrix} 1&0 \\ 1&2 \end{pmatrix} + \begin{bmatrix} 1 & -1 \end{bmatrix} . \] It is a “shear” followed by a translation. The effect of this shear on the square (𝑎, b, c, d) is shown in the following figure. The image of this square is the parallelogram.

Then vertical line x = −1 is mapped into the line \[ x \mapsto y , \qquad y \mapsto 2\,y - 1 . \] The images of vertices become \[ \left( -1,-1 \right) \mapsto \left( -1, -3 \right) , \quad \left( -1, 1 \right) \mapsto \left( 1, 1 \right) , \quad \left( 1, 1 \right) \mapsto \left( 3, 1 \right) , \quad \left( 1, -1 \right) \mapsto \left( 1, -3 \right) . \]
Clear[gr1, gr2]; gr1=(*Labeled[*) Graphics[{Opacity[0],Thick,EdgeForm[{Thick,Black}],(*Red,*)Rectangle[{-1,-1},{1,1}]}, Axes->True, GridLines->Automatic, GridLinesStyle->Directive[Red,Thick, Dashed], AspectRatio->Automatic ]
Graphics[{Thickness[0.01], Line[{{-1, -3}, {1, 1}, {3, 1}, {1, -3}, {-1, -3}}]}]
The next line below is Wolfram code for exporting any graphic to PNG format
Export[FileNameJoin[{NotebookDirectory[], "affineExample2.png"}], gr1, "PNG"];

Square before affine map.
       
Parallelogram after affine map.

Now we plot the same figures by keeping grid visible.

Clear[gr1, gr2]; gr1 =(*Labeled[*) Graphics[{Opacity[0], Thick, EdgeForm[{Thick, Black}],(*Red,*) Rectangle[{-1, -1}, {1, 1}]} , Axes -> True , GridLines -> Automatic , GridLinesStyle -> Directive[Red, Thick, Dashed] , AspectRatio -> Automatic ](*,"Square before\naffine map"]*);
Since Wolfram has a documentation for AffineTransform, we are going to use its command Affine Transform[m,b].

Note that we are using the second version of the AffineTransform. It has an argument set which is a list, {m, v}, where m is a linear transform and v is a shift (translate) vector. Note below the TransformationFunction it produces is an augmented matrix in the usual form with the b vector the far right column to the right of the dividing line.

t = AffineTransform[{{{1, 1}, {0, 2}}, {1, -1}}]
\( \displaystyle \quad \left( \begin{array}{cc|c} 1&1&1 \\ 0&2&-1 \\ \hline 0&0&1 \end{array} \right) \)
The lower left coordinate of our rectangle to be transformed is {-1,-1}. We can use our AffineTransform as function, t, and apply that to the lower left coordinate to get the image of that coordinate.
AffineTransform[{{{1, 1}, {0, 2}}}][{-1, -1}] TranslationTransform[{1, -1}][%] %% + {1, -1}
{-2, -2}

{-1, -3}

{-1, -3}

The transformation function performs two operations: The linear transformation associated with the "m" matrix is the first. The second is the translation association with the "b" vector.

Applying the transform to each corner of the rectangle, we get the same transformed vertices images as provided in the narration above

origCorners = {{-1, -1}, {-1, 1}, {1, 1}, {1, -1}}; newCorners = t[#] & /@ origCorners Thread[origCorners -> newCorners]
\( \displaystyle \quad \begin{pmatrix} -1&-3 \\ 1&1 \\ 3&1 \\ 1&-3 \end{pmatrix} \)
gr2 =(*Labeled[*) Graphics[{Opacity[0], EdgeForm[{Thick, Black}], Polygon[newCorners]} , GridLines -> Automatic , Axes -> True , GridLinesStyle -> Directive[Red, Thick, Dashed] , AspectRatio -> Automatic ](* ,"Parallelogram\nafter affine map"]*);
Showing both in a grid with different scales
gr3 = Grid[{{gr1, gr2}}]

Square before affine map with grid.
       
Parallelogram after affine map with grid.
Clear[A, x, y, xy]
A = {{1, 1}, {0, 2}};
b = {1, -1};
xy = {x, y}
A . xy
{x + y, 2 y}
Solve[{(A . xy)[[1]] == 1, (A . xy)[[2]] == -1}, {x, y}]
\( \displaystyle \quad \left( x\,\to\,\frac{3}{2} \ y\,\to\, -\frac{1}{2} \right) \)
t = AffineTransform[{A, b}]
TransformationFunction[ \( \displaystyle \left( \begin{array}{cc|c} 1&1&1 \\ 0&2&-1 \\ \hline 0&0&1 \end{array} \right) \) ]
t[{x, y}]
{1 + x + y, -1 + 2 y}
TransformationMatrix@AffineTransform[{A, b}] // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1&1&1 \\ 0&2&-1 \\ 0&0&1 \end{pmatrix} \)

Wolfram has a function, FindGeometricTransform, which essentially reverses the process we just completed. Below we provide Mathematica with just the new corners, the original corners and ask for an Affine Transform matrix. It returns our matrix above.

FindGeometricTransform[newCorners,origCorners,TransformationClass->Affine]//Chop
{0,TransformationFunction[ \( \displaystyle \quad \left( \begin{array}{cc|c} 1&1&1 \\ 0&2&-1 \\ \hline 0&0&1 \end{array} \right) \) ]

One might reasonably ask: "What does the last line mean? What does the row containing {0.,0.|1.}, of the TransformationFunction do?" Those questions come after answering the broader question: "What makes a mathematician?" Getting close to these answers takes a little thought about the order of learning things. A mathematician might say the first questions are premature. We will answer them later in this lesson, further down the page you are reading. But people today are impatient. Wolfram Mathematica version 14 offers a "Chat-Enabled Notebook" with a direct "tap" on the wonders of Artificial Intelligence. You can ask one of many Large Language Models your questions about that last line of the Transformation Function. Here is the answer you might get:

The last row "0 0 1" of a 3x3 transformation matrix is a part of the homogeneous coordinate representation used in affine transformations.
In homogeneous coordinates, a 2D point (x, y) is represented as (x, y, 1) and a 2D vector is represented as (x, y, 0). The reason for this is to allow for translations to be represented as matrix multiplications.

The "0 0 1" row in the transformation matrix ensures that when this matrix is multiplied with a point represented in homogeneous coordinates, the 1 in the third component of the point stays a 1. This is necessary to keep the point a point, and not a vector.

In contrast, if the transformation matrix is multiplied with a vector, the 0 in the third component of the vector stays a 0, ensuring that the vector remains a vector and is not translated. So, the "0 0 1" row is essentially a part of the mathematical machinery that allows for points and vectors to be treated differently by affine transformations, particularly translations. It doesn't directly affect the scaling, rotation, shear or translation applied to the points or vectors.

For now, patience is advised as we will revisit this later, in subsection devoted to augmented (affine) matrices.    ■

End of Example 2
Theorem 1: If an affine transformation has an inverse, then it is also an affine transformation.
Let q = A x + b be an affine transformation written for column vectors. It has an inverse only when A is nonsingular matrix, so det(A) ≠ 0. Then A x = qb. Application of inverse matrix (which exists for nonsingular matrices) to the latter, we obtain x = A−1qA−1b. `Hence, x = B q + v, where B = A−1 and v = − A−1b.
   
Example 3: Matrix \( \displaystyle \quad \mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0&0 \end{bmatrix} \) maps all points to the x-axis, so it is a projection on this axis. The area of any closed region will become zero. We have det(A) = 0, which verifies that any closed region’s area will be scaled by zero.

In general, for any given closed region, the area under an affine transformation A x + b is scaled by det(A). This result is valid for any linear mapping y = A x.

Illustrating this in Mathematica requires some careful definitions and distinguishing a transformation matrix with a zero determinant from one with a non-zero determinant. First, we define a transformation matrix, A, which has a zero determinant

A = {{1, 0}, {0, 0}}
Det[A]
\( \displaystyle \quad \begin{pmatrix} 1&0 \\ 0&0 \end{pmatrix} \)
0
Define a unit square as a list of points, not as a pair of vectors:
unitSquare = {{0, 0}, {1, 0}, {1, 1}, {0, 1}, {0, 0}}
\( \displaystyle \quad \begin{pmatrix} 0&0 \\ 1&0 \\ 1&1 \\ 0&1 \\ 0 &0 \end{pmatrix} \)
In Mathematica, this unitSquare is a graphical object. It has an area
poly1 = Polygon[unitSquare]; Labeled[Show[Graphics[{LightRed, poly1} , Axes -> True]], "Polygon with area = " <> ToString[Area[poly1]], Top]
Polygon with area = 1

Apply the transformation matrix to the unit square
transformedSquare1 = A . # & /@ unitSquare
\( \displaystyle \quad \begin{pmatrix} 0&0 \\ 1&0 \\ 1&0 \\ 0&0 \\ 0&0 \end{pmatrix} \)
Compute the area of the transformed square
Area[Polygon[transformedSquare1]]
Undefined
This transformation matrix collapses the square into a line, which makes it impossible for Mathematica's Area function to compute the area since a line has no area.

Now we use a transformation matrix, B, with a non-zero determinant.

B = {{2, 0}, {0, 1}}
\( \displaystyle \quad \begin{pmatrix} 2&0 \\ 0&1 \end{pmatrix} \)
B has a positive Determinant
Det[B]
2
Apply transformation matrix B to the unit square
transformedSquareB = B . # & /@ unitSquare
\( \displaystyle \quad \begin{pmatrix} 0&0 \\ 2&0 \\ 2&1 \\ 0&1 \\ 0&0 \end{pmatrix} \)
The unit square has an area of 1, its transform this time has an area of 2
poly2 = Polygon[transformedSquareB]; areaB = Area[poly2]
2
The entire area below is the transformed area, the darker sub-area is the original unit square.
Show[{Graphics[{Red, Opacity[.3], poly2}, Axes -> True], Graphics[{Opacity[.1], poly1} ]}]
Transformation of a square.
   ■
End of Example 3
Corollary 1: A composition of affine transformations is an affine transformation.
Let f(x) = A x + a and g(x) = B x + b be affine transformations. Then (gf)(x) = g(f)(x)) = B(A x + a) + b = (B A)x + (B a + b)
   
Example 4: We consider two affine maps: \[ f \,:\ \mathbb{A}^3 \mapsto \mathbb{A}^2 \qquad \mbox{and} \qquad g\,:\ \mathbb{A}^2 \mapsto \mathbb{A}^2 , \] defined explicitly as \[ f({\bf x}) = {\bf A}\,{\bf x} + {\bf b} = {\bf y} \quad \mbox{and} \qquad g({\bf y}) = {\bf B}\,{\bf y} + {\bf w} , \] where \[ {\bf A} = \begin{bmatrix} 1&2&3 \\ 3&2&1 \end{bmatrix}, \qquad {\bf B} = \begin{bmatrix} 1& -3 \\ 2& -2 \end{bmatrix}, \] and \[ {\bf b} = \begin{bmatrix} -3 \\ \phantom{-}2 \end{bmatrix}, \qquad {\bf w} = \begin{bmatrix} \phantom{-}1 \\ -5 \end{bmatrix} . \] If x = (x₁, x₂, x₃)T ∈ ℝ3×1, is an arbitrary column vector, then \[ {\bf y} = {\bf A}\,{\bf x} + {\bf b} = \begin{bmatrix} x_1 + 2\, x_2 + 3\, x_3 \\ 3\, x_1 + 2\, x_2 + x_3 \end{bmatrix} + \begin{bmatrix} -3 \\ \phantom{-}2 \end{bmatrix} . \]
Clear[A, B, y, b, w, x, x1, x2, x3];
A = {{1, 2, 3}, {3, 2, 1}}; b = {-3, 2};
y = A . {x1, x2, x3} + b
{-3 + x1 + 2 x2 + 3 x3, 2 + 3 x1 + 2 x2 + x3}
Then we apply transformation g and obtain \[ {\bf B} \left( {\bf A}\,{\bf x} + {\bf b} \right) + {\bf w} = {\bf B} \,{\bf A}\,{\bf x} + {\bf B} \, {\bf b} + {\bf w} = \begin{bmatrix} -8\, x_1 - 4\, x_2 \\ -4\, x_1 + 4\, x_3\end{bmatrix} + \begin{bmatrix} -8 \\ -15 \end{bmatrix} . \]
B = {{1, -3}, {2, -2}}; w = {1, -5};
Simplify[B . y + w]
{-4 (2 + 2 x1 + x2), -15 - 4 x1 + 4 x3}
We check the answer with Mathematica:
B.A
\( \displaystyle \quad \begin{pmatrix} -8 & -4 & 0 \\ -4&0&4 \end{pmatrix} \)
B . b + w
{-8, -15}
   ■
End of Example 4

The basic properties of affine transformations are summarized in the following statement.

Theorem 2: Let f(x) = A x + b be an affine transformation. Then f
  1. maps a line to a line,
  2. maps a line segment to a line segment,
  3. preserves the property of parallelism among lines and line segments
  4. maps an n-gon to an n-gon,
  5. maps a parallelogram to a parallelogram,
  6. preserves the ratio of lengths of two parallel segments, and
  7. preserves the ratio of areas of two figures.
  1. Let L be a line and let L: p + tm, t ∈ ℝ, be an equation of L in vector form. Then for every t ∈ ℝ, \[ f \left( \mathbf{p} + t\,{\bf m} \right) = \mathbf{A}\left( \mathbf{p} + t\,{\bf m} \right) + \mathbf{b} = \mathbf{p}_1 + t\,\mathbf{m}_1 , \] where p₁ = A p + b and m₁ = A m. Hence, f(L) = L₁, where L₁ : p₁ + tm₁, t ∈ ℝ, is again a line.
  2. The proof is the same as that for (1), with t restricted to [0, 1].
  3. Suppose that L: p + tm and L₁ : p₁ + tm₁, t ∈ ℝ, are parallel lines. Then m₁ = km for some k ∈ ℝ. Therefore, \begin{align*} f \left( \mathbf{p} + t\,\mathbf{m} \right) &= \mathbf{A} \left( \mathbf{p} + t\,\mathbf{m} \right) + {\bf b} = \left( \mathbf{A}\,\mathbf{p} + {\bf b} \right) + t \left( {\bf A}\,{\bf m} \right) = \mathbf{q} + t\,\mathbf{n} , \\ f \left( \mathbf{p}_1 + t\,\mathbf{m}_1 \right) &= f \left( \mathbf{p}_1 + t\,k\,\mathbf{m} \right) = \mathbf{A} \left( \mathbf{p}_1 + t\,k\,\mathbf{m} \right) + {\bf b} \\ &= \left( \mathbf{A} \, \mathbf{p}_1 + {\bf b} \right) + t \left( \mathbf{A} \,k\,{\bf m} \right) = \mathbf{p}_2 + t\,\mathbf{m}_2 . \end{align*} That is, L and L₁ are mapped to lines that are parallel.

    It is clear that for two line segments or a line and a line segment the proof is absolutely analogous.

  4. We prove this by strong induction on n. For the base case, when n = 3, consider a triangle T. Then T and its interior can be represented in vector form as T : u + sv + tw, where s, t ∈ [0, 1], s + t ≤ 1, and the vectors v and w are not collinear. Then \begin{align*} f(T) &= F \left( {\bf u} + s {\bf v} + t {\bf w} \right) = \mathbf{A} \left( {\bf u} + s {\bf v} + t {\bf w} \right) + {\bf b} \\ &= \left( {\bf A}\,{\bf u} + {\bf b} \right) + s \left( \mathbf{A}\,{\bf v} \right) + t \left( \mathbf{A}\,{\bf w} \right) \\ &= {\bf u}_1 + s{\bf v}_1 + t{\bf w}_1 , \end{align*} where s, t ∈ [0, 1], s + t ≤ 1. By part3, v₁ = Av and w₁ = Aw are not parallel. Thus, T is mapped to a triangle T₁, which completes the proof of the base case.

    Now suppose that f maps each n-gon to an n-gon for all n, 3 ≤ nk, and let P be a polygon with k + 1 sides. We know that every polygon with at least 4 sides has a diagonal contained completely in its interior. Let \( \displaystyle \quad \overline{AB} \) be such a diagonal in P. This diagonal divides P into two polygons, P₁ and P₂ containing t and k + 1 − t sides, respectively, for some t, 3 ≤ tk. By the inductive hypothesis, f(P₁) and f(P₂) will be t-sided and (k + 3 − t)-sided polygons, respectively. Since each of these polygons will have the segment from f(A) to f(B) as a diagonal, the union of P₁ and P₂ will form a polygon with k + 1 sides, which concludes the proof.

  5. The proof that a parallelogram is mapped to a parallelogram is analogous to the proof that triangles get mapped to triangles in part (4), by simply dropping the condition that s + t ≤ 1.
  6. Consider parallel line segments, S₁ and S₂, given in vector form as Si : pi + t ui, t ∈ [0, 1]. Because they are parallel, u₂ = ku₁ for some k ∈ ℝ. As |ui| is the length of Si , the ratio of lengths of S₂ and S₁ is |k|. From parts (1) and (2), Si is mapped into a segment of length |Aui|. Since Au₂ = A(ku₁) = k(A u), |Au₂| = |k| |Au₁|, which shows that the ratio of lengths of f(S₂) and f(S₁) is also |k|.
  7. You are encouraged to prove this property!
   
Example 5:

Part 1:

In order to illustrate the first property, we plot a line and compare it with transformed line. For this specific example, the transformation is a translation by the vector b, so the transformed line will be parallel to the original line and offset by the vector b. We consider Eq.(1) when \[ {\bf A} = \begin{bmatrix} 1&-2 \\ 3&-1 \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} 5 \\ 6 \end{bmatrix} . \]
Clear[p, m, A, b, p1, m1];
p = {1, 2};
m = {3, 4};
A = {{1, -2}, {3, -1}};
b = {5, 6};
p1 = A . p + b;
m1 = A . m;
Labeled[ ParametricPlot[{p + t*m, p1 + t*m1}, {t, -2, 2}, PlotStyle -> Thickness[0.016], PlotLegends -> {"Original Line", "Transformed Line"}], "Transformation of a line"]
Transformation of a line.

Part 2:

We consider affine transformation \[ \begin{bmatrix} x \\ y \end{bmatrix} \,\to\, \begin{bmatrix} 2&-1 \\ 1&1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} 5 \\ 6 \end{bmatrix} . \] Generated plot shows that this transformation maps lines into lines. Using Mathematica, we define the affine transformation and then plot the original straight line and its output.
Clear[p, m, A, b, p1, m1]; p = {1, 2}; m = {3, 4}; A = {{2, -1}, {1, 1}}; b = {5, 6}; p1 = A . p + b; m1 = A . m; Labeled[ParametricPlot[{p + t*m, p1 + t*m1}, {t, -2, 2}, PlotStyle -> Thickness[0.03], PlotLegends -> {"Original Line", "Transformed Line"}], "Transformation of a line segment"]
Transformation of a line segment.

Part 3: We generate a plot showing the original lines and their transformed versions. The transformed lines are parallel to each other, just like the original lines. First, we define the affine transformation

Clear[f, x, L1, L2]; f[x_] := {{1,2}, {3,4}} . x + {1, -1}
Now, define some arbitrary parallel lines with slope 2:
L1[t_] := {t, 2*t + 1} L2[t_] := {t, 2*t - 1}
Applying the affine transformation to the lines, we get
L1Transformed[t_] := f[L1[t]] L2Transformed[t_] := f[L2[t]]
Plot the original lines and their transformed versions
Plot[{L1[t][[2]], L2[t][[2]], L1Transformed[t][[2]], L2Transformed[t][[2]]}, {t, -2, 2}, PlotStyle -> Thickness[0.007], PlotLegends -> {"L1", "L2", "L1 Transformed", "L2 Transformed"}]
Transformation of two parallel lines.

Part 4: We have two parts. The first part plots a triangle T and its transformed triangle T1 side by side. T is mapped to T1 by the affine transformation.

Clear[u, v, w, T, u1, v1, w1, T1]; u = {0, 0}; v = {1, 0}; w = {0, 1}; T = ConvexHullMesh[{u, u + v, u + w}]; u1 = {1, 1}; v1 = {2, 0}; w1 = {0, 2}; T1 = ConvexHullMesh[{u1, u1 + v1, u1 + w1}]; GraphicsRow[{Labeled[ Show[T, Axes -> True, PlotRange -> {{-1, 2}, {-1, 2}}], "Original\ntriangle", Top], Labeled[Show[T1, Axes -> True, PlotRange -> {{-1, 3}, {-1, 3}}], "Transformed\ntriangle", Top]}]
Original image and its transformation.

The second part of item #4 generalized further by plotting an original quadrilateral (as two triangles) and the transformed quadrilateral (also as two triangles, which is no longer a quadrilateral). The original is mapped to the transformed one by the affine transformation.

Clear[z, T1, T2, z1]; z = {1, 1}; T1 = ConvexHullMesh[{u, v, z}]; T2 = ConvexHullMesh[{u, w, z}]; z1 = {2, 2}; T1Prime = ConvexHullMesh[{u1, v1, z1}]; T2Prime = ConvexHullMesh[{u1, w1, z1}]; GraphicsRow[{ Labeled[ Show[{T1, T2}, Axes -> True, PlotRange -> {{-1, 2}, {-1, 2}}], "Original\nquadrilateral", Top], Labeled[ Show[{T1Prime, T2Prime}, Axes -> True, PlotRange -> {{-1, 3}, {-1, 3}}], "Transformed\nquadrilateral", Top] }]
Square under affine map.

Part 5: Here we need to illustrate that an affine transformation maps a parallelogram into a parallelogram.

Define a parallelogram *)

originalParallelogram = Polygon[{{1, 2}, {3, 5}, {6, 4}, {4, 1}}]
Define the affine transformation
transformation = AffineTransform[{{2, 0.3}, {0.5, 1.5}}, {1,-1}]
Apply the transformation to the parallelogram
transformedParallelogram = transformation[originalParallelogram]
Visualize the original and transformed parallelograms
Graphics[{EdgeForm[{Thick, Black}], Opacity[.3], Black, originalParallelogram, LightRed, transformedParallelogram}, Axes -> True]
Affine transformation of the parallelogram

Part 6:

Here we need to illustrate that an affine transformation preserves the ratio of lengths. We can do this by defining a line segment and its transformation, then verifying the ratio of lengths before and after the transformation.

Let's consider a simple affine transformation f(x) = A x + b with \[ {\bf A} = \begin{bmatrix} 3&4 \\ -1&2 \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} 2 \\ -1 \end{bmatrix} , \] and four line segments S₃ = [1,2] of length √5 and S₂ = [2, 4], S₃ = [4, 2], S₄ = [−2, 4] of length √20, which is twice the length of S₁.

After transformation, these will become f(S₁) = [13, 2] and f(S₂) = [24, 5], f(S₃) = [22, 1], f(S₄) = [12, 9].

A = {{3, 4}, {-1, 2}}; b = {{2}, {-1}}; A . {1, 2} + b
{{13}, {2}}
A . {2, 4} + b
{{24}, {5}}
A . {4, 2} + b
{{22}, {-1}}
A . {-2, 4} + b
{{12}, {9}}
Now we calculate the ratios of lengths of transformed segments (with the aid of Mathematica)
N[Norm[A . {2, 4} + b]/Norm[A . {1, 2} + b]]
1.86386
and
N[Norm[A . {4, 2} + b]/Norm[A . {1, 2} + b]]
1.67436
N[Norm[A . {-2, 4} + b]/Norm[A . {1, 2} + b]]
1.14043
None of these ratios is 2, as you expect. Now we answer why these ratios are different. When vector v [1, 2] is subject to affine transformation, its lenth is ∥A v∥ independently of shift vector b, but not ∥A v + b∥ because shift by any vector (b in our case) does not change its length. Therefore, the lengths of affine transformation f(v) are \begin{align*} \| f(S_1 ) \| &= \| {\bf A}\,[1, 2] \| = \| [11, 3] \| = \sqrt{130} , \\ \| f(S_2 ) \| &= \| {\bf A}\,[2, 4] \| = \| [22, 6] \| = 2\,\sqrt{130} , \\ \| f(S_3 ) \| &= \| {\bf A}\,[4, 2] \| = \| [20, 0] \| = 20 , \\ \| f(S_4 ) \| &= \| {\bf A}\,[-2, 4] \| = \| [10, 10] \| = 10\,\sqrt{2} . \end{align*}

Norm[A . {1, 2}]
Sqrt[130]
Norm[A . {2, 4}]
2 Sqrt[130]
On the other hand,
Norm[A . {4, 2}]
20
Norm[A . {-2, 4}]
10 Sqrt[2]

We plot the original line segments S₁ and S₂ and the transformed line segments f(S₁) and f(S₂). The ratio of the lengths of the transformed line segments is 2, which is the same as the ratio of the lengths of the original line segments.

Clear[f, u1, S1, p1, p2, u2, S2, S1Prime, S2Prime, gr3, gr4]; f[x_] := 2*x + {1, 1}; p1 = {0, 0}; u1 = {1, 0}; S1 = Line[{p1, p1 + u1}]; p2 = {0, 1}; u2 = {2, 0}; S2 = Line[{p2, p2 + u2}]; S1Prime = Line[{f[p1], f[p1 + u1]}]; S2Prime = Line[{f[p2], f[p2 + u2]}]; gr3 = Labeled[Graphics[{Thickness[0.02], {S1, S2}}, Axes -> True, PlotRange -> {{-1, 3}, {-1, 3}}], "Original\nlines", Top]; gr4 = Labeled[ Graphics[{Thickness[0.02], {S1Prime, S2Prime}}, Axes -> True, PlotRange -> {{-1, 5}, {-1, 5}}], "Transformed\nlines", Top]; Grid[{{gr3, gr4}}]
Two parallel lines.
     
Lines upon affine map.

Part 7: This example illustrates the fact that an Affine Transformation preserves the ratios of areas of polygons.

Define the polygons (that we choose as parallelogram and a triangle):

parallelogram = Polygon[{{0, 0}, {1, 0}, {2, 1}, {1, 1}}]; triangle = Polygon[{{2, 1}, {3, 1}, {3, 2}}];
Define an affine transformation xA x + b, where \[ {\bf A} = \begin{bmatrix} -3&2 \\ \phantom{-}1&2 \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} \phantom{-}2 \\ -4 \end{bmatrix} . \tag{5.7.1} \]
m = {{-3, 2}, {1, 2}} b = {2, -4}; t = AffineTransform[{m, b}]
TransformationFunction[\( \displaystyle \quad \left( \begin{array}{cc|c} -3&2&2 \\ 1&2&-4 \\ \hline 0&0&1 \end{array} \right) \) ]
Compute the areas before the transformation
area1Before = Area[parallelogram1]; area2Before = Area[triangle]; ratioBefore = area1Before/area2Before;
Apply the transformation to the polygons. We first determine equations of lines that enclose the parallelogram. For example, the line y = 0 is transferred according to affine transformation \[ \begin{split} X &= -3\,x + 2\,y +2 , \\ Y &= x + 2\,y -4 \end{split} \] leads to the pair \[ \begin{split} X &= -3\,x +2 , \\ Y &= x -4 . \end{split} \] From the latter, we get x = Y + 4. Substitution of x into the former equation, we obtain \[ X = -3 \left( Y+4 \right) +2 \qquad \Longrightarrow \qquad X = -3\,Y + 10 . \] Similarly, we express in new coordinates the equation of lines that embrace the given parallelogram: \begin{align*} y = 0 \ & \mapsto \ Y = -\frac{1}{3}\,X - \frac{10}{3} , \\ y = x \ & \mapsto \ Y = -3\,X + 2 , \\ y = 1 \ & \mapsto \ Y = - \frac{1}{3}\, X - \frac{2}{3} , \\ y = x -1 \ & \mapsto \ Y = -3\,X + 6 . \end{align*} So the corresponding vertices are mapped as follows:
t[#] & /@ {{0, 0}, {1, 0}, {2, 1}, {1, 1}}; Thread[{{0, 0}, {1, 0}, {2, 1}, {1, 1}} -> %] // TableForm
\begin{align*} (0, 0) \ & \mapsto \ (2, -4) , \\ (1, 0) \ & \mapsto \ (-1, -3) , \\ (2, 1) \ & \mapsto \ (-2, 0) , \\ (1, 1) \ & \mapsto \ (1, -1) . \end{align*} Apply the transformation to the polygons
transformedParallelogram = GeometricTransformation[parallelogram, t]; transformedTriangle = GeometricTransformation[triangle, t];
Compute the areas after the transformation
area1After = Area[transformedParallelogram]; area2After = Area[transformedTriangle]; ratioAfter = area1After/area2After
2
Compare the ratios
ratiosEqual = ratioBefore == ratioAfter
True
Plot the original and transformed polygons
legEG5g = Labeled[GraphicsRow[ { Graphics[{Red, parallelogram, Blue, triangle}, Frame -> True, Axes -> True], Graphics[{Red, transformedParallelogram, Blue, transformedTriangle}, Frame -> True, Axes -> True] }, ImageSize -> Large ], "Transformed graphics with equal ratio of area"] (* Print the result *) Print["Ratio before transformation: ", ratioBefore]; Print["Ratio after transformation: ", ratioAfter]; Print["Are the ratios equal? ", ratiosEqual];
Parallelogram and triangle.
     
Areas after affine map.

Part 7A:

Now we extend this part by considering a half unit circle instead of triangle: \[ C = \left\{ (x, y) \in \mathbb{R}^2 \ : \ 0 \le x^2 + y^2 \le 1, \quad x\ge 0, \ y\ge 0 \right\} . \tag{5.7.2}\]

semiCir = ParametricPlot[{Cos[theta], Sin[theta]}, {theta, 0, Pi}, AxesLabel -> {"x", "y"}, PlotRange -> {-1, 1}, PlotStyle -> {Red, Thickness[0.01]}]
semiReg1 = RegionPlot[y^2 <= 1 - x^2 && y >= 0, {x, -1, 1}, {y, -1, 1}, Frame -> False]; semiReg2 = RegionPlot[ImplicitRegion[y^2 <= 1 - x^2 && y >= 0, {x, y}], AspectRatio -> Automatic]
Boundary of a unit semi-circle.
     
Area bounded by semi-circle.
The area of semi-circle of radius 1 is
Area[ImplicitRegion[y^2 <= 1 - x^2 && y >= 0, {x, y}]]
\[Pi]/2
Since the area of the parallelogram through points (0,0), (1, 0), (2, 1), (1,1)
parallelogram = Polygon[{{0, 0}, {1, 0}, {2, 1}, {1, 1}}]; area1Before = Area[parallelogram]
1
is 1, the ratio of areas of semi-circle to the parallelogram becomes \[ \frac{\mbox{area of semicircle}}{\mbox{area of parallelogram}} = \frac{\pi /2}{1} = \frac{\pi}{2} = \approx 1.5708 . \]
N[Pi/2]
1.5708
The equation of the upper boundary of the given semi-circle is \[ \partial C := \sqrt{1 - x^2} = y , \qquad x\ge 0, \ y\ge 0 . \] In order to determine the equations of the boundary of the transformed region, we invoke the affine transformation: \[ \begin{split} X &\mapsto \ -3\,x + 2\,y +2 , \\ Y &\mapsto \ x + 2\,y -4 \end{split} \] because the affine transformation is given in matrix/vector form: \[ \begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} -3&2 \\ \phantom{-}1 & 2\end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} \phantom{-}2 \\ -4 \end{bmatrix} . \] We can express old variables (x, y) through new ones (X, Y): \begin{align*} x &= \frac{1}{4} \left( 6 - X + Y \right) , \\ y &= \frac{1}{8} \left( 10 + X + 3\, Y \right) . \end{align*}
Solve[{X == -3*x + 2*y + 2, Y == x + 2*y - 4}, {x, y}]
{{x -> 1/4 (6 - X + Y), y -> 1/8 (10 + X + 3 Y)}}
We can express this inverse transformation in succinct form: \[ \begin{bmatrix} x \\ y \end{bmatrix} = \frac{1}{8} \begin{bmatrix} -2&2 \\ \phantom{-}1 &3 \end{bmatrix} \begin{bmatrix} X \\ Y \end{bmatrix} + \begin{bmatrix} 3/2 \\ 5/4 \end{bmatrix} \tag{5.7.3} \] because \[ {\bf A}^{-1} = \frac{1}{8} \begin{bmatrix} -2&2 \\ \phantom{-}1 & 3 \end{bmatrix} , \qquad \det{\bf A} = -8 . \]
Inverse[{{-3, 2}, {1, 2}}]
{{-(1/4), 1/4}, {1/8, 3/8}}
Since the boundary of the semicircle consists of two pieces \begin{align*} y &= \sqrt{1 - x^2} , \quad -1 \le x \le 1, \\ y &= 0 , \quad -1 \le x \le 1, \end{align*} we need to determine equations of these two pieces in new variables. The equation of straight line y = 0 was determined previously (as a part of parallelogram's boundary): \[ Y = - \frac{1}{3} \left( X-2 \right) -4 , \qquad 5 \ge X \ge -1 . \tag{5.7.4} \] Note that the new variable X moves in opposite direction because determinant of matrix A is negative. Equation (5.7.4) follows from affine transformations of endpoints: \begin{align*} (1, 0) &\mapsto\ (-1, -3) , \\ (-1,0) &\mapsto\ (5, -5) . \end{align*} Now we determine the equation of transformed semi-circle. We substitute into the equation of semi-circle \( \displaystyle \quad x^2 + y^2 = 1 , \quad y \ge 0 . \) instead of old variables (x, y) their expressions from Eq.(5.7.3) to obtain \[ 2 \left( 6 -X + Y \right)^2 + \left( 10 + X + 3\,Y \right)^2 = 8^2 , \qquad 5 \ge X \ge -1, \quad -4 \le Y \le -2 . \tag{5.7.5} \]

The alert student might compare the range of X and the range of Y above with the ranges of the code for the same variables immediately below. While they are different, it is of no matter as using a different ranges in the code just changes the perimeter white space in the graphic.

Labeled[ContourPlot[ 2*(6 - X + Y)^2 + (10 + X + 3*Y)^2 == 8^2, {X, -3, 7}, {Y, -4, -1}, ContourStyle -> {Red, Thickness[0.01]}, AspectRatio -> 0.6], "Transformed upper Boundary\nof a unit semi circle"]
Transformed upper Boundary of a unit semi-circle.

The figure above does not provide a true presentation because Mathematica plots it using standard ordering of independent variable. You need to flip it with respect to ordinate (vertical axis) to obtain a true picture.

Now we plot the region after given affine transformation

We improve this plot by adding arrows to show how points are transformed.

halfUnitCircle = Disk[{0, 0}, 1, {0, Pi}]; original = Graphics[{LightRed, halfUnitCircle}, Axes -> True]; transformation = AffineTransform[{{{-3, 2}, {1, 2}}, {2, -4}}]; transformed = Graphics[{Purple, Opacity[.4], GeometricTransformation[halfUnitCircle, transformation]}, Axes -> True, GridLines -> Automatic]; lowBound1 = Plot[0, {x, -1, 1}, PlotStyle -> {Thick, Red}]; lowBound2 = Plot[-(1/3) (x - 5) - 5, {x, -1, 5}, PlotStyle -> {Thick, Black}]; Legended[ Show[original, transformed, lowBound1, lowBound2, GridLines -> Automatic, Epilog -> {Arrow[{{1, 0}, {-1, -3}}], Arrow[{{0, 0}, {2, -4}}], Arrow[{{-1, 0}, {5, -5}}]}], Placed[SwatchLegend[{LightRed, Purple}, {"Original", "Transformed"}], Right]]
Transformed semi-circle.
     
Improved plot of transformed semi-circle.
Finally, we calculate the ratio of transformed areas, semi-circle to parallelogram.
(Area[GeometricTransformation[halfUnitCircle, transformation]] // N)/Area[transformedParallelogram]
1.5708
This confirms that ratio of areas remain the same upon affine transformations.    ■
End of Example 5

Affine Space (do you mean a fine space?)

Affine spaces provide a better framework for dealing with geometric object. In particular, it is possible to work with points, curves, surfaces, etc., in an intrinsic manner, that is, independently of any specific choice of a coordinate system. In affine spaces, points and their properties are frame invariant.

Recall that the Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (𝑎, b), where 𝑎 is in A and b is in B. Since our main object of interest is ℝ, the set of real numbers, its direct product ℝ² = ℝ × ℝ inherits a linear structure (we know which number is larger than the other) from field ℝ. This space provides the main historical example of the Cartesian plane in analytic geometry.

The set of all such pairs (i.e., the Cartesian product ℝ × ℝ, denoted by ℝ²) is assigned to the set of all points in the plane as well as to the set of all free vectors. All these three sets (the set of points on the plane, the set of 2-tuples, and the set of free vectors in ℝ²) are in one-to-one and onto correspondence between each other. Therefore, they traditionally are denoted by ℝ², and content specifies which of these sets is in use. One can similarly define the Cartesian product of n sets, also known as an n-fold Cartesian product, which can be represented by an n-dimensional array, where each element is an n-tuple. In Euclidean space, points and vectors are usually identified with n-tuples. In particular, a Euclidean plane contains points P(x, y) and vectors v(x, y) simultaneously because they both have the same coordinates.

In computer graphics, the main problem is to render or display a three-dimensional objects (or models) by projecting or mapping them into two-dimensional images. Then the two-dimensional data must be converted into a form that the computer can display (rasterization) and then be displayed. This requires a viewpoint or direction of projection and a viewing or projection plane. Fortunately, a monitor is just a two-dimensional array of finite number of pixels, short for picture elements.

The practical situation with rastering data in computer graphics shows that we need to distinguish points from vectors. It is important because points and vectors have some mutually exclusive properties. A point has location but no extent while a vector in ℝ³ has both direction and magnitude (norm) but its location is independent.

   
Example 6:

In order to visualize an affine plane, we consider a 2D plane ℝ² inside ℝ³. In order to separate points from vectors, we choose two planes, one for points and another one for vectors, as in the picture below. We'll call the green one the vector space V ≌ ℝ² and the blue one as the point plane A. The plane V passes through the origin since it is a vector space, but the blue plane A does not. However, the inhabited set A looks almost exactly the same as V, having the exact same, flat geometry, and in fact A and V are simply translates of one another. This plane A is a classical example of an affine space. Later you will see that any affine space is isomorphic to the affine space generated by V. You will learn in Part 3 that A is a coset of V.

Figure 3: Wrong model of affine plane from Wikipedia.
     
Figure 4: Affine plane.

The Wolfram code below produces Figure 4 above.

Clear[x, y, p, a, b, c, plt1];
Define the vectors
x = {{0, 0, 0}, {1, 0, 0}}; y = {{0, 0, 0}, {0, 1, 0}};
Define the starting point for the red arrows on the green plane
p = {0.0, 0.0, 0.0};
Define two vectors a and b on the blue plane, not parallel to the\ x, y, or z, and making 110 degrees with each other.
a = p + {Cos[0], Sin[0], 0}; b = p + {Cos[110 Degree], Sin[110 Degree], 0};
Calculate the sum of the two vectors
c = p + (a - p) + (b - p);
Define the transformation : a translation moving the vectors to the blue plane in the z - direction
transformZ = TranslationTransform[{0, 0, 1.5}];
Define the lateral displacement : a translation moving the\ vectors in the x - y plane.
transformXY = TranslationTransform[{0.5, 0.5, 0}];
Apply the transformations to the vectors
pGreen = transformXY[transformZ[p]]; aGreen = transformXY[transformZ[a]]; bGreen = transformXY[transformZ[b]];
Make cylindrical disks as point markers
pts = {{p, aGreen}, {p, bGreen}, {p, pGreen}}; shoDisks = (Map[ Graphics3D[Cylinder[{#, # + {0, 0, .04}}, .04], Axes -> True] &, pts[[#]]] & /@ {1, 2, 3})[[All, 2]];
Show graphics together
plt1 = Graphics3D[{ {Green, Opacity[.6], InfinitePlane[{{-1, 0, 0}, {1, 0, 0}, {0, 1, 0}}]}, {Blue, Opacity[.3], InfinitePlane[{{-1, 0, 1.5}, {1, 0, 1.5}, {0, 1, 1.5}}]}, {Black, Thickness[.003], Arrow[{p, a}], Arrow[{p, b}], Red, Arrow[{pGreen, aGreen}], Arrow[{pGreen, bGreen}]} , Point[{p, aGreen}], Point[{p, bGreen}], Point[{p, pGreen}] }, PlotRange -> {{-2, 2}, {-2, 2}, All}, Axes -> True, Boxed -> False, AxesLabel -> {"x", "y", "z"}, ViewPoint -> {2.529316501501312, -1.994314667132436, 1.036950839574181}, ViewVertical -> {0.16811447593810377, -0.1219840171786107, 0.9781908926855868} ]; shoAll = Show[plt1, shoDisks, ImageSize -> 500]

The left picture shows an attempt to introduce vector structure in the inhabited set A. Let T : AV be a translation of the point set to the vector space. You may try to define addition of two points as

\[ P \left( + \right) Q = T^{-1} \left( T(P) + T(Q) \right) . \]
However, the resulting vector (P(+)Q) does not belong to the inhabited set A. It is impossible to introduce a vector structure into an inhabited set of points---it is not a vector space because it has no algebraic structure. The basic idea of affine space is inherited from physics where forces (vectors) are acting on point objects to move them into another position (point again).

   ■
End of Example 6

Now we are ready to make a general definition of an affine space. We start with a succinct version according to Wikipedia for pure mathematicians.

An affine space is a geometric structure that generalizes some of the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and angles, keeping only the properties related to parallelism and the ratio of lengths for parallel line segments. Affine space is the setting for affine geometry.

Practitioners prefer to use a more elaborate definition. There are two versions of this definition using the action of vectors on points due to either "addition" of points and vectors or "subtraction" of points. We present both versions to please everyone.

In context of linear algebra, an affine space is a set of points A equipped with a set of transformations (that is bijective mappings); the translations, which form a vector space (over a given field, commonly the set of real numbers), such that for any given ordered pair of points there is a unique translation sending the first point to the second one; such translation is also called the action of a vector on a point. The composition of two translations is their sum in the vector space of the translations.

An affine space over a field 𝔽 is a triple (A, V, +), consisting of a vector space V over a field 𝔽, a set A whose elements are called points, and an external binary operation A × VA : (𝑎, v) ↦ 𝑎 + v, satisfying the following axioms:
  1. (𝑎 + v) + u = 𝑎 + (v + u) for all 𝑎 ∈ A and ∀ v, uV;
  2. 𝑎 + 0 = 𝑎 for all 𝑎 ∈ A;
  3. for any two points 𝑎, bA, there exists a unique vector xV, so that b = 𝑎 + x.

It is customary to denote points as n-tuples and vectors as column or row vectors. Then action of vector x on point 𝑎 can be written as

\[ \left( a_1 , a_2 , a_3 \right) + \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \left( a_1 + x_1 , a_2 + x_2 , a_3 + x_3 \right) . \]

Every finite dimensional linear space V has an affine structure (V, V, +) inherited by vector addition from V, see Example 8 for details. We will refer to the affine structure (V, V, +) on a vector space V as the canonical (or natural) affine structure on V. In particular, the vector space ℝn can be viewed as the affine space (ℝn, ℝn, +). We often call the triple (A, V, +) or pair (A, V) an affine space, omitting other terms and denote it as 𝔸. The mapping AA : PP + x is called a translation by vector x or action of vector x on point P.

Any vector space V has an affine space structure specified by choosing the inhabited set to be V and letting "+" be addition in the vector space V.
   
Example 7: Any finite dimensional vector space V has an affine space structure specified by choosing the inhabited set A = V and letting d be subtraction in the vector space V. We will refer to the affine structure (V, V, d) on a vector space V as the canonical affine structure on V. In particular, the vector space ℝn can be viewed as the affine space (ℝn, ℝn, d), denoted by 𝔸n. The affine space 𝔸n is called the real affine space of dimension n.

Recall that a frame in ℝ³ (we restrict ourselves with n = 3 for simplicity) is a pair consisting of the point O (called the origin) and an ordered basis ε = [e₁, e₂, e₃]. For example, the standard frame in ℝ³ has origin O = (0, 0, 0) and the basis of three vectors e₁ = (1, 0, 0) = i, e₂ = (0, 1, 0) = j, and e₃ = (0, 0, 1) = k. The position of a point P is then defined by the “unique vector” from O to P. This approach identifies point P(p₁, p₂, p₃) with corresponding vector \( \displaystyle \quad {\bf x} = \overline{OP} = \left( p_1 , p_2 , p_3 \right) . \)

Hence, in a standard frame of ℝ³, points and vectors are identically represented by triples of real numbers. However, if we choose another frame with different origin Ω = (ω₁, ω₂, ω₃), but with the same basis vectors ε, points and position vectors are no longer identified. This time, the point P = (p₁, p₂, p₃) is defined by by two position vectors: \[ OP = \left( p_1 , p_2 , p_3 \right) \] in frame (O, ε) and \[ \Omega P = \left( p_1 - \omega_1 , p_2 - \omega_2 , p_3 - \omega_3 \right) \] in frame (Ω, ε). This is because \[ OP = O\Omega + \Omega P \qquad \mbox{and} \qquad O\Omega = \left( \omega_1 , \omega_2 , \omega_3 \right) . \] In the second frame (Ω, ε), points and position vectors are no longer identical.    ■

End of Example 7

It is convenient to denote the vector vV for which Q = P + v by \( \displaystyle \quad \overline{PQ} \quad \mbox{or} \quad \vec{PQ} . \quad \mbox{or} \quad Q - P , \quad\) or just PQ. So the difference of two points is just an abbreviation of vector v such that Q = P + v. Based on this difference operation, we give an equivalent definition of affine space.

An affine space with vector space V is a nonempty set A of points and a vector valued map d : A × AV called a difference function, such that for all P, Q, RA
  1. d(P, Q) + d(Q, R) = d(P, R),     Chasles's identity;
  2. the restricted map d₁ = d{P}×A : {P} × AV defined as mapping (P, Q) ↦ d(P, Q) is a bijection. d(P, Q) is referred to as a translation vector.

The first condition (i) is just the usual “parallelogram property” of the addition of vectors. From the second condition, it follows that for every pair of points P and Q from A, there exits a unique vector vV such that P + v = Q.

Lemma 1: In an affine space (A, V, d) with difference function d we have
  1. d(P, P) = 0 for all points PA,
  2. d(P, Q) = −d(Q, P) for all points P, QA.
   
Proof to be typed shortly :)
  1. d(P, P) = 0 for all points PA,
  2. d(P, Q) = −d(Q, P) for all points P, QA.
   
Example 8: Let us consider the subset A of 𝔸³ consisting of all points (x, y, z) satisfying the equation \[ x^2 + y^2 - z = 0 . \] The set of points A is a paraboloid of revolution, with axis Oz. The surface A can be made into an official affine space by defining the action of addition of points and vectors (which is equivalent to the difference operation d) s : A × ℝ² → A of ℝ² on A defined such that for every point (x, y, x² + y²) on A and any vector v = (v, u) ∈ ℝ², \[ \left( x, y , x^2 + y^2 \right) + \begin{bmatrix} v \\ u \end{bmatrix} = \left( x + v , y + u , (x+v)^2 + (y+u)^2 \right) . \]    ■
End of Example 8

Michel Chasles
Michel Floréal Chasles (1793--1880) was a French geometer who published in 1837 the book Aperçu historique sur l'origine et le développement des méthodes en géométrie ("Historical view of the origin and development of methods in geometry"), a study of the method of reciprocal polars in projective geometry. Chasles was elected a Foreign Honorary Member of the American Academy of Arts and Sciences in 1864. In 1865 he was awarded the Copley medal.

The following identity bears Chasles's name:

\begin{equation} \label{EqAffine.3} \vec{ab} + \vec{bc} = \vec{ac} \end{equation}
for any three points 𝑎, b, c in the inhabited set A of an affine space (A, V, +).
In an affine space (A, V, d), for any three points P, Q, RA, and any real number λ addition of points and scalar multiplication is defined via \[ \begin{split} \left( P, Q \right) + \left( P, R \right) &= d^{-1} \left( d\left( P, Q \right) + d\left( P, R \right) \right) , \\ \lambda \left( P, Q \right) &= d^{-1} \left( \lambda\, d \left( \left( P, Q \right)\right) \right) . \end{split} \] This vector space is the tangent space to A at point P, denoted TP(A). For v ∈ TP(A) ≌ V, we denote P + d−1(v) as P + v.
   
Example 9: Let P₀ = (x₀, y₀, z₀) be a point on a surface S ⊂ ℝ³, and let C be any curve passing through P₀ and lying entirely in S. If the tangent lines to all such curves C at P₀ lie in the same plane, then this plane is called the tangent plane to S at P₀.

For a tangent plane to a surface S to exist at a point on that surface, it is sufficient for the function that defines the surface to be continuously differentiable at that point. If a surface is defined by a differentiable function z = f(x, y), and P₀ = (x₀, y₀, z₀) is a point on S, then the equation of the tangent plane at P₀ is given by \[ z = f(x_0 , y_0 ) + f_x (x_0 , y_0 ) \left( x- x_0 \right) + f_y (x_0 , y_0 ) \left( y- y_0 \right) . \tag{9.1} \] To see why this formula is correct, let’s first find two lines tangent to the surface S. The equation of the tangent line to the curve that is represented by the intersection of S with the vertical trace given by x = x₀ is z = f(x₀, y₀) + fy(x₀, y₀)(yy₀). Similarly, the equation of the tangent line to the curve that is represented by the intersection of S with the vertical trace given by y = y₀ is z = f(x₀, y₀) + fx(x₀, y₀)(xx₀). A parallel vector to the first tangent line is a = j + fy(x₀, y₀)k, a parallel vector to the second tangent line is b = i + fx(x₀, y₀)k. We can take the cross product of these two vectors \begin{align*} {\bf a} \times {\bf b} &= \left( {\bf j} + f_y (x_0 , y_0 )\,{\bf k} \right) \times \left( {\bf i} + f_x (x_0 , y_0 )\,{\bf k} \right) \\ &= \begin{vmatrix} {\bf i} & {\bf j} & {\bf k} \\ 0&1& f_y (x_0 , y_0 ) \\ 1&0& f_x (x_0 , y_0 ) \end{vmatrix} \\ &= f_x (x_0 , y_0 )\,{\bf i} + f_y (x_0 , y_0 )\,{\bf j} - {\bf k} . \end{align*} This vector is perpendicular to both lines and is therefore perpendicular to the tangent plane. We can use this vector as a normal vector to the tangent plane, which we denote as n = a × b/(∥a × b∥) (of unit length), along with the point P₀ = (x₀, y₀, z₀) in the equation for a plane: \begin{align*} {\bf n} \bullet \left( (x- x_0 )\,{\bf i} + (y - y_0 )\,{\bf j} + (z - f(x_0 , y_0 ))\,{\bf k} \right) &= 0 , \\ \left( f_x (x_0 , y_0 )\, {\bf i} + f_y (x_0 , y_0 )\,{\bf j} - {\bf k} \right) \bullet \left( (x- x_0 )\,{\bf i} + (y - y_0 )\,{\bf j} + (z - f(x_0 , y_0 ))\,{\bf k} \right) &= 0 , \\ f_x (x_0 , y_0 )\left( x - x_0 \right) + f_y (x_0 , y_0 ) \left( y - y_0 \right) - \left( z - f(x_0 , y_0 ) \right) &= 0. \end{align*} Solving this equation for z gives Equation (9.1).

As an example, let us choose a quadratic function f(x, y) = 3 x² − 5 xy − 7 y² + 4 x − 5 y + 1 at point (−2, 1). First, we calculate derivatives \begin{align*} f_x (x,y) &=6\,x - 5\,y + 4, \\ f_x (-2,1) &= -13 , \\ f_y (x,y) &= -5 - 5 x - 14 y , \\ f_y (-2,1) &= -9 . \end{align*} Mathematica confirms calculations above.

Clear[f, x, y]; f[x_, y_] = 3*x^2 - 5*x*y - 7*y^2 + 4*x - 5*y + 1; D[f[x, y], x];
% /. {x -> -2, y -> 1}
-13
D[f[x, y], y];
% /. {x -> -2, y -> 1}
-9
Then Eq.(9.1) becomes \[ z = 3 -13 \left( x+2 \right) -9 \left( y -1 \right) . \] Here is how Mathematica sees this. First, we define S, the surface function
Clear[f, x, y, z, fx, fy, P0, fx0, fy0]; f[x_, y_] := 3 x^2 - 5 x y - 7 y^2 + 4 x - 5 y + 1;
Then we compute partial derivatives
fx = Derivative[1, 0][f]; fy = Derivative[0, 1][f];
Defining Subscript[P, 0] requires our x and y values and the value of the function at those values (z)
P0 = {-2, 1, f[-2, 1]}
{-2, 1, 3}
Next we evaluate the derivatives at P₀.
fx0 = fx[-2, 1]
-13
fy0 = fy[-2, 1]
-9
The equation of the tangent plane is
tangentPlane[x_, y_] := P0[[3]] + fx0 (x - P0[[1]]) + fy0 (y - P0[[2]])
We are now in a position to plot. Note that, while the tangent plane is indeed tangent to the surface at P₀, it intercepts the surface in other places. So, we plot the surface and tangent plane using a smaller range for the tangent plane.
surfacePlot = Plot3D[f[x, y], {x, -4, 1}, {y, -1, 3}, AxesLabel -> {"x", "y", "z"}, PlotStyle -> {Directive[Opacity[0.7], Blue]}, PlotLegends -> {"Surface"}, MeshFunctions -> {#3 &}, Mesh -> {{3}}, PlotRange -> {{-4, 1}, {-1, 3}, {-100, 100}}, Boxed -> False, Lighting -> "Neutral", AxesOrigin -> {0, 0, 0} ];
tangentPlanePlot = Plot3D[tangentPlane[x, y], {x, P0[[1]] - 1, P0[[1]] + 1}, {y, P0[[2]] - 1, P0[[2]] + 1}, PlotStyle -> {Directive[Opacity[0.5], Red]}, Mesh -> None, PlotLegends -> {"Tangent Plane"}, PlotRange -> {{-4, 1}, {-1, 3}, {-100, 100}}, Boxed -> True, Lighting -> "Neutral", AxesOrigin -> {0, 0, 0} ];
This adds a green point to show the tangency
pointPlot = Graphics3D[{Green, PointSize[Medium], Point[P0]}]; Show[surfacePlot, tangentPlanePlot, pointPlot]
Tangent plane and surface.

The two parabolas you see in the plot are the vertical traces of the surface, which are obtained by intersecting the surface z = f(x,y) with the planes x = x₀ and y = y₀. The equations for these parabolas can be derived by substituting these values into the surface equation.

Given the surface equation:

z==f(x,y)==3x^2-5xy-7y^2+4x-5y+1
The two parabolas are:
1. Vertical trace at x = x₀ ==-2:
z = f(-2,y) == 3(-2)^2-5(-2)y-7y^2+4(-2)-5y+1
Simplifying the expression:
z==12+10y-7y^2-8-5y+1
z==-7 y^2+5y+5
So, the equation for the first parabola is:
z==-7 y^2+5y+5

2. Vertical trace at y = y₀ ==1:
z==f(x,1)==3x^2-5x(1)-7(1)^2+4x-5(1)+1
Simplifying the expression:
z==3x^2-5x-7+4x-5+1
z==3x^2-x-11
So, the equation for the second parabola is:
z==3x^2-x-11

Thus, the equations for the two parabolas are:
1. z==-7 y^2+5y+5
2. z==3x^2-x-11
These are the vertical traces of the surface at x==-2 and y==1, respectively.

   ■
End of Example 9

As the notion of parallel lines is one of the main properties that is independent of any metric, affine geometry is often considered as the study of parallel lines.

Affine Sets and Combinations

In this subsection, we consider only field ℝ of real numbers and finite dimensional vector spaces over ℝ. To motivate understanding of affine combinations, we recall the vector equations of lines and planes in ℝ³ that do not necessarily pass through the origin (see section in this chapter). You can skip the following example if you know this material.    

Example 10: Let us consider the line L in ℝ³ through the distinct points x, y ∈ ℝ³. Geometrically, we can reach each point on this line by starting at the origin, traveling to the tip of the vector x (viewed as an arrow starting at 0), then following some scalar multiple t(yx) of the direction vector yx for the line (where t ∈ ℝ). Algebraically, the line consists of all vectors x + t(yx) as t ranges over ℝ. Restating this,
\[ L = \left\{ t\,{\bf y} + \left( 1- t \right) {\bf x} \ : \ t \in \mathbb{R}\right\} . \]
In other words, L consists of all affine combinations of x and y. Recall that a linear combination of two vectors includes all points \( \displaystyle \quad t\,{\bf y} + s \,{\bf x} , \quad \) where t, s ∈ ℝ, without constraint s = 1 − t.

Similarly, consider the plane P in ℝ³ through three non-collinear points x, y, z ∈ ℝ³. As in the case of the line, we reach arbitrary points in P from the origin by going to the tip of x, then traveling within the plane some amount in the direction yx and some other amount in the direction zx. Algebraically, we have

\[ P = \left\{ {\bf x} + s \left( {\bf y} - {\bf x} \right) + t \left( {\bf z} - {\bf x} \right) \ : \ s,t \in \mathbb{R} \right\} . \]
Hence, P is the set of affine combinations of x, y, z.
Clear[orig, x, y, z]; arrPos2 = {{{0.5, 1}, {1.5, 1.5}}, {{0.5, 1}, {1.5, 0.5}}}; arrPos3 = {{{0, 0, 0}, yShifted}, {{0, 0, 0}, zShifted}}; shoArr2 = Map[Graphics[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, arrPos2]; shoArr3 = Map[Graphics3D[{Red, Thickness[0.005], Arrowheads[.04], Arrow[#]}] &, arrPos3]; disks2 = {{0.5, 1}, {1.5, 1.5}, {2, 7/4}, {1.5, 0.5}}; shoDisks2 = Map[Graphics[Disk[#, 0.04]] &, disks2]; txtPos2 = {{0.4, 1.1}, {1.5, 1.7}, {1.65, 0.5}, {0.85, 1.4}, {1.3, 0.77}, {2, 1.95}}; texts2 = {"x", "y", "z", "y - x", "z - x", "x + t(y-x)+s(z-x)"}; txt2 = MapThread[ Graphics[ Text[#1, #2, BaseStyle -> {Bold, Black, FontSize -> 24}]] &, {texts2, txtPos2}]; (* Define points in 3D space *) orig = {0, 0, 0}; x = {0.5, 1, 0}; y = {1.5, 1.5, 0}; z = {1.5, 0.5, 0}; (* Shift all points up except the origin *) shift = {0, 0, 0.5}; xShifted = x + shift; yShifted = y + shift; zShifted = z + shift; (* Create Graphics3D objects for the points and plane *) points = Graphics3D[{Red, PointSize[Medium], Point[{orig, xShifted, yShifted, zShifted}]}]; plane = Graphics3D[{Opacity[0.5], LightRed, InfinitePlane[{xShifted, yShifted, zShifted}]}]; (* Convert arrows and text to 3D *) arrPos3D = Map[Append[#, 0] + shift &, arrPos2, {2}]; shoArr3D = Map[Graphics3D[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, arrPos3D]; arr3D = Map[ Graphics3D[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, {{{0, 0, 0}, {0.5, 1, 0.5}}, {{0, 0, 0}, {2, 7/4, 0.5}}}]; disks3D = Map[Append[#, 0] + shift &, disks2]; shoDisks3D = Map[Graphics3D[Cylinder[{#, # + {0, 0, 0.04}}, 0.04]] &, disks3D]; txtPos3D = Map[Append[#, 0] + shift + .05 &, txtPos2]; txt3D = MapThread[ Graphics3D[ Text[#1, #2, BaseStyle -> {Bold, Black, FontSize -> 10}]] &, {texts2, txtPos3D}]; (* Show the graphics *) fig2 = Labeled[ Show[shoArr3, shoArr3D, arr3D, shoDisks3D, txt3D, points, plane, Axes -> True, PlotRange -> {{0, 2.5}, {0, 2.5}, {0, 1}}, AxesLabel -> {"X", "Y", "Z"}, ViewAngle -> 0.437822841617392, ViewPoint -> {1.6603694159710891`, -2.342913183640552, 1.7899528531308087`}, ViewVertical -> {0.03834356259750612, -0.04332162695936721, 0.9983251012796006}], "Figure 2:\nPlane through x, y and z"]

Clear[orig, x, y, z]; arrPos2 = {{{0.5, 1}, {1.5, 1.5}}, {{0.5, 1}, {1.5, 0.5}}}; arrPos3 = {{{0, 0, 0}, yShifted}, {{0, 0, 0}, zShifted}}; shoArr2 = Map[Graphics[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, arrPos2]; shoArr3 = Map[Graphics3D[{Red, Thickness[0.005], Arrowheads[.04], Arrow[#]}] &, arrPos3]; disks2 = {{0.5, 1}, {1.5, 1.5}, {2, 7/4}, {1.5, 0.5}}; shoDisks2 = Map[Graphics[Disk[#, 0.04]] &, disks2]; txtPos2 = {{0.4, 1.1}, {1.4, 1.6}, {1.55, 0.4}, {0.85, 1.3}, {1.3, 0.77}, {2, 1.95}}; texts2 = {"x", "y", "z", "y - x", "z - x", "x + t(y-x)+s(z-x)"}; txt2 = MapThread[ Graphics[ Text[#1, #2, BaseStyle -> {Bold, Black, FontSize -> 24}]] &, {texts2, txtPos2}]; (* Define points in 3D space *) orig = {0, 0, 0}; x = {0.5, 1, 0}; y = {1.5, 1.5, 0}; z = {1.5, 0.5, 0}; (* Shift all points up except the origin *) shift = {0, 0, 0.5}; xShifted = x + shift; yShifted = y + shift; zShifted = z + shift; (* Create Graphics3D objects for the points and plane *) points = Graphics3D[{Red, PointSize[Medium], Point[{orig, xShifted, yShifted, zShifted}]}]; plane = Graphics3D[{Opacity[0.5], LightRed, InfinitePlane[{xShifted, yShifted, zShifted}]}]; (* Convert arrows and text to 3D *) arrPos3D = Map[Append[#, 0] + shift &, arrPos2, {2}]; shoArr3D = Map[Graphics3D[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, arrPos3D]; arr3D = Map[ Graphics3D[{Blue, Thickness[0.01], Arrowheads[.08], Arrow[#]}] &, {{{0, 0, 0}, {0.5, 1, 0.5}}, {{0, 0, 0}, {2, 7/4, 0.5}}}]; disks3D = Map[Append[#, 0] + shift &, disks2]; shoDisks3D = Map[Graphics3D[Cylinder[{#, # + {0, 0, 0.04}}, 0.04]] &, disks3D]; txtPos3D = Map[Append[#, 0] + shift + .05 &, txtPos2]; txt3D = MapThread[ Graphics3D[ Text[#1, #2, BaseStyle -> {Bold, Black, FontSize -> 10}]] &, {texts2, txtPos3D}]; (* Show the graphics *) fig2 = Labeled[ Show[shoArr3, shoArr3D, arr3D, shoDisks3D, txt3D, points, plane, Axes -> True, PlotRange -> {{0, 2.5}, {0, 2.5}, {0, 1}}, AxesLabel -> {"X", "Y", "Z"}, ViewAngle -> 0.437822841617392, ViewPoint -> {-0.5776086979775906, -1.01827530434378, 3.17482024634218}, ViewVertical -> {-0.26351351953885604`, 0.22525261423023582`, 0.937988211441215}], "Figure 2:\nPlane through x, y and z"]

Figure 1: Line through x and y.
     

As a further example, we consider two points P₁ and P₂ in a two-dimensional affine space. The following expression \[ P = P_1 + t \left( P_2 - P_1 \right) \] makes sense because the difference P₁ − P₂ is a vector, and thus so is t(P₁ − P₂). Therefore, P is the sum of a point and a vector which is again a point in the inhabited set. This point P lies on the line going through two other points P₁ and P₂. Note that if 0 ≤ t ≤ 1, then P is somewhere on the line segment joining P₁ and P₂.

ar = Graphics[{Blue, Thick, Arrow[{{0, 0}, {1, 0.5}}]}]; ar2 = Graphics[{Black, Dashed, Thick, Arrow[{{1, 0.5}, {2, 1}}]}]; txt = Graphics[{Text[Style[Subscript[P, 1], Black, 20], {0, -0.15}], Text[Style[Subscript[P, 2], Black, 20], {2.1, 0.9}]}]; dot = Graphics[{{Purple, Disk[{1, 0.5}, 0.02]}, {Purple, Disk[{0, 0}, 0.02]}, {Purple, Disk[{2, 1}, 0.02]}}]; txt2 = Graphics[{Text[Style["+ t (", Black, 20], {1.3, 0.45}], Text[Style["-", Black, 20], {1.6, 0.45}], Text[Style[")", Black, 20], {1.81, 0.45}]}]; txt3 = Graphics[{Text[ Style[Subscript[P, 1], Black, 20], {1.1, 0.45}], Text[Style[Subscript[P, 2], Black, 20], {1.46, 0.45}], Text[Style[Subscript[P, 1], Black, 20], {1.74, 0.45}]}]; Show[ar, ar2, txt, dot, txt2, txt3]
Linear combination of two points.

Now we consider three points; the following figure shows a combination of these points: \[ P = \alpha_1 P_1 + \alpha_2 P_2 + \alpha_3 P_3 , \qquad \alpha_1 + \alpha_2 + \alpha_3 = 1. \]

ar = Graphics[{Blue, Thick, Arrow[{{0, 0}, {1, 0.5}}]}]; ar2 = Graphics[{Black, Dashed, Thick, Arrow[{{1, 0.5}, {2, 1}}]}]; ar3 = Graphics[{Blue, Thick, Arrow[{{1, 0.5}, {1.7, 0}}]}]; line = Graphics[{Black, Dashed, Thick, Line[{{2, 1}, {3, -0.7}, {0, 0}}]}]; dot = Graphics[{{Purple, Disk[{1, 0.5}, 0.02]}, {Purple, Disk[{0, 0}, 0.02]}, {Purple, Disk[{2, 1}, 0.02]}}]; dot2 = Graphics[{{Purple, Disk[{3, -0.7}, 0.02]}, {Red, Disk[{1.7, 0}, 0.03]}}]; txt = Graphics[{Text[Style[Subscript[P, 1], Black, 20], {0, -0.15}], Text[Style[Subscript[P, 2], Black, 20], {2.2, 1.0}], Text[Style[Subscript[P, 3], Black, 20], {3.2, -0.7}]}]; txt2 = Graphics[{Text[Style["+ t (", Black, 20], {2.0, 0.2}], Text[Style["+ s (", Black, 20], {3.0, 0.2}], Text[Style["-", Black, 20], {2.45, 0.2}], Text[Style["-", Black, 20], {3.5, 0.2}], Text[Style[")", Black, 20], {2.76, 0.2}], Text[Style[")", Black, 20], {3.86, 0.2}]}]; txt3 = Graphics[{Text[Style[Subscript[P, 1], Black, 20], {1.7, 0.2}], Text[Style[Subscript[P, 2], Black, 20], {2.27, 0.2}], Text[Style[Subscript[P, 3], Black, 20], {3.3, 0.2}], Text[Style[Subscript[P, 1], Black, 20], {2.64, 0.2}], Text[Style[Subscript[P, 1], Black, 20], {3.71, 0.2}]}]; Show[ar, ar2, ar3, line, dot, dot2, txt, txt2, txt3]
Linear combination of three points.
   ■
End of Example 10

In order to understand a fundamental concept of linear combination in affine geometry, we recommend to open the following example.    

Example 11: Let us consider the affine transformation applied to column vectors: \[ \mathbb{F}^{n\times 1} \ni \mathbf{x} \longrightarrow f({\bf x}) = {\bf A}\,{\bf x} + \mathbf{b} , \tag{11.1} \] where A is an m-by-n real matrix and b is a given m-column vector. We check whether this (arbitrary) affine transformation transfers affine combinations into another affine combination.

For arbitrary two real scalars α, β ∈ ℝ, we have \[ f \left( \alpha {\bf x} + \beta {\bf y} \right) = \alpha\,{\bf A}\,{\bf x} + \beta \,{\bf A}\,{\bf y} + \mathbf{b} , \] for arbitrary column vectors x and y. If we want this affine transformation to preserve linear combination, we need to satisfy the identity \[ f \left( \alpha {\bf x} + \beta {\bf y} \right) = \alpha\,f \left( {\bf x} \right) + \beta\,f \left( {\bf y} \right) \] or \[ \alpha\,{\bf A}\,{\bf x} + \beta \,{\bf A}\,{\bf y} + \mathbf{b} = \alpha \left( {\bf A}\,{\bf x} \right) + \beta \left( {\bf A}\,{\bf y} \right) + \left( \alpha + \beta\right) \mathbf{b} . \] The latter identity is true only when \[ \alpha + \beta = 1. \tag{11.2} \] Therefore, we conclude that the affine transformation (11.1) preserves linear combination of two points only when weights α and β satisfy the condition (11.2).

It is straight forward to verify that a linear combination of n column vectors x₁, x₂, … , xn with weights c₁, c₂, … , cn is preserved by affine transformation (11.1) only when these weights satisfy the condition \[ \sum_{i=0}^n c_i = 1 . \]    ■

End of Example 11

Consider a system of n+1 particles, located at x₀, x₁, … , xn and with masses w₀, w₁, … , wn. It is then well-known from physics that the center of mass or barycentre of this particle system is the unique point x which satisfies

\[ \sum_{i=0}^n w_i \left( x - x_i \right) = 0 , \]
that is,
\[ x = \frac{\sum_{i=0}^n w_i x_i}{\sum_{i=0}^n w_i} . \]
Clear[x, n, w, i];
\( \displaystyle \quad \mbox{Solve}\left[ \sum_{i=0}^n w_i * \left( x - x_i \right) == 0, \ x \right] \)

Sum[w[i]*x[i], {i, 0, n}]/Sum[w[i], {i, 0, n}]

This physical situation can be reformulated mathematically as follows. For a given fixed set of distinct locations or nodes x₀, x₁, … , xn and an arbitrary point x, does there exist some masses or weights w₀, w₁, … , wn, such that x is the barycentre \( \displaystyle \quad x = \sum_j w_j x_j , \qquad \sum_j w_j = 1 , \quad \) of the corresponding particle system?

Here is Wolfram code which answers that question and animates two bodies with the same masses.

First we define the initial positions of the two bodies

x0 = {0, 0}; x1 = {2, 0};
Since the masses are equal we can just make them 1
w0 = 1; w1 = 1;
Calculate the barycenter
barycenter = (w0*x0 + w1*x1)/(w0 + w1);
Define the time-dependent positions of the two bodies and animate
body1[t_] := barycenter + {Cos[t], Sin[t]}; body2[t_] := barycenter + {-Cos[t], -Sin[t]}; animation = Animate[ Graphics[{ Red, Disk[body1[t], 0.1], Blue, Disk[body2[t], 0.1], Black, PointSize[Large], Point[barycenter] }, PlotRange -> {{-3, 3}, {-3, 3}}, Axes -> True, AxesLabel -> {"x", "y"}], {t, 0, 2 Pi}, AnimationRate -> 0.1 ]
Similar code can produce behavior for two bodies with unequal masses

Two bodies with the same masses.
Two bodies with similar masses.
     
Two bodies with distinct masses
     

A. Möbius was probably the first to answer this question in full generality. He showed that for particle systems in ℝm such weights always exist for any x ∈ ℝm, as long as the number of particles is greater than the dimension, that is, for nm. Möbius called the weights w₀(x), w₁(x), … , wn(x) the barycentric coordinates of x with respect to nodes x₀, x₁, … , xn. It is clear that barycentric coordinates are homogeneous in the sense that they can be multiplied with a common non-zero scalar and still satisfy

\begin{equation} \label{EqAffine.4} x = \frac{\sum_{i=0}^n w_i (x)\,x_i}{\sum_{i=0}^n w_i (x)} . \end{equation}

We can generalize this to define an affine combination of an arbitrary number of points. If P₁, P₂, … ,Pn are points and w₁, w₂, … ,wn are scalars such that w₁ + w₂+ ⋯ + wn = 1, then

\[ w_1 P_1 + w_2 P_2 + \cdots + w_n P_n \]
is defined to be the point in the inhabited set because
\[ P = P_1 + w_2 \left( P_2 - P_1 \right) + \cdots + w_n \left( P_n - P_1 \right) , \quad w_1 = 1 - w_2 - \cdots - w_n . \]
This equation is meaningful, as P₂ − P₁ is a vector, and thus so is its multiple w₂(P₂ − P₁).
Lemma 2: Given an affine space A, let {𝑎i}i∈I be a family of points in A, and let {λi}i∈I be a family of scalars. For any two points 𝑎, bA, the following properties hold:
  1. If \( \displaystyle \quad \sum_{i\in I} \lambda_i = 1 , \quad \) then \[ a + \sum_{i\in I} \lambda_i \,\vec{a\,a_i} = b + \sum_{i\in I} \lambda_i \,\vec{b\,a_i} . \]
  2. If \( \displaystyle \quad \sum_{i\in I} \lambda_i = 0 , \quad \) then \[ \sum_{i\in I} \lambda_i \,\vec{a\,a_i} = \sum_{i\in I} \lambda_i \,\vec{b\,a_i} . \]
   
(1) By Chasles’s identity, we have \begin{align*} a + \sum_{i\in I} \lambda_i \,\vec{a\,a_i} &= a + \sum_{i\in I} \lambda_i \left( \vec{ab} + \vec{b\,a_i} \right) \\ &= a + \left( \sum_{i\in I} \lambda_i \right) \vec{ab} + \sum_{i\in I} \lambda_i \, \vec{b\, a_i} \\ &= a + \vec{a\,b} + \sum_{i\in I} \lambda_i \, \vec{b\, a_i} \end{align*} because \( \displaystyle \quad \sum_{i\in I} \lambda_i = 1 . \quad \)

(2) We also have \begin{align*} \sum_{i\in I} \lambda_i \,\vec{a\,a_i} &= \sum_{i\in I} \lambda_i \left( \vec{ab} + \vec{b\,a_i} \right) \\ &=\left( \sum_{i\in I} \lambda_i \right) \vec{ab} + \sum_{i\in I} \lambda_i \, \vec{b\, a_i} \\ &= \sum_{i\in I} \lambda_i \, \vec{b\, a_i} . \end{align*}

   
Example 12: Let us consider three points in two-dimensional space ℝ² with respect to some frame: \[ a (0, 0), \qquad b(2, 2), \qquad c (3, 1) . \] Of course, the expressions for points are not written accurately (from mathematician prospective), but lazy people like me use this informal notation. These points should be written as \[ P_1 = O + a , \qquad P_2 = O + b , \qquad P_3 = O + c . \] Using these three points, we determine three other points \[ p_1 = \frac{1}{4}\, a + \frac{1}{4}\, b + \frac{1}{2}\, c , \quad p_2 = \frac{1}{3}\, a + \frac{1}{3}\, b + \frac{1}{3}\, c , \quad p_3 = a - b + c . \] Now we explore Mathematica. First, we define points.
Clear[a, b, c, p1, p2, p3]; a = {0, 0}; b = {2, 2}; c = {3, 1};
Create triangle from these points that are marked now blue,
ptsTri = Graphics[{Blue, PointSize -> Medium, Point[#]}] & /@ {a, b, c}; gphBez = Graphics[{Opacity[.2], Triangle[{a, b, c}]}, Axes -> True];
Compute new, large, red points,
p1 = (a/4) + (b/4) + (c/2) p2 = (a/3) + (b/3) + (c/3) p3 = a - b + c
(2, 1}
{5/3, 1}
{1, -1}

Note that new p1 is inside the triangle created from the original three points.

pt1Bez = Graphics[{Red, PointSize -> Large, Point[(a/4) + (b/4) + (c/2)]}]; Show[gphBez, pt1Bez, ptsTri, GridLines -> {{2}, {.75}}, Ticks -> {{2}, {{.75, "\!\(\*FractionBox[\(3\), \(4\)]\)"}}}]
Point p1 is inside the triangle.
Note that for p1 we divide b by 4, so we have b/4. Now we modify this point p1 by introducing a coefficient (0,1) multiplier, .5 − t. Then upon changing t, we obtain new barycenters depending on the value of t. Because variable names are re-used below the following animation later in the notebook, the animation will change upon evaluation of later cells.
animP1 = Module[{pt1Anim, a, b, c}, a = {0, 0}; b = {2, 2}; c = {3, 1}; Animate[ pt1Anim = Graphics[{Red, PointSize -> Large, Point[(a/4) + ((.5 - t)*b) + (c/2)]}]; Show[gphBez, pt1Anim, ptsTri, GridLines -> {{2}, {1}}, Ticks -> {{2}, {1}}], {{t, .25, "t-range"}, 0, .5, .05, Appearance -> "Labeled"}, AnimationDirection -> ForwardBackward, AnimationRate -> .1 ]]
Animation of point p1 inside the triangle.

Point p2 is also within the triangular outline of the original points.

pt2Bez = Graphics[{Red, PointSize -> Large, Point[p2]}]; edges = {{p2, a}, {p2, b}, {p2, c}}; graph = Graph[UndirectedEdge @@@ edges, EdgeStyle -> Dashed, VertexCoordinates -> {p2 -> p2, a -> a, b -> b, c -> c}, VertexStyle -> Directive[Red, PointSize[Large], PointShape -> "Circle"] ];
Show[gphBez, pt2Bez, ptsTri, graph]
Point p2 is inside the triangle.

Note that point p3 does not fall inside the original triangle.

pt3Bez = Graphics[{Red, PointSize -> Large, Point[p3]}]; Show[gphBez, pt3Bez, ptsTri]
Similar to point p1, we make animation by continuously changing position of point p3.
animP3 = With[{a = {0, 0}, b = {2, 2}, c = {3, 1}}, Animate[ pt3Anim = Graphics[{Red, PointSize -> Large, Point[a - (t*b)/4 + t*c]}]; Show[gphBez, pt3Anim, ptsTri, PlotRange -> {{0, 4}, {0, 2}}], {{t, 0, "t-range"}, 0, 1.5, .05, Appearance -> "Labeled"}, AnimationDirection -> ForwardBackward, AnimationRate -> .1 ] ]
Point p3 is outside the triangle.
   ■
End of Example 12

Generally speaking, a sum P + Q of two points in the inhabited set A is meaningless (except when A = V is a vector space itself). By Lemma 2, for any family of points (𝑎i)i∈I in the inhabited set A, for any family (λi)i∈I of scalars such that \( \displaystyle \quad \sum_{i \in I} \lambda_i = 1 , \quad \) the point

\[ x = a + \sum_{i\in I} \lambda_i \,\vec{a\, a_i} \]
is independent of the choice of the origin 𝑎 ∈ A. In this form, the values (λ₀, λ₁, …., λn) are called the barycentric coordinates of x relative to the points 𝑎₀, 𝑎₁, … , 𝑎n. The restriction on weight coefficients to be \( \quad \sum_{i \in I} \lambda_i = 1 \quad \) makes definition of affine combination frame free. Note that the notion of linear combination of vectors in a vector space is basis independent. This property motivates the following definition.

For any family of points (𝑎i)i∈I in the inhabited set A, for any family (λi)i∈I of scalars such that \( \displaystyle \quad \sum_{i \in I} \lambda_i = 1 , \quad \) and for any point 𝑎 ∈ A \[ a + \sum_{i\in I} \lambda_i \,\vec{a\, a_i} \] (which is independent of 𝑎 ∈ A, by Lemma 2) is called the barycenter (or barycentric combination, or affine combination) of the points (𝑎i)i∈I assigned the weights λi, and it is denoted by \[ \sum_{i\in I} \lambda_i \, a_i \qquad \left( \sum_{i \in I} \lambda_i = 1 \right) . \]

This allows us to make the following observation.

A sequence 𝑎₀, 𝑎₁, … , 𝑎n of n+1 points in n dimensional space is affinely independent if and only if each point xA (= ℝn) can be written uniquely as an affine combination of them, i.e., \[ x = \sum_j \lambda_j (x)\, a_j , \qquad \sum_{0 \le j \le n} \lambda_j (x) = 1 . \] The functions λj, so defined, are called barycentric coordinates of point x.

We can use linear subspaces of V to build more examples of affine sets in V. Let W be a fixed linear subspace of V. Since W is closed under all linear combinations of its elements, which include affine combinations as a special case, W is an affine subset of V. For any vector uV, its translation u + W = { u + w : wW } by subspace W is an affine set.

Theorem 3: For every nonempty affine set X in a finite dimensional vector space V, there exists a unique linear subspace W in V, known as the direction subspace, such that X = u + W = { u + w : wW} for some (possibly not unique) uV.
You will learn in Part 3 (Quotient spaces) that the set u + W is called the coset    
Fix a nonempty affine set X in V, and fix uX. Define W = −u + X. On one hand, W is an affine set, being a translate of the affine set X. On the other hand, uX implies 0 = −u + uW. By the result preceding the theorem, W is a linear subspace, and evidently u + W = u + (−u + X) = X. We now prove uniqueness of the linear subspace W. Say X = u + W = v + W₁ for some u, vV and linear subspaces W and W₁. Since 0W and 0W₁, u and v belong to X. Then v = u + w for some wW, hence W₁ = −v + X = (−v + u) + W = −w + W = W. (The last step uses the fact that x + W = W for all x in a subspace W.)
   
Example 13: This example demonstrates that an arbitrary polynomial curve can be defined as a set of barycenters of a fixed number of points. For example, let (𝑎, b, c, d) be a sequence of points in 𝔸². Observe that \[ \left( 1- t \right)^3 + 3t \left( 1 - t \right)^2 + 3t^2 \left( 1 - t \right) + t^3 = 1 \] because the sum on the left-hand side is obtained by expanding \[ \left( 1 + (1-t) \right)^3 = \sum_{i=0}^3 \binom{3}{i} t^i \left( 1 - t \right)^{3-i} = 1 \] using the binomial formula. Thus, \[ \left( 1- t \right)^3 a + 3t \left( 1 - t \right)^2 b + 3t^2 \left( 1 - t \right) c + t^3 d \] is a well-defined affine combination. Then, we can define the curve F : 𝔸 → 𝔸² such that \[ F(t) = \left( 1- t \right)^3 a + 3t \left( 1 - t \right)^2 b + 3t^2 \left( 1 - t \right) c + t^3 d \] Such a curve is called a cubic Bézier curve, and (𝑎, b, c, d) are called its control points. The mathematical basis for Bézier curves—the Bernstein polynomials—was established in 1912, but the polynomials were not applied to graphics until some 50 years later when mathematician Paul de Casteljau (1930--2022) in 1959 developed de Casteljau's algorithm, a numerically stable method for evaluating the curves, and became the first to apply them to computer-aided design at French automaker Citroën.

Note that the curve passes through 𝑎 and d, but generally not through b and c. For example, a Bézier curve can be used to specify the velocity over time of an object such as a cursor moving from A to B, rather than simply moving at a fixed number of pixels per step. The Bézier curve can be constructed using the de Casteljau algorithm. Although the algorithm is slower for most architectures when compared with the direct approach, it is more numerically stable. Wolfram implemented de Casteljau's algorithm in a special build-in command: BezierCurve. Here is an illustration from the Mathematica Documentation. Use your cursor to drag the points about to see how it operates.

Manipulate[ Graphics[{BezierCurve[pts, SplineDegree -> d], Dashed, Green, Line[pts]}, PlotRange -> 5, Frame -> True], {{pts, {{-3, 0}, {-1, 3}, {1, -3}, {3, 0}}}, Locator, LocatorAutoCreate -> True}, {{d, 3, "degree"}, 2, 6, 1, Appearance -> "Labeled"}]
Bézier approximation of sine function.
   ■
End of Example 13

Since the direction subspace of an affine set X is uniquely determined by X, we can define the affine dimension of X (written dim(X)) to be the vector-space dimension of its direction subspace. For instance, points, affine lines, affine planes, and affine hyperplanes have respective affine dimensions 0, 1, 2, and n − 1. The affine dimension of ∅ is undefined.

Theorem 4: A subset of 𝔽n is an affine set if and only if it is the solution set to a system of linear equations over 𝔽.
If S ⊆ 𝔽n is an affine set, it has the form S = b + W, where W is a subspace of 𝔽n and b is a vector in 𝔽n. Given such a set, let W₁ be a direct complement to W in 𝔽n. Note that we can assume b is in W₁.

Let A be the standard matrix representation of the projection onto W₁ along W. For w₁ in W₁ and w in W, we have \[ {\bf A} \left( {\bf w}_1 + {\bf w} \right) = {\bf w}_1 . \] In particular, NullSpace(A) = W and A x = b. Lemma 2 (the general solution of A x = b is the sum of a particular solution and the general solution of the homogeneous equation) in section thus ensures that S is the solution set to A x = b. This completes the proof in one direction.

The same Lemma 2 gives us the proof in the other direction.

   
Example 14: Let us now consider data f₀, f₁, … , fn corresponding to the nodes x₀, x₁, … , xn and possibly sampled from some function f : ℝm → ℝ, that is, fi = f(xi) for i = 0, 1 … , n. The barycentric interpolant of this data is then given by \[ F(x) = \sum_{0 \le i \le n} b_i (x)\, f_i . \tag{14.1} \] The function F : ℝm → ℝ interpolates the data fi at xi for i = 0, 1 , … , n, and we require that this interpolation should be exact for linear functions: \[ \sum_{0 \le i \le n} b_i (x) = 1 \qquad\mbox{and} \qquad \sum_{0 \le i \le n} b_i (x)\, x_i = x . \] This means that when data fi at xi are sampled from a linear polynomial f, the interpolation (14.1) is exact: F(xi) = f(xi) for i = 0, 1 , … , n.

In particular, if we have two nodes x₀, x₁, the two functions b₀, b₁ : ℝ → ℝ are \[ b_0 (x) = \frac{x_1 - x}{x_1 - x_0} , \qquad b_1 (x) = \frac{x - x_0}{x_1 - x_0} \] These functions form a barycentric basis with respect to x₀ and x₁ that lead to approximation \[ F_1 (x) = \frac{x_1 - x}{x_1 - x_0} \, f\left( x_0 \right) + \frac{x - x_0}{x_1 - x_0} \, f \left( x_1 \right) . \tag{14.2} \] In general, we have \[ F_n (x) = \frac{\sum_{i=0}^n \frac{(-1)^i}{x - x_i} \, f\left( x_i \right)}{\sum_{i=0}^n \frac{(-1)^i}{x - x_i}} . \tag{14.3} \]

As a numerical example, we consider the sine function f(x) = sin(x) and two points x₀ = π/6 and x₁ = 3π/4. The corresponding two points barycentric interpolation reads as \[ F_1 (x) = \frac{3\pi /4 -x}{7\pi /12} \, \frac{1}{2} + \frac{x - \pi /6}{7\pi /12} \, \frac{1}{\sqrt{2}} . \]

Now we turn our attention to three point-interpolation. We again consider sine function and choose three points: x₀ = π/6, x₁ = π/4, and x₂ = 3π/4. So we use formula (14.3) and ask Mathematica for help.

Clear[x0, x1, x2, den, F3]; x0 = Pi/6; x1 = Pi/4; x2 = 3*Pi/4; den = Simplify[1/(x - x0) - 1/(x - x1) + 1/(x - x2)]; F3[x_] = Simplify[(Sin[x0]/(x - x0) - Sin[x1]/(x - x1) + Sin[x2]/(x - x2))/ den ]
\[ F_2 (x) = \frac{\left( 9- 4\sqrt{2} \right) \pi^2 + 24 \left( -2 + \sqrt{2} \right) \pi x + 48 x^2}{10 \pi^2 -48 \pi x + 96 x^2} . \] For Mathematica, using the equations (14.3), we begin by defining the nodes.
Clear[nodes2, nodes3, values2, values3, x, x0, x1, F1, f0, f1]; nodes2 = {\[Pi]/6, 3*\[Pi]/4}; nodes3 = {\[Pi]/6, \[Pi]/4, 3*\[Pi]/4};
Next, we define the function values at the nodes
values2 = Sin /@ nodes2; values3 = Sin /@ nodes3;
Create functions to compute Barycentric weights for the two nodes
b0[x_, x0_, x1_] := (x1 - x)/(x1 - x0); b1[x_, x0_, x1_] := (x - x0)/(x1 - x0);
Define the interpolant for the two nodes.
F1[x_, x0_, x1_, f0_, f1_] := b0[x, x0, x1] f0 + b1[x, x0, x1] f1;
Create the interpolant function for two nodes
interpolant2[x_] := F1[x, nodes2[[1]], nodes2[[2]], values2[[1]], values2[[2]]];

Create module that produces the general barycentric interpolation for three nodes.

barycentricWeights[x_, nodes_] := Table[(-1)^i/(x - nodes[[i]]), {i, Length[nodes]}]; F3[x_, nodes_, values_] := Module[{weights, numerator, denominator}, If[MemberQ[nodes, x], (* If x is one of the nodes, return the corresponding function value *) values[[First[FirstPosition[nodes, x]]]], (* Otherwise, compute the barycentric interpolation *) weights = barycentricWeights[x, nodes]; numerator = Total[weights values]; denominator = Total[weights]; numerator/denominator ] ];
Define and display the interpolant function for three nodes. However, Mathematica does some simplification.
interpolant3[x_] := F3[x, nodes3, values3]; TableForm[{interpolant2[x], interpolant3[x]}, TableHeadings -> {{"2 nodes", "3 nodes"}, None}]
Out[ ] // TableForm =
2 nodes         \( \displaystyle \quad \frac{6 \left( \frac{3\pi}{4} - x \right)}{7\pi} + \frac{6 \sqrt{2} \left( -\frac{\pi}{6} +x \right)}{7\pi} \)
                       
3 nodes         \( \displaystyle \quad \frac{-\frac{1}{\sqrt{2} \left( -\frac{3\pi}{4} + x \right)} + \frac{1}{\sqrt{2} \left( -\frac{\pi}{4} +x \right)} - \frac{1}{2 \left( -\frac{\pi}{6} +x \right)}}{- \frac{1}{-\frac{3\pi}{4} +x} + \frac{1}{-\frac{\pi}{4} +x} - \frac{1}{-\frac{\pi}{6} +x}} \)

We check with Mathematica:

TrueQ[F3[x] == FullSimplify[interpolant3[x]]]
True
   ■
End of Example 14

Affine Mapping

    Previously (see opening subsection of this web page) we defined affine transformation as \( \displaystyle \quad \mathbb{F}^{n\times 1} \ni \mathbf{x} \longrightarrow f({\bf x}) = {\bf A}\,{\bf x} + \mathbf{b} \quad \) for some column vector b ∈ 𝔽m×1 or \( \displaystyle \quad \mathbb{F}^{1\times m} \ni \mathbf{v} \longrightarrow f({\bf v}) = {\bf v}\,{\bf A} + \mathbf{w} \quad \) for row vectors. Its generalization for arbitrary affine spaces is due to A. Grothendieck (1928--2014), who realized that an affine map should consist of two parts: one transformation for inhabited sets and another one for corresponding vector spaces.
Let 𝔸 = (A, V, +) and 𝔹 = (B, U, +) be two affine spaces over the same field 𝔽. The pair (f, Df), where f : AB and Df : VU, satisfying the following conditions:
  1. Df is a linear mapping from V into U;
  2. for any two points P, Q from the inhabited set A, \[ f(Q) - f(P) = D\,f(Q - P) \in U . \]
is called an affine mapping of the first space into the second space.

Df or D(f) is the linear part of the affine mapping f. If affine transformation is given by \( \displaystyle \quad \mathbf{v} \longrightarrow f({\bf v}) = {\bf v}\,{\bf A} + \mathbf{w} ,\quad \) then Df is just matrix A. Since QP runs through all vectors in V when Q, PA, , the linear part Df is defined with respect to f uniquely. This makes it possible to denote affine mappings simply as f : 𝔸 ↣ 𝔹.

Formally, an affine map is a function (consisting of two parts) from one affine space to another (which may be, and in fact usually is, the same space) that preserves affine combinations.

   
Example 15:
  1. Any linear transformation T : VU induces an affine mapping of the spaces (V, V, +) ↣ (U, U, +). For it, Df = f.
  2. Transformation f : 𝔽n ↣ 𝔽n is an affine map for f(x) = A x + b because Df = A.
  3. Any translation tx : AA, where tx(P) = P + x, is affine, and D(tx) = idV because \[ t_{\bf x} (P) - t_{\bf x} (Q) = \left( P + {\bf x} \right) - \left( Q + {\bf x} \right) = P - Q . \]
  4. If f : 𝔸 ⇾ 𝔹 is an affine mapping and yU, then the mapping txf : 𝔸 ⇾ 𝔹 is affine and D(txf) = D(f). Indeed, \begin{align*} t_{\bf x} \circ f (P) - t_{\bf x} \circ f (Q) &= \left( t_{\bf x} (P) + {\bf x} \right) - \left( t_{\bf x} (Q) + {\bf x} \right) \\ &= f(P) - f(Q) = Df(P-Q) . \end{align*}
  5. An affine function f : 𝔸 ↣ 𝔽 is defined as an affine mapping of 𝔸 into (𝔽, 𝔽, +). Thus, f assumes values in 𝔽, while Df is a linear functional on V. Any constant function f is affine: Df = 0.
  6. The identity mapping id : 𝔸 ↣ 𝔸 is an affine mapping. Indeed, PQ = idV(PQ). In particular, D(idA) = idV.
   ■
End of Example 15

It turns out that there is an alternative definition (which is of course equivalent to the previously used one) of affine mapping.

Given two affine spaces 𝔸 = (A, V, +) and 𝔹 = (B, U, +) under the same field of scalars. A function F : 𝔸 ↣ 𝔹 is an affine map if and only if for every family of points (𝑎i)i ∈ I and weights (λi)i ∈ I such that ∑i∈I λi = 1, we have \[ F \left( \sum_{i\in I} \lambda_i a_i \right) = \sum_{i\in I} \lambda_i \, F \left( a_i \right) . \] In other words, F preserves barycenters.
Theorem 5: Let f : 𝔸 ↣ 𝔹 be an affine map between two afine spaces 𝔸 = (A, V, +) and 𝔹 = (B, U, +) over the same field. Then there is a unique linear transformation Df : VU such that \[ f(a + {\bf x}) = f(a) + Df(\mathbf{x}) \] for every 𝑎 ∈ A and every xV.
The proof of this theorem is taken from the website Basics of Affine Geometry, Lemma 2.7.2, page 28. This is a part of the book Geometric Methods and Applications

by J. Gallier

We try to verify every step with Mathematica.

Let 𝑎 ∈ A be any point in the inhabited set. We claim that the map defined according t the identity \[ Df(\mathbf{x}) = f(a)\,f(a + \mathbf{x}) \quad \iff \quad f(a + \mathbf{x}) = f(a) + Df(\mathbf{x}) \] is a linear transformation Df : VU for every xV independently of point 𝑎 ∈ A. Indeed, we can write \[ a + \lambda \mathbf{x} = \lambda \left( a + \mathbf{x} \right) + \left( 1 - \lambda \right) a \] because 𝑎 + λx = 𝑎 + λ𝑎(𝑎 + x) + (1 − λ)𝑎𝑎, and also \[ a + \mathbf{x} + \mathbf{y} = \left( a + \mathbf{x} \right) + \left( a + \mathbf{y} \right) - a \]
Clear[f, a, v, \[Lambda]]; f[pt_] := pt + {1, 1} (* Example affine transformation *)
since 𝑎 + x + y = 𝑎 + 𝑎 (𝑎 + x) + 𝑎 (𝑎 + y) −𝑎𝑎. We also know that i>f preserves barycenters, so \[ f\left( a + \lambda\mathbf{y} \right) = \lambda\, f(a + {\bf y}) + \left( 1 -\lambda \right) f(a) . \] Define the linear map Df:
Df[v_] := f[a+v] - f[a]
If we recall that \( \displaystyle \quad \sum_{i \in I} \lambda_i a_i \quad \) is the barycenter of a family points (𝑎i)i ∈ I and weights (λi)i ∈ I with \( \displaystyle \quad \sum_{i \in I} \lambda_i = 1 \quad \) iff \[ b\,\mathbf{x} = \sum_j \lambda_j \,k{\bf a}_j \qquad \forall b \in V , \] we get \[ f(a)\,f(a+ \lambda{\bf v}) = \lambda\,f(a)\,f(a + {\bf v}) + \left( 1 - \lambda \right) f(a)\,f(a) = \lambda\, f(a)\,f(a+ \lambda{\bf v}) , \] showing that \[ Df(\lambda\mathbf{v}) = \lambda\,Df(\mathbf{v}) . \] We also have \[ f(a+ \mathbf{u} + \mathbf{v}) = f(a+ \mathbf{u} ) + f(a+ \mathbf{v} ) -f(a) , \] from which we get \[ f(a)\,f(a + \mathbf{u} + \mathbf{v}) = f(a)\,f(a + \mathbf{u} ) + f(a)\,f(a + \mathbf{v} ) , \] showing that \[ Df(\mathbf{u} + \mathbf{v}) = Df(\mathbf{u}) + Df(\mathbf{v}) . \] Consequently, Df is a linear map. For any other point bA, since \[ b + \mathbf{v} = a + a{\bf b} + \mathbf{v} = a + a \left( a + \mathbf{v} \right) - aa + a{\bf b} . \] So b + u = (a + v) a + b, and since f _reserves barycenters, we get \[ f(b + \mathbf{v}) = f(a + \mathbf{v}) - f(a) + f(b) , \] which implies that \begin{align*} f(b)\,f(b+ {\bf v}) &= f(b)\,f(a + {\bf v}) - f(b)\, f(a) + f(b)\,f(b) \\ &= f(a)\,f(b) + f(b)\, f(a + {\bf v}) \\ &= f(a)\,f(a+ {\bf v}) . \end{align*} Thus, f(b) f(b + v) = f(𝑎) f(𝑎 + v), which shows that the definition of Df does not depend on the choice of 𝑎 ∈ A. The fact that Df is unique is obvious: We must have Df(v) = f(𝑎) f(𝑎 + v).
   

Example 16: Let us consider the affine transformation S : ℝ³ ↦ ℝ4 given by S(x) = A x + b, where \[ \mathbf{A} = \begin{bmatrix} -2& 3& 5 \\ 3& 7& -1 \\ 5& 27& 7 \\ -9& 2& 16 \end{bmatrix}, \qquad {\bf b} = \begin{bmatrix} 3 \\ -6 \\ -12 \\ 15 \end{bmatrix} . \] First, we find solution A u = −b, so S(u) = 0, with the aid of Mathematica.

A = {{-2, 3, 5}, { 3, 7, -1}, {5, 27, 7}, {-9, 2, 16}};
b = {{3}, {-6}, {-12}, {15}};
RowReduce[Join[A, b, 2]] // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1& 0& -\frac{38}{23}& -\frac{39}{23} \\ 0& 1& \frac{13}{23}& -\frac{3}{23} \\ 0& 0& 0& 0 \\ 0& 0& 0& 0 \end{pmatrix} \)
From solution given above, we can reconstruct the solution as a line given parametrically \[ L = \left\{ \left( x, y, z \right) \ : \ x = -\frac{38}{23}\, t -\frac{39}{23} , \quad y = \frac{13}{23}\, t - \frac{3}{23} , \quad z = t , \quad \forall t \in \mathbb{R} \right\} . \] We can decompose L into a sum of two components: the first is the line L₀, which passes through origin; the second is a translation by a particular vector vp. To find the particular vector vp, notice that all we have to do is set t = 0 in the parametric definition of L given above, which yields \( \displaystyle \quad {\bf v}_p = \left[ -\frac{39}{23} , \ -\frac{3}{23} , \ 0 \right] \quad \) Once we know vp, the line L₀ is simply the remaining portion of the solution \[ L_0 = \left\{ \left( x, y, z \right) \ : \ x = -\frac{38}{23}\, t , \quad y = \frac{13}{23}\, t , \quad z = t , \quad \forall t \in \mathbb{R} \right\} . \] Clearly, L₀ is a line through the origin, and is thus a subspace of ℝ³. The line L can be realized as a translate of the line L₀ by the particular solution xp. Now let us plot these two lines along with the particular solution xp.
lineL = ParametricPlot3D[{38/23*t - 39/23, 13/23*t - 3/23, t}, {t, -3, 3}, PlotStyle -> {Thickness[0.007], Blue}];
lineL0 = ParametricPlot3D[{38/23*t, 13/23*t, t}, {t, -3, 3}, PlotStyle -> {Thickness[0.007], Red}];
pts = Graphics3D[{Black, Sphere[{0, 0, 0}, 0.13], Black, Sphere[{-39/23, -3/23, 0}, 0.13]}];
arrow = Graphics3D[{Arrowheads[0.05], Thickness[0.007], Purple, Arrow[{{0, 0, 0}, {-39/23, -3/23, 0}}]}];
txt = Graphics3D[{Text[Style["O", Black, 20], {0, 0, 0.6}], Text[Style["P", Black, 20], {-39/23, -3/23, 0.6}]}];
Show[lineL, lineL0, txt, arrow, pts]
Coset of 1D space.
   ■
End of Example 16

The properties of affine transformations on points and vectors are summarized in the following theorem.

Theorem 6: Let P and Q be points and u and v be vectors in an affine space 𝔸 = (A,V, d). Let F : 𝔸 ↣ 𝔹 be an affine transformation from 𝔸 to another affine space 𝔹 = (B, U, d). Then for any scalar α
  1. DF(v) = DF(PQ) = F(P) − F(Q)    for    v = PQ,
  2. F(P + αu) = F(P) + αDF(u)    for any    uV,
  3. DF(u + v) = DF(u) + DF(v),
  4. DFv) = αDF(v).
  1. The first property is the definition of the linear part of an affine map.
  2. The second property is a reformulation of Theorem 5.
  3. Showing part (c) is straight forward if P and Q are points in 𝔸 such that u = PQ and v = QR and the head-to-tail axiom is applied several times. \begin{align*} F(\mathbf{u} - \mathbf{v}) &= F \left( (P - Q) + (Q - R) \right) \\ &= F(P - R) = F(P) - F(R) \\ &= F(P) - F(Q) + F(Q) - F(R) \\ &= F(P - Q) + F(Q-R) \\ &= F(\mathbf{u}) + F(\mathbf{v}) . \end{align*}
  4. This is just a well-known property of a linear transformation.

    When F is applied to vectors, it is actually a linear part of affine map. So Fv) = DFv) = α DF(v) = α F(v) because DF is a linear transformation.

   
Example 17: We consider a typical affine map: yA x + b, where A ∈ ℝm×n, x ∈ ℝn×1, b ∈ ℝm×1. For simplicity, we choose m = n = 2 and set

P = (1, 2) ,
Q = (3, 1) ,
R = (4, 4) ,

Then vectors acting on these points become

u = PQ = (−2, 1),
v = QR = (−1, −3),

Clear[P,Q,R,F,DF,u,v,lhs,rhs]; P={1,2}; Q={3,1}; R={4,4}; u=P-Q; v=Q-R;

Using Mathematica's build-in command, we define an affine transformation with matrix \( \displaystyle \quad {\bf A} = \begin{bmatrix} 1.5 & -1 \\ 2 & 2.5 \end{bmatrix} \quad \) and translation vector b = (2, −1).

affineFunc = AffineTransform[{{{1.5,-1},{2,2.5}}, {2, -1}}];
Using this affine transformation with linear part DF(·) = A(·), we check properties of Theorem 6.
  1. Part (a):

    Apply the affine transformation to points and vectors
    transformedP=affineFunc[P]; transformedQ=affineFunc[Q];
    Apply the affine transformation to vectors without translation vector v:
    A = {{1.5, -1}, {2, 2.5}}; v = P-Q; A.v
    transformedP - transformedQ
    {-4., -1.5}
    {-4., -1.5}
  2. Part (b):

    We set α = 7/3 and calculate
    affineFunc[P + (7/3)*v]
    {-7.83333, 2.5}
    A.(P + 7/3*v) + {2, -1}
    {-7.83333, 2.5}
    Now we calculate the right-hand side:
    affineFunc[P] + (7/3)*A.v
    {-7.83333, 2.5}
  3. Part (c):

    Apply the affine transformation to vectors without translation
    DF[vector_]:= A.vector;
    Verify the theorem statement
    lhs=DF[u+v]; rhs=DF[u]+DF[v]; TrueQ[lhs==rhs]
    True
    Graphical Illustration of Part C.

    Points and their transformations

    transformedP = affineFunc[P] + b; transformedQ = affineFunc[Q] + b; transformedR = affineFunc[R] + b;
    Graphics for original points and vectors
    originalGraphics=Graphics[{ Red,PointSize[Large],Point[P],Point[Q],Point[R], Blue,Arrow[{P,Q}],Arrow[{Q,R}],Arrow[{P,R}], Text[P,P,{1,-2}],Text[Q,Q,{-2,-1}],Text[R,R,{1,-1}] }];
    transformedGraphics=Graphics[{ Green,PointSize[Large],Point[transformedP],Point[transformedQ],Point[transformedR], Blue,Arrow[{transformedP,transformedQ}],Arrow[{transformedQ,transformedR}],Arrow[{transformedP,transformedR}], Text[F(P),transformedP,{1,-1}],Text[F(Q),transformedQ,{1,.9}],Text[F(R),transformedR,{1,-1}] }];
    Show both graphics
    Show[originalGraphics,transformedGraphics,PlotRange->All,Axes->True,AxesOrigin->{0,0},GridLines->Automatic]
    Illustration of property (c).
  4. Part (d):

    We check the assessment with α = (7/3):
    A.(7/3*v)
    {-9.33333, -3.5}
    and right-hand side is
    (7/3)*A.v
    {-9.33333, -3.5}
   ■
End of Example 17

Recall that a linear transformation T : ℝ² ⇾ ℝ² is uniquely determined by taking a line segment (or its endpoints) in the domain and map it into another line segment (or its endpoints) in the codomain. This is no longer the case for an affine map f : 𝔸² ↦ 𝔸². It turns out that an affine transformation f : 𝔸² ↦ 𝔸² is uniquely determined by taking a triangle (or three points) in the domain and mapping it into another triangle (or three points) in the codomain. To see how this works, let the triangle in the domain be defined as the interior of the three points

\[ T_1 = \left\{ \left( x_1 , y_1 \right) , \ \left( x_2 , y_2 \right) , \ \left( x_3 , y_3 \right) \right\} . \]
Similarly, suppose these points are mapped into
\[ T_2 = \left\{ \left( z_1 , w_1 \right) , \ \left( z_2 , w_2 \right) , \ \left( z_3 , w_3 \right) \right\} . \]
Then
\[ f \left( T_1 \right) = T_2 \qquad \iff \qquad f\left( x_j , y_j \right) = \left( z_j , w_j \right) , \quad j=1,2,3. \]
Given that f is determined by formula f(x) = A x + b, where A and b are expressed respectively by
\[ \mathbf{A} = \begin{bmatrix} a&b \\ c & d \end{bmatrix} , \qquad \mathbf{b} = \begin{bmatrix} \alpha \\ \beta \end{bmatrix} , \]
we get the following system of equations:
\[ \begin{bmatrix} a&b \\ c & d \end{bmatrix} \cdot \begin{bmatrix} x_j \\ y_j \end{bmatrix} + \begin{bmatrix} \alpha \\ \beta \end{bmatrix} = \begin{bmatrix} z_j \\ w_j \end{bmatrix} , \qquad j=1,2,3. \]
This gives the new single matrix/vector equation
\begin{equation} \label{EqAffine.5} \begin{bmatrix} x_1 & y_1 & 0&0& 1 & 0 \\ 0&0& x_1 & y_1 & 0&1 \\ x_2 & y_2 & 0&0&1&0 \\ 0&0& x_2 & y_2 &0&1 \\ x_3&y_3 & 0&0&1&0 \\ 0&0& x_3 & y_3 & 0&1 \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ \alpha \\ \beta \end{bmatrix} = \begin{bmatrix} z_1 \\ w_1 \\ z_2 \\ w_2 \\ z_3 \\ w_3 \end{bmatrix} . \end{equation}
This matrix/vector equation can be solved for our six unknowns {𝑎, b, c, d, α, β}, which determine the affine map uniquely. Therefore, a two-dimensional affine space has six degrees of freedom.
Theorem 7: Given two ordered sets of three non-collinear points each, there exists a unique affine transformation f : 𝔸² ↣ 𝔸² mapping one set onto the other.
We first show that the special (ordered) triple of vectors, \[ \left\{ {\bf 0} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} , \quad {\bf i} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} , \quad {\bf j} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} , \right\} \] can be mapped by an appropriate affine transformation to an arbitrary (ordered) triple of vectors \[ \left\{ \mathbf{p} = \begin{bmatrix} p_1 \\ p_2 \end{bmatrix} , \quad \mathbf{q} = \begin{bmatrix} q_1 \\ q_2 \end{bmatrix} , \quad \mathbf{r} = \begin{bmatrix} r_1 \\ r_2 \end{bmatrix} \right\} , \] which corresponds to three non-collinear points. Let \[ \mathbf{A} = \begin{bmatrix} q_1 - p_1 & r_1 - p_1 \\ q_2 - p_2 & r_2 - p_2 \end{bmatrix} \quad\mbox{and} \quad \mathbf{b} = \begin{bmatrix} p_1 \\ p_2 \end{bmatrix} . \] One can immediately verify that \[ \mathbf{A}\,{\bf 0} + {\bf b} = \mathbf{p} , \quad \mathbf{A}\,{\bf i} + {\bf b} = \mathbf{q} , \quad \mathbf{A}\,{\bf j} + {\bf b} = \mathbf{r} . \] Note that the columns of A correspond to the vectors qp and rp. Since the points (p₁, p₂), (q₁, q₂), and (r₁, r₂) are non-collinear, the vectors qp and rp are non-parallel vectors. Hence, the determinant of A is nonzero. Thus, A is invertible, and f(x) = A x + b is an affine transformation by definition.

Let (p, q, r) and (p₁, q₁, r₁) be two ordered triples of position vectors representing two arbitrary triples of non-collinear points. Using the result we have just proven, there exist affine transformations f and g mapping the special triple {0, i, j} to {p, q, r} and to {p₁, q₁, r₁}, respectively. Then gf−1 is an affine transformation that maps {p, q, r} into {p₁, q₁, r₁}. The uniqueness of this transformation is left to you.

   
Example 18: Let us consider two sets of points on the plane ℝ²: \[ \begin{split} T_1 &= \left\{ \left( -5, -3 \right) , \ \left( 2, 10 \right) , \ \left( 3, -5 \right) \right\} , \\ T_2 &= \left\{ \left( -4, 1 \right) , \ \left( -3, 11 \right) , \ \left( 1, 9 \right) \right\} . \end{split} \] We want to find an affine transformation f(x) = A x + w that maps points from T₁ into T₂. Writing affine transformation explicitly, we get \[ {\bf A} = \begin{bmatrix} a&b \\ c&d \end{bmatrix} \in \mathbb{R}^{2\times 2} , \qquad {\bf w} = \begin{bmatrix} \alpha \\ \beta \end{bmatrix} \in \mathbb{R}^{2\times 1} . \] In order to have f(T₁) = T₂, we should have \begin{align*} f \left( \begin{bmatrix} -5 \\ -3 \end{bmatrix} \right) &= \begin{bmatrix} -4 \\ \phantom{-}1 \end{bmatrix} \quad \Longrightarrow \quad \begin{split} -5 a --3b &= -4 - \alpha , \\ -5c - 3d &= 1 - \beta ; \end{split} \\ f \left( \begin{bmatrix} 2 \\ 10 \end{bmatrix} \right) &= \begin{bmatrix} -3 \\ 11 \end{bmatrix} \quad \Longrightarrow \quad \begin{split} 2 a -+10b &= -3 -\alpha , \\ 2c + 10 d &= 11 -\beta ; \end{split} \\ f \left( \begin{bmatrix} 3 \\ -5 \end{bmatrix} \right) &= \begin{bmatrix} 1 \\ 9 9 \end{bmatrix} \quad \Longrightarrow \quad \begin{split} 3 a --5b &= 1-\alpha , \\ 3c - 5d &= 9 -\beta . \end{split} \end{align*} Introducing 6-dimensional vector of unknowns X and right-hand side vector Y, \[ {\bf X} = \left[ a\ b \ c\ d\ \alpha \ \beta \right]^{\mathrm T} , \qquad {\bf Y} = \left[-4\ 1 \ -3\ 11\ 1 \ 9 \right]^{\mathrm T} , \] we can write our problem as a matrix/vector equation \[ {\bf M}\, {\bf X} = {\bf Y} , \] where \[ {\bf M} = \begin{bmatrix} -5 & -3 & \phantom{-}0&\phantom{-}0&1&0 \\ \phantom{-}0 &\phantom{-}0&-5&-3&0&1 \\ \phantom{-}2 & 10 & \phantom{-}0 & \phantom{-}0 & 1&0 \\ \phantom{-}0 & \phantom{-}0 & \phantom{-}2&10 & 0&1 \\ \phantom{-}3 & -5 & \phantom{-}0 & \phantom{-}0 & 1&0 \\ \phantom{-}0 & \phantom{-}0& \phantom{-}3 & -5& 0&1 \end{bmatrix}, \quad {\bf Y} = \begin{bmatrix} -4 \\ 1 \\ -3 \\ 11 \\ 1 \\ 9 \end{bmatrix} . \] So we use Mathematica to build matrix and column vector from Eq.(5):
Clear[X1,X2,Y1,Y2,A,B,AB,CC,R,b,d]; X1={-5,2,3}; Y1={-3,10,-5}; X2={-4,-3,1}; Y2={1,11,9};
R={{X2[[1]]},{Y2[[1]]},{X2[[2]]},{Y2[[2]]},{X2[[3]]},{Y2[[3]]}}; M={{X1[[1]],Y1[[1]],0,0,1,0},{0,0,X1[[1]],Y1[[1]],0,1},{X1[[2]],Y1[[2]],0,0,1,0},{0,0,X1[[2]],Y1[[2]],0,1},{X1[[3]],Y1[[3]],0,0,1,0},{0,0,X1[[3]],Y1[[3]],0,1}}
b={-4,1,-3,11,1,9}
\( \displaystyle \quad \begin{pmatrix} -5 &-3& 0&0&1&0 \\ 0&0&-5&-3&0&1 \\ 2&10&0&0&1&0 \\ 0&0&2&10&0&1 \\ 3&-5&0&0&1&0 \\ 0&0&3&-5&0&1 \end{pmatrix} \)
{-4, 1, -3, 11, 1, 9}
AB = Inverse[M].R
\( \displaystyle \quad \begin{pmatrix} 67/118 \\ -(27/118) \\ 62/59 \\ 12/59 \\ -(109/59) \\ 405/59 \end{pmatrix} \)
A = {{AB[[1, 1]], AB[[2, 1]]}, { AB[[3, 1]], AB[[4, 1]]}};
% // MatrixForm
\( \displaystyle \quad \begin{pmatrix} \frac{67}{118} & -\frac{27}{118} \\ \frac{62}{59} & \frac{12}{59} \end{pmatrix} \)
B = {{AB[[5, 1]]}, {AB[[6, 1]]}};
% // MatrixForm
\( \displaystyle \quad \begin{pmatrix} - \frac{109}{59} \\ \frac{405}{59} \end{pmatrix} \)
A . {X1[[1]], Y1[[1]]} + B;
% // MatrixForm
\( \displaystyle \quad \begin{pmatrix} -4 \\ 1 \end{pmatrix} \)
A . {X1[[2]], Y1[[2]]} + B;
% // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 3 \\ 11 \end{pmatrix} \)
A . {X1[[3]], Y1[[3]]} + B;
% // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1 \\ 9 \end{pmatrix} \)
The last three Mathematica commands are simply verifications that the vectors (xk, yk) determine the corners of triangle T₁ were sent to their corresponding counterparts (zk, wk) of T₂.

   ■
End of Example 18

The following theorem provides a practical algorithm to detefine an affine map for any given linear transformation between vector spaces, Df : VU.

Theorem 8: Let 𝔸 = (A, V, +) and 𝔹 = (B, U, +) be two affine spaces. For any pair of points 𝑎 ∈ A, bB and any linear mapping g : VU, there exists a unique affine mapping f : AB such that f(𝑎) = b and Df = g.
We set \[ f(a + {\bf x}) = b + g({\bf x}) \qquad \mbox{for all} \quad {\bf x} \in V. \] Since any point in A can be uniquely represented in the form 𝑎 + x, this formula defines a set-theoretic mapping    f : AB. It is affine because \begin{align*} f(a + {\bf x}) - f(a + {\bf y}) &= g({\bf x} - g({\bf y}) = g({\bf x} - {\bf y}) \\ &= g\left[ (a + {\bf x}) - (a + {\bf y}) \right] \\ &= D\,f \left[ (a + {\bf x}) - (a + {\bf y}) \right] . \end{align*} Hence, Df = g and f(𝑎) = b. This proves the existence of f.

Conversely, if f is a mapping with the required properties, then \[ f(a + {\bf x}) - f(a) = g({\bf x}), \] whence f(𝑎 + x) = b + g(x) for all xU.

   
Example 19: Let g : ℝ³ ⇾ ℝ³ be a linear transformation that is defined by a singular matrix \[ \mathbf{A} = \begin{bmatrix} \phantom{-}1& \phantom{-}2& 3 \\ \phantom{-}2& -3& 1 \\ -1& \phantom{-}5& 2 \end{bmatrix} \qquad \Longrightarrow \qquad g({\bf x}) = {\bf A}\,{\bf x} . \] Let us consider an affine transformation \[ f(\mathbf{x}) = {\bf A}\,{\bf x} + {\bf w} , \tag{19.1} \] with some vector w to be determined. If we want transformation (19.1) to satisfy the condition f(𝑎) = b for a pair of points 𝑎 ∈ ℝ3×1, b ∈ ℝ3×1, we should get the relation \[ f(a) = {\bf A}\,a + {\bf w} = b . \] Hence, vector w must be equal to \[ \mathbf{w} = b - {\bf A}\,a , \tag{19.2} \] the difference of two points. Here we used canonical affine spaces 𝔸 = (ℝ³, +) = 𝔹. Points 𝑎 and b in Eq.(19.2) are identified with some vectors by choosing (fixing) some frames in ℝ³, which is isomorphic to ℝ3×1.

Condition (19.2) for vector w shows that singularity of matrix A plays no role in defining affine mappings.    ■

End of Example 19

An important particular case of Theorem 8 is obtained by applying it to the pair (A, V, +), (V, V, +), and points 𝑎 ∈ A, 0V and identity map g = idV of vector space V. We find that for any point 𝑎 ∈ A, there exists a unique affine isomorphism f : AV such that transforms this point into the origin of coordinates and has the same linear part. This is the precise meaning of the statement that an affine space is a "linear space whose origin of coordinates is forgotten". In particular, affine spaces are isomorphic if and only if the associated linear spaces are isomorphic. The latter are classified by their dimension, and we can call the dimension of an affine space the dimension of the corresponding linear space.

Corollary 2: Let f, g : (A, V, +) ↣ (B, U, +) be two affine mappings. Then their linear parts are equal if and only if g is the composition of f with a translation by some unique vector from U.
The sufficiency of the condition was checked in Example 11(c). To prove necessity we select any point 𝑎 ∈ A and set h to be a composition of f followed by translation: \[ h = t_{g(a) - f(a)} \circ f , \] where tx(P) = P + x. Straifgt forward evaluations show that h(𝑎) = g(𝑎) and D(h) = D(g). By Theorem 7, h = g.

Conversely, if g = txf, then x = g(𝑎) - f(𝑎); this vector is independent of 𝑎 ∈ A because f and g have the same linear parts.

Homogeneous Coordinates

An n-dimensional affine space (A, V, d) is specified by the vector space V and the inhabited set of points A. The n-dimensional vector space V is completely described by providing an ordered basis for it. From the definition of an affine space, it is known that for every pair of points in A there exists a vector in V that “connects” them. Once a particular point O is selected from A, every other point in A can be obtained by adding a vector from V to O. Therefore, supplying an ordered basis for V and a single point in A is sufficient to specify the affine space A.

A frame for the n-dimensional affine space 𝔸 = (A, V, d) consists of the set of basis vectors e₁, e₂, … , en for V and a point O from A. The point O locates the origin of the frame within A. We use the notation ϕ = (e₁, e₂, … , en, O) to denote a frame. Every vector u in V can be expressed as \[ \mathbf{u} = c_1 \mathbf{e}_1 + c_2 \mathbf{e}_2 + \cdots + c_n \mathbf{e}_n , \] and every point P in A can be written as \[ P = k_1 \mathbf{e}_1 + k_2 \mathbf{e}_2 + \cdots + k_n \mathbf{e}_n + O. \]
Specifying a frame for an affine space is equivalent to providing a coordinate system for it; once a frame has been determined any point or vector in the affine space can be described by a set of scalar values. To do this in matrix notation, however, the following definition must be made. This is often specified as a third axiom to definition of the affine space:
\[ 0 \cdot P = O \qquad \mbox{and} \qquad 1 \cdot P = P \in \mathbb{A} . \]

We start a demonstration of affine transformations with a plane case. So we choose a frame ϕ = (e₁, e₂, O) for an affine space 𝔸 = (A, ℝ², d). Any vector u in ℝ² can be written in either column form or row form:

\[ \mathbf{u} = \begin{bmatrix} \mathbf{e}_1 & \mathbf{e}_2 & O \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ 0 \end{bmatrix} = \left( \alpha_1 \,:\,\alpha_2 \, : \,0 \right) \begin{pmatrix} \mathbf{e}_1 \\ \mathbf{e}_2 \\ O \end{pmatrix} . \]
Hence, column vector \( \displaystyle \quad \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ 0 \end{bmatrix} \quad \) and the corresponding row vector \( \displaystyle \quad \left( \alpha_1 \,:\, \alpha_2 \, : \, 0 \right) \quad \) are coordinate vectors written in column form and in row form, respectively. Traditionally, coordinate vectors are written in row form where components are separated by ":" in projective geometry. Similarly, point P in the inhabited set A can be expressed as
\[ P = \begin{bmatrix} \mathbf{e}_1 & \mathbf{e}_2 & O \end{bmatrix} \begin{bmatrix} \beta_1 \\ \beta_2 \\ 1 \end{bmatrix} = \left( \beta_1 \, : \, \beta_2 \,:\, 1 \right) \begin{pmatrix} \mathbf{e}_1 \\ \mathbf{e}_2 \\ O \end{pmatrix} . \]
Similar expression are valid for three-dimensional affine spaces, and in general, they are extended for arbitrary n-dimensional case. Since there is no standard notation for affine coordinates, some authors prefer column notation while others use row form. Therefore, we place both notations together and let the reader deside which one is preferable.    
Example 20: Given the frame \[ \phi = \left( \begin{bmatrix} \phantom{-}2 \\ -3 \end{bmatrix} , \ \begin{bmatrix} 1 \\ 4 \end{bmatrix} , \ \begin{pmatrix} 6 & 2 \end{pmatrix} \right) , \] determine the point Q that has the coordinates (-3, 2, 1).

Solution: We use the coordinates to form a linear combination of the vectors in the frame that we then add to the frame’s origin. Because we are adding a vector to a point the result will indeed be a point. \[ Q = -3 \begin{bmatrix} \phantom{-}2 \\ -3 \end{bmatrix} + 2 \begin{bmatrix} 1 \\ 4 \end{bmatrix} + \begin{pmatrix} 6 & 2 \end{pmatrix} = \begin{bmatrix} 2 \\ 19 \end{bmatrix} . \]

-3*{2, -3} + 2 *{1, 4} + {6, 2}
{2, 19}
   ■
End of Example 20

Often it is desirable to find the coordinates of a point relative to one frame given the coordinates of that point relative to another frame. This operation, called a change of frames, is analogous to the change of basis operation in vector spaces. Let β = (v₁, v₂, v₃, O) and ϕ = (e₁, e₂, e₃, Q) be two frames for the 3-dimensional affine space 𝔸. To find a coordinate vector of arbitrary point in frame ϕ, denoted by ⟦Pϕ given ⟦Pβ = [α₁, α₂, α₃, 1], we must first write the basis vectors and point in β in terms of the basis vectors and point in ϕ:

\begin{align*} \mathbf{v}_1 &= a_1 \mathbf{e}_1 + b_1 \mathbf{e}_2 + c_1 \mathbf{e}_3 , \\ \mathbf{v}_2 &= a_2 \mathbf{e}_1 + b_2 \mathbf{e}_2 + c_2 \mathbf{e}_3 , \\ \mathbf{v}_3 &= a_3 \mathbf{e}_1 + b_3 \mathbf{e}_2 + c_3 \mathbf{e}_3 , \\ O &= a_4 \mathbf{e}_1 + b_4 \mathbf{e}_2 + c_4 \mathbf{e}_3 + Q . \end{align*}
Then
\begin{align*} [\! P ]\!]_{\phi} &= [\![ \alpha_1 \mathbf{v}_1 + \alpha_2 \mathbf{v}_2 + \alpha_3 \mathbf{v}_3 ]\!]_{\phi} + [\![ O ]\!]_{\phi} \\ &= \alpha_1 [\![ \mathbf{v}_1 ]\!]_{\phi} + \alpha_2 [\![ \mathbf{v}_2 ]\!]_{\phi} + \alpha_3 [\![ \mathbf{v}_3 ]\!]_{\phi} + [\![ O ]\!]_{\phi} \\ &= \begin{bmatrix} [\![ \mathbf{v}_1 ]\!]_{\phi} & [\![ \mathbf{v}_2 ]\!]_{\phi} & [\![ \mathbf{v}_3 ]\!]_{\phi} & [\![ O ]\!]_{\phi} \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} a_1 & a_2 & a_3 & a_4 \\ b_1 & b_2 & b_3 & b_4 \\ c_1 & c_2 & c_3 & c_4 \\ 0&0&0&1 \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ 1 \end{bmatrix} \end{align*}
is the change of frame matrix.    
Example 21: Let β and ϕ be two frames for the same affine space such that \[ \beta = \left( \begin{bmatrix} 3 \\ 2 \end{bmatrix} , \ \begin{bmatrix} 1 \\ 4 \end{bmatrix} , \ \left( 5, \ 2 \right) \right) , \qquad \phi = \left( \begin{bmatrix} 7 \\ 3 \end{bmatrix} ,\ \begin{bmatrix} -3 \\ -2 \end{bmatrix} ,\ \left( 3,\ -2 \right) \right) . \] If ⟦Qβ = (-3, 1, 1), then find ⟦Qϕ.

Solution: The basis vectors in β can be written as \begin{align*} \begin{bmatrix} 3 \\ 2 \end{bmatrix} &= 0 \cdot \begin{bmatrix} 7 \\ 3 \end{bmatrix} - \begin{bmatrix} -3 \\ -2 \end{bmatrix} + 0 \cdot \left( 3,\ -2 \right) , \\ \begin{bmatrix} 1 \\ 4 \end{bmatrix} &= -2 \cdot \begin{bmatrix} 7 \\ 3 \end{bmatrix} -5\cdot \begin{bmatrix} -3 \\ -2 \end{bmatrix} + 0 \cdot \left( 3,\ -2 \right) , \\ \left( 5, \ 2 \right) &= - \frac{8}{5}\cdot \begin{bmatrix} 7 \\ 3 \end{bmatrix} -\frac{22}{5} \cdot \begin{bmatrix} -3 \\ -2 \end{bmatrix} + 1 \cdot \left( 3,\ -2 \right) , \end{align*}

Inverse[{{7, -3}, {3, -2}}] . {3, 2}
{0, -1}
Inverse[{{7, -3}, {3, -2}}] . {1, 4}
{-2, -5}
Inverse[{{7, -3}, {3, -2}}] . {2, 4}
{-(8/5), -(22/5)}
so the change of frame matrix M is \[ \mathbf{M} = \begin{bmatrix} 0&-1& -\frac{8}{5} \\ -2&-5& -\frac{22}{5} \\ 0&0&1 \end{bmatrix} . \] Knowing M we can compute ⟦Qϕ: \[ [\![ Q ]\!]_{\phi} = \begin{bmatrix} 0&-1& -\frac{8}{5} \\ -2&-5& -\frac{22}{5} \\ 0&0&1 \end{bmatrix} \begin{bmatrix} -3 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} -\frac{13}{5} \\ - \frac{17}{5} \\ 1 \end{bmatrix} . \]
{{0,-1,-8/5}, {-2,-5,-22/5}, {0,0,1}} . {-3, 1, 1}
{-(13/5), -(17/5), 1}
   ■
End of Example 21
August Möbius

Compared to Euclidean geometry, projective geometry has a different setting and has extra points for a given dimension. This allows translation to be described as a linear transformation, thereby allowing all the transformations we would like to affect to be represented by matrix multiplication. Recall that a linear translation is not a linear transformation in vector spaces. The way out of this dilemma is to turn the n-dimensional problem into a (n+1)-dimensional problem, but in homogeneous coordinates, introduced by the German mathematician August Ferdinand Möbius (1790--1868) in his 1827 work Der barycentrische Calcul.

The real projective plane ℙ² can be given in terms of equivalence classes. For non-zero elements of ℝ³, define (x₁, y₁, z₁) ~ (x₂, y₂, z₂) to mean there is a non-zero λ so that (x₁, y₁, z₁) = (λx₂, λy₂, λz₂). Then ~ is an equivalence relation and the projective plane can be defined as the equivalence classes of ℝ³ ∖ {0}. If (x, y, z) is one of the elements of the equivalence class p, then these are taken to be homogeneous coordinates of p. The homogeneous coordinates or projective coordinates of the point are denoted with columns, either (x:y:z) or [x:y:z].

Homogeneous coordinates for a 𝑛-dimensional space consist of tuples with 𝑛+1 coordinates, where the extra coordinate is kept at a special value.

When z ≠ 0, the point [x:y:z] represents the point (x/z, y/z) in the Euclidean plane ℝ². Homogeneous coordinates of the form (x, y, 0) do not correspond to a point in the Cartesian plane. Instead, they correspond to the unique point at infinity in the direction (x, y). Hence, the projective plane ℙ² can be seen as the plane ℝ² plus all the points at infinity, each of which along a different direction. The plane ℙ² also makes sense of the notion that two parallel lines intersect at infinity,

The projective transformation does not preserve parallelism, length, and angle. But it still preserves collinearity and incidence. Projective transformation can be represented as transformation of an arbitrary quadrangle (i.e. system of four points) into another one.    

Example 22: The Plücker coordinates were introduced by the German mathematician and physicist Julius Plücker (1801--1868) in the 19th century. Plücker 3D line coordinates are concise and efficient for numerous chores---they provide faster and simpler code for line representations than if you described lines by a direction vector and a position vector, or as two plane equations, or by two points.

Suppose that a line L in 3-dimensional Euclidean space is determined by two distinct points x = (x₁, x₂, x₃) and y = (y₁, y₂, t₃). The vector displacement from x to y is nonzero because the points are distinct, and represents the direction of the line. That is, every displacement between points on L is a scalar multiple of d = yx. If a physical particle of unit mass were to move from x to y, it would have a moment about the origin. The geometric equivalent to this moment, is a vector whose direction is perpendicular to the plane containing L and the origin, and whose length equals twice the area of the triangle formed by the displacement and the origin. Treating the points as displacements from the origin, the moment is the cross product of these vectors, m = x × y, so dm = 0.

For illustration of Plücker coordinates as displacement-moment (d, m) on the plane, we define first points x and y:
x = {1, 2, 0}; y = {4, 3, 0};
Calculate the displacement vector d and moment vector m:
d = y - x; m = Cross[x, y]/2;
Ensure m is in the positive z direction
m = If[m[[3]] < 0, -m, m];
Base point of the moment vector (a point on the z-axis)
base = {2.5, 4.5, 0};
Now we obtain a graphical illystration:
Graphics3D[{{Red, PointSize[Large], Point[x], Point[y]}, {Darker[Yellow], Arrowheads[0.07], Arrowheads[0.07], Arrow[Tube[{x, y}, 0.035]]}, {Green, Thickness[0.01], Arrowheads[0.07], Arrow[{base, base + m}]}, {Orange, Dashed, Line[{x, base}], Line[{y, base}]}, {Opacity[.75], Polygon[{x, y, base}]}, Text[Style["x", 15, Bold], x + {0.15, -0.5, 0}], Text[Style["y", 15, Bold], y + {0.15, 0.5, 0}], Text[Style["d", 15, Bold], x + {1.5, 0.25, 0}], Text[Style["m", 15, Bold], base + m + {0, 0, 0.25}]}, Boxed -> False, Axes -> True, AxesLabel -> {"x", "y", "z"}, ViewPoint -> {1.3, -2.4, 2}]
Displacement and moment as Plücker coordinates.

A pair (d, m) identifies uniquely the line, up to a common (nonzero) scalar multiple. That is, the coordinates \[ \left( {\bf d} : {\bf m} \right) = \left( d_1 : d_2 : d_3 : m_1 : m_2 : m_3 \right) \] may be considered homogeneous coordinates for L, in the sense that all pairs (λd : λm) for λ ≠ 0, can be produced by points on L and only L, and any such pair determines a unique line so long as d is not zero and dm = 0. Furthermore, this approach extends to include points, lines, and a plane "at infinity", in the sense of projective geometry. In addition a point x lies on the line L if and only if x × d = m.

Shoemake's tutorial contains many properties of the Plücker coordinates, of which we mention the following: \[ \mbox{Squared distance from origin to $L$ is} = \frac{m_1^2 + m_2^2 + m_3^2}{d_1^2 + d_2^2 + d_3^2} . \]

For example, x = (1,-3, 2) and y = (5. 2, -3). Their cross product is

Cross[{1, -3, 2}, {5, 2, -3}]
{5, 13, 17}
Then \[ \left( {\bf d} : {\bf m} \right) = \left( 4 : 5 : -5 : 5 : 13 : 17 \right) . \] The squared distance of this line from the origin is \[ \frac{5^2 + 13^2 + 17^2}{4^2 + 5^2 + 5^2} = \frac{483}{66} = \frac{161}{22}\approx 7.31818 \approx \left( 2.70521 \right)^2 . \]

If two vectors d = m are the same, then the Plücker coordinates of the line are \[ \left( \mathbf{d} : \mathbf{0} \right) = \left( d_1 : d_2 : d_3 : 0 : 0 : 0 \right) . \] The squared distance of this line from the origin is 0.    ■

End of Example 22

Affine (augmented) Matrices

In order to define an affine plane (which is a two-dimensional geometric object), we need to separate points from vectors (that are also called lines). It is a custom to introduce a binary marker or "gender" relationship between points and vectors by appending an extra digit to their coordinates. Namely, we identify points with extra integer "1," but we mark vectors with "0." Hence, we write points as P(x, y, 1) and vectors as v(x, y, 0). However, you can move vectors to the point plane and attach them to points. This allows us to move a point P into another position Q along vector v. In coordinates, it can be written as

\[ P(x, y, 1) + \mathbf{v}(a, b, 0) = Q(x+a, y+b, 1) \quad \Longrightarrow \quad \mathbf{v} = = Q - P = \overline{PQ} . \]

From this perspective, we are not allowed to add points, but we can add points and vectors, as well as vectors and vectors. For any two points P and Q from the inhabited set A, there exists a unique vector vV such that Q = P + v; so we can identify this vector as PQ or QP. In general, an affine space consists of an inhabited set of points A together with a vector space V and subtraction operation of two points, producing a unique vector.

Suppose that 𝔸 and 𝔹 are n-dimensional and m-dimensional affine spaces, respectively. Let α = (a₁, a₂, … , an, Oα) and β = (b₁, b₂, … , bm, Oβ) be frames for 𝔸 and 𝔹. Suppose further that there exists an affine transformation F such that F : 𝔸 ↣ 𝔹 so that if P is a point in inhabited set A, then Q = F(P) is a point in inhabited set B. Finally, let ⟦Pα = [α₁, α₂, … , αn, 1]. Then

\begin{align*} \mathbf{Q} &= F(\mathbf{P}) \\ &= F \left( \alpha_1 \mathbf{a}_1 + \alpha_2 \mathbf{a}_2 + \cdots + \alpha_n \mathbf{a}_n + O_{\alpha} \right) \\ &= \alpha_1 F \left( \mathbf{a}_1 \right) + \alpha_2 F \left( \mathbf{a}_2 \right) + \cdots + \alpha_n F \left( \mathbf{a}_n \right) +F \left( O_{\alpha} \right) \end{align*}
where the last step is possible because of properties (c), (d) and (e) of theorem 5. Hence,
\begin{align*} [\![\mathbf{Q} ]\!]_{\beta} &= \left[ \alpha_1 F \left( \mathbf{a}_1 \right) + \alpha_2 F \left( \mathbf{a}_2 \right) + \cdots + \alpha_n F \left( \mathbf{a}_n \right) + F \left( O_{\alpha} \right) \right]_{\beta} \\ &= \begin{bmatrix} \left[ F \left( \mathbf{a}_1 \right) \right]_{\beta} & \cdots & \left[ F \left( \mathbf{a}_n \right) \right]_{\beta} & \left[ O_{\alpha} \right]_{\beta} \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} & a_{1, n+1} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} & a_{2, n+1} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} & a_{m, n+1} \\ 0&0& \cdots & 0 & 1 \end{bmatrix} \begin{bmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \\ 1 \end{bmatrix} \end{align*}
since
\[ \left[ F \left( \mathbf{a}_1 \right) \right]_{\beta} = \begin{bmatrix} a_{1,1} \\ a_{2,1} \\ \vdots \\ a_{m,1} \\ 0 \end{bmatrix} . \quad \left[ F \left( \mathbf{a}_2 \right) \right]_{\beta} = \begin{bmatrix} a_{1,2} \\ a_{2,2} \\ \vdots \\ a_{m,2} \\ 0 \end{bmatrix} \cdots , \quad \left[ F \left( O_{\alpha} \right) \right]_{\beta} = \begin{bmatrix} a_{1, n+1} \\ a_{2, n+1} \\ \vdots \\ a_{n, m+1} \\ 1 \end{bmatrix} . \]
The matrix
\[ \mathbf{M} = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} & a_{1, n+1} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} & a_{2, n+1} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} & a_{m, n+1} \\ 0&0& \cdots & 0 & 1 \end{bmatrix} \]
is the standard matrix of the affine transformation. In the common cases of two- and three- dimensional affine spaces M has the form
\[ \mathbf{M} = \begin{bmatrix} a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \\ 0&0&1 \end{bmatrix} , \quad \mathbf{M} = \begin{bmatrix} a_1 & a_2 & a_3 & a_4 \\ b_1 & b_2 & b_3 & b_4 \\ c_1 & c_2 & c_3 & c_4 \\ 0&0&0&1 \end{bmatrix} \]
In case of row vectors, these matrices become
\[ \mathbf{M} = \begin{pmatrix} a_1 & b_1 & 0 \\ a_2 & b_2 & 0 \\ a+_3 & b_3 &1 \end{pmatrix} , \quad \mathbf{M} = \begin{pmatrix} a_1 & b_1 & c_1 & 0 \\ a_2 & b_2 & c_2 & 0 \\ a_3 & b_3 & c_3 & 0 \\ a_4 & b_4 &c_4 &1 \end{pmatrix} \]
   

Example 23: Let 𝔸 = (A, ℝ², d) be an affine space with frame α = [e₁, e₂, O], where O = (0, 0). Let T : 𝔸 → 𝔸 be defined as T(P) = P + t, where t = [Δx, Δy]. Find the 3 × 3 matrix T that implements this transformation.

Solution: Since only frame α is used, we have \[ \left[ T(\mathbf{e}_1 ) \right]_{\alpha} = [\![ \mathbf{e}_1 ]\!] = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \qquad \mbox{and} \qquad \left[ T(\mathbf{e}_2 ) \right]_{\alpha} = [\![ \mathbf{e}_2 ]\!] = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \] while \[ \left[ T(O) \right]_{\alpha} = \left[ O + \mathbf{t} \right]_{\alpha} = \left[ O \right]_{\alpha} + \left[ {\bf t} \right]_{\alpha} = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} + \begin{bmatrix} \Delta x \\ \Delta y \\ 0 \end{bmatrix} = \begin{bmatrix} \Delta x \\ \Delta y \\ 1 \end{bmatrix} . \] Thus the matrix T is given by \[ \mathbf{T} = \begin{bmatrix} 1 & 0 & \Delta x \\ 0&1& \Delta y \\ 0&0&1 \end{bmatrix} . \]    ■

End of Example 23

Upon embedding n dimensional case into (n+1)-dimensional, we can define an affine transformation as regular linear transformation via matrix/vector multiplication. Since matrix form is so handy for building up complicated transforms from simpler ones, it would be very useful to be able to represent all of the affine transforms by matrices.

We also extend our augmented m-by-(n+1) matrix [Ab] from Eq.\eqref{EqAffine.1} into (m+1) × n+1)) matrix

\begin{equation} \label{EqAffine.6} \left[ \mathbf{A} \mid \mathbf{b} \right] \quad \Longrightarrow \quad {\bf A}_b = \begin{bmatrix} \mathbf{A} & \mathbf{b} \\ 0 \cdots 0 & 1 \end{bmatrix} \quad\mbox{or} \quad {\bf A}_b = \begin{pmatrix} \mathbf{A} & \mathbf{0} \\ \mathbf{b} & 1 \end{pmatrix} \end{equation}
define maps 𝔽m×(n+1) → 𝔽(m+1)×(n+1) . On the subset V ⊂ 𝔽(n+1)×1 consisting of vectors with last component 1, we recover the affine maps
\begin{equation} \label{EqAffine.7} \begin{bmatrix} \mathbf{A} & \mathbf{b} \\ 0 \cdots 0 & 1 \end{bmatrix} \begin{pmatrix} \mathbf{x} \\ 1 \end{pmatrix} = \begin{bmatrix} {\bf A}\,{\bf x} + \mathbf{b} \\ 1 \end{bmatrix} . \end{equation}
Since V does not contain the zero vector, it is not a vector subspace. But if V₀ denotes the subspace consisting of vectors having last component 0, then
\begin{equation} \label{EqAffine.8} V = \left\{ \mathbf{t} + \mathbf{v} \, | \ \mathbf{v} \in V_0 \right\} = \mathbf{t} + V_0 , \end{equation}
where t denotes any vector having last component 1. We view it as a translate of a vector subspace. Any subset of a vector space which is obtained by translation from a vector subspace is called affine subspace. For example, the set of solutions of a linear system is an affine subspace: It is a translate of the subspace of solutions of the associated homogeneous system.

Theorem 9: Let f and g be two affine mappings. A composition (if it is defined) of affine mappings is an affine mapping, so D(g f) = DgDf and the augmented matrix of composition gf is a product of corresponding augmented matrices.
Indeed, let P, QA. Then f(P) − f(Q) = i>Df(PQ). So \begin{align*} g \circ f (P) - g \circ f (Q) &= g\,f(P) - g\,f(Q) = D\,g \left[ f(P) - f(Q) \right] \\ &= D\,g \circ D\,f \left( P - Q \right) . \end{align*}

Composition of affine maps is expressed by the following formula:

\begin{equation} \label{EqAffine.9} {\bf A}_b \,{\bf B}_c = \begin{bmatrix} \mathbf{A} & \mathbf{b} \\ 0 \cdots 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} \mathbf{B} & \mathbf{c} \\ 0 \cdots 0 & 1 \end{bmatrix} = \begin{bmatrix} \mathbf{A}\,\mathbf{B} & \mathbf{A}\,\mathbf{c} + \mathbf{b} \\ 0 \cdots 0 & 1 \end{bmatrix} . \end{equation}
If matrix A has m rows and n columns, then the last row in augmented matrix Ab is a row containing n zeroes followed by single "1." Hence the augmented matrix Ab has dimensions (m + 1) × (n + 1).

   

Example 24: This example illustrates usefulness of affine (augmented) matrices for evaluation of compositions of two or more affine transformations. Once an affine transformation is written in matrix form (by extending the original dimensions), evaluation of compositions of two or more affine transformations is reduced to standard matrix multiplications.

We consider two affine transformations f : 𝔸² ↣ 𝔸³ and g : 𝔸³ ↣ 𝔸4, defined by formulas \[ f(\mathbf{x}) = {\bf A}\,\mathbf{x} + \mathbf{b} \quad\mbox{and} \quad g(\mathbf{y}) = {\bf B}\,\mathbf{y} + \mathbf{c} , \] where \[ {\bf A} = \begin{bmatrix} 1&2 \\ -2&1 \\ 3&-2 \end{bmatrix} , \quad {\bf b} = \begin{bmatrix} -2 \\ -3 \\ 1 \end{bmatrix} , \qquad {\bf B} = \begin{bmatrix} 4 & 1 & -2 \\ 2&5&-2 \\ 2&-1&3 \\ 5&-3& 2 \end{bmatrix} , \quad {\bf c} = \begin{bmatrix} 4\\ 3\\ 2 \\ 1 \end{bmatrix} . \] The corresponding augmented (affine) matrices for these transformations are \[ {\bf A}_b = \begin{bmatrix} 1&2 & -2 \\ -2&1 & -3 \\ 3&-2& 1 \\ 0&0&1 \end{bmatrix} , \qquad {\bf B}_c = \begin{bmatrix} 4 & 1 & -2& 4 \\ 2&5&-2&3 \\ 2&-1&3 & 2 \\ 5&-3& 2& 1 \\ 0&0&0&1 \end{bmatrix} . \] Then to determine an augmented matrix corresponding to their conposition gf, we simply multiply the corresponding augmented matrices: \begin{align*} {\bf B}_c {\bf A}_{b} &= \begin{bmatrix} 4 & 1 & -2& 4 \\ 2&5&-2&3 \\ 2&-1&3 & 2 \\ 5&-3& 2& 1 \\ 0&0&0&1 \end{bmatrix} \cdot \begin{bmatrix} 1&2 & -2 \\ -2&1 & -3 \\ 3&-2& 1 \\ 0&0&1 \end{bmatrix} \\ &= \begin{bmatrix} -4& 13&-9 \\ -14& 13&-18 \\ 13& -3&4 \\ 17& 3&2 \\ 0&0&1\end{bmatrix} , \end{align*} because \[ g \left( f({\bf x}) \right) = {\bf B}\,{\bf A}\,{\bf x} + {\bf B}\,{\bf b} + {\bf c} . \] This identity tells us that the composition gf is an affine transformation xCx + d, where \[ {\bf C} = {\bf B}\,{\bf A} , \qquad {\bf d} = {\bf B}\,{\bf b} + {\bf c} . \] So \[ {\bf C} = \begin{bmatrix} -4& 13 \\ -14& 13 \\ 13& -3 \\ 17& 3 \end{bmatrix} , \qquad {\bf d} = \begin{bmatrix} -9\\ -18\\ 4\\ 2 \end{bmatrix} \]

B = {{4, 1, -2}, {2, 5, -2}, {2, -1, 3}, {5, -3, 2}}; A = {{1, 2}, {-2, 1}, {3, -2}}; CC = B.A
{{-4, 13}, {-14, 13}, {13, -3}, {17, 3}}
B = {{4, 1, -2}, {2, 5, -2}, {2, -1, 3}, {5, -3, 2}}; b = {-2, -3, 1}; c = {4, 3, 2, 1}; B.b + c
{-9, -18, 4, 2}
Mathematica build-in command generates the augmented matrix:
A = {{1, 2}, {-2, 1}, {3, -2}}; b = {-2, -3, 1}; f = AffineTransform[{A, b}]
TransformationFunction \( \displaystyle \quad \left( \begin{array}{cc|c} 1&2&-2 \\ -2&1&3 \\ 3&-2&1 \\ \hline 0&0&1 \end{array}\right) \)
B = {{4, 1, -2}, {2, 5, -2}, {2, -1, 3}, {5, -3, 2}}; c = {4, 3, 2, 1}; g = AffineTransform[{B, c}]
TransformationFunction \( \displaystyle \quad \left( \begin{array}{ccc|c} 4&1&-2&4 \\ 2&5&-2&3 \\ 2&-1&3 & 2 \\ 5&-3&2&1 \\ \hline 0&0&0&1 \end{array}\right) \)
Since it is an augmented matrix of affine transformation (gf)(x) = C x + d, we extract the corresponding matrices \[ {\bf C} = \begin{bmatrix} -4& 13 \\ -14& 13 \\ 13& -3 \\ 17& 3 \end{bmatrix} , \qquad {\bf d} = \begin{bmatrix} -9 \\ -18 \\ 4 \\ 2 \end{bmatrix} . \]    ■
End of Example 24
   
Example 25: This example contains two distinct parts of affine mappings.

Let us start with two affine mappings that operate on affine spaces of different dimensions. We define them according to formulas: \[ f(\mathbf{x}) = {\bf A}\, \mathbf{x} + \mathbf{b} \qquad\mbox{and} \qquad g(\mathbf{y}) = {\bf B}\, \mathbf{y} + \mathbf{c} , \] where \[ {\bf A} = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \in \mathbb{R}^{2\times 2} , \qquad {\bf b} = \begin{pmatrix} -2 \\ 3.5 \end{pmatrix} \in \mathbb{R}^{2\times 1} , \] and \[ {\bf B} = \begin{bmatrix} -1 & -2 \\ 2&3 \\ -3& -4 \end{bmatrix} \in \mathbb{R}^{3\times 2}, \qquad {\bf c} = \begin{pmatrix} 1 \\ -1 \\ 2 \end{pmatrix} \in \mathbb{R}^{3\times 1} . \] Observe that f : 𝔸² ↣ 𝔸² and g : 𝔸² ↣ 𝔸³. So we can define one composition gf    of these affine mappings, but not fg---it is undefined.

The composition gf : 𝔸² ↣ 𝔸³ of these two affine mappings acting on two-dimensional vector v can be evaluated as follows: \begin{align*} g \circ f (\mathbf{v} ) &= g \left( f(\mathbf{v}) \right) = {\bf B} \left( f(\mathbf{v}) \right) + \mathbf{c} \\ &= {\bf B} \left( {\bf A}\, {\bf v} + {\bf b} \right) + \mathbf{c} \\ &= {\bf B} \,{\bf A} \,{\bf v} + {\bf B} \,{\bf b} + \mathbf{c} , \end{align*} which is an affine transformation \[ \mathbb{R}^{2 \times 1} \ni {\bf v} ↣ {\bf C}\,{\bf v} + {\bf d} \in \mathbb{R}^{3 \times 1} , \tag{E25.1} \] with \[ {\bf C} = {\bf B} \,{\bf A} = \begin{bmatrix} -5&-11 \\ 8&18 \\ 5&7 \end{bmatrix} \in \mathbb{R}^{3\times 2} \tag{E25.2} \] and \[ \mathbf{d} = {\bf B} \,{\bf b} +\mathbf{c} = \begin{bmatrix} -4 \\ 5.5 \\ 22 \end{bmatrix} \in \mathbb{R}^{3\times 1} . \tag{E25.3} \]

Clear[A, b, B, c, d, f, g]; A = {{1, 3}, {2, 4}}; B = {{-1, -2}, {2, 3}, {-3, 4}}; b = {-2, 3.5}; c = {1, -1, 2};
BA = B.A
d = B . b + c
{{-5, -11}, {8, 18}, {5, 7}}
{-4., 5.5, 22.}
We verify correctness of Theorem 9 with Mathematica in three different ways. First, we apply composition transformation (E25.1) to a particular point v = [3.4, −2.1] directly and then using special Mathematica build-in command. Finally, we obtain the same result upon application of product of corresponding augmented matrices.

Clear[A, b, B, c, d, f, g, v]; A = {{1, 3}, {2, 4}}; B = {{-1, -2}, {2, 3}, {-3, 4}}; b = {-2, 3.5}; c = {1, -1, 2}; v = {3.4, -2.1}; B . (A . v + b) + c
{2.1, -5.1, 24.3}
\[ ( g \circ f)({\bf v}) = \begin{bmatrix} 2.1 \\ -5.1 \\ 24.3 \end{bmatrix} \in \mathbb{R}^{3 \times 1} \quad \mbox{for} \quad {\bf v} = \begin{pmatrix} 3.4 \\ -2.1 \end{pmatrix} . \]
BA = B.A
\( \displaystyle \quad \begin{pmatrix} -5 & -11 \\ 8&18 \\ 5&7 \end{pmatrix} \)
Hence, we see that the linear part D(gf) of their composition gf is just product of corresponding matrices: \[ D \left( g \circ f \right) = {\bf C} = {\bf B}\,{\bf A} = (Dg) \circ (Df) . \]
d = B . b + c; BA . v + d
2.1, -5.1, 24.3}
There may be a more efficient way to do the next step using built-in Wolfram functions. Now the same job can be done more efficiently. We define affine transformations f and g using build-in Mathematica commands:
f=AffineTransform[{A, b}]; g=AffineTransform[{B, c}];
Apply transformations
fv = f[v]; gfv = g[fv]
{2.1, -5.1, 24.3}
However, if we try another composition fg with non-compatible dimensions, Mathematica complains:
gv = g[v]; fgv = f[gv]
TransformationFunction::fdim: {1.8,-0.5,-16.6} is not a vector of length 2 or a list of length 2 vectors.

In order to finish this part, we build augmented matrices corresponding to transformations f, g and their composition gf \[ {\bf A}_b = \begin{bmatrix} 1 & 3 & -2 \\ 2 & 4& 3.5 \\ 0&0&1 \end{bmatrix} \in \mathbb{R}^{3\times 3} , \qquad {\bf B}_c = \begin{bmatrix} -1&-2& 1 \\ 2&3&-1 \\ -3&-4&2 \\ 0&0&1 \end{bmatrix} \in \mathbb{R}^{4\times 3} , \] Multiplying these matrices, we obtain \[ {\bf B}_c {\bf A}_b = \begin{bmatrix} {\bf B}\,{\bf A} & {\bf B}\,{\bf b} + {\bf c} \\ 0 \cdots 0 & 1 \end{bmatrix} = \begin{bmatrix} -5& -11& -4 \\ 8&18& 5.5 \\ 5&7& 22 \\ 0&0&1 \end{bmatrix} \in \mathbb{R}^{4 \times 3} . \]

Clear[Ab, Bc, BcAb, Cdv]; Ab = {{1, 3, -2}, {2, 4, 3.5}, {0, 0, 1}}
\( \displaystyle \quad \left( \begin{array}{ccc} 1 & 3 & -2 \\ 2 & 4 & 3.5 \\ 0&0&1 \end{array} \right) \)
Bc = {{-1, -2, 1}, {2, 3, -1}, {-3, -4, 2}, {0, 0, 1}}
\( \displaystyle \quad \begin{pmatrix} 1 & -2 & 1 \\ 2&3& -1 \\ -3&-4&2 \\ 0&0&1 \end{pmatrix} \)
Denoting composition (product) of these augmented matrices as Cd, we apply it to point v. However, two-dimensional vector v = [3.4 −2.1] must be splited depending whether it is considered as an element of the inhabited set or vector space in the canonical affine space &Apf;² = (ℝ², ℝ², +). In other words, vector v = [3.4 −2.1] can belong either to the inhabited set ℝ² of points or to vector space ℝ². In Euclidean space ℝ², these two sets coinside, but they are treated differently in the canonical affine space.

Therefore, when vector v = [3.4 −2.1] is considered as an element of the inhabited set, an extra "1" is added to make it three-dimensional: v = [3.4 −2.1 1] Existence of extra component "1" indicates that v is considered as a point in the inhabited set. On the other hand, if the same vector v = [3.4 −2.1] is considered as a vector, we append extra component "0," which shows that this vector belongs to the vector space: v = [3.4 −2.1 0].

Then the augmented matrix Cd can be applied from left to either points or vectors. In computer graphics, this augmented matrix acts on points/vectors written in row format from right. The last component (either "1" or "0") of points/vectors does not effect the outcome. This makes the augmented matrices a universal instrument that can operate on both, points and vectors. \[ {\bf C}_d {\bf v} = \begin{bmatrix} -5& -11& -4 \\ 8&18& 5.5 \\ 5&7& 22 \\ 0&0&1 \end{bmatrix} \begin{pmatrix} 3.4 \\ -2.1 \\ 1 \end{pmatrix} = \begin{pmatrix} 2.1 \\ -5.1 \\ 24.3 \\ 1 \end{pmatrix} . \]

BcAb = Append[ Transpose[Append[Transpose[B . A], B . b + c]], {0, 0, 1}]
\( \displaystyle \quad \begin{pmatrix} -5 & -11 & -4 \\ 8&18 & 5.5 \\ 5&7&22 \\ 0&0&1 \end{pmatrix} \)
Alternatively
Join[Join[B . A, Transpose[{B . b + c}], 2], {{0, 0, 1}}] TrueQ[% == BcAb]
If you consider v = [3.4 −2.] ∈ ℝ² as a vector [3.4 −2.1 0] in affine vector space, then \[ {\bf C}_d {\bf v} = \begin{bmatrix} -5& -11& -4 \\ 8&18& 5.5 \\ 5&7& 22 \\ 0&0&1 \end{bmatrix} \begin{pmatrix} 3.4 \\ -2.1 \\ 0 \end{pmatrix} = \begin{pmatrix} 2.1 \\ -5.1 \\ 24.3 \\ 0 \end{pmatrix} . \]

Note that the product AbBc is undefined because it has uncompatible dimensions (you cannot multiply 3×3 matrix by 4×3 matrix from right).

Observe that Mathematica automatically generates augmented matrices when command AffineTransform is invoked.

Another example:

Define points P and Q on the plane: \[ P = \left[ 1, 2 \right]^{mathrm T} , \qquad Q = \left[ 3, 4 \right]^{mathrm T} . \]

Clear[P,Q,f,g,fP,fQ,gfP,gfQ,lhs,rhs]; P={1,2}; Q={3,4};
Define affine transformations f and g, both map 𝔸² ↣ 𝔸². \[ f(\mathbf{v}) = \begin{bmatrix} 1&1 \\ 0&1 \end{bmatrix} \mathbf{v} + \begin{pmatrix} 1 \\ 1 \end{pmatrix} , \qquad g(\mathbf{u}) = \begin{bmatrix} 0&-1 \\ 1&0 \end{bmatrix} \mathbf{u} + \begin{pmatrix} 3 \\ 2 \end{pmatrix} \]
f=AffineTransform[{{{1,1},{0,1}},{1,1}}];(* Shear transformation with translation *) g=AffineTransform[{{{0,-1},{1,0}},{3,2}}];(* Rotation by 90 deg counterclockwise with translation *)
Apply transformations needed for composition gf   
fP=f[P]; fQ=f[Q]; gfP=g[fP]; gfQ=g[fQ];
Manually combine the transformations:
fMatrix={{1,1},{0,1}}; fVector={1,1}; gMatrix={{0,-1},{1,0}}; gVector={3,2}; combinedMatrix=gMatrix.fMatrix; combinedVector=gMatrix.fVector+gVector;
Define the composed transformation
compositiongf=AffineTransform[{combinedMatrix,combinedVector}];
Apply composed transformation
compositionP=composition[P]; compositionQ=composition[Q]; lhs=compositionDiff=compositionP-compositionQ; rhs=gMatrix.(fMatrix.(P-Q)); proofResult=lhs==rhs; {lhs,rhs,proofResult}
{{2, -4}, {2, -4}, True}
Create graphical illustration:
gr17e1 = Graphics[{Red, PointSize[Large], Point[P], Point[Q], Blue, Thick, Arrow[{P, fP}], Arrow[{Q, fQ}], Text["P", P, {-1, -1}], Text["Q", Q, {1, 1}], Text["f(P)", fP, {-1, -1}], Text["f(Q)", fQ, {.5, -.75}]}];
gr17e2 = Graphics[{Red, PointSize[Large], Point[fP], Point[fQ], Blue, Thick, Arrow[{fP, gfP}], Arrow[{fQ, gfQ}], Text["f(P)", fP, {-1, -1}], Text["f(Q)", fQ, {1, 1}], Text["g\[SmallCircle]f(P)", gfP, {-1, -1}], Text["g\[SmallCircle]f(Q)", gfQ, {-1, -.75}]}];
gr17e3 = Graphics[{Red, PointSize[Large], Point[P], Point[Q], Blue, Thick, Arrow[{P, compositionP}], Arrow[{Q, compositionQ}], Text["P", P, {-1, -1}], Text["Q", Q, {1, 1}], Text["g\[SmallCircle]f(P)", compositionP, {-1, -1}], Text["g\[SmallCircle]f(Q)", compositionQ, {-1, -1.5}]}]; GraphicsGrid[{{gr17e1, gr17e2, gr17e3}}, Frame -> All, Spacings -> {1, 1}]
Illustration of Theorem 9.
We also calculate reverse composition fg:
gP=g[P]; gQ=g[Q]; fgP=f[gP]; fgQ=f[gQ];
gr17e1a = Graphics[{Red, PointSize[Large], Point[P], Point[Q], Blue, Thick, Arrow[{P, gP}], Arrow[{Q, gQ}], Text["P", P, {-1, -1}], Text["Q", Q, {1, 1}], Text["g(P)", gP, {-1, -1}], Text["g(Q)", gQ, {.5, -.75}]}];
gr17e2a = Graphics[{Red, PointSize[Large], Point[gQ], Point[gQ], Blue, Thick, Arrow[{gQ, gfQ}], Arrow[{gQ, fgQ}], Text["g(Q)", gQ, {-1, -1}], Text["g(Q)", gQ, {1, 1}], Text["f\[SmallCircle]g(Q)", fgQ, {-1, -1}], Text["f\[SmallCircle]g(Q)", fgQ, {-1, -.75}]}];
r17e3a = Graphics[{Red, PointSize[Large], Point[P], Point[Q], Blue, Thick, Arrow[{P, compositionP}], Arrow[{Q, compositionQ}], Text["P", P, {-1, -1}], Text["Q", Q, {1, 1}], Text["f\[SmallCircle]q(P)", compositionP, {-1, -1}], Text["f\[SmallCircle]g(Q)", compositionQ, {-1, -1.5}]}]; GraphicsGrid[{{gr17e1a, gr17e2a, gr17e3a}}, Frame -> All, Spacings -> {1, 1}]
Illustration of Theorem 9.
   ■
End of Example 25

Conclusions

    This section is very large. The natural question is: Do you really need to know this material (which is advanced)? There is no simple answer, So, we explore our exposition from an historical prospective in the hope the reader appreciates our approach, while breaking the topic into smaller parts. This long story about affine spaces is needed for acknowledging the differences and similarities of the homogeneous equationA x = 0 and the nonhomogeneous one, A x = b.

The objective of this tutorial is to demonstrate that theoretical mathematics (and Linear Algebra in particular) is easier to comprehend and understand by working with software packages over numerous examples. About 50 years ago, affine spaces and transformations were included only in theoretical (advanced) courses for math majors. Now, with computers, the situation is very different. What was strictly theoretical 100 years ago, now becomes practical. For particular applications of affine transformations, see the section computer graphics in Part 7.

On the other hand, this tutorial also includes theoretical development of an advanced course in Linear Algebra because it is not wise to lose information that great people developed over centuries. Theory and practice should co-exist peaceably! Therefore, we present theoretical material in the hope that practical people such as engineers satisfy their curiosity and learn first principles. This tutorial is designed for mathematicians as well who want to know where and how their theories are applied. It is the reader's job to evaluate how successful this tutorial is in uniting these two categories of readers.

 

Artificial Intellingence in use


Earlier, way back at the top of this page, we began a passage with "One might reasonably ask: 'What does the last line mean? What does the row containing {0.,0.|1.}, of the TransformationFunction do?'" We put off until now a more complete answer to that question. Before, what followed was a particular answer from the ChatBot representing Wolfram's Artificial Intelligence module in a Chat-Enabled notebook. Below we ask that same question of the same AI bot and this time we get:

The last line ChatCodeInlineTemplate[{0,0,1}] in the matrix of a

TextRefLink[TransformationFunction,paclet:ref/TransformationFunction] is part of the homogeneous coordinates used in projective geometry.
When transforming points in 2D space, it's often convenient to work with 3x3 matrices instead of 2x2. This allows us to handle translations as well as rotations, scalings, and shears in a unified manner. A 2D point ChatCodeInlineTemplate[{x,y}] is represented as a 3D point ChatCodeInlineTemplate[{x,y,1}] in homogeneous coordinates. The 3x3 transformation matrix operates on these 3D homogeneous coordinates. The last row of this matrix is usually ChatCodeInlineTemplate[{0,0,1}]. This means that the z-coordinate (which is always 1 for 2D points) remains unchanged after the transformation. In other words, the last line ChatCodeInlineTemplate[{0,0,1}] is there to ensure that the translation part of the transformation works correctly, and that we stay in the "z=1" plane of the 3D space. For more information, you can read about TextRefLink[AffineTransform,paclet:ref/AffineTransform] and TextRefLink[TransformationFunction,paclet:ref/TransformationFunction] in the Wolfram Language documentation, or about homogeneous coordinates in resources on computer graphics or projective geometry.
The determined student is urged to make a careful comparison of the two answers, noting how they differ. To the pure mathematician, it would seem anathema, even heresy, to ask the same question twice and get two different answers. And yet that is what the computer provides. To the pure mathematician, one and only one answer may be correct. Having two calls into question the entire process, leaving both answers suspect. To the applied mathematician, elements of accuracy are present in each. One answer may be more complete or more useful than the other.

In the earlier passage we proposed broadening the question to consider "What makes a mathematician?" One answer might reasonably be "One who demands precisely one correct answer to every question asked." Indeed, the domain of the pure mathematician is the formal proof, a series of irrefutable statements leading to the one correct conclusion.

On the other hand, computer science (and computer packages such as Wolfram's Mathematica) are used everywhere in engineering, physics, and other applied topics. What is important here is the distinction between the "pure" or "theoretical" mathematician and the applied mathematician. Innovations in computer science are embraced by the latter as they became part of nature, influencing the fields to which math is a servant. Does this mean that classical math will be forced to change and adopt new notations and terminology? The answer is "yes" in the practical, applied world where ever more accurate estimates are accepted as useful.

To employ Mathematica to describe this situation, consider the following matrix:

Text@Grid[ Prepend[data, {"", Style["PURE MATHEMATICS", Red, Bold], Style["APPLIED MATHEMATICS", Red, Bold]}], Background -> {None, {Lighter[Yellow, .9], {White, Lighter[Blend[{Blue, Green}], .8]}}}, Dividers -> {{Darker[Gray, .6], {Lighter[Gray, .5]}, Darker[Gray, .6]}, {Darker[Gray, .6], Darker[Gray, .6], {False}, Darker[Gray, .6]}}, Alignment -> {{Left, Center, {Center}}}, Frame -> Darker[Gray, .6], ItemStyle -> 14, Spacings -> {Automatic, Automatic}]

  1. Suppose that an affine space A has two frames \[ \beta = \left( \begin{bmatrix} 2 \\ -2 \end{bmatrix} , \ \begin{bmatrix} 3 \\ 1 \end{bmatrix} , \ (2, -4)\right) \] and \[ \phi = \left( \begin{bmatrix} 3 \\ 1 \end{bmatrix} , \ \begin{bmatrix} 1 \\ 1 \end{bmatrix} , \ (-2, 5)\right) \] Find the change of frame matrix M and use it to compute ⟦Qβ = (5, -3, 1), then find ⟦Qϕ.
  2. Suppose M is the change of frame matrix that transforms coordinates relative to frame β to coordinates relative to frame ϕ. Prove that M−1 exists.
  3. Determine the matrix representation of the affine transformation S : 𝔸 → 𝔸 if 𝔸 = (A, ℝ²) and S(P) = Q where Q = (x+2y, y) if P = (x, y). What type of transformation is this?
  4. We call the ith median of the system of points P₁, P₂, … , PnA, the inhabited set, segment connecting the point Pi with the center of gravity of the remaining point {Pj : ji}. Prove that all medians intersect at one point - the center of gravity of points P₁, P₂, … , Pn.

  1. Anton, Howard (2005), Elementary Linear Algebra (Applications Version) (9th ed.), Wiley International
  2. DeRose, T., Three-dimensional Computer Graphics, A coordinate-free approach,
  3. Dunn, F. and Parberry, I. (2002). 3D math primer for graphics and game development. Plano, Tex.: Wordware Pub.
  4. Foley, James D.; van Dam, Andries; Feiner, Steven K.; Hughes, John F. (1991), Computer Graphics: Principles and Practice (2nd ed.), Reading: Addison-Wesley, ISBN 0-201-12110-7
  5. Gallier, J., Geometric Methods and Applications for Computer Science and Engineering, Springer, second edition, 2011.
  6. Kostrikin, A.I. and Manin, Yu.I., Linear Algebra and Geometry, Gordon and Breach Science Publishers, PAmsterdam, The Netherlands, 1997.
  7. Mann, S., Litke, N., DeRose, T., A coordinate free geometry ADT, Technical Report 89-09-16, Department of Computer Science and Engineering, University of Washigton, Seatle, Washington, 1988.
  8. Matrices and Linear Transformations
  9. Rogers, D.F., Adams, J. A., Mathematical Elements for Computer Graphics, McGraw-Hill Science/Engineering/Math, 1989.
  10. Samuel, Pierre (1988), Projective Geometry, Springer-Verlag, ISBN 0-387-96752-4
  11. Shoemake, K. Plücker coordinate tutorial. Ray Tracing News, volume 11, number 1, July 1998.
  12. Szeliski, R., Computer Vision: Algorithms and Applications, 2nd edition, Springer,
  13. Watt, A., 3D Computer Graphics, Addison-Wesley; 3rd edition, 1999.