es Labeled Chessboard .black { background-color: #b58863; } Labeled Chessboard

The Wolfram Mathematica notebook which contains the code that produces all the Mathematica outputs in this web page may be downloaded at this link. Caution: This notebook will evaluate, cell-by-cell, sequentially, from top to bottom. However, due to re-use of variable names in later evaluations, computer memory must be cleared with command Clear[ ] prior further code execution.

$Post := If[MatrixQ[#1], MatrixForm[#1], #1] & (* outputs matrices in MatrixForm*)
Remove[ "Global`*"] // Quiet (* remove all variables *)

Linear algebra is primarily concerned with two types of mathematical objects: vectors and their transformations or matrices. Moreover, these objects are common in all branches of science related to physics, engineering, economics, and through computer technologies to almost all branches of science. Since matrices are build from vectors, this section focuses on the latter by presenting basic vector terminology and corresponding concepts. Fortunately, we have proper symbols for their computer manipulations.

There are two approaches in science literature to define vectors---one is abstract, especially those on linear algebra texts (you will meet it in part 3 of this tutorial), another trend involving engineering and physics topics is based on geometrical interpretation or coordinates definition, caring sequences of numbers. Computer science folks sit on both chairs because they utilize both interpretations of vectors. Moreover, they introduced abstract analogues of vectors---lists and arrays are widely used in mathematical books. We all observe a shift in science made by computer science including impact on classical mathematics.

Vectors

What is a vector? It turns out that the answer depends on who is asking. In classical mechanics and calculus, you learn that a vector is a mathematical object that has both magnitude (or length) and direction. In quantum mechanics, you will discover that instead of geometrical definition of vectors, you need to know more abstract objects, called bra- and ket-vectors. Studying functional analysis, integral equations, and partial differential equations you meet with special kind of vectors, called covectors or functionals. Later, when you want to understand what Albert Einstein contributed to the world, you discover two types of vectors: covariant vectors and contravariant vectors. In graduate study, you will meet three types of vectors in Machine Learning course: feature vectors, thought vectors, and word vectors.

In the third chapter of this tutorial, you will learn how all these different kinds of vector definition can be united into one mathematically well-motivated generalization. This section is devoted to one particular version of vectors that is closely related to solving systems of linear equations. Moreover, it establishes a connection of our algebraic definition of vectors with similar definition of vectors in object oriented computer languages such as C++ that generalize such terms as arrays and lists.

Mathematicians distinguish between vector and scalar (pronounced “SKAY-lur”) quantities. You’re already familiar with scalars—scalar is the technical term for an ordinary number. In this course, we use four specific sets of scalars: integers ℤ, rational numbers ℚ, real numbers ℝ, and complex numbers ℂ. We abbreviate these four sets with symbol 𝔽 (which is either ℤ or ℚ or ℝ or ℂ). Be aware that we do not cover the seemingly exotic finite fields only for technical reasons; however, they are very important in some applications (see for example, in coding theory). By a vector we mean a list or finite sequence of numbers. We employ scalars when we wish to emphasize that a particular quantity is not a vector quantity. For example, as we will discuss shortly, “velocity” and “displacement” are vector quantities, whereas “speed” and “distance” are scalar quantities.

In this section, we focus on algebraic definition of vectors as arrays of numbers and put its geometrical interpretation on back burner (Part 3).

   
Example 1: About 2,000 years ago, the ancient Greek engineer Philo of Byzantium came up with what may be the earliest design for a thermometer: a hollow sphere filled with air and water, connected by tube to an open-air pitcher. The idea was that air inside the sphere would expand or contract as it was heated or cooled, pushing or pulling water into the tube. In the second century A.D., the Greek-born Roman physician Galen created and may have used a thermometer-like device with a crude 9-degree scale, comprising four degrees of hot, four degrees of cold, and a “neutral” temperature in the middle.

It wasn’t until the early 1600s that thermometer began to come into its own. The famous Italian astronomer and physicist Galileo Galilei (1564--1642), or possibly his friend the physician Santorio, likely came up with an improved thermoscope around 1593: An inverted glass tube placed in a bowl full of water or wine. Santorio apparently used a device like this to test whether his patients had fevers. Shortly after the turn of the 17th century, English physician Robert Fludd also experimented with open-air wine thermometers.

The first recorded instance of anyone thinking to create a universal scale for thermoscopes was in the early 1700s. In fact, two people had this idea at about the same time. One was a Danish astronomer named Ole Christensen Rømer (1644--1710), who had the idea to select two reference points—the boiling point of water and the freezing point of a saltwater mixture, both of which were relatively easy to recreate in different labs—and then divide the space between those two points into 60 evenly spaced degrees. The other was England’s revolutionary physicist and mathematician Isaac Newton (1643--1727), who announced his own temperature scale, in which 0 was the freezing point of water and 12 was the temperature of a healthy human body, the same year that Rømer did. (Newton likely developed this admittedly limited scale to help himself determine the boiling points of metals, whose temperatures would be far higher than 12 degrees.)

After a visit to Rømer in Copenhagen, the Dutch-Polish physicist Daniel Fahrenheit (1686--1736) was apparently inspired to create his own scale, which he unveiled in 1724. His scale was more fine-grained than Rømer’s, with about four times the number of degrees between water’s boiling and freezing points. Fahrenheit is also credited as the first to use mercury inside his thermometers instead of wine or water. Though we are now fully aware of its toxic properties, mercury is an excellent liquid for indicating changes in temperature.

Originally, Fahrenheit set 0 degrees as the freezing point of a solution of salt water and 96 as the temperature of the human body. But the fixed points were changed so that they would be easier to recreate in different laboratories, with the freezing point of water set at 32 degrees and its boiling point becoming 212 degrees at sea level and standard atmospheric pressure.

But this was far from the end of the development of important temperature scales. In the 1730s, two French scientists, Rene Antoine Ferchault de Réamur (1683--1757) and Joseph-Nicolas Delisle (1688--1768), each invented their own scales. Réamur’s set the freezing point of water at 0 degrees and the boiling point of water at 80 degrees, convenient for meteorological use, while Delisle chose to set his scale “backwards,” with water’s boiling point at 0 degrees and 150 degrees (added later by a colleague) as water’s freezing point.

A decade later, Swedish astronomer Anders Celsius (1701--1744) created in 1742 his eponymous scale, with water’s freezing and boiling points separated by 100 degrees—though, like Delisle, he also originally set them “backwards,” with the boiling point at 0 degrees and the ice point at 100. (The points were swapped after his death.) In 1745, Carolus Linnaeus (1707--1778) of Uppsala, Sweden, suggested that things would be simpler if we made the scale range from 0 (at the freezing point of water) to 100 (water’s boiling point), and called this scale the centigrade scale. (This scale was later abandoned in favor of the Celsius scale, which is technically different from centigrade in subtle ways that are not important here.) Notice that all of these scales are relative—they are based on the freezing point of water, which is an arbitrary (but highly practical) reference point. A temperature reading of x°C basically means “x degrees hotter than the temperature at which water freezes.”

Then, in the middle of the 19th century, British physicist William Lord Kelvin (1824--1907) also became interested in the idea of “infinite cold” and made attempts to calculate it. In 1848, he published a paper, On an Absolute Thermometric Scale, that stated that this absolute zero was, in fact, -273 degrees Celsius. (It is now set at -273.15 degrees Celsius.)

Loudness:

Loudness is usually measured in decibels (abbreviated dB). To be more precise, decibels are used to measure the ratio of two power levels. If we have two power levels P₁ and P₂, then the difference in decibels between the two power levels is \[ 10\,\log_{10} \left( \frac{P_2}{P_1} \right) \ \mbox{dB} . \] So, if P₂ is about twice the level of P₁, then the difference is about 3 dB. Notice that this is a relative system, providing a precise way to measure the relative strength of two power levels, but not a way to assign a number to one power level.

Note that humans perceive loudness based on the intensity (amplitude) of sound waves, but also taking into account frequency and duration. While the ear converts sound waves into electrical signals that the brain interprets, loudness is a subjective experience influenced by factors beyond simple sound pressure level. While decibels (dB) measure sound intensity objectively, perceived loudness is subjective and doesn't increase linearly with decibel increases.

The generally accepted just noticeable difference (JND) in sound intensity for the average human ear is approximately 1 dB. This means that, on average, a change of about 1 dB is the smallest difference in loudness that a typical listener can reliably detect. However, under ideal, quiet conditions with pure tones and focused attention, some people can detect differences as small as 0.5 dB. In particular, the difference of +3 dB: is barely perceptible — this is often considered the minimum noticeable difference in loudness in sound level that is clearly noticeable in typical settings like music or speech. +5 dB: is clearly audible change; +10 dB: is perceived as twice as loud; +20 dB: is perceived as four times as loud.

These two examples of scalars (temperature and loudness) show that mother nature put bounds (upper and lower) of their applications in real life. Even in possible future, people may use another scaling in enumeration of these scalars, they still will be bounded. From mathematical prospective, it is not convenient to be restricted in calculations with these scalars or numbers. Therefore, mathematicians use their lovely trick (crucial concept) discovered in the seventeen century with calculus invention---infinity (or limit, which is equivalent). So the set of all real numbers, denoted by ℝ, is called a scalar set and our two examples constitute only a part of ℝ, but not the whole set of scalars (which must be unbounded by definition).    ■

End of Example 1

Vectors and Points

Points and vectors, while often related, represent different concepts. There is a special mathematical structure that treats them strictly differently---affine space. A point indicates a specific location in space, while a vector represents a displacement or direction with magnitude (length). Since vectors have no fixed position in space, they sometimes are called free vectors. A vector starting at the origin and having its tip at particular point is called a coordinate vector. Geometrically, we draw points as dots and vectors as line segments with arrows.

Besides that we can add or subtract vectors, we can also add a vector to a point to get another point. This gives us a way to describe points. We need to have a starting point (the origin). We can then represent any point in space by the vector that connects the origin to that point. In this sense, each point corresponds uniquely to a position vector anchored at the origin. This identification establishes a bijection (one-to-one and onto correspondence) between points and vectors. When origin and frame are established, both vectors and points can be described as an array of numbers.

The Cartesian product of two sets, A and B, is a new set formed by pairing every element of A with every element of B, resulting in ordered pairs. It's denoted as A × B = {(a, b) : a ∈ A, and b ∈ B}.

Note that the set-builder notation admits separation of sets and their properties by either two vertical dots or vertical line. Remember that order of pairing sets A and B in its Cartesian product A × B matters. For instance, if ℤ = {0, ±1, ±2, …} is paired with itself, the ordered pairs (1, 3) and (3, 1) represent distinct elents from ℤ × ℤ, but the sets {3, 1} and {1. 3} are equal. The arrays (2, 2) and (2, 2, 2) are not equal (they do not have the same length), although the sets {2, 2} and {2, 2, 2} both equal the set {2}.

This definition can be extended for arbitrary finite number of sets, forming an n-fold Cartesian product. Although mathematician define Cartesian products for arbitrary family of sets, we do not use it in this tutorial.

Given any family (Aj)j ∈ J of sets, the Cartesian product ΠjAj of the family is the set of all functions f from the index set J with fj in Aj for each j in J.

An important special case is a power of an object 𝑋, where we take all 𝑋j to be 𝑋 and form \( \displaystyle \quad X^J = \prod_{j\in J} X . \quad \) When J is the finite set of integers, [1..n] = {1, 2, … , n}, the n-fold Cartesian product of set X is \( \displaystyle \quad X^{[1..n]} . \)

Given sets A₁, A₂, etc., the Cartesian product of the countably infinitary family (A₁, A₂, … ) is written as \( \displaystyle \quad \prod_{j=1}^{\infty} A_j ; \quad \) its elements (𝑎₁, 𝑎₂, 𝑎₃, …) are called infinite sequences.

The Cartesian product of the empty family () is the point, a set whose only element is the empty list ().

In categories of algebras, products are constructed in the “obvious” manner; see for example direct product group.

Given n sets A₁, A₂, through An, the Cartesian product of the 𝑛-ary family (A₁, A₂, … , An) is written Πni=1Aj; its elements (𝑎₁, 𝑎₂, … , 𝑎n) are called ordered n-tuples. When all sets A₁, A₂, through An are the same, their Cartesian product is usually denoted as A[1..n].

An n-tuple is an ordered list of n elements, which are typically enclosed in parentheses to represent points in n-dimensional space. Strictly speaking, n-tuples are not vectors but points because they are elements of the Cartesian product 𝔽[1..n] and as so there is not vector structure. Points cannot be added as pixels on your screen, but you can add (or attach) a vector to a point and obtain a new point. It will be shown shortly that the corresponding space of points can be equipped with addition and scalar multiplication that makes it a vector space. In many textbooks, n-tuples are identified with vectors keeping in mind corresponding isomorphism between points and vectors.

   
Example 2: Let us consider eight sets consisting of a single element. In order to distinguish these sets, we mark them with first eight letter from Latin alphabet, (a--h). We also consider an eight element set of integers starting with 1, so [1..8] = {1, 2, … , 8}. The Cartesian product of eight letter sets is just (a--h)[1..8] that consists of all functions mapping each of eight digits into a single letter. so there are 64 = 8 × 8 of them. Every element from this Cartesian set can be uniquely identified with a letter and a digit; say A1 means a functions that maps 1 into letter A.

If you have ever played chess, you have some exposure to two dimensional (2D) Cartesian coordinate spaces. A chessboard is usually enumerated with a system called algebraic notation. Each square is identified by a letter (a-h) for the file (column) and a number (1-8) for the rank (row), from the perspective of the white player. So white king is initially in position A5. This creates a grid of 64 unique squares, like a coordinate system on the board.

It is possible to use the Cartesian product [1..8](a-h) for enumeration of squares in the chessboard. So white king will be initially in position E1; However, such enumeration is not used in practice.    ■

End of Example 2

The following example demonstrates a practical application of two dimensional Cartesian product to possible planning of streets in a city.    

Example 3: A city planners want to build a new district that will be very efficient for transportaion. So they choose to use two kinds of roads, one of them they call "Avenue" and another one "Street." The number of these roads and their names will be determined later upon construction and budget constraints. They chose to build m streets and n avenues. First, they made two Cartesian products Avenue[1..n] and Street[1..m], which both are one dimensional arrays of size n and m, respectively. For example, \[ \mbox{Street}^{[1..m]} = \left\{ \mbox{st}1, \mbox{st}2 , \ldots , \mbox{st}m \right\} . \] The city planners decided to make "streets" horizontally and "avenues" vertical with respect to main city frame. Finally, they defined the Cartesian product of these one dimensional arrays: Avenue[1..n] × Street[1..m]. The result of this two dimensional Cartesian product with the aid of Mathematica is plotted below.
hor0 = Graphics[{Red, Thickness[0.016], Line[{{-3, 0}, {3, 0}}]}]; ver0 = Graphics[{Red, Thickness[0.016], Line[{{0, -2.5}, {0, 2.5}}]}]; verm1 = Graphics[{Blue, Thickness[0.01], Line[{{-3, -2.5}, {-3, 2.5}, {-2.5, 2.5}, {-2.5, -2.5}, {-2, -2.5}, {-2, 2.5}, {-1.5, 2.5}, {-1.5, -2.5}, {-1, -2.5}, {-1, 2.5}, {-0.5, 2.5}, {-0.5, -2.5}}]}]; verp1 = Graphics[{Blue, Thickness[0.01], Line[{{3, -2.5}, {3, 2.5}, {2.5, 2.5}, {2.5, -2.5}, {2, -2.5}, {2, 2.5}, {1.5, 2.5}, {1.5, -2.5}, {1, -2.5}, {1, 2.5}, {0.5, 2.5}, {0.5, -2.5}}]}]; horp1 = Graphics[{Blue, Thickness[0.01], Line[{{-3, 2.5}, {3, 2.5}, {3, 2}, {-3, 2}, {-3, 1.5}, {3, 1.5}, {3, 1}, {-3, 1}, {-3, 0.5}, {3, 0.5}}]}]; horm1 = Graphics[{Blue, Thickness[0.01], Line[{{-3, -2.5}, {3, -2.5}, {3, -2}, {-3, -2}, {-3, -1.5}, {3, \ -1.5}, {3, -1}, {-3, -1}, {-3, -0.5}, {3, -0.5}}]}]; Show[hor0, ver0, horp1, verm1, verp1, horm1]
Figure 3.1: Map of the hypothetical city of Cartesia

Let’s imagine a fictional city named Cartesia. Inspired by Manhattan, a borough of New York City, the Cartesia city planners were laying out the streets and avenues in a very particular way, as illustrated in the map of Cartesia in Figure 3.1. As you can see from the map, Center Avenue runs east-west through the middle of town. All other east-west avenues (parallel to Center Avenue) are named based on whether they are north or south of Center Avenue, and how far they are from Center Avenue. Examples of avenues that run east-west are North 3rd and South 15th Avenue.

The other streets in Cartesia run north. Division Street runs north-south through the middle of town. All other north-south streets (parallel to Division Street) are named based on whether they are east or west of Division Street, and how far they are from Division Street.

Of course, the map of Cartesia is an idealization of rectangular plane---there is no limit in space of the Cartesian space, and streets do not have width being drawn through any point. Nevertheless, this map shows coordinate lines parallel to abscissa (streets) and ordinate (avenues).

   ■
End of Example 3

As you know, a point has a location but no real size or thickness. In order to identify position of a point, we need to establish a global frame relative to which we specify location of a point. However, an “absolute” position does not exist. For instance, any position on the earth can be specified by its latitude, longitude, and height above sea level.

Every attempt to describe a position requires that we describe it relative to something else. Any description of a position is meaningful only in the context of some (typically “larger”) reference frame. Theoretically, we could establish a reference frame encompassing everything in existence and select a point to be the “origin” of this space, thus defining the “absolute” coordinate space. Luckily for us, absolute positions in the universe aren’t important. Do you know your precise position in the universe right now?

In many cases, displacements are from the origin, and so there will be no distinction between points and vectors because they both share the same coordinates. However, we often deal with quantities that are not relative to the origin, or any other point for that matter. In these cases, it is important to visualize these quantities as an arrow rather than a point.

arx = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{-0.2, 0}, {1, 0}}]}]; ary = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, -0.1}, {0, 1}}]}]; point = Graphics[{Purple, Disk[{0.8, 0.7}, 0.02]}]; txt = Graphics[{Black, Text[Style["P(x, y)", FontSize -> 18, Bold], {0.86, 0.78}], Text[Style["x-axis", FontSize -> 18, Bold], {1.0, 0.1}], Text[Style["x", FontSize -> 18, Bold], {0.8, -0.1}], Text[Style["y", FontSize -> 18, Bold], {-0.1, 0.7}], Text[Style["y-axis", FontSize -> 18, Bold], {0.0, 1.04}]}]; line = Graphics[{Black, Dashed, Thick, Line[{{0, 0.7}, {0.8, 0.7}, {0.8, 0}}]}]; Show[line, point, txt, arx, ary]
Figure 1: Location of a point

We consider points as elements of Cartesian product 𝔽[1..n] of n scalar fields. Then every point P from ℝ[1..n] has n coordinates that specify its position:

\begin{equation} \label{EqVector.1} P = \left( x_1 , x_2 , \ldots , x_n \right) \in \mathbb{R}^{[1..n]} . \end{equation}

Since positions are relative to some larger frame, points are relative as well— they are relative to the origin of the coordinate system used to specify their coordinates. This leads us to the relationship between points and vectors. The following figure illustrates how the point (x, y) is related to the vector [x, y], given arbitrary values for x and y.

arx = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{-0.2, 0}, {1, 0}}]}]; ary = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, -0.1}, {0, 1}}]}]; ar = Graphics[{Blue, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, 0}, {0.8, 0.7}}]}]; point = Graphics[{LightGray, Disk[{0.81, 0.71}, 0.02]}]; txt = Graphics[{Black, Text[Style["Point (x, y)", FontSize -> 18, Bold], {0.86, 0.78}], Text[Style["x-axis", FontSize -> 18, Bold], {1.0, 0.1}], Text[Style["vector [x, y]", FontSize -> 18, Bold], {0.7, 0.4}], Text[Style["y-axis", FontSize -> 18, Bold], {0.0, 1.04}]}]; Show[point, ar, arx, ary, txt]
Figure 2: Point vs vector

Be aware that Figure 2 is not 100% accurate because a vector space has no property to identify which vector is perpendicular to another vector as we plotted coordinate axes. To define the orthogonality of vectors, one needs to impose additional structure into the vector space (in this case, this space is called an Euclidean space, see Chapter 5). The following figure shows the relationship between points and vectors in oblique coordinate system.

arx = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{-0.2, 0}, {1.5, 0}}]}]; ary = Graphics[{Black, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, -0.1}, {0.5, 1}}]}]; ar = Graphics[{Blue, Thickness[0.01], Arrowheads[0.1], Arrow[{{0, 0}, {1.4, 0.7}}]}]; point = Graphics[{Purple, Disk[{1.4, 0.71}, 0.02]}]; line1 = Graphics[{Dashed, Thick, Line[{{0.4, 0.7}, {1.4, 0.7}}]}]; line2 = Graphics[{Dashed, Thick, Line[{{1.0, 0.0}, {1.4, 0.7}}]}]; txt = Graphics[{Black, Text[Style["Point (x, y)", FontSize -> 18, Bold], {1.46, 0.85}], Text[Style["x-axis", FontSize -> 18, Bold], {1.5, 0.1}], Text[Style["vector [x, y]", FontSize -> 18, Bold], {0.7, 0.4}], Text[Style["y-axis", FontSize -> 18, Bold], {0.5, 1.04}]}]; Show[point, ar, arx, ary, txt, line1, line2]
Figure 3: Point vs vector in oblique coordinate system

As you see, we use lower case bold font to denote vectors. In opposite to vectors, we denote points by upper case letters in italic font. Since every multiple ℝ in ℝ[1..n] is a real line containing all real numbers in natural order, every element of the Cartesian product is uniquely identified by a list of n numbers, which we call a point. Vectors are used to describe displacements, and therefore they can describe relative positions. Points are used to specify positions.

Every point on the plane has two coordinates P(x, y) relative to the origin of the coordinate system. This point can also be identified by a vector pointed to it and started from the origin. This means that the vector also can be uniquely identified by the same pair v = [x, y]. This establishes a one-to-one correspondence between points and vectors.

Vectors as elements of 𝔽n

The Cartesian product deals with sets. However, we would like to extend it to fields rather than sets. Since fields possess an algebraic structure—defined by two operations, addition and multiplication---we would like to extend these operations to elements of Cartesian products. The following construction demonstrates how the Cartesian product 𝔽[1..n] of n scalar fields can be transferred into a vector space by introducing arithmetic operations inherited from the scalar field.

A direct product, denoted as 𝔽n = {x₁, x₂, … , xn) : xj ∈ 𝔽 for j = 1, 2, … , n}, of n fields 𝔽 is the Cartesian product of these fields, equipped with two operations: addition of arbitrary vectors x = (x₁, x₂, … , xn) and y = (y₁, y₂, … , yn), \[ \left( x_1 , \ldots , x_n \right) + \left( y_1 , \ldots , y_n \right) = \left( x_1 + y_1 , \ldots , x_n + y_n \right) \] and scalar multiplication \[ \lambda \left( x_1 , \ldots , x_n \right) = \left( \lambda\,x_1 , \ldots , \lambda\, x_n \right) , \quad \lambda \in \mathbb{F} . \] For (x₁, x₂, … , xn) ∈ 𝔽n and j ∈ [1..n], we say that xj is the j-th coordinate of (x₁, x₂, … , xn).

It was G.W. Leibniz (1646--1716) who first called them "coordinates." REcall that [1..n] = {1, 2, … , n is the set of first n natural numbers.

An element from 𝔽n is called a vector, which is an ordered array of numbers x = (x₁, x₂, … , xn), known as n-tuple. The number of components (n in 𝔽n) is determined as its size. Two vectors of the same size are equal if all their components are equal. The vector with all components zero, (0, 0, … , 0), is called the zero vector and denoted by 0.

It was a Welsh physician and mathematician Robert Recorde (1510--1558) who invented the equal sign (=) circa 1557 and also introduced the pre-existing plus (+) and minus (−) signs to English speakers in 1557.

When we wish to refer to the individual components in a vector, we use subscript notation. In math literature, integer indices are used to access the elements of n-tuple. As it follows from the definition above, vectors can be added/subtracted and multipled by a scalar.

   
Example 4: Computational linguistics has dramatically changed the way researchers study, understand, and translate languages. The ability to number-crunch huge amounts of words for the first time has led to entirely new ways of thinking about words and their relationship to one another. This number-crunching shows exactly how often a word appears close to other words, an important factor in how they are used. Phillip Walter Katz (1962--2000) used this property to invent a compression program known as ZIP. So the word "Tennis" might appear close to words like running, jumping, and throwing but less often next to words like computation or integral. This set of relationships can be thought of as a multidimensional vector that describes how the word Tennis is used within a language, which itself can be thought of as a vector space.

Computational technologies allow languages to be treated like vector spaces with precise mathematical properties. Let us consider four words as vectors of the same size: \[ {\bf King}, \quad {\bf Queen}, \quad {\bf man}, \quad {\bf woman}. \] We can write a linear equation that makes sense: \[ {\bf King} - {\bf man} + {\bf woman} = {\bf Queen} . \] You can interchange the order of words in the left-hand side as, for instance, \[ {\bf King} + {\bf woman} - {\bf man} = {\bf Queen} . \]

People generally know and use thousands of words; on average, working males use 2000-3000 words, females from 10,000-20000.

RJB
   ■
End of Example 4

 

Matrix forms of vectors


Although we will study matrices in next section and Chapter 2 is totally dedicated to this topic, it is convenient to realize an interconnection between matrices and vectors right now. In mathematics, a matrix is a rectangular array of numbers (or other mathematical objects), arranged in rows and columns. We denote by 𝔽m×n or 𝔽m,n the set of all m-by-n matrices with entries from field 𝔽.

Computers store matrices as a single, one-dimensional array of numbers, along with metadata that indicates the matrix's dimensions (number of rows and columns). This structure is computationally efficient for storage and access of matrix elements using their row and column indices. Different languages and libraries might use row-major or column-major ordering to map the 2D matrix to the 1D array. Therefore, computers treat matrices as one-dimensional arrays (actually vectors) with some auxiliary instructions containing two numbers: m, the number of rows, and n, the number of columns.

The dimension of a vector tells us how many numbers the vector contains; it is n when a vectors is taken from 𝔽n. Vectors may be of any positive dimension, including one. In fact, a scalar can be considered a one dimensional (1D for short) vector. . When writing a vector from 𝔽n, mathematicians list the numbers surrounded by parentheses:

\[ \mathbb{F}^n = \left\{ {\bf x} = \left( x_1 , x_2 , \ldots , x_n \right) \ : \ x_i \in \mathbb{F}, \quad i=1, 2, \ldots , n \right\} . \]
It is also very convenient to represent arbitrary n-tuple (or vector) as a matrix because matrices have much richer library of operations than vectors. There are three kinds of matrices that adequately represent vectors. The set of all row vectors of size or dimension n is denoted
\[ \mathbb{F}^{1\times n} = \left\{ {\bf x} = \left[ x_1 , x_2 , \ldots , x_n \right] \ : \ x_i \in \mathbb{F}, \quad i=1, 2, \ldots , n \right\} . \]
Although n-tuples and row vectors look very similar for human's eyes, computer languages have their own opinion on this matter (see next example for Mathematica applications). Another very important class of matrices for vector representation constitute column vectors
\[ \mathbb{F}^{n\times 1} = \left\{ {\bf x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} : \ x_i \in \mathbb{F}, \quad i=1, 2, \ldots , n \right\} . \]
There is no universal notation for column vectors in mathematical literature---some authors use brackets, others prefer parentheses, another people use both notations. This tutorial supports the latter approach---we will explane later in chapter 3 when column vectors in parentheses are preferable compared to brackets.

Finally, we consider diagonal matrices

\[ \begin{bmatrix} x_1 & 0 & 0 & \cdots & 0 \\ 0& x_2 & 0 & \cdots & 0 \\ 0&0&x_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & x_n \end{bmatrix} = \begin{pmatrix} x_1 & 0 & 0 & \cdots & 0 \\ 0& x_2 & 0 & \cdots & 0 \\ 0&0&x_3 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0&0&0& \cdots & x_n \end{pmatrix} . \]
Again, matrices can be embraced in parentheses or brackets.     All computer solvers distinguish rows from columns. Some of them treat n-tuples or points as n dimensional row vectors. For example, an 3-tuple (1, 2, 3) can be represented as a column vector
\[ {\bf a} = \begin{bmatrix} 1\\ 2 \\ 3 \end{bmatrix} = \begin{pmatrix} 1\\ 2\\ 3 \end{pmatrix} , \qquad \begin{split} a_1 &= a_x = 1, \\ a_2 &= a_y = 2, \\ a_3 &= a_z = 3 \end{split} \]
or as a diagonal matrix
\[ \begin{bmatrix} 1&0&0 \\ 0&2&0 \\ 0&0&3 \end{bmatrix} . \]
   
Example 5: Mathematica distinguishes n-tuples from row vectors. It has a dedicated command called VectorQ for checking whether an object is an n-tuple or a vector because Mathematica identifies n-tuples with vectors. For instance, let us consider a 3-tuple (1, 2, 3).
a = {1, 2, 3}; b = {{1, 2, 3}}; c = {{1}, {2}, {3}};
a == b
False
VectorQ[a]
True
VectorQ[b]
False
VectorQ[c]
False
SameQ[a, c]
False
Now we vialize these vectors
a // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} \)
b // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1 & 2 & 3 \end{pmatrix} \)
c // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} \)
Although Mathematica presents on the screen vectors a (3-tuple) and c (column vector) similarly, it knows that they are different matrices. Finally, we define a diagonal matrix
DiagonalMatrix[{1, 2, 3}] // MatrixForm
\( \displaystyle \quad \begin{pmatrix} 1&0&0 \\ 0&2&0 \\ 0&0&3 \end{pmatrix} \)
   ■
End of Example 5

Lists vs arrays vs vectors

Information technologies affect many traditional branches of science including mathematics. Programming languages and their applications constantly lead to updates of regular terms by adding some properties as well as introducing new ones. Now we observe that computer science terminology penetrates mathematical language, especially discrete mathematics and numerical analysis. Moreover, existing computational solvers dictate standardization of terminology in a new way making some notations and terms obsolescence. For example, more and more mathematicians follow the computer genius and inventor of new terms Donald Knuth and replace the Pochhammer symbol with rising factorial.

In many textbooks on Linear Algebra, you may find such terms as "list" or "array" as synonymous of the familiar word "vector" because there is no universally accepted definitions of these terms. Be aware that these terms are used in different computer languages and they may have different options. Array's definition does not allow to change its size and datatype.

The term vector appears in a variety of mathematical and engineering contexts, which we will discuss in Part3 (Vector Spaces). There is no universal notation for vectors because of diversity of their applications. Until then, vector will mean an ordered set (list) of numbers. Generally speaking, concept of vector may include infinite sequence of numbers or other objects. However, in this part of tutorial, we consider only finite lists.

In the context of algorithm analysis, a list is a fundamental data structure that represents a collection of elements in a specific order. It is a linear collection where each element is stored at a position, and these positions are arranged sequentially. Lists can be implemented using various techniques, such as arrays or linked structures.

Array and list are data structures that are used to store the data in a specific order, such as a list of student names or a sequence of numbers, while lists are mutable, meaning that they can be changed after they are created. Arrays are more memory efficient than lists for storing large collections of data of the same type. Arrays are mutable, meaning you can modify their content after creation. However, the type of elements they store remains consistent.

While arrays and vectors are mutable, their size is not as dynamically adjustable as lists. However, you can still append, extend, or remove elements from arrays and vectors, but it will result in a new array or vector of different size. Hence, list is appropriate to store student's names because lists can grow or shrink in size dynamically. You can cotenate it with any number of students who want to join your course. Similarly, when students drop the course, you just delete their names without changing the file name.

An array is always of a fixed size; it does not grow as more elements are required. The programmer must ensure that only valid values in the array are accessed, and must remember the location in the array of each value. Arrays are basic types in most programming languages and have a special syntax for their use.

Vectors are much like arrays, but data in their entries can be of distinct type. Unlike static arrays, which are always of a fixed size, vectors can be grown. This can be done either explicitly or by adding more data, which leads to new vectors.

In most functional programming languages, the word list always refers to a linked list built out of pairs, where each pair contains one list element and a pointer to the next pair. Whereas a list is built out of out of interlinked but still separate objects, a vector is a single object.    

Example 6: Mathematica has List command to store data:
List[a, b, c, d]

Recall that lists can be made of any type of element. A list element can also be a list. For example: [2, [3, 5], 17] is a valid list with the list [3, 5] being the element at index 2. When a list is an element inside a larger list, it is called a nested list.

Mathematica also has a special command Array

Arrays[{a, b, c, d}]
Mathematica uses Vectors command to represents the domain of vectors of particular dimension. Vectors are usually defined by embracing data into currly brackets (see the previous example).
Vectors[d]
   ■
End of Example 6

Vector Properties

For dimensions greater than three, we can no longer rely on visual representations of vectors. Therefore, it's essential to develop the ability to manipulate and calculate vectors algebraically, much like we do with real numbers. In many cases, vector operations resemble those of scalar arithmetic, and familiar algebraic rules apply. However, as we progress, we will encounter situations where vector algebra behaves quite differently from our experience with real numbers. For this reason, it's crucial to verify any algebraic properties before applying them.

The following theorem summarizes the main properties of vectors from the direct product 𝔽n for any positive integer n. Later in part 3 of this tutorial, you will see that these properties are valid for arbitrary vector spaces. The word "theorem" is derived from the Greek word "theorema," which in turn comes from a word meaning "to look at."

Theorem 1: Let u, v, and w be vectors in 𝔽n and let α and β be scalars from field 𝔽. Then
  1. u + v = v + u,
  2. (u + v) + w = u + (v + w),
  3. u + 0 = u,
  4. u + (−u) = 0,
  5. α (u + v) = α u + α v,
  6. (α + β) u = α u + β u,
  7. α (βu) = (αβ) u,
  8. u = u.
Let u = (u₁, u₂, … , un), v = (v₁, v₂, … , vn), and w = (w₁, w₂, … , wn).
  1. \begin{align*} \mathbf{u} + {\ng v} &= \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} + \begin{pmatrix} v_1 & v_2 & \cdots & v_n \end{pmatrix} \\ &= \begin{pmatrix} u_1 + v_1 & u_2 + v_2 & \cdots & u_n + v_n \end{pmatrix} \\ &= \begin{pmatrix} v_1 + u_1 & v_2 + u_2 & \cdots ( v_n + u_n \end{pmatrix} \\ &= \mathbf{v} + {\bf u} \end{align*}
  2. \begin{align*} \left( \mathbf{u} + \mathbf{v} \right) + \mathbf{w} &= \left( \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} + \begin{pmatrix} v_1 & v_2 & \cdots & v_n \end{pmatrix} \right) \\ & \quad + \begin{pmatrix} w_1 & \cdots & w_n \end{pmatrix} \\ &= \begin{pmatrix} u_1 + v_1 & u_2 + v_2 & \cdots & u_n + v_n \end{pmatrix} \\ & \quad + + \begin{pmatrix} w_1 & \cdots & w_n \end{pmatrix} \\ &= \begin{pmatrix} u_1 + v_1 + w_1 & u_2 + v_2 + w_2 & \cdots & u_n + v_n + w_n \end{pmatrix} \\ &= \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ & \quad + \begin{pmatrix} v_1 + w_1 & v_2 + w_2 & \cdots & v_n + w_n \end{pmatrix} \\ &= \mathbf{u} + \left( \mathbf{v} + \mathbf{w} \right) . \end{align*}
  3. \begin{align*} \mathbf{u} + \mathbf{0} &= \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} + \begin{pmatrix} 0&0& \cdots & 0 \end{pmatrix} \\ &= \begin{pmatrix} u_1 + 0 & u_2 + 0 & \cdots & u_n + 0 \end{pmatrix} \\ &= \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &= \mathbf{u} . \end{align*}
  4. \begin{align*} \mathbf{u} + \left( - \mathbf{u} \right) &= \begin{pmatrix}u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &\quad - \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &= \begin{pmatrix}u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &\quad + \begin{pmatrix} -u_1 & -u_2 & \cdots & -u_n \end{pmatrix} \\ &= \begin{pmatrix}u_1 - u_1 & u_2 - u_2 & \cdots & u_n - u_n \end{pmatrix} \\ &= \begin{pmatrix} 0&0& \cdots & 0 \end{pmatrix} = \mathbf{0} \end{align*}
  5. \begin{align*} \alpha \left( \mathbf{u} + \mathbf{v} \right) &= \alpha \left( \begin{pmatrix} u_1 & \cdots & u_n \end{pmatrix} + \begin{pmatrix} v_1 & \cdots & v_n \end{pmatrix} \right) \\ &= \begin{pmatrix} \alpha\, u_1 & \cdots & \alpha u_n \end{pmatrix} + \begin{pmatrix} \alpha v_1 & \cdots & \alpha v_n \end{pmatrix} \\ &= \alpha\,\mathbf{u} + \alpha\,\mathbf{v} . \end{align*}
  6. \begin{align*} \left( \alpha + \beta \right) \mathbf{u} &= \left( \alpha + \beta \right) \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &= \begin{pmatrix} \left( \alpha + \beta \right) u_1 & \left( \alpha + \beta \right) u_2 & \cdots & \left( \alpha + \beta \right) u_n \end{pmatrix} \\ &= \begin{pmatrix} \alpha u_1 + \beta u_1 & \alpha u_2 + \beta u_2 & \cdots & \alpha u_n + \beta u_n \end{pmatrix} \\ &= \begin{pmatrix} \alpha u_1 & \alpha u_2 & \cdots & \alpha u_n \end{pmatrix} \\ &\quad + \begin{pmatrix} \beta u_1 & \beta u_2 & \cdots & \beta u_n \end{pmatrix} \\ &= \alpha\,\mathbf{u} + \beta\,\mathbf{u} . \end{align*}
  7. \begin{align*} \alpha \left( \beta \mathbf{u} \right) &= \alpha \left( \beta \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \right) \\ &= \alpha \left( \begin{pmatrix} \beta\, u_1 & \beta\, u_2 & \cdots & \beta\, u_n \end{pmatrix} \right) \\ &= \begin{pmatrix} \alpha\beta\, u_1 & \alpha\beta\, u_2 & \cdots & \alpha\beta\, u_n \end{pmatrix} \\ &= \left( \alpha\beta \right) \mathbf{u} . \end{align*}
  8. \begin{align*} 1 \cdot \mathbf{u} &= 1 \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &= \begin{pmatrix} 1\,u_1 & 1\,u_2 & \cdots & 1\,u_n \end{pmatrix} = \begin{pmatrix} u_1 & u_2 & \cdots & u_n \end{pmatrix} \\ &= \mathbf{u} . \end{align*}
    Remarks:   
  • Properties (c) and (d) together with the commutativity property (a) imply that 0 + u = u and −u + u = 0 as well.
  • By property (b), we may unambiguously write u + v + w without parentheses, since we may group the summands in whichever way you please.
  • If we read the distributivity properties (e) and (f) from right to left, they say that we can factor a common scalar or a common vector from a sum.
   
Example 7:
  1. We generate two random vectors of size four:
         u = RandomReal[1, 4]
         {0.71096, 0.806601, 0.115963, 0.801105}
         v = RandomReal[1, 4]
         {0.753365, 0.792732, 0.635351, 0.887251}
    \begin{align*} \mathbf{u} &= \left( 0.71096, 0.806601, 0.115963, 0.801105 \right) , \\ \mathbf{v} &= \left( 0.753365, 0.792732, 0.635351, 0.887251 \right) . \end{align*} Commutative property of addition is verified with Mathematica
         u+v == v+u
         True
    \[ \mathbf{u} + \mathbf{v} = \left( 1.46432, 1.59933, 0.751314, 1.68836 \right) = \mathbf{v} + \mathbf{u} . \]
  2. We consider three column vectors \[ \mathbf{u} = \begin{pmatrix} 2 \\ -3 \\ 4 \end{pmatrix} , \quad \mathbf{v} = \begin{pmatrix} -4 \\ 5 \\ 6 \end{pmatrix} , \quad \mathbf{w} = \begin{pmatrix} 7 \\ 8 \\ -9 \end{pmatrix} . \] The sums of two of them are \[ \mathbf{u} + \mathbf{v} = \begin{pmatrix} -2 \\ 2 \\ 10 \end{pmatrix} , \quad \mathbf{v} + \mathbf{w} = \begin{pmatrix} 3 \\ 13 \\ -3 \end{pmatrix} . \]
         u = {2, -3, 4}; v = {-4, 5, 6}; u + v
         {-2, 2, 10}
         w = {7, 8, -9}; v + w
         {3, 13, -3}
    The sum of three vectors is
         (u+v) +w
         {5, 10, 1}
    We check property (b) with Mathematica
         (u+v) +w == u + (v+w)
         True
  3. Let u = (2.718281828, 3.141592654, -1.618033989) ∈ ℝ³, and 0 = (0, 0, 0) be the zero vector. Then their sum is just u, as Mathematica confirms
         u = {2.718281828, 3.141592654, -1.618033989}; null = {0,0,0}; u + null == u
         True
  4. Let u = (3.1, 3.14, 3.141. 3.1415) be a vector of size 4, which we write in matrix form \[ \mathbf{A} = \begin{bmatrix} 3.1 &0&0&0 \\ 0&3.14 &0&0 \\ 0&0& 3.141 &0 \\ 0&0&0&3.1415 \end{bmatrix} . \] Its negation is \[ - \mathbf{A} = \begin{bmatrix} -3.1 &0&0&0 \\ 0&-3.14 &0&0 \\ 0&0& -3.141 &0 \\ 0&0&0&-3.1415 \end{bmatrix} . \] No doubts, adding these two matrices produces the zero matrix.
         A = {{3.1 , 0,0,0}, {0, 3.14, 0,0}, {0,0,3.141 , 0}, {0,0,0,3.1415}}; A + (-A)
        \( \displaystyle \quad \begin{pmatrix} 0&0&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \\ 0&0&0&0 \end{pmatrix} \)
  5. Let u = (3, 7, −2), v = (−1, 5, 4) be vectors from ℝ³ and α = 3.5. Then \[ \mathbf{u} + \mathbf{v} = \begin{pmatrix} 2& 12& 2 \end{pmatrix} \]
         u = {3, 7, -2}; v = {-1, 5,4}; alpha = 3.5; u + v
        {2, 12, 2}
    Multiplying by α, we get \[ \alpha \left( \mathbf{u} + \mathbf{v} \right) = \begin{pmatrix} 7.& 42.& 7. \end{pmatrix} \]
         alpha*(u+v)
         {7., 42., 7.}
    Using Mathematica, we verify identity (g):
         alpha*(u + v) == alpha*u + alpha*v
        True
  6. We consider a vector u = [3, −8, 7, −2] a vector from &Ropf'1×4 and two scalars α = 2.4 and β = −7.7. Then \[ \alpha + \beta = 2.4 - 7.7 = -5.3 \]
         alpha = 2.4; beta = -7.7; alpha + beta
         -5.3
    and \[ \left( \alpha + \beta \right) \mathbf{u} = \begin{bmatrix} -15.9& 42.4& -37.1& 10.6 \end{bmatrix} . \]
         u = {{3, -8, 7, -2}}; (alpha + beta)*u
         {{-15.9, 42.4, -37.1, 10.6}}
    Now we check identity (f):
         (alpha + beta)*u == alpha*u + beta*u
         True
  7. Let α = −1.87, β = 4.52 be real numbers, and u = (3.1, -1.17, 5.46)T be a column vector from &Ropf:3×1. Then the product of this vector and constant β is \[ \beta \,\mathbf{u} = 4.52 \begin{pmatrix} 3.1 \\ -1.17 \\ 5.46 \end{pmatrix} = \begin{pmatrix} 14.012 \\ -5.2884 \\ 24.6792 \end{pmatrix} . \]
         beta = 4.52; u = {{3.1}, {-1.17}, {5.46}}; beta*u
         {{14.012}, {-5.2884}, {24.6792}}
    Next multiplication by α yields \[ \alpha \left( \beta\,\mathbf{u} \right) = \begin{pmatrix} -26.2024 \\ 9.88931 \\ -46.1501 \end{pmatrix} . \]
         alpha = -1.87; alpha *(beta*u)
         {{-26.2024}, {9.88931}, {-46.1501}}
    Using Mathematica, we check the identity (g):
         alpha*(beta*u) == (alpha*beta)*u
         True
  8. Let us write vector u = (4, −7, 3) in matrix form \[ \mathbf{A} = \begin{pmatrix} 4& 0& 0 \\ 0& -7& 0 \\ 0& 0& 3 \end{pmatrix} . \]
         A = {{4, 0, 0}, {0, -7, 0}, {0, 0, 3}}
    Multiplying by 1, we get matrix 1·A, which is A again as Mathematica confirms
         1*A == A
         True
   ■
End of Example 7

For any given vector dimension, there is a special vector, known as the zero vector, that has zeroes in every position,

\[ {\bf 0} = \left[ 0, 0, \ldots , 0 \right] \quad\mbox{or} \quad {\bf 0} = \begin{bmatrix}0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} . \]
Note that zero vector is denoted by 0 in either case, independently whether it is a row vector or a column vector. Another important case of vectors serve unit vectors:
\[ {\bf e}_1 = \left[ 1, 0, \cdots , 0 \right] , \ {\bf e}_2 = \left[ 0, 1, \cdots , 0 \right] , \ldots , \ {\bf e}_n = \left[ 0, 0, \cdots , 1 \right] . \]

These unit vectors can be written in column form

\[ {\bf e}_1 = \begin{pmatrix} 1\\ 0\\ \vdots \\ 0 \end{pmatrix} , \quad {\bf e}_2 = \begin{pmatrix} 0\\ 1\\ \vdots \\ 0 \end{pmatrix} , \quad \cdots , \quad {\bf e}_n = \begin{pmatrix} 0\\ 0\\ \vdots \\ 1 \end{pmatrix} . \]
In three dimensional case (3D), these vectors are usually denoted by
\[ {\bf i} = \begin{bmatrix} 1\\ 0 \\ 0 \end{bmatrix}^{\mathrm T} = \begin{bmatrix} 1& 0& 0 \end{bmatrix} , \quad {\bf j} = \begin{bmatrix} 0\\ 1 \\ 0 \end{bmatrix}^{\mathrm T} = \begin{bmatrix} 0& 1& 0 \end{bmatrix}, \quad {\bf k} = \begin{bmatrix} 0\\ 0 \\ 1 \end{bmatrix}^{\mathrm T} = \begin{bmatrix} 0& 0& 1 \end{bmatrix} , \]
where "T" stands for transposition. Note that in all other areas of mathematics, this operation is denoted by prime (') not by "T." With these unit vectors, every vector can be written as a linear combination of unit vectors (independently of their form, rows or columns):
\[ {\bf v} = x_1 {\bf e}_1 + x_2 {\bf e}_2 + \cdots + x_n {\bf e}_n , \]
for some scalars x₁, x₂, … , xn.

Another important operation is negation of a vector of any dimension (which is called "additive inverse"): we simply negate each component of the vector:
\[ - {\bf v} = (-1){\bf v} = - \left( \begin{array}{c} v_1 \\ v_2 \\ \vdots \\ v_n \end{array} \right) = \left( \begin{array}{c} - v_1 \\ -v_2 \\ \vdots \\ -v_n \end{array} \right) . \]
This is a particular case of vector multiplication by a scalar. Then adding a vector with its negative always results the zero vector---this operation is called additive inverse.    
Example 8: Important properties of linear systems can be described with concept and notation of vectors. As a motivating example, let us consider a system of three equations \begin{align*} 2\,x_1 -3\,x_2 + x_3 &= 3, \\ -x_1 + 2\,x_2 - 4\,x_3 &= 5 , \tag{8.1} \\ 3\, x_1 + 4\,x_2 - 2\, x_3 &= 3. \end{align*} We rewrite this system in columns as \[ \begin{bmatrix} 2\,x_1 \\ - x_1 \\ 3\,x_1 \end{bmatrix} + \begin{bmatrix} -3\, x_2 \\ \phantom{-}2\, x_2 \\ \phantom{-}4\,x_2 \end{bmatrix} + \begin{bmatrix} \phantom{-1}x_3 \\ -4\, x_3 \\ -2\, x_3 \end{bmatrix} = \begin{bmatrix} 3 \\ 5 \\ 3 \end{bmatrix} \] because we know how to operate with numbers. So we assume that we can add these columns by adding corresponding components. Taking out common multiples in each column, we get \[ x_1 \begin{bmatrix} \phantom{-}2 \\ -1 \\ \phantom{-}3 \end{bmatrix} + x_2 \begin{bmatrix} -3 \\ \phantom{-}2 \\ \phantom{-}4 \end{bmatrix} + x_3 \begin{bmatrix} \phantom{-}1 \\ -4 \\ -2 \end{bmatrix} = \begin{bmatrix} 3 \\ 5 \\ 3 \end{bmatrix} . \tag{8.2} \] The expression in the left-hand side is known as a linear combination---it is obtained by adding two or more vectors that are multiplied by scalar values. Calling each column a vector, we denote them with lower case letter written in bold font:
\[ {\bf u}_1 = \begin{bmatrix} \phantom{-}2 \\ -1 \\ \phantom{-}3 \end{bmatrix} , \quad {\bf u}_2 = \begin{bmatrix} -3 \\ \phantom{-}2 \\ \phantom{-}4 \end{bmatrix} , \quad {\bf u}_3 = \begin{bmatrix} \phantom{-}1 \\ -4 \\ -2 \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} 3 \\ 5 \\ 3 \end{bmatrix} . \]
Then we can rewrite the linear equation (8.2) in the succinct form
\[ x_1 {\bf u}_1 + x_2 \mathbf{u}_2 + x_3 {\bf u}_3 = {\bf b} . \]
   ■
End of Example 8
   
Example 9: We demonstrate some operations with vectors from ℝ3×1, so our field of scalars is the set of real numbers ℝ. So adding two vectors, we have \[ \begin{bmatrix} \phantom{-}5 \\ -1 \\ \phantom{-}2 \end{bmatrix} + \begin{bmatrix} \phantom{-}2 \\ \phantom{-}5 \\ -7 \end{bmatrix} = \begin{bmatrix} \phantom{-}7 \\ \phantom{-}4 \\ -5 \end{bmatrix} = \begin{bmatrix} \phantom{-}2 \\ \phantom{-}5 \\ -7 \end{bmatrix} + \begin{bmatrix} \phantom{-}5 \\ -1 \\ \phantom{-}2 \end{bmatrix} . \]
{5, -1, 2} + {2, 5, -7}
{7, 4, -5}
So addition is commutative. If we add a vector with its negative (additive inverse), we get \[ \begin{bmatrix} \phantom{-}5 \\ -1 \\ \phantom{-}2 \end{bmatrix} + \begin{bmatrix} -5 \\ \phantom{-}1 \\ -2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} . \]
{5, -1, 2} + {-5, 1, -2}
{0, 0, 0}
Multiplying by a constant 3.1415926, we obtain \[ 3.1415926 \begin{bmatrix} \phantom{-}5 \\ -1 \\ \phantom{-}2 \end{bmatrix} = \begin{bmatrix} 15.7087963 \\ -3.1415926 \\ 6.2831852 \end{bmatrix} . \]
3.1415926 * {5, -1, 2}
{15.708, -3.14159, 6.28319}
   ■
End of Example 9

The entries of vectors in the previous examples are integers, but they are suitable only for class presentations by lazy instructors like me. In real life, the set of integers ℤ appears mostly in kindergarten. Usually, vector entries may be any numbers, for instance, real numbers, denoted by ℝ, or complex numbers ℂ. However, humans and computers operate only with rational numbers ℚ as approximations of fields ℝ or ℂ. Although the majority of our presentations involves integers for simplicity, the reader should understand that they can be replaced by arbitrary numbers from either ℝ or ℂ or ℚ. When it does not matter what set of numbers can be utilized, which usually the case, we denote them by 𝔽 and the reader could replace it with any field (either ℝ or ℂ or ℚ).

For our purposes, it is convenient to represent vectors as columns. This allows us to rewrite the given system of algebraic equations in compact form:
\[ x_1 {\bf u}_1 + x_2 {\bf u}_2 + x_3 {\bf u}_3 = {\bf b} . \]
In general, a system of m linear equations
\begin{align} a_{1,1} x_1 + a_{1,2} x_2 + \cdots + a_{1,n} x_n &= b_1 , \notag \\ a_{2,1} x_1 + a_{2,2} x_2 + \cdots + a_{2,n} x_n &= b_2 , \label{EqVector.3} \\ \ddots \qquad\qquad & \qquad \vdots \notag \\ a_{m,1} x_1 + a_{m,2} x_2 + \cdots + a_{m,n} x_n &= b_m , \notag \end{align}
with n unknowns, x₁, x₂, … , xn, can be similarly rewritten as a linear combination
\begin{equation} \label{EqVector.4} x_1 {\bf u}_1 + x_2 {\bf u}_2 + \cdots + x_n {\bf u}_n = {\bf b} . \end{equation}
of column vectors
\[ {\bf u}_1 = \begin{bmatrix} a_{1,1} \\ a_{2,1} \\ \vdots \\ a_{m,1} \end{bmatrix} , \quad {\bf u}_2 = \begin{bmatrix} a_{1,2} \\ a_{2,2} \\ \vdots \\ a_{m,2} \end{bmatrix} , \quad \cdots \quad {\bf u}_n = \begin{bmatrix} a_{1,n} \\ a_{2,2} \\ \vdots \\ a_{m,n} \end{bmatrix} , \qquad {\bf b} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{m} \end{bmatrix} . \]
The succinct form \eqref{EqVector.4} of the linear system of equations \eqref{EqVector.3} tells us that we can add vectors by adding their components
\[ {\bf u} + {\bf v} = \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix} \]
and multiply by a number, say k as
\[ k\,{\bf u} = k \begin{bmatrix} u_1 \\ u_2 \\ \vdots \\ u_n \end{bmatrix} = \begin{bmatrix} k\,u_1 \\ k\,u_2 \\ \vdots \\ k\,u_n \end{bmatrix} . \]
The number k in ku is called a scalar: it is written in the lightface type to distinguish it from the boldface vector u. Note that components of vector u are also written in the lightface type because they are numbers that we call scalars. How to operate with numbers (scalars) everybody knows from school: they can be added/subtracted and multiplied/divided by nonzero number.

Remember that the form of vector representation as columns, rows, or n-tuples (parentheses and comma notation) depends on you. However, you must be consistent and use the same notation for addition or scalar multiplication. You cannot add a column vector and a row vector:

\[ \begin{bmatrix} 1 \\ 2 \end{bmatrix} + \left[ 1, \ 2 \right] \qquad{\bf wrong!} \]
because they have different structures. In physics, row vectors are usually called bra vectors and column vectors are named ket vectors. Also, you cannot manipulate row vectors and n-tuples:
\[ \left[ 1, \ 2,\ 3 \right] + \left( 1, \ 2,\ 3 \right) \qquad{\bf wrong!} \]
because (1, 2, 3) ∈ ℝ³ = ℝ×ℝ×ℝ, but [1 2 3] ∈ ℝ1×3 is 1×3 matrix. Of course, all three sets ℝ×ℝ×ℝ, ℝ³, and ;ℝ1×3 are equivalent since all are just descriptions (depending on humans) of the same vectors (nature).

Our definition of vectors as lists of numbers includes one very important ingredient---scalars. We primary use either the set of real numbers, denoted by ℝ or the set of complex numbers, denoted by ℂ. However, computers operate only with rational numbers, denoted by ℚ. Since elements from these sets of scalars can be added/subtracted and multiplied/divided (by non zero), they are called fields. Either of these fields is denoted by 𝔽 (meaning either ℝ or ℂ or ℚ).

   Giusto Bellavitis  Michail Ostrogradsky      William Hamilton

The concept of vector, as we know it today, evolved gradually over a period of more than 200 years. The Italian mathematician, senator, and municipal councilor Giusto Bellavitis (1803--1880) abstracted the basic idea in 1835. The idea of an n-dimensional Euclidean space for n > 3 appeared in a work on the divergence theorem by the Russian mathematician Michail Ostrogradsky (1801--1862) in 1836, in the geometrical tracts of Hermann Grassmann (1809--1877) in the early 1840s, and in a brief paper of Arthur Cayley (1821--1895) in 1846. Unfortunately, the first two authors were virtually ignored in their lifetimes. In particular, the work of Grassmann was quite philosophical and extremely difficult to read. The term vector was introduced by the Irish mathematician, astronomer, and mathematical physicist William Rowan Hamilton (1805--1865) as part of a quaternion.

Vectors can be described also algebraically. Historically, the first vectors were Euclidean vectors that can be expanded through standard basic vectors that are used as coordinates. Then any vector can be uniquely represented by a sequence of scalars called coordinates or components. The set of such ordered n-tuples is denoted by \( \mathbb{R}^n . \) When scalars are complex numbers, the set of ordered n-tuples of complex numbers is denoted by \( \mathbb{C}^n . \) Motivated by these two approaches, we present the general definition of vectors.

  1. Compute uv and 3u −2v ---> \[ {\bf u} = \begin{bmatrix} -2 \\ \phantom{-}1 \end{bmatrix} , \quad {\bf v} = \begin{bmatrix} 3 \\ 2 \end{bmatrix} \qquad\mbox{and} \qquad {\bf u} = \begin{bmatrix} \phantom{-}1 \\ -1 \end{bmatrix} , \quad {\bf v} = \begin{bmatrix} \phantom{-}4 \\ -3 \end{bmatrix} . \]
  2. Write system of equations that is equivalent to the given vector equation.
    1. \[ x_1 \begin{bmatrix} \phantom{-}3 \\ -4 \end{bmatrix} + x_2 \begin{bmatrix} -1 \\ -2 \end{bmatrix} + x_3 \begin{bmatrix} \phantom{-}5 \\ -2 \end{bmatrix} = \begin{bmatrix} \phantom{-}1 \\ -1 \end{bmatrix}; \]
    2. \[ x_1 \begin{bmatrix} 3 \\ 0 \\ 2 \end{bmatrix} + x_2 \begin{bmatrix} -1 \\ \phantom{-}3 \\ \phantom{-}5 \end{bmatrix} + x_3 \begin{bmatrix} -2 \\ \phantom{-}7 \\ \phantom{-}2 \end{bmatrix} = \begin{bmatrix} 5 \\ 1 \\ 3 \end{bmatrix}; \]
    3. \[ x_1 \begin{bmatrix} 2 \\ 1 \\ 7 \end{bmatrix} + x_2 \begin{bmatrix} \phantom{-}3 \\ -2 \\ -5 \end{bmatrix} + x_3 \begin{bmatrix} \phantom{-}4 \\ -6 \\ \phantom{-}1 \end{bmatrix} = \begin{bmatrix} -5 \\ -4 \\ \phantom{-}2 \end{bmatrix} . \]
  3. Given \( \displaystyle {\bf u} = \begin{bmatrix} 2 \\ 1 \\ 3 \end{bmatrix} , \ {\bf v} = \begin{bmatrix} \phantom{-}3 \\ -1 \\ -5 \end{bmatrix} , \quad\mbox{and} \quad {\bf b} = \begin{bmatrix} -5 \\ 5 \\ h \end{bmatrix} . \) For what value of h is b a linear combination of vectors u and v?
  4. Given \( \displaystyle {\bf u} = \begin{bmatrix} 4 \\ 3 \\ 1 \end{bmatrix} , \ {\bf v} = \begin{bmatrix} \phantom{-}2 \\ -2 \\ -3 \end{bmatrix} , \quad\mbox{and} \quad {\bf b} = \begin{bmatrix} 8 \\ h \\ 9 \end{bmatrix} . \) For what value of h is b a linear combination of vectors u and v?
  5. Rewrite the system of equations in a vector form
    \[ \begin{split} 2x_1 - 3 x_2 + 7 x_3 &= -1 , \\ -5 x_1 -2 x_2 - 3 x_3 &= 2 , \\ 3x_1 + 2 x_2 + 4 x_3 &= 3 . \end{split} \]
  6. Let \( \displaystyle {\bf u} = \begin{bmatrix} 3 \\ 1 \end{bmatrix} , \ {\bf v} = \begin{bmatrix} \phantom{-}2 \\ -2 \end{bmatrix} , \quad\mbox{and} \quad {\bf b} = \begin{bmatrix} h \\ k \end{bmatrix} . \) Show that the linear equation xu + xv = b has a solution for any values of h and k.
  7. Mark each statement True or False.
    1. Another notation for the vector (1, 2) is \( \displaystyle \begin{bmatrix} 1 \\ 2 \end{bmatrix} . \)
    2. An example of a linear combination of vectors u and v is 2v.
    3. Any list of of six complex numbers is a vector in ℂ6.
    4. The vector 2v results when a vector v + u is added to the vector vu.
    5. The solution set of the linear system whose augmented matrix is [ a1 a2 a3 b ] is the same as the solution set of the vector equation xa₁ + xa₂ + xa₃ = b.
  1. Vector addition
  2. The Mathematical Gazette Vol. 48, No. 363 (Feb., 1964), pp. 34-36 (3 pages). https://doi.org/10.2307/3614306
  3. Milk