They are a set of statistical procedures aimed at selecting from a given set of variables subsets of variables that are closely related (correlating) with each other. Variables that are in one subset and correlate with each other, but are largely independent of variables from other subsets, form factors. The goal of factor analysis is to identify apparently non-observable factors using a set of observable variables. An additional way to check the number of identified factors is to calculate a correlation matrix that is close to the original one if the factors are identified correctly. This matrix is ​​called reproduced correlation matrix. In order to see how this matrix deviates from the original correlation matrix (with which the analysis began), one can calculate the difference between them. The residual matrix may indicate a "disagreement", i.e. that the correlation coefficients in question cannot be obtained with sufficient accuracy based on the available factors. In the methods of principal components and factor analysis, there is no such external criterion that allows one to judge the correctness of the solution. The second problem is that after extracting factors, an infinite number of rotation options arise, based on the same initial variables, but giving different solutions (factor structures are defined in a slightly different way). The final choice between possible alternatives within an infinite set of mathematically equivalent solutions depends on the meaningful comprehension by researchers of the interpretation results. And since the objective criterion for evaluating various solutions no, the proposed justifications for choosing a solution may seem unfounded and unconvincing.


It should be noted that there are no clear statistical criteria for the completeness of factorization. However, its low values, for example, less than 0.7, indicate the desirability of reducing the number of features or increasing the number of factors.

Met The coefficient of the relationship between a certain feature and a common factor, expressing the degree of influence of the factor on the feature, is called the factor load of this feature for this common factor.

A matrix consisting of factor loadings and having a number of columns equal to the number of common factors and a number of rows equal to the number of original features is called a factor matrix.

The basis for calculating the factor matrix is ​​the matrix of paired correlation coefficients of the original features.

The correlation matrix captures the degree of relationship between each pair of features. Similarly, the factor matrix captures the degree of linear relationship of each feature with each common factor.

The magnitude of the factor load does not exceed unity in modulus, and its sign indicates a positive or negative relationship between the feature and the factor.

The greater the absolute value of the factor load of a feature by a certain factor, the more this factor determines this feature.

The value of the factor load for a certain factor, close to zero, indicates that this factor practically does not affect this trait.

The factor model makes it possible to calculate the contributions of factors to the total variance of all features. Summing up the squares of factor loadings for each factor for all features, we obtain its contribution to the total variance of the feature system: the higher the proportion of this contribution, the more significant and significant this factor is.

At the same time, it is possible to identify the optimal number of common factors that describe the system of initial features quite well.

The value (measure of manifestation) of the factor y separate object is called the factor weight of the object with respect to this factor. Factor weights allow you to rank, order objects by each factor.

The greater the factor weight of an object, the more it manifests that side of the phenomenon or that pattern that is reflected by this factor.

Factor weights can be either positive or negative.

Due to the fact that the factors are standardized values ​​with an average value equal to zero, factor weights close to zero indicate the average degree of manifestation of the factor, positive - that this degree is above average, negative - about that. that it is below average.

In practice, if the number of principal components (or factors) already found is no more than m/2, the variance explained by them is at least 70%, and the next component contributes no more than 5% to the total variance, the factor model is considered to be quite good.

If you want to find factor values ​​and store them as additional variables, use the Scores... switch. (Values) Factor value is usually between -3 and +3.

Factor analysis is a more powerful and complex apparatus than the principal method.

component, so it is applied if the results

component analysis is not quite satisfied. But since these two methods

solve the same problems, it is necessary to compare the results of the component and


factorial analyses, i.e. load matrices, as well as regression equations for

main components and common factors, comment on similarities and differences

results.

The maximum possible number of factors m for a given number of features R is determined by the inequality

(p+m)<(р-m)2,

At the end of the whole procedure of factor analysis, using mathematical transformations, the factors fj are expressed through the initial features, that is, the parameters of the linear diagnostic model are obtained explicitly.

Methods of principal components and factor analysis are a set of statistical procedures aimed at selecting from a given set of variables subsets of variables that are closely related (correlating) with each other. Variables that are in one subset and correlate with each other, but are largely independent of variables from other subsets, form factors 1 . The goal of factor analysis is to identify apparently non-observable factors using a set of observable variables.

General expression for j-th factor can be written as follows:

where Fj (j changes from 1 to k) are common factors, Ui- characteristic, Aij- constants used in linear combination k factors. Characteristic factors may not correlate with each other and with common factors.

Factor-analytical processing procedures applied to the obtained data are different, but the structure (algorithm) of the analysis consists of the same main steps: 1. Preparation of the initial data matrix. 2. Calculation of the matrix of feature relationships. 3. Factorization(at the same time, it is necessary to indicate the number of factors identified during the factorial solution, and the method of calculation). At this stage (as well as at the next one), one can also evaluate how well the resulting factorial solution approximates the original data. 4. Rotation - the transformation of factors, facilitating their interpretation. 5. Counting factor values for each factor for each observation. 6. Data interpretation.

the invention of factor analysis was associated precisely with the need to simultaneously analyze a large number of correlation coefficients of various scales among themselves. One of the problems associated with the methods of principal components and factor analysis is that there are no criteria that would allow checking the correctness of the solution found. For example, in regression analysis, one can compare empirically obtained indicators for dependent variables with indicators calculated theoretically based on the proposed model, and use the correlation between them as a criterion for the correctness of the solution according to the correlation analysis scheme for two sets of variables. In discriminant analysis, the correctness of the decision is based on how accurately the subjects' belonging to one or another class is predicted (when compared with the real belonging that takes place in life). Unfortunately, in the methods of principal components and factor analysis, there is no such external criterion that allows one to judge the correctness of the solution. The second problem is that after extracting the factors, an infinite number of rotation options arise, based on the same initial variables, but giving different solutions ( factor structures are defined in a slightly different way). The final choice between possible alternatives within an infinite set of mathematically equivalent solutions depends on the meaningful comprehension by researchers of the interpretation results. And since there is no objective criterion for evaluating different solutions, the proposed justifications for choosing a solution may seem unfounded and unconvincing.

The third problem is that factor analysis is often used to salvage poorly designed research when it becomes clear that no single statistical procedure is producing the desired result. The power of the methods of principal components and factor analysis allows you to build an ordered concept from chaotic information (which gives them a dubious reputation).

The second group of terms refers to matrices that are built and interpreted as part of the solution. Turn factors is the process of finding the most easily interpretable solution for a given number of factors. There are two main classes of turns: orthogonal and oblique. In the first case, all factors are a priori chosen to be orthogonal (not correlating with each other) and a factor loading matrix, which is a matrix of relationships between observed variables and factors. The magnitude of the loads reflects the degree of relationship between each observed variable and each factor and is interpreted as a correlation coefficient between the observed variable and the factor (latent variable), and therefore varies from -1 to 1. The solution obtained after the orthogonal rotation is interpreted based on the analysis of the matrix of factorial loads by identifying which of the factors is most associated with one or another observed variable. Thus, each factor turns out to be given by a group of primary variables that have the largest factor loads on it.

If an oblique rotation is performed (i.e., the possibility of factors being correlated with each other is a priori allowed), then several additional matrices are constructed. Factor correlation matrix contains correlations between factors. Factor loading matrix, mentioned above, splits into two: structural matrix of relationships between factors and variables and factorial mapping matrix, expressing linear relationships between each observed variable and each factor (without taking into account the influence of the imposition of some factors on others, expressed by the correlation of factors among themselves). After the oblique rotation, the factors are interpreted based on the grouping of primary variables (similarly to what was described above), but using the factor mapping matrix first of all.

Finally, for both rotations, one computes coefficient matrix of factorial values, used in special regression-type equations to calculate factor values ​​(factor scores, factor scores) for each observation based on the values ​​of the primary variables for them.

Comparing the methods of principal components and factor analysis, we note the following. Principal component analysis builds a model to best explain (maximize reproduction) of the total variance of the experimental data obtained for all variables. As a result, the "components" stand out. In factor analysis, it is assumed that each variable is explained (determined) by a number of hypothetical general factors (affecting all variables) and characteristic factors (each variable has its own). And the computational procedures are carried out in such a way as to get rid of both the variance resulting from measurement error and the variance explained by specific factors, and analyze only the variances explained by hypothetically existing common factors. The result is objects called factors. However, as already mentioned, from a content-psychological point of view, this difference in mathematical models is not significant, therefore, in the future, unless special explanations are given about which particular case we are talking about, we will use the term "factor" as in relation to components, and in relation to factors.

Sample sizes and missing data. The larger the sample, the greater the reliability of the relationship indicators. Therefore, it is very important to have a large enough sample. The required sample size also depends on the degree of correlation of indicators in the population as a whole and the number of factors: with a strong and significant relationship and a small number of well-defined factors, a small sample will be sufficient.

Thus, a sample of 50 subjects is rated as very poor, 100 as poor, 200 as average, 300 as good, 500 as very good, and 1000 as excellent ( Comrey, Lee, 1992). Based on these considerations, it is recommended as a general principle to study samples of at least 300 subjects. For a decision based on a sufficient number of marker variables with high factor loadings (>0.80), a sample of about 150 subjects is sufficient ( Guadagnoli, Velicer, 1988). normality for each variable separately is checked by asymmetries(how much the curve of the distribution under study is shifted to the right or left compared to the theoretically normal curve) and excess(the degree to which the “bell” of the existing distribution, visually represented in the frequency diagram, is stretched upwards or downwards, compared to the “bell” of the density graph, characteristic of the normal distribution). If a variable has significant skewness and kurtosis, then it can be transformed by introducing a new variable (as a single-valued function of the one under consideration) in such a way that this new variable is normally distributed (for more on this, see: Tabachnik, Fidell, 1996, Ch. four).

Eigenvectors and corresponding eigenvalues
for the considered case study

Eigenvector 1

Eigenvector 2

Eigenvalue 1

Eigenvalue 2

Since the correlation matrix is ​​diagonalizable, the matrix algebra of eigenvectors and eigenvalues ​​can be applied to it to obtain the results of factor analysis (see Appendix 1). If a matrix is ​​diagonalizable, then all essential information about the factor structure is contained in its diagonal form. In factor analysis, eigenvalues ​​correspond to the variance explained by the factors. The factor with the largest eigenvalue explains the largest variance, and so on, until it comes down to factors with small or negative eigenvalues, which are usually left out of the analysis. The factor loading matrix is ​​a matrix of relationships (interpreted as correlation coefficients) between factors and variables. The first column is the correlations between the first factor and each variable in turn: ticket price (-.400), comfort of the complex (.251), air temperature (.932), water temperature(.956). The second column is the correlations between the second factor and each variable: ticket price (.900), comfort of the complex(-.947), air temperature (.348), water temperature(.286). The factor is interpreted on the basis of variables strongly associated with it (i.e., having high loads on it). So, the first factor is mainly "climatic" ( air and water temperature), while the second is "economic" ( the cost of the ticket and the comfort of the complex).

When interpreting these factors, one should pay attention to the fact that variables with high loads on the first factor ( air temperature and water temperature) are positively correlated, while variables with high loadings on the second factor ( ticket price and comfort of the complex), are interconnected negatively (one cannot expect great comfort from a cheap resort). The first factor is called unipolar (all variables are grouped at one pole), and the second - bipolar(variables split into two groups opposite in meaning - two poles). Variables with factor loadings with a plus sign form a positive pole, and those with a minus sign form a negative pole. At the same time, the names of the poles "positive" and "negative" when interpreting the factor do not have the evaluative meaning of "bad" and "good". The sign is chosen randomly during the calculations. Orthogonal rotation

Rotation is usually applied after factor extraction to maximize high correlations and minimize low ones. There are numerous methods of rotation, but rotation is the most commonly used. varimax, which is a procedure for maximizing the variances. This rotation maximizes the factor loading variances by making high loadings higher and low loadings lower for each of the factors. This goal is achieved through transformation matrices Λ:

Transform matrix is the matrix of sines and cosines of the angle Ψ through which the rotation is performed. (Hence the name of the transformation - turn, because from a geometric point of view, the axes rotate around the origin of the factor space.) Having performed the rotation and received the matrix of factor loadings after rotation, a series of other indicators can be analyzed (see Table 4). Generality of a variable is the variance calculated using factor loadings. This is the quadratic multiple correlation of the variable predicted by the factorial model. Commonality is calculated as the sum of squared factor loadings (FSC) for a variable over all factors. In table. 4 commonality for ticket price equals (-.086)2+(.981)2 = .970 i.e. 97% of the variance ticket price due to factors 1 and 2.

The fraction of the variance of a factor over all variables is the SKN over the factor divided by the number of variables (in the case of an orthogonal rotation) 7 . For the first factor, the proportion of variance is:

[(-.086)2+(-.071)2+(.994)2+(.997)2]/4 = 1.994/4 = .50,

i.e., the first factor explains 50% of the variance of the variables. The second factor explains 48% of the variance of the variables and (because of rotation orthogonality) the two factors together explain 98% of the variance of the variables.

Relationship between factor loadings, commonality, SKN,
variance and covariance of orthogonal factors after rotation

Generalities ( h2)

Ticket price

∑a2=.970

Comfort level

∑a2=.960

Air temperature

∑a2=.989

Water temperature

∑a2=.996

∑a2=1.994

∑a2=1.919

Share of variance

Share of covariance

The fraction of the solution variance explained by the factor is the fraction covariances is the SKN for the factor divided by the sum of the generalities (the sum of the SKN over the variables). The first factor explains 51% of the solution variance (1.994/3.915); the second - 49% (1.919/3.915); the two factors together explain the entire covariance.

Eigenval - reflect the magnitude of the dispersion of the corresponding number of factors. As an exercise, we recommend that you write out all these formulas to obtain the calculated values ​​for the variables. For example, for the first responder:

1.23 = -.086(1.12) + .981(-1.16)

1.05 = -.072(1.12) - .978(-1.16)

1.08 = .994(1.12) + .027(-1.16)

1.16 = .997(1.12) - .040(-1.16)

Or in algebraic form:

Z cost of the tour = a 11F 1 + a 12F 2

Z comfort of the complex = a 2l F 1 + a 22F 2

Z air temperature = a 31F 1 + a 32F 2

Z water temperature = a 41F 1 + a 42F 2

The greater the load, the more confident it can be that the variable determines the factor. Comrie and Lee ( Comrey, Lee, 1992) suggest that loadings greater than 0.71 (explaining 50% of the variance) are excellent, 0% of the variance) are very good, 0%) are good, 0%) are fair, and 0.32 (explain 10% of the variance) are weak.

Suppose you are doing a (somewhat "stupid") study in which you measure the height of a hundred people in inches and centimeters. Thus, you have two variables. If you want to further investigate, for example, the effect of various nutritional supplements on growth, will you continue to use both variables? Probably not, because height is one characteristic of a person, no matter what units it is measured in.

The relationship between variables can be found using scatterplots. The regression line obtained by fitting gives graphic representation dependencies. If a new variable is defined based on the regression line depicted in this diagram, then such a variable will include the most significant features of both variables. So, in fact, you have reduced the number of variables and replaced two with one. Note that the new factor (variable) is actually a linear combination of the two original variables.

In the general case, to explain the correlation matrix, not one, but several factors will be required. Each factor is characterized by a column , each variable is a row of the matrix. The factor is called general, if all its loads are significantly different from zero and it has loads from all variables. The general factor has loads from all variables and such a factor is shown schematically in Fig.1. column. The factor is called general, if at least two of its loads are significantly different from zero. Columns, on rice. one. represent such common factors. They have loads from more than two variables. If a factor has only one load that is significantly different from zero, then it is called characteristic factor(see columns on rice. one.) Each such factor represents only one variable. Common factors are crucial in factor analysis. If the general factors are established, then the characteristic factors are obtained automatically. The number of high variable loads on common factors is called complexity. For example, a variable for fig.1. has complexity 2 and the variable has complexity three.

Rice. 1. Schematic representation of the factor display. A cross indicates a high factor loading.

So let's build a model

, (4)

where are unobservable factors m< k,

Observed variables (initial features),

factor loads,

Random error associated with zero mean and variance only:

I - uncorrelated,

Uncorrelated Random Variables with Zero Mean and Unit Variance .

(5)

Here - i The th generality, which is the part of the variance , due to the factors, is the part of the variance , due to the error. In the matrix notation, the factorial model takes the form:

(6)

where is the load matrix, is the factor vector, is the error vector.

Correlations between variables, expressed by factors, can be derived as follows:

where - diagonal matrix of order containing error variances[i]. Basic condition: - diagonal, - non-negative definite matrix. Additional condition the uniqueness of the solution is the diagonality of the matrix .

There are many methods for solving a factorial equation. The earliest method of factor analysis is principal factor method, in which the technique of principal component analysis is applied to a reduced correlation matrix with commonalities on the main diagonal. To assess the commonality, the coefficient of multiple correlation between the corresponding variable and the set of other variables is usually used.

Factor analysis is carried out on the basis of the characteristic equation, as in the analysis of principal components:

(8)

Solving which, one obtains the eigenvalues ​​λ i and the matrix of normalized (characteristic) vectors V, and then finds the factor mapping matrix:

To obtain estimates of the generalities and factor loadings, an empirical iterative algorithm is used that converges to the true estimates of the parameters. The essence of the algorithm is as follows: the initial estimates of factor loads are determined using the method of main factors. Based on the correlation matrix R, estimates of the principal components and common factors are formally determined:

(9)

where is the corresponding eigenvalue of the matrix R;

Initial data (column vectors);

Coefficients for common factors;

Principal components (column vectors).

The estimates of factor loadings are the values

The estimates of the generalities are obtained as

At the next iteration, the matrix R is modified - instead of the elements of the main diagonal, the estimates of the generalities obtained at the previous iteration are substituted; based on the modified matrix R, using the computational scheme of component analysis, the calculation of the main components (which are not such from the point of view of component analysis) is repeated, estimates of the main factors, factor loadings, generalities, and specificities are sought. Factor analysis can be considered complete when the estimates of the commonality change little at two adjacent iterations.

Note. Transformations of the matrix R may violate the positive definiteness of the matrix R + and, as a consequence, some of the eigenvalues ​​of R + may be negative.

National Research Nuclear University MEPhI
Faculty of Business Informatics and Management
complex systems
Department of Economics and Management
in industry (No. 71)
Mathematical and instrumental processing methods
statistical information
Kireev V.S.,
Ph.D., Associate Professor
Email:
Moscow, 2017
1

Normalization

Decimal scaling
Minimax normalization
Normalization with Standard Transform
Normalization with Element-wise Transforms
2

Decimal scaling

Vi
"
Vi k , max (Vi) 1
10
"
3

Minimax normalization

Vi
Vi min (Vi)
"
i
max (Vi) min (Vi)
i
i
4

Normalization with Standard Deviation

Vi
"
V
V
Vi V
V
- selective
average
- sample mean square
deviation
5

Normalization with Element-wise Transforms

Vi f Vi
"
Vi 1
"
log Vi
, Vi log Vi
"
Vi exp Vi
"
Vi Vi , Vi 1 y
Vi
"
y
"
6

Factor analysis

(FA) is a set of methods that
the basis of real-life connections of the analyzed features, the connections themselves
observed objects, allow you to identify hidden (implicit, latent)
generalizing characteristics of the organizational structure and development mechanism
studied phenomena, processes.
Methods of factor analysis in research practice are mainly used
way in order to compress information, obtain a small number of generalizing
features that explain the variability (dispersion) of elementary features (R-factor analysis technique) or the variability of observed objects (Q-technique
factor analysis).
Factor analysis algorithms are based on the use of a reduced
pairwise correlation (covariance) matrices. A reduced matrix is ​​a matrix
the main diagonal of which are not units (estimates) of the total correlation or
estimates of the total variance, and their reduced, somewhat reduced values. At
This postulates that the analysis will not explain all the variance
studied signs (objects), and some part of it, usually a large one. Remaining
the unexplained part of the variance is the characteristicity arising from the specificity
observed objects, or errors made when registering phenomena, processes,
those. unreliability of input data.
7

Classification of FA methods

8

Principal Component Method

(MGK) is used to reduce the dimension
space of observed vectors, without leading to a significant loss
informative. The premise of the PCA is the normal distribution law
multidimensional vectors. In the PCA, linear combinations of random variables are defined
characteristic
vectors
covariance
matrices.
Main
components are an orthogonal coordinate system in which the variances
components characterize their statistical properties. MGK is not classified as FA, although it has
similar algorithm and solves similar analytical problems. Its main difference
lies in the fact that it is not the reduced, but the usual matrix that is subject to processing
pair correlations, covariances, on the main diagonal of which there are ones.
Let an initial set of vectors X of the linear space Lk be given. Application
method of principal components allows us to pass to the basis of the space Lm (m≤k), such
that: the first component (the first vector of the basis) corresponds to the direction, along
which the variance of the vectors of the original set is maximum. Direction second
components (of the second basis vector) is chosen in such a way that the variance of the original
vectors along it was maximum under the condition of orthogonality to the first vector
basis. Other basis vectors are defined similarly. As a result, directions
basis vectors are chosen so as to maximize the variance of the original set
along the first components, called principal components (or principal
axes). It turns out that the main variability of the vectors of the original set of vectors
represented by the first few components, and it becomes possible, by discarding
less essential components, go to a space of lower dimension.
9

10. Method of principal components. Scheme

10

11. Method of principal components. Billing Matrix

The score matrix T gives us the projections of the original samples (J-dimensional
vectors
x1,…,xI)
on the
subspace
major
component
(A-dimensional).
The rows t1,…,tI of the matrix T are the coordinates of the samples in new system coordinates.
Columns t1,…,tA of the matrix T are orthogonal and represent the projections of all samples onto
one new coordinate axis.
When examining data using the PCA method, special attention is paid to graphs
accounts. They carry information useful for understanding how
data. On the score chart, each sample is depicted in coordinates (ti, tj), most often
– (t1, t2), denoted by PC1 and PC2. The proximity of two points means their similarity, i.e.
positive correlation. Points at right angles are
uncorrelated, and located diametrically opposite - have
negative correlation.
11

12. Method of principal components. Load Matrix

The load matrix P is the transition matrix from the original space
variables x1, …xJ (J-dimensional) into the space of principal components (A-dimensional). Each
the row of the matrix P consists of coefficients relating the variables t and x.
For example, a-th line is the projection of all variables x1, …xJ onto a-th axis major
component. Each column of P is a projection of the corresponding variable xj onto a new
coordinate system.
The load graph is used to study the role of variables. On this
graph, each variable xj is represented by a point in coordinates (pi, pj), for example
(p1, p2). Analyzing it in a similar way to a chart of accounts, one can understand which variables
related and which are independent. Joint study of paired charts of accounts and
loads, can also give a lot useful information about data.
12

13. Features of the principal component method

The principal component method is based on the following assumptions:
the assumption that data dimensionality can be efficiently downsized
by linear transformation;
the assumption that the most information is carried by those directions in which
the variance of the input data is maximum.
It can be easily seen that these conditions are by no means always satisfied. For example,
if the points of the input set are located on the surface of the hypersphere, then no
linear transformation will not be able to reduce the dimension (but this can be easily done
non-linear transformation based on the distance from a point to the center of a sphere).
This disadvantage is equally common to all linear algorithms and maybe
overcome by using additional dummy variables that are
non-linear functions of the elements of the input data set (the so-called kernel trick).
The second drawback of the principal component method is that the directions
that maximize variance do not always maximize information content.
For example, a variable with the highest variance may carry almost no
information, while the minimum variance variable allows
separate classes completely. The principal component method in this case will give
preference for the first (less informative) variable. All additional
information associated with the vector (for example, whether the image belongs to one of
classes) is ignored.
13

14. Example of data for the PCA

K. Esbensen. Analysis of multivariate data, abbr. per. from English. under
ed. O. Rodionova, IPCP RAS, 2005
14

15. Example of data for the PCA. Notation

Height
Height: in centimeters
Weight
Weight: in kilograms
Hair
Hair: short: -1, or long:
+1
shoes
Shoes: EU size
standard
Age
Age: in years
Income
Income: in thousands of euros per year
beer
Beer: consumption in liters per year
Wine
Wine: consumption in liters per year
sex
Gender: male: -1, or female: +1
Strength
Strength: index based on
testing of physical abilities
region
Region: north: -1, or south: +1
IQ
IQ,
measured by standard test
15

16. Matrix of accounts

16

17. Load matrix

17

18. Sample objects in the space of new components

Women (F) are indicated by circles ● and ●, and
men (M) - squares ■ and ■. North (N)
represented by cyan ■ and south (S) by red
color ●.
The size and color of symbols reflects income - than
bigger and lighter, the bigger it is. Numbers
represent age
18

19. Initial variables in the space of new components

19

20. Scree plot

20

21. Principal factor method

In the paradigm of the method of principal factors, the problem of reducing the dimension of the indicative
space looks like n features can be explained using a smaller
the number of m-latent features - common factors, where m<initial features and introduced common factors (linear combinations)
taken into account using the so-called characteristic factors.
The ultimate goal of a statistical study conducted with the involvement
factor analysis apparatus, as a rule, consists in identifying and interpreting
latent common factors with a simultaneous desire to minimize both their
number and degree of dependence on their specific residual random
component.
Every sign
is the result
exposure to m hypothetical total and
one characteristic factor:
X 1 a11 f1 a12 f 2 a1m f m d1V1
X a f a f a f d V
2
21 1
22 2
2m m
2
X n a n1 f1 a n 2 f 2 a nm f m d nVn
21

22. Factor Rotation

Rotation is a way of transforming the factors obtained in the previous step,
into more meaningful ones. Rotation is divided into:
graphical (drawing axes, not applicable for more than two-dimensional
analysis),
analytical (a certain rotation criterion is chosen, orthogonal and
oblique) and
matrix-approximate (rotation consists in approaching a certain given
target matrix).
The result of the rotation is the secondary structure of the factors. Primary
factor structure (consisting of primary loads (obtained in the previous
stage) are, in fact, projections of points onto orthogonal coordinate axes. It's obvious that
if the projections are zero, then the structure will be simpler. And the projections will be zero,
if the point lies on some axis. Thus, rotation can be considered as a transition from
one coordinate system to another with known coordinates in one system (
primary factors) and iteratively selected coordinates in another system
(secondary factors). When obtaining a secondary structure, they tend to move to such
coordinate system in order to pass through points (objects) as many axes as possible in order to
as many projections (and therefore loads) as possible were zero. At the same time, they may
remove restrictions on orthogonality and decrease in importance from first to last
factors characteristic of the primary structure.
22

23. Orthogonal rotation

implies that we will rotate the factors, but not
we will violate their orthogonality to each other. Orthogonal rotation
implies the multiplication of the original matrix of primary loads by the orthogonal
matrix R (a matrix such that
V=BR
The orthogonal rotation algorithm in the general case is as follows:
0. B - matrix of primary factors.
1.
Are looking for
orthogonal
matrix
RT
size
2*2
for
two
columns (factors) bi and bj of the matrix B such that the criterion for the matrix
R max.
2.
Replace columns bi and bj with columns
3.
Check if all columns have been sorted. If not, then jump to 1.
4.
We check that the criterion for the entire matrix has grown. If yes, then jump to 1. If
no, then the end of the algorithm.
.
23

24. Varimax rotation

This criterion uses the formalization
variance of squared loads variable:
difficulties
factor a
through
Then the criterion in general form can be written as:
At the same time, factor loadings can be normalized to get rid of
influence of individual variables.
24

25. Quartimax rotation

We formalize the notion of factorial complexity of the q i-th variable in terms of
variance of squared factor loadings of factors:
where r is the number of columns of the factor matrix, bij is the factor load of the j-th
factor on the i-th variable, - the average value. Criterion quartimax tries
maximize the complexity of the entire set of variables in order to achieve
ease of interpretation of the factors (tries to facilitate the description of the columns):
Given that
- constant (sum of matrix eigenvalues
covariance) and revealing the mean (and taking into account that the power function
grows in proportion to the argument), we obtain the final form of the criterion for
maximization:
25

26. Criteria for determining the number of factors

The main problem of factor analysis is the selection and interpretation
main factors. When selecting components, the researcher is usually faced with
significant difficulties, since there is no unambiguous criterion for selecting
factors, and therefore the subjectivity of interpretations of the results is inevitable here.
There are several frequently used criteria for determining the number of factors.
Some of them are alternatives to others, and some of these
criteria can be used together so that one complements the other:
Kaiser criterion or eigenvalue criterion. This criterion has been proposed
Kaiser, and is probably the most widely used. Selected only
factors with eigenvalues ​​equal to or greater than 1. This means that if
factor does not highlight a variance equivalent to at least the variance of one
variable, it is omitted.
Scree criterion (English scree) or screening criterion. He is
graphical method, first proposed by the psychologist Cattell. Own
values ​​can be displayed in the form of a simple graph. Cattell suggested finding such
the place on the graph where the decrease in eigenvalues ​​from left to right is maximum
slows down. It is assumed that to the right of this point is only
"factorial scree" - "scree" is a geological term for
fragments of rocks accumulating in the lower part of a rocky slope.
26

27. Criteria for determining the number of factors. Continuation

Significance criterion. It is especially effective when the general model
population is known and there are no secondary factors. But the criterion is unsuitable
to search for changes in the model and implement only in factor analysis using the method
least squares or maximum likelihood.
Reproducible variance share criterion. Factors are ranked by share
deterministic variance, when the percentage of variance is insignificant,
extraction should be stopped. It is desirable that the highlighted factors explain
more than 80% spread. Disadvantages of the criterion: firstly, the selection is subjective, and secondly, the specificity of the data can be such that all the main factors cannot
collectively explain the desired percentage of scatter. Therefore, the main factors
must together explain at least 50.1% of the variance.
Criterion of interpretability and invariance. This criterion combines
statistical accuracy with subjective interests. According to him, the main factors
can be distinguished as long as their clear interpretation is possible. She, in her
turn, depends on the magnitude of factor loadings, that is, if the factor contains at least
one strong load, it can be interpreted. The opposite is also possible -
if there are strong loads, but the interpretation is difficult, from this
components are preferably discarded.
27

28. An example of using the MGK

Let
there are
the following
indicators
economic
activities
enterprises: labor intensity (x1), share of purchased items in production (x2),
equipment shift ratio (x3), share of workers in the enterprise
(x4), bonuses and remuneration per employee (x5), profitability (y). Linear
the regression model looks like:
y = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4 + b5*x5
x1
x2
x3
x4
x5
y
0,51
0,2
1,47
0,72
0,67
9,8
0,36
0,64
1,27
0,7
0,98
13,2
0,23
0,42
1,51
0,66
1,16
17,3
0,26
0,27
1,46
0,69
0,54
7,1
0,27
0,37
1,27
0,71
1,23
11,5
0,29
0,38
1,43
0,73
0,78
12,1
0,01
0,35
1,5
0,65
1,16
15,2
0,02
0,42
1,35
0,82
2,44
31,3
0,18
0,32
1,41
0,8
1,06
11,6
0,25
0,33
1,47
0,83
2,13
30,1
28

29. An example of using the MGK

Building a regression model in a statistical package shows that
coefficient X4 is not significant (p-Value > α = 5%) and can be excluded from the model.
what
After eliminating X4, the model building process starts again.
29

30. An example of using the MGK

The Kaiser criterion for the PCA shows that it is possible to leave 2 components explaining
about 80% of the original variance.
For selected components, equations can be constructed in the original coordinate system:
U1 = 0.41*x1 - 0.57*x2 + 0.49*x3 - 0.52*x5
U2 = 0.61*x1 + 0.38*x2 - 0.53*x3 - 0.44*x5
30

31. An example of using the MGK

Now you can build a new regression model in the new components:
y = 15.92 - 3.74*U1 - 3.87*U2
31

32. Singular Decomposition Method (SVD)

Beltrami and Jordan are considered the founders of the singularity theory.
decomposition. Beltrami - for being the first to publish a work on
singular value decomposition, and Jordan for the elegance and completeness of its
work. Beltrami's work appeared in the Journal of Mathematics for
the Use of the Students of the Italian Universities” in 1873, main
The purpose of which was to familiarize students with
bilinear forms. The essence of the method is in the decomposition of the matrix A of size n
x m with rank d = rank (M)<= min(n,m) в произведение матриц меньшего
rank:
A=UDVT,
where matrices U of size n x d and V of size m x d consist of
orthonormal columns that are eigenvectors for
nonzero eigenvalues ​​of the matrices AAT and ATA, respectively, and
UTU = V TV = I , and D of size d x d is a diagonal matrix with
positive diagonal elements, sorted in
descending order. The columns of the matrix U are,
orthonormal basis of the column space of the matrix A, and the columns
matrix V is an orthonormal basis of the space of rows of matrix A.
32

33. Singular Decomposition Method (SVD)

An important property of the SVD decomposition is the fact that if
for k only from the k largest diagonal elements, and also
leave only the first k columns in the matrices U and V, then the matrix
Ak=UkDkVkT
will be the best approximation of the matrix A with respect to
Frobenius norms among all matrices with rank k.
This truncation firstly reduces the dimension of the vector
space, reduces storage and computing requirements
model requirements.
Second, discarding small singular numbers, small
distortion resulting from noise in the data is removed, leaving
only the strongest effects and trends in this model.

Having become acquainted with the concepts of factor loading and the area of ​​joint changes, we can go further, again using the apparatus of matrices for presentation, the elements of which this time will be correlation coefficients.

The matrix of correlation coefficients obtained, as a rule, experimentally, is called the correlation matrix, or correlation matrix.

The elements of this matrix are the correlation coefficients between all variables of the given population.

If we have, for example, a set consisting of tests, then the number of correlation coefficients obtained experimentally will be

These coefficients fill the half of the matrix located on one side of its main diagonal. On the other side are, obviously, the same coefficients, since, etc. Therefore, the correlation matrix is ​​symmetrical.

Scheme 3.2. Full correlation matrix

There are ones on the diagonal of this matrix because each variable has a +1 correlation with itself.

A correlation matrix whose main diagonal elements are equal to 1 is called the “full matrix” of correlation (Scheme 3.2) and is denoted

It should be noted that by placing units, or correlations of each variable with itself, on the main diagonal, we take into account the total variance of each variable represented in the matrix. Thus, the influence of not only general, but also specific factors is taken into account.

On the contrary, if on the main diagonal of the correlation matrix there are elements corresponding to the generalities and related only to the general variance of variables, then only the influence of general factors is taken into account, the influence of specific factors and errors is eliminated, i.e., the specificity and variance of errors are discarded.

The correlation matrix, in which the elements of the main diagonal correspond to the generalities, is called reduced and is denoted by R (Scheme 3.3).

Scheme 3.3. Reduced correlation matrix

We have already talked about factor loading, or the filling of a given variable with a specific factor. At the same time, it was emphasized that the factor load has the form of a correlation coefficient between a given variable and a given factor.

A matrix whose columns consist of the loadings of a given factor in relation to all variables of a given population, and the rows of the factor loadings of a given variable, is called a matrix of factors, or a factor matrix. Here you can also talk about the full and reduced factor matrix. The elements of the full factorial matrix correspond to the total unit variance of each variable from the given population. If the loads on general factors are denoted by c, and the loads of specific factors are denoted by and, then the full factor matrix can be represented as follows:

Scheme 3.4. Full factor matrix for four variables

The factor matrix shown here consists of two parts. The first part contains elements related to four variables and three common factors, all of which are assumed to apply to all variables. This is not a necessary condition, since some elements of the first part of the matrix may be equal to zero, which means that some factors do not apply to all variables. The elements of the first part of the matrix are the loads of the common factors (for example, the element shows the load of the second common factor with the first variable).

In the second part of the matrix, we see 4 loadings of characteristic factors, one in each row, which corresponds to their specificity. Each of these factors refers to only one variable. All other elements of this part of the matrix are equal to zero. Characteristic factors can obviously be broken down into specific and error-related.

The column of the factor matrix characterizes the factor and its influence on all variables. The line characterizes the variable and its content with various factors, in other words, the factorial structure of the variable.

When analyzing only the first part of the matrix, we are dealing with a factor matrix showing the total variance of each variable. This part of the matrix is ​​called the reduced part and is denoted F. This matrix does not take into account the loading of characteristic factors and does not take into account the specific variance. Recall that, in accordance with what was said above about the general variances and factor loadings, which are the square roots of the general variances, the sum of the squares of the elements of each row of the reduced factor matrix F is equal to the generality of the given variable

Accordingly, the sum of the squares of all elements of the row of the full matrix of factors is equal to , or the total variance of this variable.

Since factor analysis focuses on common factors, we will mainly use the reduced correlation and reduced factor matrix in what follows.


If factor analysis is done properly, rather than being satisfied with default settings ("little jiffy," as the standard gentleman's set of methodologies has been derisively called), the preferred method of extracting factors is either maximum likelihood or generalized least squares. This is where trouble can await us: the procedure gives an error message: correlation matrix is ​​not positive definite. What does this mean, why does it happen and how to deal with the problem?
The fact is that in the process of factorization, the procedure searches for the so-called inverse matrix with respect to the correlation one. There is an analogy here with the usual real numbers: by multiplying a number by its reciprocal, we should get a unit (for example, 4 and 0.25). However, for some numbers there are no inverses to them - zero cannot be multiplied by something that will eventually give one. Same story with matrices. A matrix multiplied by its inverse gives an identity matrix (the ones are diagonal and all other values ​​are zero). However, for some matrices there are no inverses, which means that it becomes impossible to carry out factor analysis for such cases. You can find out this fact using a special number called the determinant (determinant). If it tends to zero or is negative for the matrix, then we are faced with a problem.
What are the reasons for this situation? Most often, it arises due to the existence of a linear relationship between variables. It sounds strange, since it is precisely such dependencies that we are looking for using multidimensional methods. However, in the case when such dependencies cease to be probabilistic and become rigidly determined, the multivariate analysis algorithms fail. Consider the following example. Let's say we have the following dataset:
data list free / V1 to V3. startdata. 1 2 3 2 1 2 3 5 4 4 4 5 5 3 1 end data. compute V4 = V1 + V2 + V3.
The last variable is the exact sum of the first three. When does this situation occur in a real study? When we include in the set of variables raw scores for subtests and the test as a whole; when the number of variables is much greater than the number of subjects (especially if the variables are highly correlated or have a limited set of values). In this case, exact linear relationships may occur by chance. Dependencies are often an artifact of the measurement procedure—for example, if percentages within observations are calculated (say, the percentage of statements of a certain type), a ranking method or a constant sum distribution is used, some restrictions are introduced on the choice of alternatives, and so on. As you can see, quite common situations.
If you order the output of the determinant and the inverse correlation matrix during factor analysis in SPSS of the above array, then the package will report a problem.
How to identify a group of variables that create multicollinearity? It turns out that the good old method of principal components, despite the linear dependence, continues to work and gives something out of the blue. If you see that the commonality of some of the variables is approaching 0.90-0.99, and the eigenvalues ​​of some factors become very small (or even negative), this is not a good sign. In addition, order a varimax rotation and see which group of variables got along with the friend suspected of a criminal connection. Usually, its load on this factor is unusually large (0.99, for example). If this set of variables is small, heterogeneous in content, the possibility of an artifactual linear dependence is excluded, and the sample is large enough, then the discovery of such a relationship can be considered a no less valuable result. You can twist such a group in regression analysis: make the variable that showed the greatest load dependent, and try all the rest as predictors. R, i.e. the multiple correlation coefficient, in this case, should be equal to 1. If the linear relationship is very neglected, then the regression will silently throw out some more of the predictors, look carefully what is missing. By ordering an additional output of the multicollinearity diagnostics, you can eventually find the ill-fated set that forms an exact linear relationship.
And, finally, there are a few more minor reasons that the correlation matrix is ​​not positive definite. This is, firstly, the presence of a large number of non-responses. Sometimes, in order to use the maximum of available information, the researcher orders the processing of gaps in pairs. The result may be such an "illogical" relationship matrix that the factor analysis model will be too tough for it. Secondly, if you decide to factorize the correlation matrix given in the literature, you may encounter the negative effect of rounding numbers.