Saturday, March 14, 2020

Factor Analysis Essay Example

Factor Analysis Essay Example Factor Analysis Essay Factor Analysis Essay Factor Analysis Introduction Basic Concept of Factor Analysis Factor analysis is a statistical approach to reduce a large set of variables that are mostly correlated to each other to a small set of variables or factors. It is also used to explain the variables in the common underlying factors. (Hair et al, 1998) Malhotra, 2006 mentioned that factor analysis is also an interdependence technique that both dependent and independent variables are examined without making distinction between them Conducting Factor Analysis 1. Formulate the problem In this research, researcher’s objective is to determine the factors that influence customers’ satisfaction with their internet service provider in Malaysia such as Streamyx, Digi Broadband, Maxis Broadband, P1 and others (Malaysia Central, 2011). Mall intercept was used to interview a total of 30 respondents at Midvalley Megamall. Questionnaires were distributed and respondents are required to show their degree of agreement with the statements below whereby means very strongly disagree and means very strongly agrees: [pic] [pic] Figure 1. 1: Input in SPSS . Is the data appropriate? a) The correlation matrix Base on the data above, the correlation matrix was run to examine if the factor analysis is appropriate. Variables opt to be inter-related in order to be suitable to conduct a factor analysis. In other words, if all the variables have nothing in common, they can’t be analyzed into common factor. Hair et al, 1998 indicates that rule of thumb for factor analysi s is a considerable correlation of 0. 3. Field, 2009 has emphasized that if there is any value greater then 0. 9, the variables may be omitted. According to the result, V3 (quality support), V5 (sincere interest in problem solving), V6 (prompt service), V7 (willingness to help), V8 (politeness) and V9 (knowledgeable) have high correlations about more than 50% (as highlighted in yellow) All the 5 variables may be inter-related under the same factor. |Figure1. 2: Correlation Matrix | | | |Kaiser-Meyer-Olkin Measure of Sampling Adequacy. |. 55 | | | | | |Bartletts Test of Sphericity |Approx. Chi-Square |166. 649 | | |df |36 | | |Sig. |. 000 | 3. Method of Factor Analysis After examining the suitableness to apply factor analysis in these data, right method of factor analysis would be selected. There are two approaches are the principal components analysis and common factor analysis. Malhotra 2006 indicates that principal components analysis takes into account the total variation of the data to generate the factor. In this research, principal component analysis is employed as the objective is to identify least number of factors to explain a maximum variance. a) Communalities Figure 1. 4 below shows the table of communalities before and after extraction. The initial assumption of principal component analysis is that all variance is common. Hence, the communalities equal to 1 before extraction. (Field, 2009) The extraction explains the common variance of the data which show researcher the relationship of the variables with each others. So for example, 72. 3% of the variance related with V1 is common or shared. In this research, all 9 variables have accounted high value, hence, they fit well with the factor solution and none of them would be dropped from the analysis. Field 2009 mentioned that communalities after extraction to be of value above 0. 5. |Figure 1. 4: Communalities | |Initial |Extraction | |V1 |1. 000 |. 723 | |V2 |1. 000 |. 894 | |V3 |1. 000 |. 760 | |V4 |1. 000 |. 837 | |V5 |1. 000 |. 775 | |V6 |1. 000 |. 788 | |V7 |1. 000 |. 807 | |V8 |1. 000 |. 771 | |V9 |1. 000 |. 818 | |Extraction Method: Principal Component Analysis. | b) Eigenvalues In addition to the Principal Component Analysis, researcher would take into account the Eigenvalue which signify how much each factor explains towards total variance (Malhotra, 2009). At this stage, research would look into initial Eigenvalues. At the preliminary stage, the Eigenvalues will have 9 linear components which are as many factors as the variables. The results also display the percentage of variance explained, so for example factor 1 explains 48. 796 of the total variance. SPSS will then remove the eigenvalues less than 1 and only remain 3 factors which the column is labeled as ‘Extraction sums of Squared Loading’. Subsequently, SPSS generated the 3rd column which is rotated eigenvalue. Before rotation, factor 1 has explained greater variance. For the subsequent steps, researcher would look into several considerations of Eigenvalues in order to find out the numbers of factors that should be used in this analysis. |Figure 1. 5: Total Variance Explained | |Component |Initial Eigenvalues |Extraction Sums of Squared Loadings |Rotation Sums of Squared Loadings | | | . Determine the Number of Factors At this point, researcher would determine how many factors to explain the maximum variance. Malhotra, 2009 has specify 6 methods which are base on scree plot, eigenvalues, percentage of variance, significance test, split half reliability, and a prior determination. In this analysis, researcher has chosen 4 methods to determine the number of method. a) A Priori Determination This method is used when the researcher has prior knowledge of the number of factor that can be extracted beforehand. Hence, this ou tcome may be observed from the correlation matrix (Figure 1. ). There are possibilities that V3, V5, V6, V7, V8 and V9 will be one factor and hence, the remaining variables may be categorized another factor. Besides, researcher may denote the specific number of factors he or she wants. In this case, there are three factors is identified. b) Determination Based on Eigenvalues There is a criteria that researcher will to take into account when basing on Eigenvalues. Aaker et al, 2009 signify that factors with Eigenvalues that is lesser than 1. 0 will be ignored and excluded from the model. Hence, base on Figure 1. 5, only 3 factors is retained. Malhotra, 2009 adds on that the factors with Eigenvalues less than 1. 0 is as equal to single variable. c) Determination Based on Scree Plot A scree plot is plotted with Eigenvalues on the y-axis against the number of factors in x-axis. In general, the plot will show a distinct break with a steep slope or inflexion point on the curve due to the large Eigenvalues and the curve will tail off with the small Eigenvalues. The inflexion point represents the number of factors to be included according to experimental evidence (Malhotra, 2009). However, Malhotra (2009) notify that the scree plot may have more factors than the Eigenvalue criterion. As a result, Figure 1. 6 illustrates that three factors should be included in the model. [pic] Figure 1. 6: Scree Plot d) Determination based on Percentage of Variance In order to identify the number of the factors extracted, the cumulative percentage of variance extracted by factors should achieve a satisfactory level which is 60 percent. (Aaker et al, 2009). Therefore, referring to Figure 1. 5, the number of factors determined is two at 67. 476%. The satisfactory level (60%) falls in component 2 and confirms 2 factors. Figure 1. 7 is a summary of result of all the methods above. The result has shown that three factors would be a reasonable conclusion for factors to be included in the model. This is due to Eigenvalues result will be taken greater consideration than others determination even though determination base on cumulative percentage variance indicates 2 factors. |Method |Observation |Decision | |A Priori |Correlation matrix |3 factors | |Eigenvalue |Eigenvalue ; 1. |3 factors | |Scree Plot |Distinct break of scree |3 factors | |Cumulative percentage variance |Marginal % gain in variance |2 factors | Figure 1. 7: Summary of Numbers of Factor Determination 5. Factor Rotation Figure 1. 8 shows the component matrix which is also known as the unrotated factor matrix in factor analysis. The unrotated factor matrix denotes the relationship between the factors and individual variables. For example the factor loading V1 (Connection Speed) has large value of 0. 836 indicates that connection speed is closely related with the factors. As the factors are correlated with many variables, it is rarely that the results can be interpreted, even though the unrotated factor matrix shows the correlation between the factors and the variables. For instance, V2 and V4 are loading on two factors. So this makes researcher difficult to interpret the factors. Therefore, it is made easy for researcher to interpret the matrix generated if the factors are rotated. Figure 1. 9 shows the rotated component matrix. So what does it means by rotating the factors? The rotation is known as orthogonal rotation when the axes are retained at the correct angles (Malhotra, 2009). In this case, researcher has chosen the most commonly method which is the varimax procedure which boost interpretability by minimizing number of variables loading on a factor. Both unrotated and rotated component is set as not showing any loading lower than 0. 3 as it is insignificant for the model. The result of Figure 1. 9 shows three factors are extracted. It is clearly shown that V3, V5, V6, V7, V8 and V9 load highly on Factor 1. Although V3 and V5 load on more than one factor, it is significant that the loading value on Factor 1 is higher as compared. On the other hand, the loading for V1 may be uncertain. However, looking into various areas, V1 as connection speed seem to be related to V2 (Easy to connect). Hence, V1 with higher loading and V2 are correlated with Factor 2. In that case, V4 (Billing) only is loaded in Factor 3. |Figure 1. 9: Rotated Component Matrixa | | |Component | | |1 |2 |3 | |V1 | |. 648 |. 40 | |V2 | |. 942 | | |V3 |. 813 |. 308 | | |V4 | | |. 910 | |V5 |. 753 |-. 339 |. 307 | |V6 |. 872 | | | |V7 |. 898 | | | |V8 |. 873 | | | |V9 |. 898 | | | |Extraction Method: Principal Component Analysis. | |Rotation Method: Varimax with Kaiser Normalization. |a. Rotation converged in 5 iterations. | | | |Figure 1. 8: Component Matrixa | | |Component | | |1 |2 |3 | |V1 | |. 836 | | |V2 | |. 799 |-. 493 | |V3 |. 800 | | | |V4 | |. 508 |. 757 | |V5 |. 768 | |. 416 | |V6 |. 869 | | | V7 |. 897 | | | | V8 |. 874 | | | |V9 |. 901 | | | |Extraction Method: Principal Component Analysis. | |a. 3 components extracted. | 6. Interpret Factors Interpretation can be done by identifying the variables which have high loadings on the same factor (Malhotra, 2009). From the Rotated Component Matrix, Factor 1 has high coefficient with V3 (quality support), V5 (sincere interest in solving problem), V6 (prompt service), V7 (wiliness to help), V8 (politeness) and V9 (knowledgeable). All this variables are related to customer service which is the supporting service element. Therefore, Factor 1 is labeled as the supporting service. On the other hand, Factor 2 has high coefficient with V1 (connection speed) and V2 (easy to connect). Both variables are the core service of an internet service provider (ISP). Thus, Factor 2 is labeled as Core Service. Lastly, Factor 3 only has high coefficient with V4 (billing) and thus the factor is labeled as Facilitating Service (Billing). (Referring to Figure 1. 10) Figure 1. 10: Factor Labeling Supporting Service |Core Service |Facilitating Service | | |V1 (connection speed) | | | |V2 (easy to connect) | | |V3 (quality support) | | | | | |V3 (billing) | |V5 (sincere interest in solving problem) | | | |V6 (prompt service) | | | |V7 (wiliness to help) | | |V8 (politeness | | | |V9 (knowledgeable) | | | 7. Determine the Model Fit The determination of the model fit is the last step of factor analysis. This can be done by examining the difference between the observed correlation and the reproduced correlation. The difference is known as residual (Malhotra, 2009). Therefore, Malhotra, 2009 indicated that if there are more than 50% of residuals exist, the factor model is a poor fitting model. Examining the upper triangle of residual in Figure 1. 11, there are 21 (58%) of non-redundant residuals with absolute values greater than 0. 05. As a result, the factor model provides poor fitting to the data. Reconsideration towards the model needs to be taken. This issues occurs in this case might be due to small sample size of 30 or respondent bias. Hence, the model can be enhanced by increasing the sample size. |Figure 1. 11: Reproduced Correlations | | | |a. Reproduced communalities | |b. Residuals are computed between observed and reproduced correlations. There are 21 (58. 0%) nonredundant residuals with absolute values greater than 0. 05. | Conclusion In conclusion, factor analysis has reduced the data and summarizes the variables into 3 factors which are supporting service, core service and facilitating service. The method used, principals components analysis has determine the minimum number of factors which is three factors and yet explain as much as possible towards the variance. Regression Analysis Introduction Regression analysis is a statistical technique to analyze the relationship between a metric dependent variable (criterion variable) and one or more independent variables (predictor variables) (Malhotra, 2009). The aim for this procedure is to build a regression model or equation in relation to the dependent variable to independent variable(s) (Lind et al, 2010). In this case, multiple regression analysis is used when there are more than one independent variables. With the factors that were developed in Factor analysis, now the relationship between the dependent variables, V10 (Customer satisfaction) and three independent variables (Core services, Supporting Service, and Facilitating Services) In general, a multiple regression equation for predicting Y can be expressed a follows: (Berger, 2003) [pic] Conducting multiple regression analysis 1. State the Objectives The objective of this analysis is to predict the customer satisfaction towards ISP (Y) with the respect to the following independent variables (Xs): i. Supporting Services ii. Core Services iii. Facilitating Service 2. Plot the Scatter Diagram (Linearity) The relationship between variables can be determined through scatter diagram. It is plots the dependent variable on x-axis against the independent variable on y-axis. Plotting a scatter diagram may bring attention to the researcher to a pattern in the data and identifying possible problems (Malhotra, 2009) [pic] Figure 2. 1: Scatter Plot (Customer Satisfaction against Supporting Services) Figure 2. 1 has shown mere relationship between customer satisfaction against supporting services as it is complex to draw proper linear on the plot. [pic] Figure 2. 2: Scatter Plot (Customer Satisfaction against Core Services) Figure 2. shows similar result as the previous relationship that it has little relationship between customer satisfaction and supporting services. The number of outliers seems to be more extreme in this case. [pic] Figure 2. 3: Scatter Plot (Customer Satisfaction against Facilitating Service) Figure 2. 3 shows rather better result than the previous relationship that researcher can draw a straight line relationship between customer satisfaction and facilitating services. Base on the 3 scatter plot of different variables, there isn’t clear relationship towards the customer satisfaction. However, the plot may not be able to show clear linear relationship between variables as the sample size taken is only 30. Different results may occur if another 30 respondents is examined. In addition, this issue may be due to the sample size is base on convenient sampling and bias respondents. Figure 2. 4 below study the relationship between the three variables on hand. For example, if there is any variable that is highly related with another variable, one of them may be removed to create a better result in regression. According to Pearson Correlation below, there are mere relationships between supporting service and core services which is only 0. 005. Likewise, supporting services has small negative relationship with facilitating service. Lastly, the core services is about 27% towards facilitating services which is still consider as slightly related. |Figure 2. Correlations | | | |Supporting_ |Core_ |Facilitating_ | | | |Services |Services |Service | |Supporting_Services |Pearson Correlation |1 |. 005 |-. 056 | | |Sig. (2-tailed) | |. 978 |. 69 | | |N |30 |30 |30 | |Core_Services |Pearson Correlation |. 005 |1 |. 273 | | |Sig. (2-tailed) |. 978 | |. 144 | | |N |30 | 30 |30 | |Facilitating_Service |Pearson Correlation |-. 056 |. 273 |1 | | |Sig. (2-tailed) |. 769 |. 44 | | | |N |30 |30 |30 | Figure 2. 5 is a method which researcher used to cross check the variables entered into SPSS system. Listed variables are the independent variables which would influence the dependent variables which is the customer satisfaction. Incorrect input in the independent variables and dependent variables may causes error in output and mistake interpretation. |Figure 2. 5 Variables Entered/Removed | |Model |Variables Entered |Variables Removed |Method | |1 |Facilitating_Service, Supporting_Services, |. Enter | | |Core_Servicesa | | | |a. All requested variables entered. | 3. Estimate the Regression Model At this point, stepwise regression which is a procedure that the independent variables entered or removed from the regression equation one at a time is used in this analysis. There are several methods that can be used which include forward inclusion, backward elimin ation and stepwise solution. Backward elimination will be employed as all independent variables are included in the equation and the variable is removed one at a time. (Malhotra, 2009) a) Overall Fit of the Model Figure 2. is a summary of model which R denotes the correlation coefficient between independent and dependent variables (Malhotra, 2009). The result shows that 66. 5% of the independent variables (Facilitating Service, Supporting Services, and Core Services) correlated to customer satisfaction. Subsequently, the R-square signify how much the independent variables explain the variation of the dependent variables. The value indicates 0. 443 which interpret that 44. 3% of the total variation of customer satisfaction is accounted by the variation of the predictors. The models points out predictors explain 37. 9% which means 62. 1% of the variation in customer satisfaction is not explained. Hence, there opt to be other variables that influence the satisfaction. This issue occurs might be due to the small sample size. By increasing the sample size, the value of ‘r’ may increased to at least 0. 6 |Figure 2. 6 Model Summaryb | |Model |R |R Square |Adjusted R Square |Std. Error of the Estimate | |1 |. 665a |. 443 |. 379 |. 83194 | |a. Predictors: (Constant), Facilitating_Service, Supporting_Services, Core_Services | |b. Dependent Variable: Customer_Satisfaction | b) ANOVA Table With the model on hand, researcher will use the analysis of variance (ANOVA) to check if the model considerably good degree of ‘best guess’ the outcomes (Field, 2008). Referring to the result of ANOVA has generated F at 6. 889 with the significant level of 0. 001. This shows that the model can predict the outcome of the customer satisfaction significantly well. |Figure 2. 7 ANOVAb | |Model | |b. Dependent Variable: Customer_Satisfaction | |Regression Coefficients | |Figure 2. Regression Coefficientsa | |Model | At this step of the analysis, Figure 2. 6 shows the details of the model parameters with the beta values and significance of each variable. (Field, 2009) Before building any equation model base on the regression coefficients, the significant value of each variable is first taken into considerations. The t-statistic tests the null hypothesis that the b is zero. If researcher fail to reject the hypothesis, it indicates that the independent variable contribute si gnificantly to the customer satisfaction. The variable, facilitating service is at significant level of 0. 30 which is less than 95% confident level. This has indicated that the facilitating service is not important towards the customer service. Hence, this variable would be removed and regression analysis is rerun. As a result, facilitating service is removed and only left two variables which are shown as below (Figure 2. 7) |Figure 2. 9: Variables Entered/Removed (with Facilitating Service Removed) | |Model |Variables Entered |Variables Removed |Method | |1 |Core_Services, Supporting_Servicesa |. |Enter | |a. All requested variables entered. After removing the facilitating service from the model, R results shows that 62. 5% of the core and supporting service is related to the customer satisfaction. Figure 2. 10 shows that there is a slight decrease of R and R-square as compared to before removing which may indicate that facilitating service explain not much towards customer satisf action. a. Overall fit of the model with Facilitating Service Removed |Figure 2. 10: Model Summary with Facilitating Service Removed | |Model |R |R Square |Adjusted R Square |Std. Error of the Estimate | |1 |. 625a |. 390 |. 345 |. 85401 | |a. Predictors: (Constant), Core_Services, Supporting_Services | |b. Dependent Variable: Customer_Satisfaction | b. ANOVA Table with Facilitating Service Removed Figure 2. 11 shows a similar result as before omitting the facilitating service. The model has a good prediction of the outcome of customer satisfaction. |Figure 2. 1: ANOVA with Facilitating Service Removed | |Model | |b. Dependent Variable: Customer_Satisfaction | c. Regression Coefficients with Facilitating Service Removed After deleting the facilitating service (Figure 2. 12), researcher can observe the relationship between customer satisfaction and independent variables (supporting services and core services) from the b value. Deleting the facilitating service has made changes towards the values of b as compared to the earlier result. Thus, the core service has a beta value of 0. 492 larger than the supporting service of 0. 366. This would indicate that the core service which is the connection related variables would have a greater impact on the customer satisfaction over the supporting service (customer service). In addition, both independent variables have positive relationship with the customer satisfaction as the b value is positive. Thus, when the supporting services or core services increases, the customer satisfaction would increase as well. Moreover, Collinearity Statistics column helps to identify multicollinearity. As a rule of thumb, if the value of tolerance is below 0. 20, multicollinearity is a problem. The higher the correlation between independent variables, the smaller the value of tolerance becomes. Field, 2009) On the other hand, variance inflation factor (VIF) value has always been greater than or equal to 1. If the value of VIF is high, there is high multicollinearity and instability of the b and beta coefficients (Field, 2009). After deleting the unwanted variables, the tolerance value and VIF value and improved. Hence in this case, there isn’t a problem for multicollinearity. |Figure 2. 12: Regression Coefficientsa with Facilitating Service Removed | |Model | d. Collinearity Diagnostics Figure 2. 13 also helps indicate if there is any problem with multicollinearity. There isn’t very low Eigenvalue and high condition index; hence, they might be no indication of problem with multicollinearity |Figure 2. 13 Collinearity Diagnosticsa | |Model |Dimension |Eigenvalue |Condition Index |Variance Proportions | | | 4. Interpret Regression Variate With the current result, both variables seem to be contributed significantly to the model. And so, linear relationship equation can be generated with two independent variables: Y = ? + ? 1X1 + ? 2X2 The appropriate model will show as below when the ? is substitute with the value in Figure 2. 12: Customer Satisfaction = 0. 652 + 0. 366 (Supporting Services) + 0. 492 (Core Services) Furthermore, the model explains that even there is no supporting services and core services, there will be a minor satisfaction level (0. 652) which is the constant ? value. 5. Validate the results In order to validate the results of regression analysis, the plot below is generated. The examination of residuals provides useful insights in determining the appropriateness of the underlying assumptions and the regression model fit. Figure 2. 14 show that the graph shows a quite clear linearity. Hence the results of regression may be valid [pic] Figure 2. 14 Conclusions Through the regression analysis, the model below is generated. Customer Satisfaction = 0. 652 + 0. 366 (Supporting Services) + 0. 492 (Core Services) In conclusion, base on the sample size of 30, it is clear that facilitating services (billing) is not significant towards the customer satisfaction. As a result, the supporting services and core services is important in influencing the variation of customer satisfaction of internet service provider. The model may not be generalized to the population as the sample size is small and data is collected base on convenient sampling.