Table 2: Item Correlation to the Loyalty Category |
||||||||||
This category mostly closely corresponds to last year's "Helpfulness." Before conducting our pilot survey, we considered dropping this category because the related questions sent up red flags for the registrars. Registrars can provide clients with only very limited guidance for obvious reasons: Auditors are there to evaluate the systems clients have in place, and if they were also to council clients on how to fix problems, there would be definite conflicts of interest. Registrars are sticklers about this rule--as they should be--and get a little uncomfortable whenever discussion turns to assisting clients. However, after our presurvey conversations with clients this year, two things became immediately obvious. First, registrars have been proactive in educating their clients about the limits to the assistance they can offer. We received comments such as "There's been a dramatic change in registrars' approach to audits, that is, offering more assistance and being more helpful without consulting." Last year, no one made that type of statement. Second, clients want auditors to provide some sort of value other than just the audit, as evidenced by the high correlation of this category to overall customer loyalty. Additionally, we don't believe that the two survey items imply that auditors are telling clients how to fix nonconformances or providing any assistance that would help them pass the audit.
Consistent Interpretation The most common complaints about the auditing experience are those about inconsistent standards interpretation, either between the auditor and the client, between auditors or between the auditor and the registrar. The mean for this category is the lowest in the survey, indicating that standards interpretation may be an issue registrars need to address and that clients need to ensure that their registrars fully understand their business. Although the mean is low, so is the correlation to customer loyalty, implying that registrars might be better off addressing issues related to the more highly correlated categories.
Bias and other contributing factors One of the concerns registrars raised prior to this year's survey is that there may be a difference in results between those companies for which ISO 9000 or QS-9000 registration was mandatory (e.g., tier-one automotive suppliers) and those who voluntarily chose to be registered. Registrars also wondered if the length of the company's relationship with a registrar had any impact. We included questions for both of those issues in the questionnaire and found that the degree of customer loyalty was unrelated to the length of the client-registrar relationship and whether registration was mandatory. Although we didn't check for it this year, last year's survey indicated that there was also no correlation between client size or registrar size and customer satisfaction. There was also some concern on our part about self-selection bias; that is, those who chose to turn in a survey might have different attitudes than those who didn't (i.e., perhaps we only heard from those who were very happy with their registrars). To determine whether this had occurred, we conducted a small survey of 60 registered companies chosen at random that we hadn't heard from and persuaded them to complete a survey. The results show little difference between the nonrespondents and the main body of respondents, which leads us to believe that the survey sample is a good representation of the whole.
You can't watch a television news program or pick up a newspaper without being exposed to the results of some sort of survey. Public opinion polls abound, particularly in election years. Surveys also play an important role in consumer decisions; Consumer Reports magazine is a popular example of how survey data is used to show consumer perceptions on various products. Many people, however, don't understand the underlying principles of surveys. Worse yet, others think they understand survey results and then draw unwarranted conclusions. What follows is a simple description of how the statistical analysis was performed on this survey and what the numbers mean. First, we need to describe the importance of sampling. Sampling is a key element of almost all surveys. Advertisers spend billions of dollars each year to place ads on specific television and radio stations at specific times based only on the viewing or listening habits of a small number of people. These few thousand samples gathered by Nielsen or Arbitron represent millions of viewers and listeners. In our survey, the 1,853 responses represent the opinions of approximately 30,000 registered companies in the United States and Canada. In this research, we had to use convenience samples to make statistical inferences about the underlying population. In convenience sampling, respondents select themselves for inclusion in the survey. We used convenience sampling because it's not feasible to conduct pure random sampling, stratified random sampling, cluster sampling or systematic sampling due to the prohibitive amounts of time and money required. Convenience samples have the advantage of relatively easy sample selection (as when we faxed all 30,000 registered companies in our database) and data collection (as when the companies submitted their responses). However, it's impossible to evaluate the "goodness" of the sample in terms of its representativeness of the population. There is no statistically sound procedure to provide an inference about the quality of the sample results. Therefore, in this research, it's important to realize that the sample results provide only estimates of the values of the population characteristics. We don't expect a sample mean (average) for a registrar to exactly equal the actual mean for that registrar because only an unpredictable portion of the client population for each registrar responds. We did, however, attempt to enhance the reliability of the survey results with phone surveys of nonrespondents to compare their choices to those of the respondents and found no significant difference. With this type of care, the sample results should provide "good" estimates of the population characteristics for a registrar. In making our statistical inferences, we relied on an important and frequently used tool, the central limit theorem. It states that, in selecting random samples from a population, the sampling distribution of the sample mean can be approximated by a normal probability distribution as the sample size becomes large (i.e., approaches or exceeds 30). Whenever the population distribution is mound-shaped and symmetrical, sample sizes as small as five to 10 can be enough for the central limit to apply (the basis for small samples in SPC). However, if the population distribution is highly skewed and clearly non-normal, larger sample sizes are needed. For this survey we allowed the sample size to be as small as 25 because the confidence interval didn't change significantly from 25 samples to 30 samples. Any registrars for which we received fewer than 25 client responses were dropped from the survey. Summary statistics were obtained for these responses and used to calculate confidence intervals for each question, each category and each registrar included in the sample. The mean response and the median response measure the average response to each question or each category (groups of similar questions). The mean response was obtained by adding up all the observations and dividing by the number of observations. A regression algorithm was then used to impute the mean to questions that the respondent failed to answer. The mean for each registrar is represented in our charts as the red vertical line. A confidence interval on the mean response is an interval within which we expect (with some degree of confidence) the true mean response for the entire population of responses to a question to fall. The confidence interval consists of the mean response and the addition and subtraction of a margin of error that indicates how precisely we have estimated that mean response. By adding and subtracting the margin of error from the average response, upper and lower limits are obtained for each average response. The standard error of the mean is used in calculating the margin of error. We used a 95-percent confidence interval for our results, meaning we are 95-percent confident that the interval between the upper and lower ends of the error bar will contain the true population mean response to a particular question or category if every response in the population of responses were analyzed. This is an approximate explanation of the confidence interval. The more exact interpretation holds that if 95-percent confidence intervals are calculated for many samples of responses, then 95 percent of those confidence intervals will include the true population mean response. The confidence interval is represented as horizontal red bars extending from the mean (vertical red bar). The correlation coefficient used in Tables 1 and 2 measures the strength of the linear association between two variables and takes on values between -1 and +1. Values close to +1 indicate a strong positive linear relationship, values close to -1 indicate a strong negative linear relationship, and values near zero indicate the absence of a linear relationship. For example, the correlation coefficient between the loyalty category and category 1 is 0.731. This indicates that as category 1 increases by one unit, the overall category increases by 0.731 units. Therefore, category 1 may be used to explain the behavior of the loyalty category. However, we cannot assume that category 1 causes the behavior of the overall category. The existence of a linear relationship between the two categories doesn't imply that one causes the other to occur. Focus groups of Quality Digest subscribers were asked to group questions on the surveys that they believed asked similar questions. Based on the results obtained from the focus groups, the questions were grouped into categories as outlined in the analysis and their reliabilities calculated. The reliability for a category is the degree to which that category consistently produces the same measurement on repeated trials. It's a measure of how well the category consistently measures that which is being asked by the questions in that category. For example, the reliability of 0.9144 for the customer loyalty category implies that 91.44 percent of the time the loyalty category consistently measures the information that's being asked in the two questions that make up that category. The larger the reliabilities, the more confident we can be that a category is measuring what it's intended to measure.
This year's survey technique was designed and overseen by Thomas Pyzdek, statistician and
president of the International Quality Federation, and conducted by Quality Digest. A pilot survey of 26 items was culled from about 125 statements generated by
presurvey interviews with ISO 9000-registered clients. Survey questions were chosen and categorized based on two manual categorization techniques and
verified with a statistical reliability technique that told us how well each group of questions measure what the category claims to measure. The category names, Auditor Interpersonal Relations, Auditing Value Added, Consistent
Interpretation, Administration and Communication, came from the manual categorization techniques. Our Web-based pilot survey was built using eListen software from Scantron
Technologies and hosted by Scantron using eListen's autohosting feature. An invitation to participate in the pilot Web survey was sent to 2,000 randomly selected registered companies. The results were then analyzed by W. B.
Fredenberger, Ph.D., and Claude R. Superville, Ph.D., of the Management Department at Valdosta State University in Valdosta, Georgia. Parallel analysis was performed by Scantron Technologies. The results were used to test and
reduce the number of survey items to 17. The final survey consists of the 17 questions plus two demographic questions requested by registrars. This Web-based survey was also hosted using eListen.
An invitation was faxed to about 30,000 registered companies in the United States and Canada. Each respondent evaluated the statements based on a five-point scale from "strongly agree" to "strongly disagree," with a sixth
response for "doesn't apply." Clients that didn't have access to the Web were able to request a copy of the survey by fax. The final data analysis was performed at Valdosta State University by
Fredenberger and Superville using SPSS and Minitab statistics packages. Although we received data on the level of customer service for about 50 different
registrars, there is only a statistically valid sample for 25 registrars. To include a registrar in the analysis, we required a minimum of 25 customer responses. As is
described in "How we conducted the survey" on page 32, 30 responses is the conservative cutoff point. However, analysis of the confidence intervals for each registrar led us to feel comfortable with 25 responses.
The complete list of registrars for which we received data is shown on page 37. As a service to the registrars, they will each receive the data collected on them
from their clients, even if the number of respondents was too low for inclusion in the survey.
|
||||||||||
|
|
|
|
|
|
Copyright 2000 QCI International. All rights reserved. |