
Photo by Gabrielle Henderson on Unsplash
In last month’s article, “ANOVA and the Process Behavior Chart,” we saw how both techniques use the same basic comparison to answer completely different questions. Here, we’ll look at a case history where both techniques were used.
ADVERTISEMENT |
A physical property of a mass-produced item was important to its functionality. The production of these items involved three steps: batches of compound were mixed, molded into parts, and coated. In an effort to maximize a product characteristic, an experiment involving three production variables was carried out. Factor A was studied at two levels; Factor B was studied at three levels; and Factor C was studied at five levels. So the fully crossed study required 30 experimental runs. For each run, a sample of 40 parts was selected from the output and each part was measured. Thus the manufacturer had 30 treatments with 40 observations per treatment for a total of 1,200 observations.
For each treatment, they plotted a histogram and computed the average and standard deviation. This information was combined with the experimental conditions, processing dates, and original data in a one-page summary for each treatment.
Figure 1: Summary sheet for one treatment
The one-way ANOVA
They began their analysis of the 30 groups of 40 observations each with a one-way ANOVA. Their first step was to find the variance of the averages. Here, we compute the standard deviation of the 30 treatment averages and square it. This value is then multiplied by the number of data per average of 40 to obtain the mean square between.
The second step is to find the average of the variances. Here, we compute the standard deviation for each treatment separately, square them, and average them to obtain the mean square within.
With k = 30 treatments of size n = 40, the MSB term has (k-1) = 29 degrees of freedom while the MSW term has k(n-1) = 30(39) =1,170 degrees of freedom. The Fisher ratio is:
The p-value for this result is 0.028. This means that when the treatments are not different you would get a result this large less than 3% of the time. So there may be some differences between the treatments. To look deeper, we use a multifactor ANOVA.
The multifactor ANOVA
The multifactor ANOVA for this fully crossed experiment is summarized in Figure 2. Here, the factors are listed in order of increasing p-values.
Figure 2: Multifactor ANOVA
Among the experimental factors considered, the only weak possibilities for explaining the differences between the treatments are the three-factor interaction effect and the main effect for Factor B. Since a three-factor interaction did not make sense in the context of these factors and this experiment, the experimenters called for help.
Expecting that I would ask if they had placed their data on a chart, they prepared the chart in Figure 3 and stapled it to the top of the stack of 30 one-page summaries of the treatments.
Figure 3: Chart created by client
The chart in Figure 3 agrees with the ANOVA results. No single treatment stands out at the 1% level. As I studied the chart in Figure 3 along with the stack of 30 pages, it became apparent that the chart was an afterthought: The order of the points on the chart was the same as the order of pages in the stack, and the stack had been thoroughly shuffled before the chart was made.
Moreover, based on the dates listed on each page, it was clear that the steps in each treatment had been carried out as time and equipment became available. By using the context provided by the mix dates, the mold dates, and the coating dates, I could create charts with rational orderings. The chart organized by mix date was the one with the clearest signal. It’s shown in Figure 4.
Figure 4: Chart organized by mix date
I flew up to Connecticut and used Figure 4 to show them what they had missed by not plotting their data in time order. The most important signal in these data was an unknown assignable cause in their mixing process.
While this change had nothing to do with Factors A, B, or C, it showed up partially as the three-factor interaction and partially as Factor B simply because it had to show up somewhere in the factorial analysis. This is why it’s always important to interpret experimental results in context. (Hint: Your software cannot do this for you.) In this case, both the ANOVA and the average chart used exactly the same subgroups. But the chart placed them in (one of) their time-order sequences.
Summary
Good experimenters are always alert for factors outside the experiment as well as those in the study. This is why the first step in data analysis should always be to plot the data, and these plots should always take the context into account. Histograms show the group picture, but running records using time order, or other logical orders, reveal how the process producing the data is behaving.
Data have no meaning apart from their context. Plotting the data in a way that respects this context should be an essential part of every analysis.
Add new comment