In the last blog entry we discussed how everything that comes from data is uncertain.

The good news is that you can measure this uncertainty. Even better news is that it is easy to do. The only down side is that it costs a little money -- but it is well worth the investment!

You can measure the uncertainty in your data by repeating a few measurements of your data. So, if you make a measurement and then repeat it as precisely and accurately as you possibly can, the difference in the measurements is a measure of the uncertainty in your data.

Yes, this costs a little extra-- and it is extremely useful. It becomes even more useful if you can communicate this uncertainty in terms that your management can easily understand. Statistics helps us use probability to do just that.

Possibly the single most useful way to communicate the uncertainty in results from your data is to report Statistical Tolerance Limits. These limits provide you with 95% confidence that at least 99% (or another proportion of your choice) of all the product you will ever make will lie between your limits. These limits usually assume that your data falls in bell-shaped piles, but they can easily be calculated without the need to know the shape of the pile.

If you are interested in the history of these ideas, you may enjoy the book, "The Lady Tasting Tea," by David Salsburg.

If you'd like to learn more about calculating Tolerance Limits and using Statistics in your work you can consider taking the eCourse, "Basic Statistics for Industry."

Next time let's talk about how you can perform a "sanity check" on the quality of your data before analyzing it.

## Wednesday, November 21, 2012

## Tuesday, November 6, 2012

### Everything that Comes from Data is Uncertain

What makes "data" different from "numbers?" Data always contain noise. Every time you try to measure exactly the same thing in the same way you get a different answer. Sometimes these differences are very small. Sometimes they are very large. If you aren't seeing these differences, you need a more sensitive measurement instrument.

This noise is random. It creates uncertainty in the data collected and in everything we calculate from our data.

Where this noise comes from is not known for sure. Modern physics believes that Nature has a random component at its most fundamental level. This is expressed in part by the Heisenberg Uncertainty Principle. However, to be fair, even if Newtonian physics is right and Nature is deterministic, we would have the same problem.

" If the accuracy is taken to be one part in billions and billions and billons -- no matter how many billions we wish, provided we do stop somewhere -- then we can find a time less than the time it took to state the accuracy -- after which we can no longer predict what is going to happen!" Richard Feynman, Lectures on Physics, Vol III, p. 2-10

Whether we like this or not, everything that comes from data is uncertain. Since we learn about our world through data, we can never be certain about our conclusions. Yes, we are very confident of certain conclusions based on lots of pretty repeatable data -- I am willing to bet the sun will rise tomorrow! I think Feynman said it best:

" I have approximate answers and possible beliefs and different degrees of certainty about different things but I’m not absolutely sure of anything and then many things I don’t know anything about, such as whether it means anything to ask, “Why we are here?” and what that question might mean. I might think about it a bit and then if I can’t figure it out then I go on to something else. " Richard Feynman

So how do you make good decisions in spite of uncertainty?

Next time let's look at how you can measure the uncertainty in your conclusions and use probability to make the best possible decisions.

This noise is random. It creates uncertainty in the data collected and in everything we calculate from our data.

Where this noise comes from is not known for sure. Modern physics believes that Nature has a random component at its most fundamental level. This is expressed in part by the Heisenberg Uncertainty Principle. However, to be fair, even if Newtonian physics is right and Nature is deterministic, we would have the same problem.

" If the accuracy is taken to be one part in billions and billions and billons -- no matter how many billions we wish, provided we do stop somewhere -- then we can find a time less than the time it took to state the accuracy -- after which we can no longer predict what is going to happen!" Richard Feynman, Lectures on Physics, Vol III, p. 2-10

Whether we like this or not, everything that comes from data is uncertain. Since we learn about our world through data, we can never be certain about our conclusions. Yes, we are very confident of certain conclusions based on lots of pretty repeatable data -- I am willing to bet the sun will rise tomorrow! I think Feynman said it best:

" I have approximate answers and possible beliefs and different degrees of certainty about different things but I’m not absolutely sure of anything and then many things I don’t know anything about, such as whether it means anything to ask, “Why we are here?” and what that question might mean. I might think about it a bit and then if I can’t figure it out then I go on to something else. " Richard Feynman

So how do you make good decisions in spite of uncertainty?

Next time let's look at how you can measure the uncertainty in your conclusions and use probability to make the best possible decisions.

## Monday, October 8, 2012

### Data Come from Observations

“It doesn't matter how beautiful your theory is,

it doesn't matter how smart you are.

If it doesn't agree with experiment,

it's wrong.” Richard Feynman

We gather information about the world through observation -- seeing, hearing, tasting, and smelling. We often augment our senses using instruments. For example, an oscilloscope produces visible patterns from invisible electrical phenomena.

This information is "data." These data include a summation of all the causes and effects that went into producing the observed situations.

“I believe in evidence. I believe in observation,

measurement, and reasoning, confirmed by independent observers.

I'll believe anything, no matter how wild and ridiculous,

if there is evidence for it. The wilder and more ridiculous

something is, however, the firmer and more solid the evidence will have to be. ”

Isaac Asimov

## Monday, June 18, 2012

### Data Driven Decisions

Life is a series of decisions. Every day you must decide what to do, when to do it, how to do it, etc. Each decision is a cause of an effect or several effects, each of which can cause other effects. It is in your best interest to make decisions that have positive, constructive, beneficial effects.

You are free to make every decision. You are free to do what most people are doing, or to do what you believe is right, or to examine data to decide what is right. Since we live in an uncertain world, trusting others or our beliefs often leads us to make poor decisions. Everyone can benefit from using data to make the best possible decisions.

For example, suppose a homeless man chooses to do what most homeless men are doing: He will remain homeless. Suppose instead that he acts on his belief that he was born unlucky and must accept his plight: He will remain homeless. Finally, suppose he decides to collect some data. He sees that a very few homeless men escaped homelessness and now have better lives. He decides to learn what they have done and do the same thing on the theory that if it worked for them, it will work for me. This man has a real, fighting chance of finding a better life.

There are three basic principles of data driven decision making:

Bill.

You are free to make every decision. You are free to do what most people are doing, or to do what you believe is right, or to examine data to decide what is right. Since we live in an uncertain world, trusting others or our beliefs often leads us to make poor decisions. Everyone can benefit from using data to make the best possible decisions.

For example, suppose a homeless man chooses to do what most homeless men are doing: He will remain homeless. Suppose instead that he acts on his belief that he was born unlucky and must accept his plight: He will remain homeless. Finally, suppose he decides to collect some data. He sees that a very few homeless men escaped homelessness and now have better lives. He decides to learn what they have done and do the same thing on the theory that if it worked for them, it will work for me. This man has a real, fighting chance of finding a better life.

There are three basic principles of data driven decision making:

- Data come from observations.
- Everything that comes from data is uncertain.
- You can measure the uncertainty in your conclusions and use probability to make the best possible decisions.

Bill.

## Friday, June 8, 2012

### Determining Product Reliability with Confidence

Product reliability is a key ingredient of product quality -- an unreliable product is not a high quality product. The first step in gaining control over your product's reliability is to learn how to measure it.

Measuring product reliability depends on understanding the variability in your product's life. Some units last longer than others, even though you made every effort to make every unit identical. As with all variability, product life variability will pile up to form a pattern. Unfortunately, this pattern is rarely the well-known bell-shaped pile. Product life can fall in piles of a wide variety of shapes.

The shape of a pile of data is called its "distribution." Two very common theoretical distributions that can explain a wide variety of shapes are the lognormal and the Weibull. If you can fit your data to either of these you will be able to make predictions about future performance.

Reliability at a specific point in time is the percentage of units surviving to that time. For instance, if 95% of ACME's MP3 players are still functioning after 3 years, the reliability at three years is 95%.

We know that everything that comes from data is uncertain, so the reliability at three years is uncertain. We can assign confidence limits to the reliability so that we can say, for instance, "We have at least 95% reliability after three years with 90% confidence." This tells the world that the lowest reliability we expect at three years is 95%. It also tells them there is a 10% chance we are wrong about this.

The math behind reliability is very complex -- even more complex than the math behind Design of Experiments! Fortunately, JMP 10 has a fantastic reliability platform that does everything for you. Instead of evaluating integrals of exponential functions, you drag sliders to find your answers.

We have a new class available to help you apply Reliability Statistics in your work using JMP. Just like all of our classes, you learn how to

Next time, let's begin looking at the fundamentals of Data-Driven Decision Making.

Measuring product reliability depends on understanding the variability in your product's life. Some units last longer than others, even though you made every effort to make every unit identical. As with all variability, product life variability will pile up to form a pattern. Unfortunately, this pattern is rarely the well-known bell-shaped pile. Product life can fall in piles of a wide variety of shapes.

The shape of a pile of data is called its "distribution." Two very common theoretical distributions that can explain a wide variety of shapes are the lognormal and the Weibull. If you can fit your data to either of these you will be able to make predictions about future performance.

Reliability at a specific point in time is the percentage of units surviving to that time. For instance, if 95% of ACME's MP3 players are still functioning after 3 years, the reliability at three years is 95%.

We know that everything that comes from data is uncertain, so the reliability at three years is uncertain. We can assign confidence limits to the reliability so that we can say, for instance, "We have at least 95% reliability after three years with 90% confidence." This tells the world that the lowest reliability we expect at three years is 95%. It also tells them there is a 10% chance we are wrong about this.

The math behind reliability is very complex -- even more complex than the math behind Design of Experiments! Fortunately, JMP 10 has a fantastic reliability platform that does everything for you. Instead of evaluating integrals of exponential functions, you drag sliders to find your answers.

We have a new class available to help you apply Reliability Statistics in your work using JMP. Just like all of our classes, you learn how to

*use*Reliability Statistics -- your time isn't wasted with a lot of unnecessary mathematical complexity.Next time, let's begin looking at the fundamentals of Data-Driven Decision Making.

## Tuesday, May 1, 2012

### Design Quality Measures, Part 2

Industrial experiments have 2 fundamental goals:

1. Fundamental understanding

2. Predictive ability

Several design quality measures can help you judge how well a design can help you achieve these goals -- before you collect any data!

Last post we looked at design quality measures to help you judge an experiment's ability help you gain fundamental understanding. This post will look at design quality measures to help you judge an experiment's predictive ability.

Condition Number: The condition number is a measure of the predictive quality of a design. The log of the condition number tells you how many significant figures you will lose from the number of digits your computer carries. Clearly a log(Condition Number) = 0 is best. Keeping this below 6 is acceptable for most computers.

Average Variance of Prediction: The smaller the better. This indicates the effect the design has on the uncertainty in your predictions. It does not include pure error. Aim for less than 5 if you can -- less than 1 is even better!

G Efficiency: This is the ability of a design to minimize the maximum variance of prediction. 100% efficiency is best. Keep this as high as you can.

Next time let's take a look at how you can determine a product's reliability with confidence.

1. Fundamental understanding

2. Predictive ability

Several design quality measures can help you judge how well a design can help you achieve these goals -- before you collect any data!

Last post we looked at design quality measures to help you judge an experiment's ability help you gain fundamental understanding. This post will look at design quality measures to help you judge an experiment's predictive ability.

Condition Number: The condition number is a measure of the predictive quality of a design. The log of the condition number tells you how many significant figures you will lose from the number of digits your computer carries. Clearly a log(Condition Number) = 0 is best. Keeping this below 6 is acceptable for most computers.

Average Variance of Prediction: The smaller the better. This indicates the effect the design has on the uncertainty in your predictions. It does not include pure error. Aim for less than 5 if you can -- less than 1 is even better!

G Efficiency: This is the ability of a design to minimize the maximum variance of prediction. 100% efficiency is best. Keep this as high as you can.

Next time let's take a look at how you can determine a product's reliability with confidence.

## Friday, March 9, 2012

### Design Quality Measures

Industrial experiments have 2 fundamental goals:

1. Fundamental understanding

2. Predictive ability

Several design quality measures can help you judge how well a design can help you achieve these goals --

First let's look at quality measures that help us judge the ability of an experiment design to extract information about effects --

VIFs: VIFs are Variance Inflation Factors. They tell you how "clean" the estimate of an effect is. A VIF of 1 indicates that the effect is free from any contamination from other effects. A design with all VIFs equal to 1 is ideal. VIFs larger than 1 indicate some level of contamination from other effects. VIFs larger than 5 generally indicate such a high level of contamination that the design will not separate effects well enough to learn anything about them. The model may still predict well -- you just won't be able to rank the effects.

Correlation Matrix: The correlation matrix also measures how "clean" effect estimates will be. The correlation matrix is more detailed in identifying the source of the contamination. A perfect correlation matrix has all ones on the main diagonal and zeroes everywhere else. This means each effect is correlated perfectly with itself and not at all correlated with anything else. Any off diagonal elements that are not zero indicate some level of correlation between 2 effects --

Relative Variance of Coefficients: The power for identifying various coefficients should as high as possible to rank the effects accurately. Try to keep the power over 0.8. As before, a model built from a design with low power for the relative variance of the coefficients could still predict well.

Next time let's look at design quality measures that tell us about predictive ability.

1. Fundamental understanding

2. Predictive ability

Several design quality measures can help you judge how well a design can help you achieve these goals --

*before you collect any data*!First let's look at quality measures that help us judge the ability of an experiment design to extract information about effects --

*fundamental understanding*.VIFs: VIFs are Variance Inflation Factors. They tell you how "clean" the estimate of an effect is. A VIF of 1 indicates that the effect is free from any contamination from other effects. A design with all VIFs equal to 1 is ideal. VIFs larger than 1 indicate some level of contamination from other effects. VIFs larger than 5 generally indicate such a high level of contamination that the design will not separate effects well enough to learn anything about them. The model may still predict well -- you just won't be able to rank the effects.

Correlation Matrix: The correlation matrix also measures how "clean" effect estimates will be. The correlation matrix is more detailed in identifying the source of the contamination. A perfect correlation matrix has all ones on the main diagonal and zeroes everywhere else. This means each effect is correlated perfectly with itself and not at all correlated with anything else. Any off diagonal elements that are not zero indicate some level of correlation between 2 effects --

*contamination*. Most experts advise keeping off diagonal elements between -0.95 and 0.95. Once again, a design with a poor correlation matrix may allow you to fit a model that predicts well, but you won't be able to rank the effects.Relative Variance of Coefficients: The power for identifying various coefficients should as high as possible to rank the effects accurately. Try to keep the power over 0.8. As before, a model built from a design with low power for the relative variance of the coefficients could still predict well.

Next time let's look at design quality measures that tell us about predictive ability.

Subscribe to:
Posts (Atom)