Operational Excellence

Lean Six Sigma Measure Section 1

DefineMeasure Analyze – Improve – Control

Introduction To Measure

The overall objective of the Measure Phase is to collect enough pertinent information, and customer process metrics, to establish the current state baseline of performance, and understand how the customer’s current process works, and how well it works.

Compared to DMAIC’s define phase, the measure phase is all about defining the process baseline, and all the activities involved in obtaining current state, baseline line information.

The following objectives need to be met:

  1. First, and most importantly, we need to establish the current baseline performance of the process we are attempting to improve.
  2. Once we establish the current baseline performance, we now have a reference source to measure any changes we make to the process.
  3. We need to have a thorough understanding of the condition of the existing process, along with the sources of what’s causing the problems, and the impact caused by the problems.
  4. Identify the source of the problems and establish process performance indicators (metrics), and determine what to measure, where to measure, and when to take measurements.
  5. We need to collect enough detailed data, so we have a clear understanding of the problem we are trying to fix. This will allow for a comparison in performance before and after improvements are implemented.
  6. Before collecting data, a Data Collection Plan needs to be developed.

I can’t emphasize enough, the importance of setting the baseline for understanding the process and for monitoring the key performance measures associated with the process.

Tools and Templates are included with the Training Course Subscription.
Tools below with a * indicator are automated and downloadable.

Before we start collecting data and begin the measurement process, we need to understand the following details of the following items:

What is a Measure?

A measure is the action of finding out the size, weight, force, length, or the amount of something.

  • Length of time (speed, age)
  • Size (length, height, weight)
  • Dollars (costs, sales revenue, profits)

Why we need to Measure?

To assess the performance of a process we wish to improve.

  • Performance Measure indicators

Without measurements, we have no idea where we are, where we’re going, or even if we ever got there!

The next important question before we start collecting data.

What Is “Data”?

  • Data is information.
  • Data can be numbers, words, measurements, and observations.
  • Data can be information used in statistics for making conclusions.

What Are the Various Types of Data?

  • Continuous: Data that can be measured but not counted
    • Time (Hours, Minutes and Seconds)
    • Height (Feet, Inches, Fractions of an inch so on)
  • Discrete/Attribute: Data that have a quality characteristic (or attribute) that meets or does not meet product specification. These characteristics can be categorized and counted, such as, sorting and counting the number of blemishes in a particular product (defects).

Discrete Data can be three different types:

  • Ordinal Data

Discrete Ordinal Data is a set of data that can be counted and set in order but cannot be measured. For example, Data put in an order from 1st to 5th.

  • Nominal Data
  • Discrete Nominal Data is descriptive, and not numeric.

 For example:  names, phone numbers, colors, and type of car.

  • Binary Data
  • Discrete Binary Data is qualitative or categorical.

For example: Yes/No, Pass/Fail, On/Off, Male/Female, Good/Bad etc.


The term Y = f(x) means Y is a function of X”, the outputs of a process are the (Ys) and the drivers of the process are the input (X‘s) within the processes. Our goal is to identify which process inputs have a major influence on the process output measures.

Identify Key Input, Process and Output Metrics

We need to Identify what metric information must be gathered to determine what are the ‘root causes’ of current process performance.

  • What are Input Measures: Input measures are referred to as (X’s) and they are behaviors that we have control over and are dependent on us.
  • Process Measures quantify whether an activity has been accomplished. These measures are specific steps in a process that lead to a particular outcome metric.

Outputs Measures are the results of the input (X’s) measures and quantify the overall performance of the process.

What is a SIPOC Diagram? [Excel File Template Included]

A SIPOC is a one-page high-level process map that summarizes the Suppliers, Inputs, Processes, Outputs, and Customers for a complex process.

Once your project has been selected, the following tools will be used in the DMAIC DEFINE Phase Section #1 and #2:


What Is a Data Collection Plan? [Excel File Template Included]

  • A data collection plan is a document that describes the exact steps to be followed in gathering the data for the given project ensuring everyone gathering the data is on the same page.

Accurate And Reliable Data Ensure Corrective Actions Are Based on Facts

Rather Than Assumptions and Opinions.

  • Developing A Data Collection Plan

In developing a data collection plan, we need to consider the following:

  • Decide what data is needed to “baseline” our problem.
  • Where to collect the data?
  • How are we going to collect the data?
  • How much data do we need to collect?
  • Who will gather the data?

Have a clear understanding of the definitions and relationships between the Output “Ys”, and the Input “X” Variables.

Identify Data Sources   Existing Data vs. New Data

  • Existing Data: Data that currently exist and readily available to be collected.
  • New Data: Capturing and recording observations.

Collecting Data Using Check Sheets

The check sheet is a form (document), which is a great source used to collect data in real time, at the location where the data is generated.

The check sheet provides an enhanced ease of data collection for the following reasons.

  • Faster capture
  • Consistent data from different people
  • Quicker to compile data

 Check Sheet [Excel File Template Included]

lean six sigma and statistical analysis

DATA STRATIFICATION [Excel File Template Included]

Stratification is defined as the act of sorting data, people, and objects into different categories and keying in on the (Ys) of the Process Outputs.

Since Six Sigma is a fact-based data approach to problem solving, we need to Identify and defined the performance gap, collect the data concerning the performance of our process, and then analyze the data to find the potential root causes of the problem.

Data stratification is a tool that we use to organize the collected data for analysis.  The time to think about data stratification is before the data is collected, not after.  We also need to make sure, that the collected data will be useful for analysis.

OPERATIONAL DEFINITIONS [Excel File Template Included]

An operational definition is a clear, concise, detailed definition of a measure.

There are times when operational definitions are vague and may lead to a misunderstanding.

For example, when someone says a loan is “closed” they may mean papers have been sent, but not signed; another person may mean signed but not funded; a third person might mean funded but not recorded.

DATA MEASUREMENT PLAN [Excel File Template Included]]

Measurement is a critical part of testing and implementing changes.  Measures tell you whether the changes you are making are leading to improvement. With the right set of balanced measures and a good data collection, there will be no guesswork in your improvement effort.

An effective measurement plan is more than a list of measures. There are a number of important aspects that are needed to be included and defined for each measure, together they will form your plan.

SAMPLING AND SAMPLE SIZE [Automated Excel tool included]

In Statistics, we come across two types of data:

  • Population data is a large amount of data that includes the entire area of study, that is why it’s termed as population.

Sampling is using a smaller group to represent the whole, which saves time, money, and simplifies measurements over time.


When data needs to be collected so we can measure the baseline performance, measuring every single item in the population can be costly, time consuming and not very realistic to ask the entire population their opinion on a certain matter.  Instead, we can draw conclusions using sampling from a larger population to determine how that population behaves, or likely to behave.

Sampling Methods and Strategies:

What is sampling BIAS?

Selecting a sample that does not represent the whole.

  • Judgment Sampling: A sample based on someone’s knowledge assuming it will be “representative.” Judgment guarantees a bias and should be avoided.
  • Convenience: Sampling when it’s easier to gather the data from people you know.

Sampling Strategies

  • Random Sampling The most straightforward of all the probability sampling methods since it only involves a single random selection and requires little advance knowledge about the population. Picking names out of a hat.
  • Stratified: Stratified Sampling is a sampling method in which the total population is divided into smaller groups or strata. After dividing the population into subgroups of interest (gender, age range, race, or nationality), you can sample either sequentially or randomly within each subgroup.
  • Sequential: A non-probability sampling technique, where the researcher picks a single or a group of subjects at a given time interval.
  • Cluster: is a method of obtaining a representative sample from a population that has been divided into groups. An individual cluster is a subgroup that mirrors the diversity of the whole population because the set of clusters are like each other.

Minimum sample size from a population or a stable process can be estimated using complex formulas for:

  • Continuous Data Sample Size
  • Discrete Data Sample Size


Bias occurs when the sampled data does not accurately represent what was obtained in your research, it is an unbalanced representation of the collected samples. This occurs when the data that was collect, was done in such a way that some members of the intended population, have a lower or higher sampling probability than others in the sampled population.

  • Self-Selection Bias: Self-Selection Bias occurs when individuals with certain characteristics select themselves into the research sample.
  • Under Coverage Bias: Under Coverage Bias is a common type of sampling bias when some of the variables in the population are poorly represented, or not represented at all in the study sample.
  • Non-Response Bias Non-response Bias occurs when there is a significant difference between those who responded to your survey and those who do not respond to your survey.


Sample size is a common term used in statistics and market research; it most always comes up whenever you’re surveying a large population of respondent.

4 Automated Excel Sample Size Calculators are Included in Our Training Course.

Why does sample size matter?

The size of the sample is very important for getting accurate and statistically significant results for a successful study.

  • If your sample is too small, your data may include a disproportionate number of individuals, which would skew the results and present an unfair picture of the whole population.
  • If the sample is too big, the research becomes too complex, expensive and time-consuming, although the results would be more accurate, the benefits do not outweigh the costs.

Sample Size Calculation is choosing the right number of observations to include in a statistical sample. Getting the right number of observations is an extremely important feature of your study.

In choosing the correct sample size for your study, consider those few different factors that can affect your research. Taking these factors in mind, you’ll then be able to use a sample size calculator to bring everything together and sample confidently, knowing that there is a high probability that your survey is statistically accurate..

Sample Size Variables to Consider

In order to get an accurate sample size, you’ll need to consider following:

1. Survey Margin of Error:

Margin of error is the percentage of potential error in how your survey results reflect the views of the total population. There are times that errors are inevitable, the question is, how much error you’ll allow.

The margin of error is expressed in terms of mean numbers (the average). You can set how much difference you’ll allow between the mean number of your sample and the mean number of your population.

For example, a 95% confidence interval with a 4 percent margin of error means that your statistic will be within 4 percentage points of the real population value 95% of the time.

2. Confidence level:

Confidence level is a percentage informing you of how confident you can be in your results. Confidence level refers to the percentage of probability, or certainty, that the confidence interval would contain the true population parameter when you draw a random sample many times.

Confidence Level, is basically, the percent of time you would expect repeated samples to approximate the first sample you took from the same population. The most common confidence intervals are 90%, 95%, and 99% when using the random sampling method. If your confidence level is 95%, you are confident that you could replicate the results of the survey 95% of the time.

The most common confidence levels for a survey are 90%, 95%, and 99%.

When you make an estimate in statistics there is always uncertainty around that estimate because the number is based on a sample of the population you are studying.

The confidence interval is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way.

The confidence level is the percentage of times you expect to reproduce an estimate between the upper and lower bounds of the confidence interval,

The 2 figures below are normal distributions:

  • Figure 1 at 99.7% confident and at -3 and +3 = 6 sigma.
  • Figure 2 at 95% confident and at -2 and +2 = 4 sigma

              Figure 1                               Figure 2

3. Population Size

Population is the entire set of items from which you draw data for your survey. It can be a group of individuals or a set of items.  To overcome the restraints of a population, you can collect data from a subset of your population study.  By collecting subset information from the groups taking part in the study, makes the data reliable.

Sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample.

4. Standard Deviation (SD) [Automated Excel tool included with Subscription]

Everything you wanted to know about Standard deviation, and probably more!

Sometimes SD could be somewhat confusing.  Hopefully, this section will make it clear.

Standard Deviation is a measure of the variance between your sample values.   A low standard deviation means your survey results are closer together in value; a larger standard deviation indicates that the results of your survey are more spread out from the reported mean.

Another way of summarizing your data, is by measuring the average “spread” or variation between each data point and the mean. Standard deviations are important here because the shape of a normal bell curve is determined by its mean (average) and standard deviation.

As illustrated by the figure below (Normal Distribution), the Dark blue section of the bell curve is one standard deviation on either side of the mean, which accounts for 68.27 percent. The medium and dark blue sections of the bell curve account for 95.45 percent, which are two standard deviations away from the mean.  All three of the sections under the bell curve account for 99.73%, which are 3 standard deviations from the mean.

While the center of your process is important, knowing the spread is particularly important because each customer deserves to be provided with acceptable service.  It’s important to the customer whether their average wait time is 30 seconds or 30 minutes.  Standard deviation is a commonly used term in statistics for measuring this type of variation.

Z-Score [Automated Excel tool included with Subscription]

A Z-score measures how many standard deviations a particular survey response is from the mean. A Z-score is a numerical measurement that describes a value’s relationship to the mean of a group of values.

Z-score is measured in terms of standard deviations from the mean.  We calculate the z-Score using the following equation.

As an example, say at a manufacturing firm, the average time it takes to make a part was 44.3 minutes.  A new person was hired and at the end of his first week, the time it took him to make the part was 51.7 minutes – assuming the firm had already calculated the standard deviation as 5.9.

Z = (51.7 – 44.3) / 5.9 = 1.25

This tells us that the new hire’s production time is 1.25 standard deviations greater than the sample mean.  Application: If the firm determined that this result was consistent for new staff, they could better predict its production time and costs when new staff are brought in.

Z Score and Standard Deviation will be discussed further in “Measure Phase Article 2 and 3”.

Once Again, let me remind you, that the material we just covered was a very brief overview.

The material in our Lean Six Sigma Training Course is self-paced, thorough, detailed, user-friendly and to understand and apply.

The statistical tools and formulas can be somewhat time consuming and complex but are made easy with are automated Plug and Play Excel Tools for Quick Analysis. They are downloadable and yours to keep (50 plus).

Along with the automated tools you will also receive several editable Templates.

Check Out Our Invaluable Offering!

This concludes Part 1 for the DMAIC Measure Phase.

Any Questions, Give me a call!
Edward Florancic
(602) 617-9282

Edward “Rick” Florancic is a highly trained statistician and Certified Lean Six Sigma Black Belt Professional. Ed has been a Senior Manager for two Fortune 500 corporations for over 35 years, and a Lean Six Sigma Consultant over the past 15 years.