|
Banner
Tables
|
- Banner
tables are the most common type of quantitative output in
the market research industry. These outputs are typically
called Tabs or Banners and they detail
the numeric distributions and relationships found in survey
data. They look very similar to a spreadsheet, where the columns
represent the key groups in the study (e.g., Users, Non-Users,
Men, Women, etc) and the rows typically summarize every response
to every significant variable in the survey. In addition,
summary measures like the mean of a variable or a % Top Two
Box measure are usually reported along with statistical tests
to show where the columns (i.e., key groups) have a statistically
significant difference
- An
abbreviated example of a banner table follows:
- How
Likely Are You To Purchase Something On The Internet This
Year?
| |
|
AGE
GROUP
|
GENDER
|
| |
TOTAL
|
18-34
|
35+
|
MALE
|
FEMALE
|
| |
(A)
|
(B)
|
(C)
|
(D)
|
(E)
|
| Base:
Total Sample |
157
|
50
|
107
|
75
|
82
|
| |
100.0%
|
100.0%
|
100.0%
|
100.0%
|
100.0%
|
| |
|
|
|
|
|
| {1}
Very Likely |
37
|
22
|
15
|
18
|
19
|
| |
23.6%
|
44.0%
|
14.0%
|
24.0%
|
23.2%
|
| |
|
C
|
|
|
|
| {2}
Somewhat Likely |
78
|
20
|
58
|
38
|
40
|
| |
49.7%
|
40.0%
|
54.2%
|
50.7%
|
48.8%
|
| |
|
|
b
|
|
|
| {3}
Not At All Likely |
38
|
7
|
31
|
18
|
20
|
| |
24.2%
|
14.0%
|
29.0%
|
24.0%
|
24.4%
|
| |
|
|
B
|
|
|
| Not
Sure |
4
|
1
|
3
|
1
|
3
|
| |
2.5%
|
2.0%
|
2.8%
|
1.3%
|
3.7%
|
| |
100.0%
|
25.0%
|
75.0%
|
25.0%
|
75.0%
|
| |
|
|
|
|
|
| Mean
|
2.01
|
1.69
|
2.15
|
2.00
|
1.96
|
| |
|
C
|
|
|
|
|
- Comparison
Groups: BC/DE
- Independent
T-Test for Means, Independent Z-Test for Percentages
- Upper
case letters indicate significance at the 95% level.
- Lower
case letters indicate significance at the 90% level.
|
|
Causal
Modeling
|
- Causal
modeling is a data modeling technique that is known by several
names, including structural modeling, path modeling, and analysis
of covariance structures. This sophisticated extension of
linear regression analysis offers two primary advantages.
First, it can solve multi-equation models that simulate complex
systems or process. Second, it gets around some of the assumptions
and limitations of standard regression modeling.
|
- As
an example, suppose you wanted to make a better soft drink.
You might start by measuring the impact of product performance
attributes (i.e., sweetness, amount of carbonation, number
of calories, etc) on the overall rating of leading soft drinks.
One typical way to do this is to regress the overall rating
on the attribute ratings. This is very easy to do in a variety
of statistical programs or even spreadsheets, but the results
they produce are based on several assumptions. These are usually
referenced as BLUE (Best Linear Unbiased Estimator) or "all
things being equal" if they are mentioned at all. Regression
actually makes quite a few assumptions about the data and
the model being solved, including that the model is correctly
specified and that the independent variables are not
correlated.
|
- Virtually
every set of attributes ever put on a questionnaire has had
some degree of correlation between the individual attributes.
Usually there are several that are at least moderately correlated.
There are statistical procedures (i.e., factor analysis) for
dealing with correlated independent variables, though often
times the correlated attributes are used as inputs to the
regression model. Suppose the soft drink model creating using
standard showed that both sweetness and the number of calories
were related to the overall rating of a soft drink. Then the
regression coefficients would indicate the impact, all
things being equal, that changing the perceived sweetness
level would have on the overall acceptance. But since the
sweetness level and the number of calories are correlated,
all things are definitely not equal, and there is a bias in
the model.
|
- The
potential model specification error is harder
to deal with. Regression assumes that the model (i.e., the
equation it was asked to solve) is an accurate representation
of the problem or system being studied with nothing
added and nothing left out. Getting back to the soft drinks,
if the brands are identified to the respondents, then the
image of the brands will have a significant impact on their
ratings. (Anyone who doubts this has never seen ratings of
the same products rated blind, identified, and misidentified.)
|
- Using
typical regression modeling you could add some image attributes
to the model, but the model would probably still be misspecified
because it is nearly impossible to capture every nuance of
a products image and performance. Some parts of these
are almost always left out or otherwise impossible
to quantify. A more accurate way to specify the model would
be to conclude that there are a series of performance attributes
that drive overall Product Performance and a series
if image attributes that drive overall Product Image,
and these in turn drive the overall product rating.
|
- Measuring
the overall performance and image of a product is similar
to measuring a persons IQ. They cant be measured
directly, but can be derived from a series of indicators.
Causal modeling will derive the measures (called unobserved
exogenous variables), and parcel out the impact of each
on the overall rating. And since image has an impact on taste,
the direct effect of image, and the indirect effect of image
(through its impact on product performance) on the overall
rating can be computed. Further, if taste in turn has an impact
on image, that effect can be quantified as well. Graphically,
this would appear as follows:
|
|
- The
arrows or paths in the diagram represent the flow of 'causality'
(i.e., effect) in the model. These indicate that there is
a statistically significant relationship between the variables.
Sometimes the path coefficients (i.e., regression coefficients)
are included on the arrows to indicate the impact one variable
has on the next. They have been omitted in this example.
|
|
Cluster
Analysis
|
- Cluster
analysis is a statistical procedure that attempts to classify
a group into like or homogeneous sub-groups. It
is usually used as a segmentation tool where people are grouped
together into segments based on their attitudes, behaviors,
demographics, or some combination of these. However cluster
analysis can also be used to cluster variables (instead of
cases) into like groups as well. The task is very analogous
to a coder developing a code list in that individual responses
are read and classified into groups that capture the common
meaning.
|
- Cluster
analysis is often considered to be more of an art than a science.
Of all the common statistical procedures, cluster analysis
gives the least statistical guidance as to whether the solution
it generates is meaningful or not. The cluster analysis algorithm
does not tell the researcher the correct number
of clusters in a data set. Instead, the researcher has to
produce and examine a number of different cluster solutions
and decide which solution is the best. So the analyst may
generate cluster solutions for two clusters, three, four and
so on up to 10 or more clusters. Between different clustering
algorithms, number of clusters produced, and options for how
the data is processed, a considerable number of cluster solutions
can be generated.
|
- To
evaluate the solutions, the researcher generally compares
the individual groups (i.e., start by comparing the groups
in the two cluster solution, then compare the groups in the
three cluster solution and so on) for each solution on a series
of demographic, attitudinal or other measures. Other statistical
procedures can be used in the evaluation process, but often
times the analyst tried to interpret each solution by how
it fits with the other variables, and chooses the solution
that seems to fit the best.
|
- Kohonen
Self Organizing Maps (SOMs) are a form of Neural Network (an
Artificial Intelligence technology) that also clusters
cases into like groups using a different mathematical approach.
To the researcher, SOMs do just what a K-Means cluster program
does, but in a different way. However, if a SOM and K-Means
cluster program are told programmed to produce the same number
of cluster groups, the cases will be assigned somewhat differently.
Often times the SOM solution will be superior.
|
|
Conjoint
Analysis
|
- Conjoint
analysis is a useful tool in predicting choice behavior. It
is a versatile marketing research technique that can provide
valuable information for new product development and forecasting,
market segmentation and pricing decisions. This technique
can be used to address numerous questions including:
|
- Which
new products will be successful?
- Which
features or attributes of a product or service drive the purchase
decision?
- Do
specific market segments exist for a product?
- What
advertising appeals will be most successful with these segments?
- Will
changes in product design increase consumer preference and
sales?
- What
is the optimal price to charge consumers for a product or
service?
- Can
price be increased without a significant loss in sales?
|
- Conjoint
analysis provides insight and understanding as to how individuals
value features (or "attributes") of products or
services by determining their tradeoffs between different
"levels" of those features. Conjoint analysis examines
these tradeoffs to determine the combination of attributes
that will be most satisfying to the consumer. In other words,
by using conjoint analysis a company can determine the optimal
features for their product or service.
|
- In
addition to providing information on the importance of product
features, conjoint analysis provides the opportunity to conduct
computer choice simulations. Since conjoint quantifies the
value of each product feature it is possible to perform various
"what if" scenarios and estimate preference levels
of hypothetical products. Simulations such as these are very
useful in determining potential market share of products or
services before they are introduced to the market!
|
- In
sum, the value of conjoint analysis is that it predicts what
products or services people will chose and assesses the weight
people give to various factors that underlie their decisions.
As such, it is one of the more powerful, versatile and strategically
important research techniques available.
|
|
Correlation
and Regression
|
- Correlation
is the statistical measure that quantifies the linear relationship
between two variables. If you look at a scatter plot of two
variables, their correlation is the slope of the best
fitting straight line that can be drawn through the
points. If the line rises (traveling left to right) the slope
is positive, which means that as one variable increases, the
other also increases. If the line falls, the opposite is true:
the slope is negative and as one variable increases, the other
decreases. Further, the size of the correlation measures the
size of the resulting rise or fall. So if a correlation was
.5, that would mean that for each unit one variable increases,
the other variable will increase by half a unit. A correlation
of -.75 would mean that for each unit one variable increases,
the other decreases by ¾ of a unit.
|
- Regression
is an extension of correlation analysis that will predict
the value of one variable (the dependent variable) based on
the values of one or more predictor or independent
variables. In a bi-variate regression (i.e., the dependent
variable and one independent variable), the main difference
between regression and correlation is that regression adds
an intercept term. Thinking of the line, the intercept
is the point where the line crosses the Y-axis. A bi-variate
regression produces a the general formula for a line:
|
- y
= a + bx where: y is the predicted value of the dependent
variable
- a
is the intercept
- b
is the slope of the line
- x
is the value of the independent variable to be predicted
|
- A
multiple regression analysis adds more independent variables,
and extends the equation above to include additional independent
variables, each having their own slope.
|
- Regression
is typically used whenever a prediction is required. Typical
uses of regression in market research include predicting market
share, coupon redemption rates, product acceptance scores,
customer satisfaction or awareness and so on.
|
|
Data
Mining
|
- Data
Mining is the name given to a class of analytical techniques
used to discover patterns, trends, and relationships in customer
databases and other business information. This process can
be an invaluable aid by providing a better understanding of
customers and markets, and can ultimately lead to increase
revenues, and customer satisfaction.
|
- The
techniques that constitute Data Mining, and their application
are quite broad. While the individual applications of Data
Mining technology tend to differ from one problem to the next,
there are several steps common to most analyses. The process
starts with compiling available information that usually exists
in one or more corporate databases. This data is frequently
augmented, or overlaid with additional information
such as attitudinal, demographic, or lifestyle data. Once
the data has been cleaned and combined, it is ready for analysis.
The analysis is generally done in two steps; knowledge discovery
and verification. Ultimately, the results of the analysis
are used to aid in the development of marketing programs,
pricing plans, new products, and so on.
|
|
Compiling
Information
|
- Most
organizations have a wealth of information about their customers,
products or services. But this information tends to be distributed
across numerous departments, databases, and computer systems.
While any one database can be the starting point for Data
Mining, there is often considerable synergy in combining information
from multiple sources. For example, by mining sales data combined
with attitudinal measures and overall category usage, detailed
buyer profiles, and models of the purchase dynamics can be
generated.
|
|
Overlay
Files
|
- It
is often desirable to supplement the in-house information
with data from an outside source. This information is usually
either existing information sold by a service bureau, or new
information collected specifically for the project. The existing
information, often called secondary data, is generally
demographic (income, assets), lifestyle information (cluster
codes), or market statistics (size, sales). These overlay
files are often used to provide information to use in predictive
models or customer segmentation.
|
- It
can also be very useful to collect new information about customers
or markets to add data that would otherwise be unavailable
for the analysis. This is frequently done to quantify the
link between attitudes and behaviors, to find leverage points
for marketing programs, or to better understand market dynamics.
Telephone surveys are frequently used to collect attitudinal
data, product category usage, loyalty measures, advertising
recall, and a host of other measures.
|
|
Cleaning/Combining
the Data
|
- Before
the data can be analyzed it needs to be cleaned and/or combined.
Cleaning the data generally involves the removal of impossible
or out of range values, and implementing a strategy to handle
missing information. Combining the data often requires additional
steps and foresight to convert the data into a common analytical
framework. For example, transaction level data might need
to be summarized into time periods before being combined with
household demographics.
|
|
Data
Analysis
|
- Once
the data are ready, the analysis phase generally consists
of two different steps with a reporting period in between.
The first step in the analysis is called knowledge discovery.
In this phase, smart algorithms search through
the data looking for patterns or relationships. These algorithms
are typically Chaid (Chi Square Automatic Interaction Detection)
or Cart (Classification And Regression Trees) procedures,
though Neural Nets, Genetic Algorithms, and other hybrid systems
are also used. They generally take one user-specified variable
called the dependent variable, and try to relate
every variable in the file to that variable. Some algorithms
can look for linear, and non-linear relationships, as well
as transform the variables in a variety of ways to maximize
their relationships. Relationships are generally reported
as decision trees, which are an easily understood way of presenting
information.
|
- Data
Mining analyses typically relate hundreds and even thousands
of variables to several dependent variables of key interest.
Since many algorithms are free to manipulate the variables
to maximize their relationships, it is not uncommon for an
analysis to yield hundreds of significant relationships.
These relationships are simply measures of statistical association,
and are often spurious or otherwise of little importance,
and they are therefore considered to be hypotheses about relationships
in the data, which need to be studied further.
|
- This
information is generally discussed with the hands on
users or other researchers, and the number of hypotheses is
filtered down to focus on the most promising avenues for further
analysis. This second step in the analysis is generally called
validation, and usually relies on common statistical techniques
like regression, discriminant analysis, and cluster analysis.
This step usually includes some form of quantification of
trends or market opportunities, prediction, segmentation,
or response modeling. The ultimate goal of the analysis is
generally to either increase revenues through a better understanding
of the customer, or else to develop better predictive models
to use as forecasting tools.
|
|
Discriminant
Analysis
|
- Discriminant
Analysis is used to relate a categorical dependent variable
to a series of independent variables. It is similar to Regression
Analysis, except instead of predicting the value of the dependent
variable, Discriminant predicts the category of the dependent
variable. It does this by constructing linear combinations
of predictor variables that best distinguish between the groups
of the independent variable.
|
|
- This
technique has a wide range of applications. For example, identifying
the factors that distinguish satisfied customers from dissatisfied,
concept acceptors from rejecters, your customers from your
competitors customers.
|
|
Factor
Analysis
|
- Factor
analysis is a data reduction technique that tries to reduce
a list of attributes or other measures to their essence; that
is, a smaller set of factors that capture the
patterns seen in the data. Marketers and researchers who study
a product, service, or industry professionally sometimes perceive
many more distinctions within their category than do their
consumers. This can lead to questionnaires containing attribute
lists that consumers see as somewhat or largely synonymous.
Factor analysis tells you how many different core factors
the consumers perceived out of the list of attributes they
rated.
|
- The
main benefits of factor analysis are that the analyst can
focus their attention on the unique core elements instead
of the redundant attributes, and as a data pre-processor
for regression models.
|
|
TURF
Analysis
|
- To
contrast TURF Analysis with typical methods, consider an example
with three possible flavors (A, B and C) of a product, where
the "best" two flavors will be brought to market.
A typical analysis might look at the % Top Two Box purchase
intent for each flavor and conclude that the best two flavors
to market are the ones with the two highest scores. If flavors
A, B, and C receive % Top Two Box scores if 80%, 75%, and
40% respectively, you could conclude that A and B are the
best two flavors. But if the vast majority of people who would
buy flavor B would also buy A, the incremental gain by offering
B is small. If the overlap between A and C is fairly small,
even though C appeals to the fewest people in total, the combination
of marketing flavors A and C will appeal to more people than
the combination of A and B.
|
- TURF
is an acronym that stands for Total Unduplicated Reach and
Frequency. This research technique originated in advertising
and media research as a tool to maximize the number of people
(i.e., Reach) who would be exposed to an advertisement per
unit of cost. By analyzing the overlap between mailing or
subscription lists, those lists with the lowest percent of
overlap are identified. By comparing the number of non-duplicated
people (i.e., Total Unduplicated) to the list costs, the most
economical method of reaching the largest number of people
can be calculated.
|
- TURF
Analysis is very useful for market research as well, especially
when used to optimize potential product or promotional offerings.
Instead of examining duplication across lists or other media
sources, purchase intent scores are analyzed for a series
of promotional offers or product elements (flavors, sizes,
etc). By optimizing the unduplicated purchase intent of potential
products or line extensions, the largest number of consumers
can be appealed to with the fewest number of products or offers.
TURF Analysis can also take into account different cost structures
to produce the products, and help to optimize the profitability
of a line extension or brand family.
|
|
- To
contrast TURF Analysis with typical methods, consider an example
with three possible flavors (A, B and C) of a product, where
the "best" two flavors will be brought to market.
A typical analysis might look at the % Top Two Box purchase
intent for each flavor and conclude that the best two flavors
to market are the ones with the two highest scores. If flavors
A, B, and C receive % Top Two Box scores if 80%, 75%, and
40% respectively, you could conclude that A and B are the
best two flavors. But if the vast majority of people who would
buy flavor B would also buy A, the incremental gain by offering
B is small. If the overlap between A and C is fairly small,
even though C appeals to the fewest people in total, the combination
of marketing flavors A and C will appeal to more people than
the combination of A and B.
|
|