| |
Sample Size Calculator

Sample Size Stuff

The confidence interval is the plus-or-minus band around the
reported result. For example, if your confidence interval was 2, and the
survey results were 50% in your favor, then the results will fall between
48-52%.
The confidence level tells you how sure the results actually
are. For any scientific study, one where the results really matter,
most researchers use the 95% confidence level, or at least start at this
level and then try to rise higher. When used together, you can make statements
such as, "I am 95% sure that 48-52% of the population feels this way.
Unfortunately, this also means that the following statement and survey/poll
results could also be completely valid. "I am 25% confident that
75-85% of the population feels
this way." This, while valid, is somewhat worthless due
to the extremely low confidence level. For market research, this might be
useful when targeting an audience for a product as that percentage of the
population could turn into sales. For more scientific processes,
this just isn't accurate enough. Standard Deviation:
In summary, this is the average distance away from the entire group's average that
each data point resides. If we were to say, 68% of all of my data resides
within 2 standard deviations, then you have a nice bell shaped curve. (2
standard deviations in either direction).

If we were to say that 68% of
the data resides within 1 standard deviation, then clearly the data is closer to
the average, and the bell shaped curve is "skinnier." So,
as your confidence level goes up, you must go out more standard
deviations. (Note, your actual standard deviation value goes down as more
data falls right around the average.) These numbers are used often
in statistics and have been simplified into a Zval, or the number of standard
deviations you must include to get to a specific confidence level.
Or, how far out you can go with your data and still get agreement. This table is in any statistics textbook, and the calculator above uses them
directly. (Standard two-tail... "chop off both ends of the bell curve
since the standard deviation extends in both the plus and minus
directions") In summary, the above calculator will help determine how many
samples you need to evaluate in order to achieve your desired confidence
level. Obviously, if you take a very small number of samples, then your
confidence level will also be small. How accurate your results are depends
on how important the data is to you and you constituents.
Factors that
Affect Confidence Intervals

There are three factors that determine the size of the confidence interval
for a given confidence level. These are: sample size, percentage and population
size.
Sample Size
The larger your sample, the more sure you can be that their answers truly
reflect the population. This indicates that for a given confidence level,
the larger your sample size, the smaller your confidence interval. However,
the relationship is not linear (i.e., doubling the sample size does not halve
the confidence interval).
Population Size
How many people are there in the group your sample represents? This may be
the number of people in a city you are studying, the number of people who
buy new cars, etc. Often you may not know the exact population size. This
is not a problem. The mathematics of probability proves the size of the
population is irrelevant, unless the size of the sample exceeds a few percent
of the total population you are examining. This means that a sample of 500
people is equally useful in examining the opinions of a state of 15,000,000
as it would a city of 100,000. For this reason, The Survey System ignores
the population size when it is "large" or unknown. Population size is only
likely to be a factor when you work with a relatively small and known group
of people (e.g., the members of an association).
The confidence interval calculations assume you have a genuine random
sample of the relevant population. If your sample is not
truly random, you cannot rely on the intervals. Non-random samples usually
result from some flaw in the sampling procedure. An example of such a flaw
is to only call people during the day, and miss almost everyone who works.
For most purposes, the non-working population cannot be assumed to accurately
represent the entire (working and non-working) population. There are
many items that impact the collected data. You can, and probably should
spend considerable time studying the data collection process to reduce or
eliminate them. A few terms that can cause flaws in your data collection
are:
 | Response Bias: This happens when forms or surveys are mailed,
or people are directed to a website (for example) to fill out a
survey. Those people who feel the strongest about a topic may
complete the survey, while those "normals" just skip
it. This sways the results. |
 | Only calling people: This discriminates against those people
without phones, or people without immediate access to a phone such as an
office or factory with shared extensions. |
 | Only a web based: This discriminates against people without
easy internet access. |
 | Non-random due to regional issues: Various regions of the
country, city, or county where data is taken from could impact the results. |
 | Failure to recognize where stratification is required: For
example, suppose the "pollee" has some motivation to come out
strongly against you. "Pollees" with this same
motivation need to be stratified out, and their results need to be modified
before those results can be applied as a sample. Another common
example is an arena where support service is being surveyed. If
the support was good, but the answer was unpleasant, the results will be
biased and not truly reflect the quality of the support service. |
|