6.7: Normal distribution and the normal deviate (2024)

  1. Last updated
  2. Save as PDF
  • Page ID
    45069
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vectorC}[1]{\textbf{#1}}\)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}}\)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}\)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}}}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction

    In Chapter 3.3 we introduced the normal distribution and the Z score, aka normal deviate, as part of a discussion about how some knowledge about characteristics of the dispersion of our data sampled from a population could be used to calculate how many samples we need (the empirical rule). We introduced Chebyshev’s inequality as a general approach to this problem, where little is known about the distribution of the population, and contrasted it with the Z score, for cases where the distribution is known to be Gaussian or the normal distribution. The normal distribution is one of the most important distributions in classical statistics. All normal distributions are bell-shaped and symmetric about the mean. To describe a normal distribution only two parameters are needed: the population mean, \(\mu\), and the population standard deviation, \(\sigma\). The normal distribution with mean equal to zero and standard deviation equal to one is called the standard normal, or Z distribution. With use of the Z score, any normal distribution can be quickly converted to the standard normal distribution.

    Proportions of a Normal Distribution

    This concept will become increasingly important for the many statistical tests we will learn over the next few weeks. What is the proportion of the populations that is greater than some specific value? Below, again, I have generated a large data set, now with population mean \(\mu = 5\) and \(\sigma = 2\). The red line corresponds to the equation of the normal curve using our values of \(\mu = 5\) and \(\sigma = 2\).

    6.7: Normal distribution and the normal deviate (2)

    Note that this is a crucial step! We assume that our sample distribution is really a sample from a population density (= “area under the curve”) function (= “an equation”) for a normal random (= “population”) variable.

    Once I (you) make this assumption, then we have powerful and easy to use tools at our command to answer questions like:

    Question: What proportion of the population is greater than 7? (colored in blue).

    This gets to the heart of the often-asked question, How many samples should I measure? If we know something about the mean and the variability, then we can predict how many samples will be of a particular kind. Let’s solve the problem.

    The Z score

    We could use the formula for the normal curve (and a lot of repetitions), but fortunately, some folks have provided tables that short-cut this procedure. R and other programs also can find these numbers because the formulas are “built in” to the base packages. First, let’s introduce a simple formula that lets us standardize our population numbers so that we can use established tables of probabilities for the normal distribution.

    Below, we will see how to use Rcmdr for these kinds of problems.

    However, it’s one of the basic tasks in statistics that you should be able to do by hand. We’ll use the Z score as a way to take advantage of known properties of the standard normal curve.

    \[Z = \frac{\left(X_{i} - \mu\right)}{\sigma} \nonumber\]

    \(Z = 1\) (with the mean = 0 and SD = 1). \(Z\) (say “Z-score”) is called the normal deviate (aka “standard normal score”; it is also called the “Z-score”); it gives us a shortcut for finding the proportion of data greater than 7 in this case).

    We use the normal deviate to do a couple of things; one use is to standardize a sample of observations so that they have a mean of zero and a standard deviation of one (the Z distribution). The data would then said to have been normalized.

    The second use is to make predictions about how often a particular observation is likely to be encountered. As you can imagine, this last use is very helpful for designing an experiment — if we need to see a specified difference, we can conduct a pilot study (or refer to the literature) to determine a mean and level of variability for our observation of choice, plug these back into the normal equation and predict how likely we can expect to see a particular difference. In other words, this is one way to answer that question — how many observations need I make for my experiment to be valid?

    Table of normal distribution

    A portion of the table of the normal curve is provided at our web site and in your workbook. For our discussions, here’s another copy to look at (Fig. \(\PageIndex{2}\)).

    6.7: Normal distribution and the normal deviate (3)

    See Table 1 in the Appendix for a full version of the normal table.

    We read values of \(Z\) from the first column and the first row. For \(Z = 0.23\) we would scan the top row, scoot over to the fourth column, then trace to where the row and column intersect (Fig. \(\PageIndex{3}\)); the frequency of occurrence of values at \(Z = 0.23\) is 0.409046, or 40.9% (Fig. \(\PageIndex{3}\)).

    6.7: Normal distribution and the normal deviate (4)

    Z on the standard normal table is going to range between \(-4\) and \(+4\), with \(Z = 0\) corresponding to \(0.500\). The Normal table values are symmetrical about the mean of zero.

    What to make of the values of Z, from \(-4, -3, \ldots +2, +3,\) up to \(+4\) and beyond? These are the standard deviations! Recall that using the Z score you corrected to a mean of zero (got it!), and a standard deviation of one! \(Z = 2\) is twice the standard deviation; a \(Z = 3\) is therefore three times the standard deviation, and so forth. The distribution is symmetrical: you get the same frequency for negative as for positive values. So on the “X” axis on a standard normal distribution, we have units of standard deviation plus (greater) or minus (less) than the mean. In Figure \(\PageIndex{4}\), the area under the curve representing less than \(-1\) standard deviations is highlighted.

    6.7: Normal distribution and the normal deviate (5)

    Question. How many multiples of standard deviations would you have for a Z score of \(Z = 1.75\)?

    Answer = 1.75 times

    Examples

    See Table 1 in the Appendix for a full version of the normal table as you read this section.

    What proportion of the data set will have values greater than \(7\)? After applying our Z score equation, I get \(Z = 1.0\), which translates to a frequency of 0.1587 or 15.87% that the observations are greater than \(7\).

    What proportion of the data set will have values less than \(-7\)? After applying our Z score equation, I get \(Z = -1.0\). Taking advantage of the symmetry argument, I just take my \(Z = -1.0\) and make it positive — instead of values smaller than \(-1.0\), we now have values greater than \(+1.0\). And for \(Z = 1.0\), 0.1587 or 15.87% of the observations are greater than \(7\), which means that 15.87% will be \(-7\) or smaller.

    What proportion of the data set will have values greater than \(8\)? Again, apply the Z score equation. I get that for \(Z = 1.5\), 0.0668 or 6.68% of the observations are greater than Z.

    What proportion of the population is between \(5\) and \(7\)? Draw the problem, as shown in Figure \(\PageIndex{5}\), where the subset of the population between 5 and 7 is colored red.

    6.7: Normal distribution and the normal deviate (6)

    Worked problem

    \(1 - (\text{proportion beyond } 7) - (\text{proportion less than } 5)\)

    \(1 - (0.1587) - (\text{proportion less than } 5)\)

    And the proportion less than 5?

    Use the Z-score equation again. Now we find that \(Z = 0\) and look up this Z-value in the table, which shows a 0.5 proportion or 50.0%.

    Therefore, the proportion between \(5\) and \(7\) equals

    \(1 - 0.1587 - 0.50 = 0.3413\)

    Answer = 34.13% of the observations are between 5 and 7 when \(\mu = 5\) and \(\sigma = 2\).

    Questions

    1. Repeat the worked problem, but this time, find the proportion

    • between 2 and 6.
    • between 3 and 5.
    • less than 5.
    • greater than 7.
    6.7: Normal distribution and the normal deviate (2024)

    References

    Top Articles
    Latest Posts
    Recommended Articles
    Article information

    Author: Terence Hammes MD

    Last Updated:

    Views: 5563

    Rating: 4.9 / 5 (49 voted)

    Reviews: 88% of readers found this page helpful

    Author information

    Name: Terence Hammes MD

    Birthday: 1992-04-11

    Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

    Phone: +50312511349175

    Job: Product Consulting Liaison

    Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

    Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.