PUH 5302, Applied Biostatistics 1
Course Learning Outcomes for Unit II
Upon completion of this unit, students should be able to:
2. Analyze relevant scientific evidence. 2.1 Compute the appropriate data to compare the extent of disease between groups. 2.2 Summarize data collection in a sample.
7. Evaluate the role of biostatistical analysis in public health research. 7.1 Prepare an outline of a selected topic related to biostatistical analysis.
Course/Unit Learning Outcomes
Learning Activity
2.1 Unit Lesson Chapter 3 Unit II Problem Solving
2.2 Unit Lesson Chapter 4 Unit II Problem Solving
7.1 Unit II Research Paper Outline
Reading Assignment
Chapter 3: Quantifying the Extent of Disease
Chapter 4: Summarizing Data Collected in the Sample
Unit Lesson
Welcome to Unit II. In the previous unit, we had an overview of biostatistics as an applied statistics to help us find meaning and interpretation of some public health issues we face. We defined some terms and familiarized ourselves with some common study designs as well. We also briefly discussed some differences between differential and inferential statistics and their application in biostatistics.
In this unit, we will discuss how scientific data are organized, identified, and retrieved. Specifically, we will select, compute, and interpret appropriate measures to compare the extent of disease between groups, and we will discuss how to summarize collected data in a sample. Be prepared. We will be solving some statistical problems as part of our learning in this unit.
Quantifying the Extent of a Disease
In addressing prevalence and incidence rates, we have to deploy both descriptive and inferential statistics in order to make generalizations about a given population. The first method normally taken is that of descriptive statistics, which enables the researcher to generate inferences. These concepts help us address prevalence and incidence. First, let’s attempt to answer a question: What is disease prevalence?
Prevalence refers to something commonly occurring. With regard to diseases, prevalence is considered the number of cases of a disease showing up in a particular population at a given time (Koch, 2015). For comparison purposes, many experts use prevalence rate, which is the measure of the proportion of infected persons with a particular disease at a specific time or period of time. Many people confuse prevalence with
UNIT II STUDY GUIDE
Organizing, Identifying, and Retrieving Relevant Scientific Evidence
PUH 5302, Applied Biostatistics 2
UNIT x STUDY GUIDE
Title
incidence, but they are different. The difference lies in the fact that prevalence covers all cases, both old and new cases within a population at a specific time. Incidence only deals with new cases. Prevalence rate measure is used by public health workers to determine the likelihood of one contracting a disease. The number of people who have contracted a disease may be referred to as cases or prevalent cases. Epidemiologists or public health practitioners may want to know the prevalence rate of a disease in a given population. They will count the total number of cases or infected persons with the disease that actually exist in a population divided by the total population. For example, if the number of people infected with HIV is 1500 out of a total population of 10,000 people, the prevalence rate is 1500/10000 x 10n which equals 0.15 (15000 per every 100,000 people). Point prevalence (PP) refers to the prevalence of a disease measured at a specific time. It is the proportion of persons with a particular disease attribute on a specific date or point in time (Sullivan, 2018). Calculating prevalence: Prevalence of a disease is determined by using the following formula: Prevalence = all new and old cases in a specific time / population during the specific period x 10n
Attribute prevalence (AP) is determined by using this formula:
AP = persons infected with an attribute during a specific time / population during that specific period x 10n
Note: (10n = 1 or 100; 1,000; 100,000)
Point prevalence (PP) is determined by using the following formula: Number of persons with disease / number of persons examined at baseline
Let’s solve a problem involving prevalence of cardiovascular disease (CVD) using data from Table 3-1 on page 24 in your textbook as an example.
Incidence Incidence refers to the frequency or extent of occurrence. For example, there is a high incidence of HIV among drug users. Public health researchers may decide to follow a certain population for a period of time. In so doing, they may be able to calculate the incidence of a disease. Incidence is also considered how likely someone is to develop a disease over time (Sullivan, 2018). In other words, disease incidence is considered the number of new infections or cases at a given period of time. Incidence rate (IR) is the number of new cases of a disease divided by the number of persons at risk for the disease (Sullivan, 2018). For example, if in a year period, 10 men are diagnosed with HIV, out of a total male study population of 400 with no infection at the beginning of the period, then the incidence of HIV in this population was 0.025 (or 2,500 per 100,000 men-years of study). The incident rate is usually multiplied by
Details: Total men examined = 1792 Total affected = 244 Total Women examined = 2007 Total affected = 135
Let’s find the prevalence. We will express our answers in percentages Prevalence of CVD = (244 + 135 = 379) / 3799 = 0.0998 = 9.98%
Prevalence of CVD in Men = 244 / 1792 = 0.1362 = 13.62%
Prevalence of CVD in Women = 135 / 2007 = 0.0673 = 6.73%
PUH 5302, Applied Biostatistics 3
UNIT x STUDY GUIDE
Title
some multiples of 10 (for example, 10; 100; 1000; 10,000). The IR is reported as a rate in relation to a specific time interval. Taking our example, the IR of HIV is reported as 2.5 per 100 person-years. Let’s see how we got this answer. The incident rate is determined in the following way:
????????? ???? = ?????? ?? ??????? ?ℎ? ??????? ??????? ?????? ? ????????? ??????
??? ?? ?ℎ? ?????ℎ? ?? ???? ?????? ?ℎ??ℎ ??????? ??? ??????? − ????
From the example above, the incident rate = 10 / 400 = 0.025 = 0.025 x 100 = 2.5 per 100 person-years Incidence proportion (IP) is the proportion of an initially disease-free population who develops disease, becomes injured, or dies during a specified period of time. Some experts use incidence synonymous with rate, risk, probability of getting disease, and cumulative incidence. They have used proportions to measure incidence for comparison purposes. IP is given using the following formula: IP = number of cases of the disease during a specific time / size of the population at the beginning of that period When the incidence rate is calculated from the onset to the outset of the study, the sum of the result is referred to as cumulative incidence (Sullivan, 2018). Cumulative incidence is calculated in the following way:
?????????? ????????? = ?????? ?? ??????? ?ℎ? ??????? ??????? ?????? ? ????????? ??????
?????? ?? ??????? ?? ???? ?? ????????
Measuring Differences Between Groups Public health experts often report differences between groups in order to compare disease trends among different groups or demographics. They do so by using simple differences and ratios. A ratio is a quantitative relation between two amounts or quantities that show the number of times one value is contained within the other. For example, 1 is to 2 or 1:2 or ½. Differences are measured using the following: Risk difference (excess risk) is the difference in point prevalence, cumulative incidence, or incidence rates among the groups (Sullivan, 2018). It shows the absolute effect of the exposure to the disease condition. Population attributable risk (PAR) is the relationship between a risk factor and the likelihood of a disease (Sullivan, 2018). The PAR is calculated by using the formula below.
??? = ????????????????? − ???????????????????
?????????????????
Let’s solve some problems using these formulas using information from Table 3-2 in your textbook. Using the data above, we will calculate risk difference (RD) of CVD between smokers and non-smokers and population attributable risk.
1. RD = Exposed (81 / 744 = 0.1089) – Unexposed (298 / 3055 = 0.0975) = 0.0114 (1.14) This shows that the risk or prevalence is 0.0114 higher in smokers compared to non-smokers
PUH 5302, Applied Biostatistics 4
UNIT x STUDY GUIDE
Title
2.
??? = ????????????????? − ????????????????????
?????????????????
= (0.0998 – 0.0975) / 0.0998 = 0.023 = 2.3% This means 2.3% of the cases of CVD were due to smoking or smoke-related exposure.
Relative risk (RR) is another ratio used to compare prevalence between groups. RR is calculated using the following formula:
RR = PP exposed / PP unexposed Odds ratio is another ratio used to measure RR of disease conditions that are rare, and it considers prevalence in cases that are less than 10 percent. Odds ratio is computed to measure relative risk under certain study designs like case-control where relative risk computation is not possible.
???? ????? =
????????????????? (1 − ????????????????? )
⁄
??????????????????? (1 − ??????????????????? )
⁄
Summarizing Data Collected in a Sample Statisticians collect information or data from a sample on a phenomenon under study. Data are facts, an observation, information organized for analysis or to be used for the basis to make a decision, or numerical information. Data are collected in different ways using various scales that are discussed below. Nominal scales are used when you want to place data into categories without giving data structure or order (e.g., yes or no; male and female; color of hair—black, brown, white, gray, other). Nominal scales do not imply any ordering among responses (M&E Studies, n.d.). Ordinal scales are used when you want to rank variables. For example, patients are asked to rank their pain level from 0–10, with 10 being the worst. These pain levels may vary from patient to patient. There is no defined relative positional order. It is a gross order subjective to the patient’s feeling.
A researcher may also choose to measure patient satisfaction with services delivered during inpatient admission. The researcher may use an ordinal scale such as to specify their feelings as very dissatisfied, somewhat dissatisfied, somewhat satisfied, or very satisfied to rank patient satisfaction. Interval scales are used widely in statistics. A typical example is the Likert scale. In a Likert scale, you may be asked to measure or rate your level of satisfaction on a 5-point scale from strongly satisfied, satisfied, neutral, dissatisfied, or strongly dissatisfied.
Example of an ordinal scale (Weis, 2015)
PUH 5302, Applied Biostatistics 5
UNIT x STUDY GUIDE
Title
Summarizing data collected can be done in various ways. The data/variables must be organized in order to be meaningful for descriptive or inferential analysis. There are several types of variables.
Dichotomous variables are usually nominal variables with two levels (e.g., male and female, yes and no).
Ordinal variables, like nominal variables, may also have two categories, except ordinal variables can be ranked or ordered. For example, very satisfied, satisfied, dissatisfied, very dissatisfied.
Continuous variables are numeric variables with observations taking any value between a certain set
of real numbers. Examples include height, time, age, and temperature (Sullivan, 2018). Data are collected from a sample within a population. A population simply refers to an entire group having common observable characteristics: for example, a population living in a specific geographical location. Often, when the population is too large, it is impossible to collect data from everyone. Therefore, we select a sample from that population. The results obtained from the study are generalized to the entire population. Data Interpretation Data are summarized or interpreted in various ways using charts and statistics. Some of the most familiar methods are displayed below.
Example of a Likert scale (Smith, 2011)
PUH 5302, Applied Biostatistics 6
UNIT x STUDY GUIDE
Title
Statistics Charts
Frequency: How often an event occurs
Cumulative frequency: The running total of
the frequencies
Relative frequency: The proportion of all given values
Mean: The average
Mode: The most frequently occurring event
Median: The middle variable, the mid-point
Variance: The difference
Standard deviation: The spread of numbers from the normal (Boeree, n.d.)
Percentile: One of a hundred equal parts in a group
Histograms for ordinal variables
Pie charts
Bar charts for categorical variables
Box-whisker plots for continuous variables
In summary, public health practitioners and other researchers have used statistical theories and principles to analyze, interpret, and report research data for several decades. Public health professionals are required to report findings relating to diseases and other population health issues. The use of statistical methods has played a major role in achieving this milestone.
References Boeree, C. G. (n.d.). Descriptive statistics. Retrieved from http://webspace.ship.edu/cgboer/descstats.html Koch, G. (2015). Basic allied health statistics and analysis (4th ed.). Stamford, CT: Cengage Learning. M&E Studies. (n.d.). Types of measurement scales. Retrieved from
http://www.mnestudies.com/research/types-measurement-scales Smith, N. (2011). Example Likert scale [Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Example_Likert_Scale.jpg Sullivan, L. M. (2018). Essentials of biostatistics in public health (3rd ed.). Burlington, MA: Jones & Bartlett
Learning. Weis, R. (2015). Children’s pain scale [Image]. Retrieved from
https://commons.wikimedia.org/wiki/File:Children%27s_pain_scale.JPG
Learning Activities (Nongraded) Nongraded Learning Activities are provided to aid students in their course of study. You do not have to submit them. If you have questions, contact your instructor for further guidance and information. Complete the following Chapter 3 practice problems: 2, 4, 7, and 10 on pages 31–33 of your textbook. Also, compete the following Chapter 4 practice problems: 12–19 on page 65 in your textbook. Be sure to show all of your work.