Research Report
Student Name
University Name
Course Name
Instructor Name
Date
Table of Contents
Results. 7
4.3.2 Descriptive statistics for the response variables. 9
Chi-square test for independence. 13
Conclusion and Recommendation. 15
Executive Summary
In recent years, cancer has become one of the biggest causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Therefore, the entire community must have cancer awareness and embrace the necessary measures into their lifestyle to ensure early cancer diagnosis and, most importantly, prevent cancer. This research study aims to investigate cancer awareness among the USA population and establish factors that affect cancer awareness among a community.
The research adopted a descriptive study, and the data collected between February-March 2019. USA residence was subjective to the survey questionnaires, and enough time allocated for the participants to respond to the questions. The respondents were distributed in all community classes ranging from students, employees, and non-employed residents. The benefit of distributing the study population broadly is to capture every member of the community’s views. Frequency tables, bar charts, chi-square tests, and logistic regression were applied in the data analysis section to synthesize the data and derive meaningful information from the data.
A sample size of 3500 was obtained to be used in the data analysis. Coding and recoding of the variables also took place where inapplicable cases were coded appropriately. In most cases, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables. The study showed a significant association between occupation status, level of education, Age of respondents, and information source. These factors proved to be significant enough, with a p-value score of less than 0.05 to predict cancer awareness among the participants.
Introduction
Despite immense growth and advancement in health and medicine, cancer and related disease are still a crucial problem all over the words in mortality rate and morbidity perspectives. Studies have shown that cancer cases grow by about 2.4% (Kav et al., (2013). Cancer is a deadly disease affecting many people in the community in different countries. Additionally, cancer has a very high incidence rate, which makes it second in terms of mortality in the world after related heart diseases (Peter and Bernard, 2008). The reported cases of cancer in the USA vary in terms of Age and gender (Yilmaz et al., 2011). This suggests that the factors age and gender are a very important factor when it comes to an understanding of cancer-related problems
Statement of the problem
Recently cancer has become one of the most causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Accessing how cancer awareness varies among community members with respect to different life factors, a comprehensive analysis can shed light on the concept of cancer awareness and help improve the cancer awareness level among communities that would, in turn, reduce cancer cases (Kumar, 2019).
Research Objectives
- Fit a logistic regression model to access factors associated with cancer awareness and identify the most crucial factors
- Perform a comprehensive review of cancer awareness
- Investigate whether Age, gender, level of education, cancer beliefs affects cancer awareness among USA residence
Hypothesis
H0: There is no association between age and cancer awareness
H1: There exists a statistically significant association between Age and cancer awareness
H0: There is no association between gender and cancer awareness
H1: There exists a statistically significant association between gender and cancer awareness
Methodology
Introduction
Data analysis is the epitome of an evidence-based research study. This section employs all the statistical and data analysis possible in wrangling the data to extract useful information from the data that can shed more light on the topic under study. Regardless of this importance, data analysis depends on several factors in the entire research study. Elements such as the study’s objectives, the data available, research questions, results presentation, and the research study’s general purpose. These factors combine to determine appropriate data analysis and statistical techniques for the data analysis process. Statistically, the process of data analysis is not a liner such that the process does not start at a point and end at another; instead, it involves jumping from section to section, altering variables, assessing the effects to identify the feasible solution of the models, and the tests (Pallant, 2020).
Data Description
Data description involves an in-depth understanding of the data and the data features. Data features are the variables constituting the data. A data set can be either continuous variable, categorical variable, nominal variable, or string variables (Ott, & Longnecker, 2015). The dataset contains all these variables, and thus it is diverse in composition. The size of the data commonly referred to as the sample size, describes the total number of rows present in the data. The sample size represents the total number of respondents who took part in the survey. A sample size of 3500 was obtained to be used in the data analysis. Further, data analysis involves variable exploration to identify the essential variables in the study and disregard the less useful variables. Generally, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables.
Descriptive Statistics
Descriptive statistics involves the computation of population parameters that describe the data at a more broad perspective than mare visualization using the human eyes. Means, medians, variances, correlations, a measure of skewness, graphs, charts, and tables are some of the techniques used in descriptive statistics. Additionally, hypothesis formulation and generation are grounded on the descriptive analysis.
Results
Table 1.0 below represents a frequency table of the dependent variable. The dependent variable distribution can also be shown. The category’s distribution is relatively equal, with a small difference in the categories size. 58.5% of the respondents indicated that they had heard HPV. In other words, 58.5% of the respondents are aware of HPV; on the other hand, 40.1% indicated that they had never heard of HPV. 1.4% of the responses were missing.
Table 1.0 Descriptive statistics for Dependent variable
| L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes. | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 50 | 1.4 | 1.4 | 1.4 |
| Yes | 2050 | 58.5 | 58.5 | 59.9 | |
| No | 1404 | 40.1 | 40.1 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Fig 1.0: A bar graph of the dependent variable
Fig 1.1: A bar graph of the dependent variable
4.3.2 Descriptive statistics for the response variables.
Fig 1.2: A bar graph of INCOMERANGES
Table 1.1: Descriptive statistics for Occupational status
| O2. What is your current occupational status? | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 43 | 1.2 | 1.2 | 1.2 |
| Multiple responses selected in error | 66 | 1.9 | 1.9 | 3.1 | |
| Employed | 1696 | 48.4 | 48.4 | 51.5 | |
| Unemployed | 115 | 3.3 | 3.3 | 54.8 | |
| Homemaker | 161 | 4.6 | 4.6 | 59.4 | |
| Student | 55 | 1.6 | 1.6 | 61.0 | |
| Retired | 1113 | 31.8 | 31.8 | 92.7 | |
| Disabled | 233 | 6.6 | 6.6 | 99.4 | |
| Other – Specify | 22 | .6 | .6 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.2: Descriptive statistics for Education Level
| EDUCB. What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing Data (Not Ascertained) | 51 | 1.5 | 1.5 | 1.5 |
| Less than High School | 275 | 7.8 | 7.8 | 9.3 | |
| High School Graduate | 631 | 18.0 | 18.0 | 27.3 | |
| Some College | 1039 | 29.7 | 29.7 | 57.0 | |
| Bachelor’s Degree | 910 | 26.0 | 26.0 | 82.9 | |
| Post-Baccalaureate Degree | 598 | 17.1 | 17.1 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.3 Descriptive statistics for the source of information
| A2. The most recent time you looked for health or medical topics, where did you go first? | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 13 | .4 | .4 | .4 |
| Missing data (Filter Missing) | 10 | .3 | .3 | .7 | |
| Multiple responses selected in error | 385 | 11.0 | 11.0 | 11.6 | |
| Question answered in error (Commission Error) | 64 | 1.8 | 1.8 | 13.5 | |
| Inapplicable, coded 2 in SeekHealthInfo | 644 | 18.4 | 18.4 | 31.8 | |
| Books | 88 | 2.5 | 2.5 | 34.4 | |
| Brochures and pamphlets | 87 | 2.5 | 2.5 | 36.8 | |
| Cancer organization | 11 | .3 | .3 | 37.2 | |
| Family | 64 | 1.8 | 1.8 | 39.0 | |
| Friend/Co-worker | 25 | .7 | .7 | 39.7 | |
| Doctor or health care provider | 390 | 11.1 | 11.1 | 50.8 | |
| Internet | 1664 | 47.5 | 47.5 | 98.3 | |
| Library | 9 | .3 | .3 | 98.6 | |
| Magazines | 18 | .5 | .5 | 99.1 | |
| Newspapers | 6 | .2 | .2 | 99.3 | |
| Telephone information number | 20 | .6 | .6 | 99.8 | |
| Complementary, alternative, or unconventional practitioner | 6 | .2 | .2 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.5: Descriptive statistics for Age
| Descriptive Statistics | |||||
| N | Minimum | Maximum | Mean | Std. Deviation | |
| O1. What is your Age? | 3417 | 18 | 97 | 57.02 | 16.729 |
| Valid N (listwise) | 3417 | ||||
Fig 1.3: Histogram of variable Age
Chi-square test for independence
The person chi-square test, commonly known as the chi-square test for independence, is used to investigate the existence of difference in two categorical variables. The chi-square tests the hypothesis that:
H0: There exist independent among the categorical variables (no association)
H1: There is no independence among the categorical variables. (Association exists)
Several assumptions must be made to ensure the validity of the chi-square test of independence. This includes the variables that must be either categorical or nominal; the variables should have two or more categorical independent groups. To investigate the association between knowledge of HPV and education level, a chi-square test of independence was fitted. The resulting likelihood ratio test score was 370.572, with a corresponding p-value of 0.000. this implies that we reject the null hypothesis of no association. This translates to a statistically significant association between knowledge of HPV and education level.
Table 1.6: Chi-square test for independence for the dependent variable and education level
| Chi-Square Tests | |||
| Value | df | Asymptotic Significance (2-sided) | |
| Pearson Chi-Square | 495.376a | 10 | .000 |
| Likelihood Ratio | 370.572 | 10 | .000 |
| Linear-by-Linear Association | 41.334 | 1 | .000 |
| N of Valid Cases | 3504 | ||
| a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73. | |||
Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES
| Chi-Square Tests | |||
| Value | df | Asymptotic Significance (2-sided) | |
| Pearson Chi-Square | 249.089a | 18 | .000 |
| Likelihood Ratio | 237.943 | 18 | .000 |
| Linear-by-Linear Association | 20.597 | 1 | .000 |
| N of Valid Cases | 3504 | ||
| a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53. | |||
Limitation of the study
Missing data and inapplicable data. Some of the data collected for the was inapplicable due to wrongly filled cases, poor handwriting, and misplaced data points. Such data cannot be used in the analysis process since it does not provide any useful information to the study. Additionally, they set the basis for errors in the study and significantly reduces the sample size.
A few variables that were considered while accessing cancer awareness and adding more variables in the study would be more helpful, and the models would be more adverse. Several variables can measure cancer awareness, and thus having heard HPV is no the only factor question that can determine cancer awareness. Limited resources and time, the process of data collection is a complex process that is resource hungry and time-consuming. Time was spent designing the questionnaire and deciding the type of question o be included in the questionnaire.
Conclusion and Recommendation
The Internet as the source of information concerning health and medical topics dominated among the respondents accounting for about 47.5%. This can be associated with rapidly growing technology worldwide. Sources of information about health exist in several forms, and medium, among these forms presented to respondents, were Internet, health care, library, magazines, etc. health center was the second frequent source of information among the USA respondents. Since the majority of the population is focusing on the Internet and health care facility for the source of information concerning health, the ministry of health should focus mainly on the two areas while passing information concerning health to capture as many people as possible.
Social beliefs about cancer are viewed as a confounding factor affecting the association between cancer awareness and the explanatory variables in the research. 23.7% of the respondents are somewhat worried about getting cancer, 17.9% are not worried about getting, while 5.3% are extremely worried. This indicated that a larger number of the USA population are worried about contracting cancer. This may be due to having experience with the disease, getting information about cancer and its effects from several sources. Additionally, the majority of the respondents are moderate about getting cancer in the future, accounting for about 38.0%
Statistically, a significant difference was established between cancer awareness and education level complete with respective p-values score of <0.05 (Bickel & Lehmann, 2012). The respondents who had higher education experience presented by the level of education completed indicated that they had heard cancer, implying they were aware of cancer. A majority of the respondents who had completed 11years and below indicated that they had not heard about cancer. The chi-square test associated HPV awareness with a high level of education completed. Interestingly, upon conducting a chi-square test on the HPV awareness and the occupational status, a higher number of USA population that is employed are aware of HPV. Nevertheless, HPV awareness is lower among students and the unemployed population.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.
Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.
Kumar, R. (2019). Research methodology: A step-by-step guide for beginners. Sage Publications Limited.
Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.
Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica: Biochemia Medica, 24(1), 12-18.
Allison, P. D. (2012). Logistic regression using SAS: Theory and application. SAS Institute.
Sharpe, D. (2015). Chi-Square Test is Statistically Significant: Now What?. Practical Assessment, Research, and Evaluation, 20(1), 8.
Vaske, J. J. (2019). Survey research and analysis. Sagamore-Venture. 1807 North Federal Drive, Urbana, IL 61801.
Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.
Appendix
Q1. Have you ever heard of HPV?
Yes No
Q2. Are you male or female?
- Male Female
Q3. What is your Age? Age
- Years old
Q4. The most recent time you looked for information about health or medical topics, where did you go first? WhereSeekHealthInfo
Mark, only one.
1 Books 2 Brochures, pamphlets, etc. 3 Cancer organization
4 Family 5 Friend/Co-worker
6 Doctor or health care provider 7 Internet 8 Library
9 Magazines 10 Newspapers
11 Telephone information number 12 Complementary, alternative, or unconventional practitioner
Q5. The most recent time you looked for information about health or medical topics, who was it for? WhoLookingFor
Mark, only one.
- Myself Someone else Both myself and someone else
Q6. How worried are you about getting cancer again? FreqWorryCancerAgain
- Not at all Slightly Somewhat
- Moderately Extremely
Research Report
Student Name
University Name
Course Name
Instructor Name
Date
Table of Contents
Results. 7
4.3.2 Descriptive statistics for the response variables. 9
Chi-square test for independence. 13
Conclusion and Recommendation. 15
Executive Summary
In recent years, cancer has become one of the biggest causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Therefore, the entire community must have cancer awareness and embrace the necessary measures into their lifestyle to ensure early cancer diagnosis and, most importantly, prevent cancer. This research study aims to investigate cancer awareness among the USA population and establish factors that affect cancer awareness among a community.
The research adopted a descriptive study, and the data collected between February-March 2019. USA residence was subjective to the survey questionnaires, and enough time allocated for the participants to respond to the questions. The respondents were distributed in all community classes ranging from students, employees, and non-employed residents. The benefit of distributing the study population broadly is to capture every member of the community’s views. Frequency tables, bar charts, chi-square tests, and logistic regression were applied in the data analysis section to synthesize the data and derive meaningful information from the data.
A sample size of 3500 was obtained to be used in the data analysis. Coding and recoding of the variables also took place where inapplicable cases were coded appropriately. In most cases, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables. The study showed a significant association between occupation status, level of education, Age of respondents, and information source. These factors proved to be significant enough, with a p-value score of less than 0.05 to predict cancer awareness among the participants.
Introduction
Despite immense growth and advancement in health and medicine, cancer and related disease are still a crucial problem all over the words in mortality rate and morbidity perspectives. Studies have shown that cancer cases grow by about 2.4% (Kav et al., (2013). Cancer is a deadly disease affecting many people in the community in different countries. Additionally, cancer has a very high incidence rate, which makes it second in terms of mortality in the world after related heart diseases (Peter and Bernard, 2008). The reported cases of cancer in the USA vary in terms of Age and gender (Yilmaz et al., 2011). This suggests that the factors age and gender are a very important factor when it comes to an understanding of cancer-related problems
Statement of the problem
Recently cancer has become one of the most causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Accessing how cancer awareness varies among community members with respect to different life factors, a comprehensive analysis can shed light on the concept of cancer awareness and help improve the cancer awareness level among communities that would, in turn, reduce cancer cases (Kumar, 2019).
Research Objectives
- Fit a logistic regression model to access factors associated with cancer awareness and identify the most crucial factors
- Perform a comprehensive review of cancer awareness
- Investigate whether Age, gender, level of education, cancer beliefs affects cancer awareness among USA residence
Hypothesis
H0: There is no association between age and cancer awareness
H1: There exists a statistically significant association between Age and cancer awareness
H0: There is no association between gender and cancer awareness
H1: There exists a statistically significant association between gender and cancer awareness
Methodology
Introduction
Data analysis is the epitome of an evidence-based research study. This section employs all the statistical and data analysis possible in wrangling the data to extract useful information from the data that can shed more light on the topic under study. Regardless of this importance, data analysis depends on several factors in the entire research study. Elements such as the study’s objectives, the data available, research questions, results presentation, and the research study’s general purpose. These factors combine to determine appropriate data analysis and statistical techniques for the data analysis process. Statistically, the process of data analysis is not a liner such that the process does not start at a point and end at another; instead, it involves jumping from section to section, altering variables, assessing the effects to identify the feasible solution of the models, and the tests (Pallant, 2020).
Data Description
Data description involves an in-depth understanding of the data and the data features. Data features are the variables constituting the data. A data set can be either continuous variable, categorical variable, nominal variable, or string variables (Ott, & Longnecker, 2015). The dataset contains all these variables, and thus it is diverse in composition. The size of the data commonly referred to as the sample size, describes the total number of rows present in the data. The sample size represents the total number of respondents who took part in the survey. A sample size of 3500 was obtained to be used in the data analysis. Further, data analysis involves variable exploration to identify the essential variables in the study and disregard the less useful variables. Generally, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables.
Descriptive Statistics
Descriptive statistics involves the computation of population parameters that describe the data at a more broad perspective than mare visualization using the human eyes. Means, medians, variances, correlations, a measure of skewness, graphs, charts, and tables are some of the techniques used in descriptive statistics. Additionally, hypothesis formulation and generation are grounded on the descriptive analysis.
Results
Table 1.0 below represents a frequency table of the dependent variable. The dependent variable distribution can also be shown. The category’s distribution is relatively equal, with a small difference in the categories size. 58.5% of the respondents indicated that they had heard HPV. In other words, 58.5% of the respondents are aware of HPV; on the other hand, 40.1% indicated that they had never heard of HPV. 1.4% of the responses were missing.
Table 1.0 Descriptive statistics for Dependent variable
| L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes. | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 50 | 1.4 | 1.4 | 1.4 |
| Yes | 2050 | 58.5 | 58.5 | 59.9 | |
| No | 1404 | 40.1 | 40.1 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Fig 1.0: A bar graph of the dependent variable
Fig 1.1: A bar graph of the dependent variable
4.3.2 Descriptive statistics for the response variables.
Fig 1.2: A bar graph of INCOMERANGES
Table 1.1: Descriptive statistics for Occupational status
| O2. What is your current occupational status? | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 43 | 1.2 | 1.2 | 1.2 |
| Multiple responses selected in error | 66 | 1.9 | 1.9 | 3.1 | |
| Employed | 1696 | 48.4 | 48.4 | 51.5 | |
| Unemployed | 115 | 3.3 | 3.3 | 54.8 | |
| Homemaker | 161 | 4.6 | 4.6 | 59.4 | |
| Student | 55 | 1.6 | 1.6 | 61.0 | |
| Retired | 1113 | 31.8 | 31.8 | 92.7 | |
| Disabled | 233 | 6.6 | 6.6 | 99.4 | |
| Other – Specify | 22 | .6 | .6 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.2: Descriptive statistics for Education Level
| EDUCB. What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing Data (Not Ascertained) | 51 | 1.5 | 1.5 | 1.5 |
| Less than High School | 275 | 7.8 | 7.8 | 9.3 | |
| High School Graduate | 631 | 18.0 | 18.0 | 27.3 | |
| Some College | 1039 | 29.7 | 29.7 | 57.0 | |
| Bachelor’s Degree | 910 | 26.0 | 26.0 | 82.9 | |
| Post-Baccalaureate Degree | 598 | 17.1 | 17.1 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.3 Descriptive statistics for the source of information
| A2. The most recent time you looked for health or medical topics, where did you go first? | |||||
| Frequency | Percent | Valid Percent | Cumulative Percent | ||
| Valid | Missing data (Not Ascertained) | 13 | .4 | .4 | .4 |
| Missing data (Filter Missing) | 10 | .3 | .3 | .7 | |
| Multiple responses selected in error | 385 | 11.0 | 11.0 | 11.6 | |
| Question answered in error (Commission Error) | 64 | 1.8 | 1.8 | 13.5 | |
| Inapplicable, coded 2 in SeekHealthInfo | 644 | 18.4 | 18.4 | 31.8 | |
| Books | 88 | 2.5 | 2.5 | 34.4 | |
| Brochures and pamphlets | 87 | 2.5 | 2.5 | 36.8 | |
| Cancer organization | 11 | .3 | .3 | 37.2 | |
| Family | 64 | 1.8 | 1.8 | 39.0 | |
| Friend/Co-worker | 25 | .7 | .7 | 39.7 | |
| Doctor or health care provider | 390 | 11.1 | 11.1 | 50.8 | |
| Internet | 1664 | 47.5 | 47.5 | 98.3 | |
| Library | 9 | .3 | .3 | 98.6 | |
| Magazines | 18 | .5 | .5 | 99.1 | |
| Newspapers | 6 | .2 | .2 | 99.3 | |
| Telephone information number | 20 | .6 | .6 | 99.8 | |
| Complementary, alternative, or unconventional practitioner | 6 | .2 | .2 | 100.0 | |
| Total | 3504 | 100.0 | 100.0 | ||
Table 1.5: Descriptive statistics for Age
| Descriptive Statistics | |||||
| N | Minimum | Maximum | Mean | Std. Deviation | |
| O1. What is your Age? | 3417 | 18 | 97 | 57.02 | 16.729 |
| Valid N (listwise) | 3417 | ||||
Fig 1.3: Histogram of variable Age
Chi-square test for independence
The person chi-square test, commonly known as the chi-square test for independence, is used to investigate the existence of difference in two categorical variables. The chi-square tests the hypothesis that:
H0: There exist independent among the categorical variables (no association)
H1: There is no independence among the categorical variables. (Association exists)
Several assumptions must be made to ensure the validity of the chi-square test of independence. This includes the variables that must be either categorical or nominal; the variables should have two or more categorical independent groups. To investigate the association between knowledge of HPV and education level, a chi-square test of independence was fitted. The resulting likelihood ratio test score was 370.572, with a corresponding p-value of 0.000. this implies that we reject the null hypothesis of no association. This translates to a statistically significant association between knowledge of HPV and education level.
Table 1.6: Chi-square test for independence for the dependent variable and education level
| Chi-Square Tests | |||
| Value | df | Asymptotic Significance (2-sided) | |
| Pearson Chi-Square | 495.376a | 10 | .000 |
| Likelihood Ratio | 370.572 | 10 | .000 |
| Linear-by-Linear Association | 41.334 | 1 | .000 |
| N of Valid Cases | 3504 | ||
| a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73. | |||
Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES
| Chi-Square Tests | |||
| Value | df | Asymptotic Significance (2-sided) | |
| Pearson Chi-Square | 249.089a | 18 | .000 |
| Likelihood Ratio | 237.943 | 18 | .000 |
| Linear-by-Linear Association | 20.597 | 1 | .000 |
| N of Valid Cases | 3504 | ||
| a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53. | |||
Limitation of the study
Missing data and inapplicable data. Some of the data collected for the was inapplicable due to wrongly filled cases, poor handwriting, and misplaced data points. Such data cannot be used in the analysis process since it does not provide any useful information to the study. Additionally, they set the basis for errors in the study and significantly reduces the sample size.
A few variables that were considered while accessing cancer awareness and adding more variables in the study would be more helpful, and the models would be more adverse. Several variables can measure cancer awareness, and thus having heard HPV is no the only factor question that can determine cancer awareness. Limited resources and time, the process of data collection is a complex process that is resource hungry and time-consuming. Time was spent designing the questionnaire and deciding the type of question o be included in the questionnaire.
Conclusion and Recommendation
The Internet as the source of information concerning health and medical topics dominated among the respondents accounting for about 47.5%. This can be associated with rapidly growing technology worldwide. Sources of information about health exist in several forms, and medium, among these forms presented to respondents, were Internet, health care, library, magazines, etc. health center was the second frequent source of information among the USA respondents. Since the majority of the population is focusing on the Internet and health care facility for the source of information concerning health, the ministry of health should focus mainly on the two areas while passing information concerning health to capture as many people as possible.
Social beliefs about cancer are viewed as a confounding factor affecting the association between cancer awareness and the explanatory variables in the research. 23.7% of the respondents are somewhat worried about getting cancer, 17.9% are not worried about getting, while 5.3% are extremely worried. This indicated that a larger number of the USA population are worried about contracting cancer. This may be due to having experience with the disease, getting information about cancer and its effects from several sources. Additionally, the majority of the respondents are moderate about getting cancer in the future, accounting for about 38.0%
Statistically, a significant difference was established between cancer awareness and education level complete with respective p-values score of <0.05 (Bickel & Lehmann, 2012). The respondents who had higher education experience presented by the level of education completed indicated that they had heard cancer, implying they were aware of cancer. A majority of the respondents who had completed 11years and below indicated that they had not heard about cancer. The chi-square test associated HPV awareness with a high level of education completed. Interestingly, upon conducting a chi-square test on the HPV awareness and the occupational status, a higher number of USA population that is employed are aware of HPV. Nevertheless, HPV awareness is lower among students and the unemployed population.
References
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.
Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.
Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.
Kumar, R. (2019). Research methodology: A step-by-step guide for beginners. Sage Publications Limited.
Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.
Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica: Biochemia Medica, 24(1), 12-18.
Allison, P. D. (2012). Logistic regression using SAS: Theory and application. SAS Institute.
Sharpe, D. (2015). Chi-Square Test is Statistically Significant: Now What?. Practical Assessment, Research, and Evaluation, 20(1), 8.
Vaske, J. J. (2019). Survey research and analysis. Sagamore-Venture. 1807 North Federal Drive, Urbana, IL 61801.
Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.
Appendix
Q1. Have you ever heard of HPV?
Yes No
Q2. Are you male or female?
- Male Female
Q3. What is your Age? Age
- Years old
Q4. The most recent time you looked for information about health or medical topics, where did you go first? WhereSeekHealthInfo
Mark, only one.
1 Books 2 Brochures, pamphlets, etc. 3 Cancer organization
4 Family 5 Friend/Co-worker
6 Doctor or health care provider 7 Internet 8 Library
9 Magazines 10 Newspapers
11 Telephone information number 12 Complementary, alternative, or unconventional practitioner
Q5. The most recent time you looked for information about health or medical topics, who was it for? WhoLookingFor
Mark, only one.
- Myself Someone else Both myself and someone else
Q6. How worried are you about getting cancer again? FreqWorryCancerAgain
- Not at all Slightly Somewhat
- Moderately Extremely