In recent years, cancer has become one of the biggest causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Therefore, the entire community must have cancer awareness and embrace the necessary measures into their lifestyle to ensure early cancer diagnosis and, most importantly, prevent cancer. This research study aims to investigate cancer awareness among the USA population and establish factors that affect cancer awareness among a community.

The research adopted a descriptive study, and the data collected between February-March 2019. USA residence was subjective to the survey questionnaires, and enough time allocated for the participants to respond to the questions. The respondents were distributed in all community classes ranging from students, employees, and non-employed residents. The benefit of distributing the study population broadly is to capture every member of the community’s views. Frequency tables, bar charts, chi-square tests, and logistic regression were applied in the data analysis section to synthesize the data and derive meaningful information from the data.

A sample size of 3500 was obtained to be used in the data analysis. Coding and recoding of the variables also took place where inapplicable cases were coded appropriately. In most cases, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables. The study showed a significant association between occupation status, level of education, Age of respondents, and information source. These factors proved to be significant enough, with a p-value score of less than 0.05 to predict cancer awareness among the participants.

Introduction

Despite immense growth and advancement in health and medicine, cancer and related disease are still a crucial problem all over the words in mortality rate and morbidity perspectives. Studies have shown that cancer cases grow by about 2.4% (Kav et al., (2013). Cancer is a deadly disease affecting many people in the community in different countries. Additionally, cancer has a very high incidence rate, which makes it second in terms of mortality in the world after related heart diseases (Peter and Bernard, 2008). The reported cases of cancer in the USA vary in terms of Age and gender (Yilmaz et al., 2011). This suggests that the factors age and gender are a very important factor when it comes to an understanding of cancer-related problems

Statement of the problem

Recently cancer has become one of the most causes of the disease worldwide. This is because cancer awareness in the community is still very low and wanting. The major problem with a low level of cancer awareness is a late diagnosis that leads to treatment starting too late. Accessing how cancer awareness varies among community members with respect to different life factors, a comprehensive analysis can shed light on the concept of cancer awareness and help improve the cancer awareness level among communities that would, in turn, reduce cancer cases (Kumar, 2019).

Research Objectives

Fit a logistic regression model to access factors associated with cancer awareness and identify the most crucial factors
Perform a comprehensive review of cancer awareness
Investigate whether Age, gender, level of education, cancer beliefs affects cancer awareness among USA residence

Hypothesis

H0: There is no association between age and cancer awareness

H1: There exists a statistically significant association between Age and cancer awareness

H0: There is no association between gender and cancer awareness

H1: There exists a statistically significant association between gender and cancer awareness

Methodology

Introduction

Data analysis is the epitome of an evidence-based research study. This section employs all the statistical and data analysis possible in wrangling the data to extract useful information from the data that can shed more light on the topic under study. Regardless of this importance, data analysis depends on several factors in the entire research study. Elements such as the study’s objectives, the data available, research questions, results presentation, and the research study’s general purpose. These factors combine to determine appropriate data analysis and statistical techniques for the data analysis process. Statistically, the process of data analysis is not a liner such that the process does not start at a point and end at another; instead, it involves jumping from section to section, altering variables, assessing the effects to identify the feasible solution of the models, and the tests (Pallant, 2020).

Data Description

Data description involves an in-depth understanding of the data and the data features. Data features are the variables constituting the data. A data set can be either continuous variable, categorical variable, nominal variable, or string variables (Ott, & Longnecker, 2015). The dataset contains all these variables, and thus it is diverse in composition. The size of the data commonly referred to as the sample size, describes the total number of rows present in the data. The sample size represents the total number of respondents who took part in the survey. A sample size of 3500 was obtained to be used in the data analysis. Further, data analysis involves variable exploration to identify the essential variables in the study and disregard the less useful variables. Generally, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the study’s objective are required in choosing the relevant variables.

Descriptive Statistics

Descriptive statistics involves the computation of population parameters that describe the data at a more broad perspective than mare visualization using the human eyes. Means, medians, variances, correlations, a measure of skewness, graphs, charts, and tables are some of the techniques used in descriptive statistics. Additionally, hypothesis formulation and generation are grounded on the descriptive analysis.

Results

Table 1.0 below represents a frequency table of the dependent variable. The dependent variable distribution can also be shown. The category’s distribution is relatively equal, with a small difference in the categories size. 58.5% of the respondents indicated that they had heard HPV. In other words, 58.5% of the respondents are aware of HPV; on the other hand, 40.1% indicated that they had never heard of HPV. 1.4% of the responses were missing.

Table 1.0 Descriptive statistics for Dependent variable

L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes.
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	50	1.4	1.4	1.4
	Yes	2050	58.5	58.5	59.9
	No	1404	40.1	40.1	100.0
	Total	3504	100.0	100.0

Fig 1.0: A bar graph of the dependent variable

Fig 1.1: A bar graph of the dependent variable

4.3.2 Descriptive statistics for the response variables.

Fig 1.2: A bar graph of INCOMERANGES

Table 1.1: Descriptive statistics for Occupational status

O2. What is your current occupational status?
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	43	1.2	1.2	1.2
	Multiple responses selected in error	66	1.9	1.9	3.1
	Employed	1696	48.4	48.4	51.5
	Unemployed	115	3.3	3.3	54.8
	Homemaker	161	4.6	4.6	59.4
	Student	55	1.6	1.6	61.0
	Retired	1113	31.8	31.8	92.7
	Disabled	233	6.6	6.6	99.4
	Other – Specify	22	.6	.6	100.0
	Total	3504	100.0	100.0

Table 1.2: Descriptive statistics for Education Level

EDUCB. What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing Data (Not Ascertained)	51	1.5	1.5	1.5
	Less than High School	275	7.8	7.8	9.3
	High School Graduate	631	18.0	18.0	27.3
	Some College	1039	29.7	29.7	57.0
	Bachelor’s Degree	910	26.0	26.0	82.9
	Post-Baccalaureate Degree	598	17.1	17.1	100.0
	Total	3504	100.0	100.0

Table 1.3 Descriptive statistics for the source of information

A2. The most recent time you looked for health or medical topics, where did you go first?
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	13	.4	.4	.4
	Missing data (Filter Missing)	10	.3	.3	.7
	Multiple responses selected in error	385	11.0	11.0	11.6
	Question answered in error (Commission Error)	64	1.8	1.8	13.5
	Inapplicable, coded 2 in SeekHealthInfo	644	18.4	18.4	31.8
	Books	88	2.5	2.5	34.4
	Brochures and pamphlets	87	2.5	2.5	36.8
	Cancer organization	11	.3	.3	37.2
	Family	64	1.8	1.8	39.0
	Friend/Co-worker	25	.7	.7	39.7
	Doctor or health care provider	390	11.1	11.1	50.8
	Internet	1664	47.5	47.5	98.3
	Library	9	.3	.3	98.6
	Magazines	18	.5	.5	99.1
	Newspapers	6	.2	.2	99.3
	Telephone information number	20	.6	.6	99.8
	Complementary, alternative, or unconventional practitioner	6	.2	.2	100.0
	Total	3504	100.0	100.0

Table 1.5: Descriptive statistics for Age

Descriptive Statistics
	N	Minimum	Maximum	Mean	Std. Deviation
O1. What is your Age?	3417	18	97	57.02	16.729
Valid N (listwise)	3417

Fig 1.3: Histogram of variable Age

Chi-square test for independence

The person chi-square test, commonly known as the chi-square test for independence, is used to investigate the existence of difference in two categorical variables. The chi-square tests the hypothesis that:

H0: There exist independent among the categorical variables (no association)

H1: There is no independence among the categorical variables. (Association exists)

Several assumptions must be made to ensure the validity of the chi-square test of independence. This includes the variables that must be either categorical or nominal; the variables should have two or more categorical independent groups. To investigate the association between knowledge of HPV and education level, a chi-square test of independence was fitted. The resulting likelihood ratio test score was 370.572, with a corresponding p-value of 0.000. this implies that we reject the null hypothesis of no association. This translates to a statistically significant association between knowledge of HPV and education level.

Table 1.6: Chi-square test for independence for the dependent variable and education level

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	495.376^a	10	.000
Likelihood Ratio	370.572	10	.000
Linear-by-Linear Association	41.334	1	.000
N of Valid Cases	3504
a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73.

Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	249.089^a	18	.000
Likelihood Ratio	237.943	18	.000
Linear-by-Linear Association	20.597	1	.000
N of Valid Cases	3504
a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53.

Limitation of the study

Missing data and inapplicable data. Some of the data collected for the was inapplicable due to wrongly filled cases, poor handwriting, and misplaced data points. Such data cannot be used in the analysis process since it does not provide any useful information to the study. Additionally, they set the basis for errors in the study and significantly reduces the sample size.

A few variables that were considered while accessing cancer awareness and adding more variables in the study would be more helpful, and the models would be more adverse. Several variables can measure cancer awareness, and thus having heard HPV is no the only factor question that can determine cancer awareness. Limited resources and time, the process of data collection is a complex process that is resource hungry and time-consuming. Time was spent designing the questionnaire and deciding the type of question o be included in the questionnaire.

Conclusion and Recommendation

The Internet as the source of information concerning health and medical topics dominated among the respondents accounting for about 47.5%. This can be associated with rapidly growing technology worldwide. Sources of information about health exist in several forms, and medium, among these forms presented to respondents, were Internet, health care, library, magazines, etc. health center was the second frequent source of information among the USA respondents. Since the majority of the population is focusing on the Internet and health care facility for the source of information concerning health, the ministry of health should focus mainly on the two areas while passing information concerning health to capture as many people as possible.

Social beliefs about cancer are viewed as a confounding factor affecting the association between cancer awareness and the explanatory variables in the research. 23.7% of the respondents are somewhat worried about getting cancer, 17.9% are not worried about getting, while 5.3% are extremely worried. This indicated that a larger number of the USA population are worried about contracting cancer. This may be due to having experience with the disease, getting information about cancer and its effects from several sources. Additionally, the majority of the respondents are moderate about getting cancer in the future, accounting for about 38.0%

Statistically, a significant difference was established between cancer awareness and education level complete with respective p-values score of <0.05 (Bickel & Lehmann, 2012). The respondents who had higher education experience presented by the level of education completed indicated that they had heard cancer, implying they were aware of cancer. A majority of the respondents who had completed 11years and below indicated that they had not heard about cancer. The chi-square test associated HPV awareness with a high level of education completed. Interestingly, upon conducting a chi-square test on the HPV awareness and the occupational status, a higher number of USA population that is employed are aware of HPV. Nevertheless, HPV awareness is lower among students and the unemployed population.

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.

Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.

Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.

Kumar, R. (2019). Research methodology: A step-by-step guide for beginners. Sage Publications Limited.

Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.

Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica: Biochemia Medica, 24(1), 12-18.

Allison, P. D. (2012). Logistic regression using SAS: Theory and application. SAS Institute.

Sharpe, D. (2015). Chi-Square Test is Statistically Significant: Now What?. Practical Assessment, Research, and Evaluation, 20(1), 8.

Vaske, J. J. (2019). Survey research and analysis. Sagamore-Venture. 1807 North Federal Drive, Urbana, IL 61801.

Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.

Appendix

Q1. Have you ever heard of HPV?

Yes No

Q2. Are you male or female?

Male Female

Q3. What is your Age? Age

Years old

Q4. The most recent time you looked for information about health or medical topics, where did you go first? WhereSeekHealthInfo

Mark, only one.

1 Books 2 Brochures, pamphlets, etc. 3 Cancer organization

4 Family 5 Friend/Co-worker

6 Doctor or health care provider 7 Internet 8 Library

9 Magazines 10 Newspapers

11 Telephone information number 12 Complementary, alternative, or unconventional practitioner

Q5. The most recent time you looked for information about health or medical topics, who was it for? WhoLookingFor

Mark, only one.

Myself Someone else Both myself and someone else

Q6. How worried are you about getting cancer again? FreqWorryCancerAgain

Not at all Slightly Somewhat
Moderately Extremely

Research Report

Student Name

University Name

Course Name

Instructor Name

Date

Table of Contents

Executive Summary. 3

Introduction. 5

Statement of problem.. 5

Research Objectives. 5

Methodology. 6

Introduction. 6

Data Description. 6

Descriptive Statistics. 7

Results. 7

4.3.2 Descriptive statistics for the response variables. 9

Chi-square test for independence. 13

Limitation of study. 15

Conclusion and Recommendation. 15

References. 17

Appendix. 17

Executive Summary

Introduction

Statement of the problem

Research Objectives

Fit a logistic regression model to access factors associated with cancer awareness and identify the most crucial factors
Perform a comprehensive review of cancer awareness
Investigate whether Age, gender, level of education, cancer beliefs affects cancer awareness among USA residence

Hypothesis

H0: There is no association between age and cancer awareness

H1: There exists a statistically significant association between Age and cancer awareness

H0: There is no association between gender and cancer awareness

H1: There exists a statistically significant association between gender and cancer awareness

Methodology

Introduction

Data Description

Descriptive Statistics

Results

Table 1.0 Descriptive statistics for Dependent variable

L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes.
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	50	1.4	1.4	1.4
	Yes	2050	58.5	58.5	59.9
	No	1404	40.1	40.1	100.0
	Total	3504	100.0	100.0

Fig 1.0: A bar graph of the dependent variable

Fig 1.1: A bar graph of the dependent variable

4.3.2 Descriptive statistics for the response variables.

Fig 1.2: A bar graph of INCOMERANGES

Table 1.1: Descriptive statistics for Occupational status

O2. What is your current occupational status?
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	43	1.2	1.2	1.2
	Multiple responses selected in error	66	1.9	1.9	3.1
	Employed	1696	48.4	48.4	51.5
	Unemployed	115	3.3	3.3	54.8
	Homemaker	161	4.6	4.6	59.4
	Student	55	1.6	1.6	61.0
	Retired	1113	31.8	31.8	92.7
	Disabled	233	6.6	6.6	99.4
	Other – Specify	22	.6	.6	100.0
	Total	3504	100.0	100.0

Table 1.2: Descriptive statistics for Education Level

EDUCB. What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing Data (Not Ascertained)	51	1.5	1.5	1.5
	Less than High School	275	7.8	7.8	9.3
	High School Graduate	631	18.0	18.0	27.3
	Some College	1039	29.7	29.7	57.0
	Bachelor’s Degree	910	26.0	26.0	82.9
	Post-Baccalaureate Degree	598	17.1	17.1	100.0
	Total	3504	100.0	100.0

Table 1.3 Descriptive statistics for the source of information

A2. The most recent time you looked for health or medical topics, where did you go first?
		Frequency	Percent	Valid Percent	Cumulative Percent
Valid	Missing data (Not Ascertained)	13	.4	.4	.4
	Missing data (Filter Missing)	10	.3	.3	.7
	Multiple responses selected in error	385	11.0	11.0	11.6
	Question answered in error (Commission Error)	64	1.8	1.8	13.5
	Inapplicable, coded 2 in SeekHealthInfo	644	18.4	18.4	31.8
	Books	88	2.5	2.5	34.4
	Brochures and pamphlets	87	2.5	2.5	36.8
	Cancer organization	11	.3	.3	37.2
	Family	64	1.8	1.8	39.0
	Friend/Co-worker	25	.7	.7	39.7
	Doctor or health care provider	390	11.1	11.1	50.8
	Internet	1664	47.5	47.5	98.3
	Library	9	.3	.3	98.6
	Magazines	18	.5	.5	99.1
	Newspapers	6	.2	.2	99.3
	Telephone information number	20	.6	.6	99.8
	Complementary, alternative, or unconventional practitioner	6	.2	.2	100.0
	Total	3504	100.0	100.0

Table 1.5: Descriptive statistics for Age

Descriptive Statistics
	N	Minimum	Maximum	Mean	Std. Deviation
O1. What is your Age?	3417	18	97	57.02	16.729
Valid N (listwise)	3417

Fig 1.3: Histogram of variable Age

Chi-square test for independence

H0: There exist independent among the categorical variables (no association)

H1: There is no independence among the categorical variables. (Association exists)

Table 1.6: Chi-square test for independence for the dependent variable and education level

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	495.376^a	10	.000
Likelihood Ratio	370.572	10	.000
Linear-by-Linear Association	41.334	1	.000
N of Valid Cases	3504
a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73.

Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES

Chi-Square Tests
	Value	df	Asymptotic Significance (2-sided)
Pearson Chi-Square	249.089^a	18	.000
Likelihood Ratio	237.943	18	.000
Linear-by-Linear Association	20.597	1	.000
N of Valid Cases	3504
a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53.