This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

4.0 CHAPTER FOUR: DATA ANALYSIS

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

4.0 CHAPTER FOUR: DATA ANALYSIS

4.1 Introduction

Data analysis is the epitome of an evidence-based research study. This section employs all the statistical and data analysis possible in wrangling the data to extract useful information from the data that can shed more light on the topic under study. Regardless of this importance, data analysis depends on several factors in the entire research study. Elements such as the study’s objectives, the data available, research questions, results presentation, and the general purpose of the research study. These factors combine to determine appropriate data analysis and statistical techniques to be applied in the data analysis process. Statistically, the process of data analysis is not a liner such that the process does not start at a point and end at another; instead, it involves jumping from section to section, altering variables, assessing the effects to identify the feasible solution of the models, and the tests (Pallant, 2020).

Research study data can exist in two forms; primary data and secondary data. Primary data is characterized by first-hand information that has not been published in any media form. Such data can be obtained from surveys, interviews, experiments, observations, etc. this data is considered more resourceful since no alteration has been conducted on the data. Likewise, obtaining firsts hand information is expensive and time-consuming since a lot of resources are spent while conducting an experiment, interviews, and the like. On the other hand, secondary data involves the data available in books, journals, social media, and other information records. Most of the time, this has been synthesized, and the useful information pushes in books and other sources. Obtaining this data is simpler as compared to primary data since no experiment or a survey is needed.

In this research study, primary data from the National Institutes of Health U.S. Department of Health and Human Services will be used for data analysis. The survey aimed at collecting the health information of the U.S residence concerning economic life, social life, and health life. All these three perspectives will form the basis of the data analysis (Ezzy, 2013). The independent variable, explanatory variables, as well as confounding variables, will be extracted from the three life aspect. Generally, all these three aspects of life, social life, economic life, and health life, interact naturally in the environment, creating association and cause effects. In order to investigate these associating and effects, a comprehensive analysis is therefore required to obtain, measure, and quantify and association as well as the outcome.

4.2 Data Description

Data description involves an in-depth understanding of the data and the data features. Data features are the variables constituting the data. Variables in a data set can be either continuous variable, categorical variable, nominal variable, or string variables (Ott, & Longnecker, 2015). The dataset contains all these variables, and thus it is diverse in composition. The size of the data commonly referred to as the sample size, describes the total number of rows present in the data. In the case of a survey, like in this case, the sample size represents the total number of respondents that took part in the survey. A sample size of 3500 was obtained to be used in the data analysis. Further, data analysis involves variable exploration to identify the essential variables in the study and disregard the less useful variables. Generally, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the objective of the study are required in choosing the relevant variables. A total of 449 variables are present in the data and 3504 cases. This translates to 3504 rows and 449 columns; this represents the data dimension.

4.2 Data Transformation

Data cleaning constitutes the most time consuming and tedious process in data analysis. Cleaning the data involves the creating of new data variables, density transformation, coding and recoding the data, labeling the data, catering to the missing data, etc. all these processes should be conducted before the process of data analysis commences. In most cases, a new variable generation that involves calculating total scores in SPSS is desirable, primarily when there exist several variables measuring the same attribute (Pallant, 2020). Population attributes can be measured by one or more parameters, and thus aggregating these parameters would provide a more accurate estimate of the population attribute. The dependent variable in this research study, knowing the HPV virus, has been measure by several variables on the L section labeled HPV awareness. This implies that HPV awareness has several parameters deciding and measuring the awareness of the respondent.

In this, we need to compute the total scores of the awareness variables and computes a categorical variable based on the total scores. The categorical variable will have two categories; HPV aware and HPV unaware. Let us call the variable

The coding on these variables implies a negative coding, and thus, the higher the score, the less knowledgeable the respondent. Since all the variables are coded uniformly, reversal coding is not necessary, but for instance, in the presence of both negative and positive coding, reversal coding is appropriate to maintain a unique variable code uniformly. The explanatory variables, Age, education level, and social-economic level are measure by only one variable each, and thus total scores will not be calculated. The presence of confounding effects in a model cannot be rule out and should always be accounted for. Two variables acting as the confounding attributes will be an investigation in the model’s effects. In other words, how do the effects of these variables on the association of having HPV knowledge and the economic as well as the social factors?

The social beliefs variable will be computes from the social beliefs factors on the survey section N by computing the total scores.

4.3 Descriptive Statistics

Descriptive statistics involves the computation of population parameters that describe the data at a more broad perspective than mare visualization using the human eyes. Means, medians, variances, correlations, a measure of skewness, graphs, charts, and tables are some of the techniques used in descriptive statistics. Additionally, hypothesis formulation and generation are grounded on the descriptive analysis.

4.3.1 Descriptive statistics for the dependent variable

Descriptive statistics assist the researcher in comprehensively understanding the data through the process of data exploration. This generally involves the computation of means, medians, variance, frequency tables, charts, and graphs. Descriptive statistics can be divided into various categories depending on the data type and the nature of the statistics being computed. Categorical data descriptive statistics involve the computation of frequency tables, cross-tabulation tables, charts, and graphs. On the other hand, the continuous data descriptive statistics involve the computation of means, variances, histograms, box plots, etc. All these statistics and charts aid in understanding the data at hand (Agresti, 2018).

Table 1.0 below represents a frequency table of the dependent variable. The dependent variable distribution can also be shown. The category’s distribution is relatively equal, with a small difference in the categories size. 58.5% of the respondents indicated that they had heard HPV. In other words, 58.5% of the respondents are aware of HPV; on the other hand, 40.1% indicated that they had never heard of HPV. 1.4% of the responses were missing.

 

 

Table 1.0 Descriptive statistics for Dependent variable

 

L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes.
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 50 1.4 1.4 1.4
Yes 2050 58.5 58.5 59.9
No 1404 40.1 40.1 100.0
Total 3504 100.0 100.0  

 

 

 

 

Fig 1.0: A bar graph of the dependent variable

 

Fig 1.1: A bar graph of the dependent variable

 

 

       4.3.2 Descriptive statistics for the response variables.

 

Fig 1.2: A bar graph of INCOMERANGES

 

 

Table 1.1: Descriptive statistics for Occupational status

 

O2. What is your current occupational status?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 43 1.2 1.2 1.2
Multiple responses selected in error 66 1.9 1.9 3.1
Employed 1696 48.4 48.4 51.5
Unemployed 115 3.3 3.3 54.8
Homemaker 161 4.6 4.6 59.4
Student 55 1.6 1.6 61.0
Retired 1113 31.8 31.8 92.7
Disabled 233 6.6 6.6 99.4
Other – Specify 22 .6 .6 100.0
Total 3504 100.0 100.0  

 

 

Table 1.2: Descriptive statistics for Education Level

 

EDUCB.  What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing Data (Not Ascertained) 51 1.5 1.5 1.5
Less than High School 275 7.8 7.8 9.3
High School Graduate 631 18.0 18.0 27.3
Some College 1039 29.7 29.7 57.0
Bachelor’s Degree 910 26.0 26.0 82.9
Post-Baccalaureate Degree 598 17.1 17.1 100.0
Total 3504 100.0 100.0  

 

 

 

 

Table 1.3 Descriptive statistics for the source of information

 

A2. The most recent time you looked for information about health or medical topics, where did you go first?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 13 .4 .4 .4
Missing data (Filter Missing) 10 .3 .3 .7
Multiple responses selected in error 385 11.0 11.0 11.6
Question answered in error (Commission Error) 64 1.8 1.8 13.5
Inapplicable, coded 2 in SeekHealthInfo 644 18.4 18.4 31.8
Books 88 2.5 2.5 34.4
Brochures and pamphlets 87 2.5 2.5 36.8
Cancer organization 11 .3 .3 37.2
Family 64 1.8 1.8 39.0
Friend/Co-worker 25 .7 .7 39.7
Doctor or health care provider 390 11.1 11.1 50.8
Internet 1664 47.5 47.5 98.3
Library 9 .3 .3 98.6
Magazines 18 .5 .5 99.1
Newspapers 6 .2 .2 99.3
Telephone information number 20 .6 .6 99.8
Complementary, alternative, or unconventional practitioner 6 .2 .2 100.0
Total 3504 100.0 100.0  

 

 

 

 

Table 1.4: Descriptive statistics for beliefs about cancer

 

N2. How easy is it for you to imagine yourself developing cancer in the future?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 84 2.4 2.4 2.4
Missing data (Filter Missing) 13 .4 .4 2.8
Multiple responses selected in error 1 .0 .0 2.8
Question answered in error (Commission Error) 249 7.1 7.1 9.9
Inapplicable, coded 1 in EverHadCancer 344 9.8 9.8 19.7
Extremely difficult 509 14.5 14.5 34.2
Somewhat difficult 630 18.0 18.0 52.2
Neither difficult nor easy 1157 33.0 33.0 85.2
Somewhat easy 398 11.4 11.4 96.6
Extremely easy 119 3.4 3.4 100.0
Total 3504 100.0 100.0  

 

Fig 1.3 A bar graph of ImagineCancer

 

 

 

 

 

Table 1.4.1: Descriptive statistics for beliefs about cancer

 

N1. How worried are you about getting cancer?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 52 1.5 1.5 1.5
Missing data (Filter Missing) 13 .4 .4 1.9
Multiple responses selected in error 2 .1 .1 1.9
Question answered in error (Commission Error) 254 7.2 7.2 9.2
Inapplicable, coded 1 in EverHadCancer 339 9.7 9.7 18.8
Not at all 627 17.9 17.9 36.7
Slightly 783 22.3 22.3 59.1
Somewhat 830 23.7 23.7 82.8
Moderately 418 11.9 11.9 94.7
Extremely 186 5.3 5.3 100.0
Total 3504 100.0 100.0  

 

Fig 1.4 A bar graph of FreqWorryCancer

 

 

 

 

 

 

Table 1.5: Descriptive statistics for Age

 

Descriptive Statistics
  N Minimum Maximum Mean Std. Deviation
O1. What is your Age? 3417 18 97 57.02 16.729
Valid N (listwise) 3417        

 

Fig 1.3: Histogram of variable Age

 

 

4.4 Chi-square test for independence

The person chi-square test, commonly known as the chi-square test for independence, is used to investigate the existence of difference in two categorical variables. The chi-square tests the hypothesis that:

H0: There exist independent among the categorical variables (no association)

H1: There is no independence among the categorical variables. (Association exists)

Several assumptions must be made to ensure the validity of the chi-square test of independence. This includes the variables that must be either categorical or nominal; the variables should have two or more categorical independent groups. To investigate the association between knowledge of HPV and education level, a chi-square test of independence was fitted. The resulting likelihood ratio test score was 370.572, with a corresponding p-value of 0.000. this implies that we reject the null hypothesis of no association. This translates to a statistically significant association between knowledge of HPV and education level.

 

 

 

 

 

Table 1.6: Chi-square test for independence for the dependent variable and education level

 

Chi-Square Tests
  Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 495.376a 10 .000
Likelihood Ratio 370.572 10 .000
Linear-by-Linear Association 41.334 1 .000
N of Valid Cases 3504    
a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73.

 

 

 

 

 

Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES

 

Chi-Square Tests
  Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 249.089a 18 .000
Likelihood Ratio 237.943 18 .000
Linear-by-Linear Association 20.597 1 .000
N of Valid Cases 3504    
a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53.

 

 

 

4.5 Model Building

 

4.5.1 Logistic Regression

In research projects, model building involves fitting statistical and data analysis models to express the relationship between the variables in the data. The main goal of model building is the prediction and forecasting process, where the dependent variable is predicted depending on the response variables. In this case, a binary logistic regression model was fitted to model the knowledge of HPV depending on Age, Education level, Social-economic factors, Social beliefs about cancer, and the related concepts. The dependent variable in this research project is a categorical variable with the categories, and thus a logistic regression model was the most appropriate (Allison, 2012).

In order to investigate the effects of social-economic variables and social beliefs about cancer, three models were fitted, excluding the confounding factors each at a time and the effects investigated. Confound=ding factors, in this case, are the social-economic factors and social beliefs about HPV.

Table 1.8: Descriptive statistics for the logistic regression model

 

Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 3865.298a .535 .580
a. Estimation terminated at iteration number 20 because maximum iterations have been reached. The final solution cannot be found.

 

Tables 1.8 above represents the model summary of a logistic regression fitted with all the explanatory variables. The model was statistically significant, with a corresponding R square score of 0.580. This indicates that the model was able to account for about 58% of the variance occurring in the HPV awareness, as explained by the explanatory variables. All the variables were statistically significant, with corresponding p values <0.000. significant variables in the model refer to the models that contribute immensely to the overall model composition and cannot be omitted from the model.

Table 1.9: Model significance

 

Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 17.865 8 .022

 

 

 

5.0 CHAPTER FIVE: RESULTS DISCUSSION AND INTERPRETATION

5.1 Introduction

In this chapter, we introduce the data analysis results, discussion, and findings. The entire hypothesis stated in the formulation of the objectives and research questions development will also be investigated and answered. Marjory, we will focus on the results from the logistic regression model, variables association, and the descriptive statistics of the variables. Data analysis showed that there exists a statistically significant association between HPV awareness and Age, level of education, social-economic status, source of information, and personal beliefs. The logistic regression fitted produces an R square value of 58%, indicating that the model could account for bout 58% of the variance occurring in the HPV awareness as indicated by both the mediating and explanatory variables.

Logistic regression presents the overall model significance of the variables. Additionally, an individual variable’s Significance can be accessed through the variables coefficients and their respective significance scores. Inspecting these coefficients, we can be able to identify the most significant variables that predict HPV awareness. Additionally, less significant variables can also be identified and be omitted from the final model since they do not present any useful information. On the other hand, to investigate the association between two dependent variables, the chi-square test of independence is applied. Results from these tests will also be presented.

 

5.2 Results interpretation

Model evaluation involves the investigation of model accuracy, consistency, and validity. Statistical models involving prediction are prone to errors resulting from differences are in the research. Due to this accessing, the model accuracy is required. In logistic regression, the model accuracy and validity are accessed by the R squared scores and the predictor variables. The r square score was above 50%, and all the explanatory variables were statistically significant. This implies that the model was statistically valid and consistent.

Hosmer and Lemeshow Test, table 1.9 in the data analysis section, gives the statistical accuracy of the model. The chi-square significance level of 0.022 was observed. Indicating the model was statistically significant. The validity of the model can also be asses using the coefficients of the explanatory variables. This coefficient gives the effects each variable has on the entire model. In other words, they represent the expected change to the dependent variable when the explanatory variable is changing with one unit. In the case of a categorical explanatory variable, the coefficient represents the change on the dependent variable with respect to the.

5.3 Results discussion

The data from the survey was obtained from a sample of the USA population, and thus the analysis results depict the analogy of HPV awareness in the USA. Results from the data analysis indicate that majority of the USA population is aware of the HPV virus accounting for about 58.5% of the total sample and 40.1% of the sampled population is not aware of the HPV virus. These numbers trigger the assumption that the community is not yet aware of the HPV virus, and this calls for HPV awareness from health ministries and the government. Controlling and combating the HPV virus begins with the public awareness of the virus, how it manifests, some of the control measures, treatment if there is, and the control measures. These measures cannot be achieved if the community is not aware of HPV.

The Internet as the source of information concerning health and medical topics dominated among the respondents accounting for about 47.5%. This can be associated with rapidly growing technology worldwide. Sources of information about health exist in several forms, and medium, among these forms presented to respondents, were Internet, health care, library, magazines, etc. health center was the second frequent source of information among the USA respondents. Since the majority of the population is focusing on the Internet and health care facility for the source of information concerning health, the ministry of health should focus mainly on the two areas while passing information concerning health to capture as many people as possible.

Social beliefs about cancer are viewed as a confounding factor affecting the association between cancer awareness and the explanatory variables in the research. 23.7% of the respondents are somewhat worried about getting cancer, 17.9% are not worried about getting, while 5.3% are extremely worried. This indicated that a larger number of the USA population are worried about contracting cancer. This may be due to having experience with the disease, getting information about cancer and its effects from several sources. Additionally, the majority of the respondents are moderate about getting cancer in the future, accounting for about 38.0%

Statistically, a significant difference was established between cancer awareness and the level of education complete with respective p-values score of <0.05. The respondents who had higher education experience presented by the level of education completed indicated that they had heard cancer, implying they were aware of cancer. A majority of the respondents who had completed 11years and below indicated that they had not heard about cancer. The chi-square test associated HPV awareness with a high level of education completed. Interestingly, upon conducting a chi-square test on the HPV awareness and the occupational status, a higher number of USA population that is employed are aware of HPV. Nevertheless, HPV awareness is lower among students and the unemployed population.

 

 

 

 

 

 

 

 

 

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.

Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.

Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.

Kumar, R. (2019). Research methodology: A step-by-step guide for beginners. Sage Publications Limited.

Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.

Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica: Biochemia Medica24(1), 12-18.

Allison, P. D. (2012). Logistic regression using SAS: Theory and application. SAS Institute.

Sharpe, D. (2015). Chi-Square Test is Statistically Significant: Now What?. Practical Assessment, Research, and Evaluation20(1), 8.

Vaske, J. J. (2019). Survey research and analysis. Sagamore-Venture. 1807 North Federal Drive, Urbana, IL 61801.

Bickel, P. J., & Lehmann, E. L.

4.0 CHAPTER FOUR: DATA ANALYSIS

4.1 Introduction

Data analysis is the epitome of an evidence-based research study. This section employs all the statistical and data analysis possible in wrangling the data to extract useful information from the data that can shed more light on the topic under study. Regardless of this importance, data analysis depends on several factors in the entire research study. Elements such as the study’s objectives, the data available, research questions, results presentation, and the general purpose of the research study. These factors combine to determine appropriate data analysis and statistical techniques to be applied in the data analysis process. Statistically, the process of data analysis is not a liner such that the process does not start at a point and end at another; instead, it involves jumping from section to section, altering variables, assessing the effects to identify the feasible solution of the models, and the tests (Pallant, 2020).

Research study data can exist in two forms; primary data and secondary data. Primary data is characterized by first-hand information that has not been published in any media form. Such data can be obtained from surveys, interviews, experiments, observations, etc. this data is considered more resourceful since no alteration has been conducted on the data. Likewise, obtaining firsts hand information is expensive and time-consuming since a lot of resources are spent while conducting an experiment, interviews, and the like. On the other hand, secondary data involves the data available in books, journals, social media, and other information records. Most of the time, this has been synthesized, and the useful information pushes in books and other sources. Obtaining this data is simpler as compared to primary data since no experiment or a survey is needed.

In this research study, primary data from the National Institutes of Health U.S. Department of Health and Human Services will be used for data analysis. The survey aimed at collecting the health information of the U.S residence concerning economic life, social life, and health life. All these three perspectives will form the basis of the data analysis (Ezzy, 2013). The independent variable, explanatory variables, as well as confounding variables, will be extracted from the three life aspect. Generally, all these three aspects of life, social life, economic life, and health life, interact naturally in the environment, creating association and cause effects. In order to investigate these associating and effects, a comprehensive analysis is therefore required to obtain, measure, and quantify and association as well as the outcome.

4.2 Data Description

Data description involves an in-depth understanding of the data and the data features. Data features are the variables constituting the data. Variables in a data set can be either continuous variable, categorical variable, nominal variable, or string variables (Ott, & Longnecker, 2015). The dataset contains all these variables, and thus it is diverse in composition. The size of the data commonly referred to as the sample size, describes the total number of rows present in the data. In the case of a survey, like in this case, the sample size represents the total number of respondents that took part in the survey. A sample size of 3500 was obtained to be used in the data analysis. Further, data analysis involves variable exploration to identify the essential variables in the study and disregard the less useful variables. Generally, not all variables in the data that are useful, and thus a comprehensive understanding of the data and the objective of the study are required in choosing the relevant variables. A total of 449 variables are present in the data and 3504 cases. This translates to 3504 rows and 449 columns; this represents the data dimension.

4.2 Data Transformation

Data cleaning constitutes the most time consuming and tedious process in data analysis. Cleaning the data involves the creating of new data variables, density transformation, coding and recoding the data, labeling the data, catering to the missing data, etc. all these processes should be conducted before the process of data analysis commences. In most cases, a new variable generation that involves calculating total scores in SPSS is desirable, primarily when there exist several variables measuring the same attribute (Pallant, 2020). Population attributes can be measured by one or more parameters, and thus aggregating these parameters would provide a more accurate estimate of the population attribute. The dependent variable in this research study, knowing the HPV virus, has been measure by several variables on the L section labeled HPV awareness. This implies that HPV awareness has several parameters deciding and measuring the awareness of the respondent.

In this, we need to compute the total scores of the awareness variables and computes a categorical variable based on the total scores. The categorical variable will have two categories; HPV aware and HPV unaware. Let us call the variable

The coding on these variables implies a negative coding, and thus, the higher the score, the less knowledgeable the respondent. Since all the variables are coded uniformly, reversal coding is not necessary, but for instance, in the presence of both negative and positive coding, reversal coding is appropriate to maintain a unique variable code uniformly. The explanatory variables, Age, education level, and social-economic level are measure by only one variable each, and thus total scores will not be calculated. The presence of confounding effects in a model cannot be rule out and should always be accounted for. Two variables acting as the confounding attributes will be an investigation in the model’s effects. In other words, how do the effects of these variables on the association of having HPV knowledge and the economic as well as the social factors?

The social beliefs variable will be computes from the social beliefs factors on the survey section N by computing the total scores.

4.3 Descriptive Statistics

Descriptive statistics involves the computation of population parameters that describe the data at a more broad perspective than mare visualization using the human eyes. Means, medians, variances, correlations, a measure of skewness, graphs, charts, and tables are some of the techniques used in descriptive statistics. Additionally, hypothesis formulation and generation are grounded on the descriptive analysis.

4.3.1 Descriptive statistics for the dependent variable

Descriptive statistics assist the researcher in comprehensively understanding the data through the process of data exploration. This generally involves the computation of means, medians, variance, frequency tables, charts, and graphs. Descriptive statistics can be divided into various categories depending on the data type and the nature of the statistics being computed. Categorical data descriptive statistics involve the computation of frequency tables, cross-tabulation tables, charts, and graphs. On the other hand, the continuous data descriptive statistics involve the computation of means, variances, histograms, box plots, etc. All these statistics and charts aid in understanding the data at hand (Agresti, 2018).

Table 1.0 below represents a frequency table of the dependent variable. The dependent variable distribution can also be shown. The category’s distribution is relatively equal, with a small difference in the categories size. 58.5% of the respondents indicated that they had heard HPV. In other words, 58.5% of the respondents are aware of HPV; on the other hand, 40.1% indicated that they had never heard of HPV. 1.4% of the responses were missing.

 

 

Table 1.0 Descriptive statistics for Dependent variable

 

L1. Have you ever heard of HPV? HPV stands for Human Papillomavirus. It is not HIV, HSV, or herpes.
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 50 1.4 1.4 1.4
Yes 2050 58.5 58.5 59.9
No 1404 40.1 40.1 100.0
Total 3504 100.0 100.0  

 

 

 

 

Fig 1.0: A bar graph of the dependent variable

 

Fig 1.1: A bar graph of the dependent variable

 

 

       4.3.2 Descriptive statistics for the response variables.

 

Fig 1.2: A bar graph of INCOMERANGES

 

 

Table 1.1: Descriptive statistics for Occupational status

 

O2. What is your current occupational status?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 43 1.2 1.2 1.2
Multiple responses selected in error 66 1.9 1.9 3.1
Employed 1696 48.4 48.4 51.5
Unemployed 115 3.3 3.3 54.8
Homemaker 161 4.6 4.6 59.4
Student 55 1.6 1.6 61.0
Retired 1113 31.8 31.8 92.7
Disabled 233 6.6 6.6 99.4
Other – Specify 22 .6 .6 100.0
Total 3504 100.0 100.0  

 

 

Table 1.2: Descriptive statistics for Education Level

 

EDUCB.  What is the highest level of school you completed? 5 Levels (Derived from Education; see History Document for mo
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing Data (Not Ascertained) 51 1.5 1.5 1.5
Less than High School 275 7.8 7.8 9.3
High School Graduate 631 18.0 18.0 27.3
Some College 1039 29.7 29.7 57.0
Bachelor’s Degree 910 26.0 26.0 82.9
Post-Baccalaureate Degree 598 17.1 17.1 100.0
Total 3504 100.0 100.0  

 

 

 

 

Table 1.3 Descriptive statistics for the source of information

 

A2. The most recent time you looked for information about health or medical topics, where did you go first?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 13 .4 .4 .4
Missing data (Filter Missing) 10 .3 .3 .7
Multiple responses selected in error 385 11.0 11.0 11.6
Question answered in error (Commission Error) 64 1.8 1.8 13.5
Inapplicable, coded 2 in SeekHealthInfo 644 18.4 18.4 31.8
Books 88 2.5 2.5 34.4
Brochures and pamphlets 87 2.5 2.5 36.8
Cancer organization 11 .3 .3 37.2
Family 64 1.8 1.8 39.0
Friend/Co-worker 25 .7 .7 39.7
Doctor or health care provider 390 11.1 11.1 50.8
Internet 1664 47.5 47.5 98.3
Library 9 .3 .3 98.6
Magazines 18 .5 .5 99.1
Newspapers 6 .2 .2 99.3
Telephone information number 20 .6 .6 99.8
Complementary, alternative, or unconventional practitioner 6 .2 .2 100.0
Total 3504 100.0 100.0  

 

 

 

 

Table 1.4: Descriptive statistics for beliefs about cancer

 

N2. How easy is it for you to imagine yourself developing cancer in the future?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 84 2.4 2.4 2.4
Missing data (Filter Missing) 13 .4 .4 2.8
Multiple responses selected in error 1 .0 .0 2.8
Question answered in error (Commission Error) 249 7.1 7.1 9.9
Inapplicable, coded 1 in EverHadCancer 344 9.8 9.8 19.7
Extremely difficult 509 14.5 14.5 34.2
Somewhat difficult 630 18.0 18.0 52.2
Neither difficult nor easy 1157 33.0 33.0 85.2
Somewhat easy 398 11.4 11.4 96.6
Extremely easy 119 3.4 3.4 100.0
Total 3504 100.0 100.0  

 

Fig 1.3 A bar graph of ImagineCancer

 

 

 

 

 

Table 1.4.1: Descriptive statistics for beliefs about cancer

 

N1. How worried are you about getting cancer?
  Frequency Percent Valid Percent Cumulative Percent
Valid Missing data (Not Ascertained) 52 1.5 1.5 1.5
Missing data (Filter Missing) 13 .4 .4 1.9
Multiple responses selected in error 2 .1 .1 1.9
Question answered in error (Commission Error) 254 7.2 7.2 9.2
Inapplicable, coded 1 in EverHadCancer 339 9.7 9.7 18.8
Not at all 627 17.9 17.9 36.7
Slightly 783 22.3 22.3 59.1
Somewhat 830 23.7 23.7 82.8
Moderately 418 11.9 11.9 94.7
Extremely 186 5.3 5.3 100.0
Total 3504 100.0 100.0  

 

Fig 1.4 A bar graph of FreqWorryCancer

 

 

 

 

 

 

Table 1.5: Descriptive statistics for Age

 

Descriptive Statistics
  N Minimum Maximum Mean Std. Deviation
O1. What is your Age? 3417 18 97 57.02 16.729
Valid N (listwise) 3417        

 

Fig 1.3: Histogram of variable Age

 

 

4.4 Chi-square test for independence

The person chi-square test, commonly known as the chi-square test for independence, is used to investigate the existence of difference in two categorical variables. The chi-square tests the hypothesis that:

H0: There exist independent among the categorical variables (no association)

H1: There is no independence among the categorical variables. (Association exists)

Several assumptions must be made to ensure the validity of the chi-square test of independence. This includes the variables that must be either categorical or nominal; the variables should have two or more categorical independent groups. To investigate the association between knowledge of HPV and education level, a chi-square test of independence was fitted. The resulting likelihood ratio test score was 370.572, with a corresponding p-value of 0.000. this implies that we reject the null hypothesis of no association. This translates to a statistically significant association between knowledge of HPV and education level.

 

 

 

 

 

Table 1.6: Chi-square test for independence for the dependent variable and education level

 

Chi-Square Tests
  Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 495.376a 10 .000
Likelihood Ratio 370.572 10 .000
Linear-by-Linear Association 41.334 1 .000
N of Valid Cases 3504    
a. 2 cells (11.1%) have an expected count less than 5. The minimum expected count is .73.

 

 

 

 

 

Table 1.7: Chi-square test for independence for the variable HEARHDPV and INCOMERANGES

 

Chi-Square Tests
  Value df Asymptotic Significance (2-sided)
Pearson Chi-Square 249.089a 18 .000
Likelihood Ratio 237.943 18 .000
Linear-by-Linear Association 20.597 1 .000
N of Valid Cases 3504    
a. 4 cells (13.3%) have an expected count less than 5. The minimum expected count is 2.53.

 

 

 

4.5 Model Building

 

4.5.1 Logistic Regression

In research projects, model building involves fitting statistical and data analysis models to express the relationship between the variables in the data. The main goal of model building is the prediction and forecasting process, where the dependent variable is predicted depending on the response variables. In this case, a binary logistic regression model was fitted to model the knowledge of HPV depending on Age, Education level, Social-economic factors, Social beliefs about cancer, and the related concepts. The dependent variable in this research project is a categorical variable with the categories, and thus a logistic regression model was the most appropriate (Allison, 2012).

In order to investigate the effects of social-economic variables and social beliefs about cancer, three models were fitted, excluding the confounding factors each at a time and the effects investigated. Confound=ding factors, in this case, are the social-economic factors and social beliefs about HPV.

Table 1.8: Descriptive statistics for the logistic regression model

 

Model Summary
Step -2 Log likelihood Cox & Snell R Square Nagelkerke R Square
1 3865.298a .535 .580
a. Estimation terminated at iteration number 20 because maximum iterations have been reached. The final solution cannot be found.

 

Tables 1.8 above represents the model summary of a logistic regression fitted with all the explanatory variables. The model was statistically significant, with a corresponding R square score of 0.580. This indicates that the model was able to account for about 58% of the variance occurring in the HPV awareness, as explained by the explanatory variables. All the variables were statistically significant, with corresponding p values <0.000. significant variables in the model refer to the models that contribute immensely to the overall model composition and cannot be omitted from the model.

Table 1.9: Model significance

 

Hosmer and Lemeshow Test
Step Chi-square df Sig.
1 17.865 8 .022

 

 

 

5.0 CHAPTER FIVE: RESULTS DISCUSSION AND INTERPRETATION

5.1 Introduction

In this chapter, we introduce the data analysis results, discussion, and findings. The entire hypothesis stated in the formulation of the objectives and research questions development will also be investigated and answered. Marjory, we will focus on the results from the logistic regression model, variables association, and the descriptive statistics of the variables. Data analysis showed that there exists a statistically significant association between HPV awareness and Age, level of education, social-economic status, source of information, and personal beliefs. The logistic regression fitted produces an R square value of 58%, indicating that the model could account for bout 58% of the variance occurring in the HPV awareness as indicated by both the mediating and explanatory variables.

Logistic regression presents the overall model significance of the variables. Additionally, an individual variable’s Significance can be accessed through the variables coefficients and their respective significance scores. Inspecting these coefficients, we can be able to identify the most significant variables that predict HPV awareness. Additionally, less significant variables can also be identified and be omitted from the final model since they do not present any useful information. On the other hand, to investigate the association between two dependent variables, the chi-square test of independence is applied. Results from these tests will also be presented.

 

5.2 Results interpretation

Model evaluation involves the investigation of model accuracy, consistency, and validity. Statistical models involving prediction are prone to errors resulting from differences are in the research. Due to this accessing, the model accuracy is required. In logistic regression, the model accuracy and validity are accessed by the R squared scores and the predictor variables. The r square score was above 50%, and all the explanatory variables were statistically significant. This implies that the model was statistically valid and consistent.

Hosmer and Lemeshow Test, table 1.9 in the data analysis section, gives the statistical accuracy of the model. The chi-square significance level of 0.022 was observed. Indicating the model was statistically significant. The validity of the model can also be asses using the coefficients of the explanatory variables. This coefficient gives the effects each variable has on the entire model. In other words, they represent the expected change to the dependent variable when the explanatory variable is changing with one unit. In the case of a categorical explanatory variable, the coefficient represents the change on the dependent variable with respect to the.

5.3 Results discussion

The data from the survey was obtained from a sample of the USA population, and thus the analysis results depict the analogy of HPV awareness in the USA. Results from the data analysis indicate that majority of the USA population is aware of the HPV virus accounting for about 58.5% of the total sample and 40.1% of the sampled population is not aware of the HPV virus. These numbers trigger the assumption that the community is not yet aware of the HPV virus, and this calls for HPV awareness from health ministries and the government. Controlling and combating the HPV virus begins with the public awareness of the virus, how it manifests, some of the control measures, treatment if there is, and the control measures. These measures cannot be achieved if the community is not aware of HPV.

The Internet as the source of information concerning health and medical topics dominated among the respondents accounting for about 47.5%. This can be associated with rapidly growing technology worldwide. Sources of information about health exist in several forms, and medium, among these forms presented to respondents, were Internet, health care, library, magazines, etc. health center was the second frequent source of information among the USA respondents. Since the majority of the population is focusing on the Internet and health care facility for the source of information concerning health, the ministry of health should focus mainly on the two areas while passing information concerning health to capture as many people as possible.

Social beliefs about cancer are viewed as a confounding factor affecting the association between cancer awareness and the explanatory variables in the research. 23.7% of the respondents are somewhat worried about getting cancer, 17.9% are not worried about getting, while 5.3% are extremely worried. This indicated that a larger number of the USA population are worried about contracting cancer. This may be due to having experience with the disease, getting information about cancer and its effects from several sources. Additionally, the majority of the respondents are moderate about getting cancer in the future, accounting for about 38.0%

Statistically, a significant difference was established between cancer awareness and the level of education complete with respective p-values score of <0.05. The respondents who had higher education experience presented by the level of education completed indicated that they had heard cancer, implying they were aware of cancer. A majority of the respondents who had completed 11years and below indicated that they had not heard about cancer. The chi-square test associated HPV awareness with a high level of education completed. Interestingly, upon conducting a chi-square test on the HPV awareness and the occupational status, a higher number of USA population that is employed are aware of HPV. Nevertheless, HPV awareness is lower among students and the unemployed population.

 

 

 

 

 

 

 

 

 

References

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.

Agresti, A. (2018). An introduction to categorical data analysis. John Wiley & Sons.

Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.

Kumar, R. (2019). Research methodology: A step-by-step guide for beginners. Sage Publications Limited.

Mackey, A., & Gass, S. M. (2015). Second language research: Methodology and design. Routledge.

Sperandei, S. (2014). Understanding logistic regression analysis. Biochemia Medica: Biochemia Medica24(1), 12-18.

Allison, P. D. (2012). Logistic regression using SAS: Theory and application. SAS Institute.

Sharpe, D. (2015). Chi-Square Test is Statistically Significant: Now What?. Practical Assessment, Research, and Evaluation20(1), 8.

Vaske, J. J. (2019). Survey research and analysis. Sagamore-Venture. 1807 North Federal Drive, Urbana, IL 61801.

Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.

Green, S. B., & Salkind, N. J. (2016). Using SPSS for Windows and Macintosh, books a la carte. Pearson.

Morgan, G. A., Barrett, K. C., Leech, N. L., & Gloeckner, G. W. (2019). IBM SPSS for introductory statistics: Use and interpretation. Routledge.

Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.

Ezzy, D. (2013). Qualitative analysis. Routledge.

Ott, R. L., & Longnecker, M. T. (2015). An introduction to statistical methods and data analysis. Nelson Education.

 

(2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.

Green, S. B., & Salkind, N. J. (2016). Using SPSS for Windows and Macintosh, books a la carte. Pearson.

Morgan, G. A., Barrett, K. C., Leech, N. L., & Gloeckner, G. W. (2019). IBM SPSS for introductory statistics: Use and interpretation. Routledge.

Pallant, J. (2020). SPSS survival manual: A step by step guide to data analysis using IBM SPSS. Routledge.

Ezzy, D. (2013). Qualitative analysis. Routledge.

Ott, R. L., & Longnecker, M. T. (2015). An introduction to statistical methods and data analysis. Nelson Education.

 

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask