This essay has been submitted by a student. This is not an example of the work written by professional essay writers.
Uncategorized

Research Report

Pssst… we can write an original essay just for you.

Any subject. Any type of essay. We’ll even meet a 3-hour deadline.

GET YOUR PRICE

writers online

 

 

 

Research Report

 

Student Name

University Name ()

Course Name

Instructor Name

Date ()

 

 

 

 

 

 

 

 

 

 

 

 

 

Introduction

Stack Overflow is a community of developers, programmers, and data scientists that enriches sharing ideas and solving problems related to technology. Normally, employers in the community advertise for jobs on the stack overflow job boards to target qualified applicants. Other than a job posting, members do also post problems on the community and get solutions and responses from other members. This research report focuses on the job posting problems that arise in the community due to redundancy in the job posting and numerous platforms for the job posting.

The dependent variable in this research report is the use of job boards by the respondents. The explanatory variables include coding as a hobby, age, contribution in open source, years of coding, employment status, undergraduate major, and manager. The analysis will restrict respondents to three countries, i.e., Australia, the Netherlands, and the Russian Federation. The survey data under consideration is the 2019 survey that has been split to contain only the three listed countries. The original data had around 89000 observations and 85 variables. After subsetting the data, we have a data set of around 5,500 observations and 85 variables.

The majority of the variables under consideration are categorical variables, and thus, categorical data analysis techniques and models will be prioritized. Chi-square test of independence will be employed to check for respondents’ differences in the use of job boards; the correlation coefficient will also be established to identify the most correlated variables, charts, and graphsandas frequency tablbe formulated in the descriptive statistics section. Finally, a random forest classification model will be fitted to the model using job boards in respect to respondents’ characteristics.

 

Problem Statement

Normally, job posting in Stack Overflow needs to be posted on one board to target suitable candidates for the job. Nevertheless, this is not always the case since jobs are posted in numerous places, and this attracts different candidates, both suitable and unsuitable for the posted jobs. Factoring in the candidate’s characteristics, accessing the factors influencing the use of job boards can lead to better targeting of candidates, reduce job posting redundancy, and probably decrease the number of unfit applicants significantly.

Research Method

Research methods involve the entire process of structuring the research. This comprises the data collection methods, data source, data types, how data was collected, analysis methods, presentation, and conveying the results. The research objectives and the structure of the variables normally give insights into the research methods to be employed in a research study. Qualitative research methods that deal with quantitative measurements try to answer quantitative research questions laid down (Bryman, 2016). The research problem also entails prediction through a random forest model. Thus, quantitative techniques will be used.

Research methods, therefore, depend on the objectives of the study, research question, and the structure of the data.

Research Questions

The research questions address the problems laid down in the research project. The research question should be well structure and constructed to comprehensively address the research aims. Normally, they are stated as a question that requires a specific approach.

  1. Is there a significant relationship between respondent’s response to employment status, the response on open source, codding as a hobby, and the use of job boards?
  2. Is there significant evidence to show that the respondent’s response on employment, open-source, and codding as a hobby provides enough evidence for predicting the respondents’ response on the use of job boards
  3. Does the data provide enough evidence for predicting the use of job boards?

 

 

Sample

            A sample is a representation of the entire population in a research study. Generally, analysis done on a sample is used to generalize the results to the entire population. A sample size of  5500 respondents from three countries was used for the analysis. All the respondents participated in the Stack Overflow survey of 2019. The sample size in a research study is a very important aspect of the analysis given the fact that the analysis results and the model accuracy depend on the size of the sample. Statistically, the accuracy of the model increases with an increase in sample size.

The sample size is also affected by the missing values in a dataset. Thus, missing values should be handled correctly in a way that does not significantly reduce the sample size. Imputing the missing values is the best-recommended process of dealing with missing values. Additionally, casewise elimination is also practical in the case where the missing values are minimal. Casewise elimination deletes all the rows in a dataset that have missing values.

Analysis Method and Limitations

            Frequency tables, bar graphs are the most used techniques for analyzing categorical data. A frequency table presents the frequencies of each class in the categorical variable. Additionally, a contingency table can also be used to analyze two categorical variables using their frequency distribution. The Chi-square test of independence investigates the presence of any statistically significant difference in two categorical variables. For instance, the process of accessing whether there is a difference in the respondent’s response to employment and the respondent’s response to student status.

A significant difference in the categories distribution exists, and this highly affects the accuracy of the analysis. The presence of missing values also reduces the sample size of the data, which might affect the accuracy of the overall results (Hennink et al., 2020).

 

Descriptive Statistics

Fig 1.0: A bar graph of variable Hobbyist

 

 

 

Fig 1.1: A bar graph of variable employment

 

Fig 1.2: A bar graph of variable Opensourcer

Fig 1.3: A bar graph of variable job boards

 

 

 

 

 

 

 

Results

            Data analysis results show that there s a significant association between respondent’s response to employment and student status. Corollary, the variable Hobbyist is also statically associated with the student’s status as indicated by the chi-square test of independence. The variables opensource is not associated with student status as the p values from the chi-square test of independence are greater than 0.05.

 

Table 1.0: Chisquare test of independence on employment and use of job boards

Pearson’s Chi-squared test
 
data:  table(Mydata$SOJobs, Mydata$Employment)
X-squared = 193.05, df = 10, p-value < 2.2e-16

 

Table 1.1: Chisquare test of independence on code as a hobby and use of job boards

Pearson’s Chi-squared test
data:  table(Mydata$SOJobs, Mydata$Hobbyist)
X-squared = 4.6969, df = 2, p-value = 0.09552

 

Table 1.3: Chisquare test of independence on opensourcer and use of job boards

Pearson’s Chi-squared test
data:  table(Mydata$SOJobs, Mydata$OpenSourcer)
X-squared = 110.01, df = 6, p-value < 2.2e-16

 

 

 

 

 

 

 

 

Discussion

Table 1.4: Random forest model

  mtry  Accuracy      Kappa   AccuracySD     KappaSD
1     1 0.5468891 0.00000000 0.0005534559 0.000000000
2     2 0.5543406 0.02660861 0.0037644631 0.009004333
3     3 0.5604795 0.07642199 0.0096809944 0.022069815
4     4 0.5679279 0.11320100 0.0106714702 0.026607434
5     5 0.5556618 0.09858260 0.0142558528 0.023421394
6     6 0.5569730 0.10273021 0.0086065024 0.011295122
7     7 0.5508357 0.09678513 0.0061475960 0.022134218
8     8 0.5552224 0.10733549 0.0122102559 0.016416118
9     9 0.5552186 0.10920201 0.0102294750 0.013260776
10   10 0.5530256 0.10697816 0.0083220774 0.022922332

 

Fig 1.4: Model accuracy

 

 

The chi-square test of independence performed on the explanatory variables showed a statistically significant association. A random forest classification model was fitted to identify whether the association is statistically enough to predict the student status. Data was divided into two parts, the training and the validation data in the ratio of 70% to 30%.

When the model was applied for prediction, an accuracy level of 54% was achieved. Further tuning the parameters, the best model accuracy was 54%. Statistically, the model was significant as the accuracy level was above the 50% level. This indicates that the respondent’s response to employment, opensource, opensourcer, and Hobbyist provides enough evidence to predict the response to job boards’ use.

 

 

 

 

Prediction results

 

               Accuracy : 0.5449
                 95% CI : (0.4996, 0.5896)
    No Information Rate : 0.5265
    P-Value [Acc > NIR] : 0.221
                  Kappa : 0.0773
 Mcnemar’s Test P-Value : <2e-16

 

 

Recommendations for Future Research

For future researchers and authors, performing a random forest on various variables, for instance, more than four variables, would be recommended to identify the significant variables in predicting student’s status. Additionally, other variables like education, career field, and satisfaction would set some insights in the respondent’s responses.

Secondly, analyzing various countries to identify whether there exist any variations in the results as per the country of residence. Other than the random forest model, other researchers and scholars would also be recommended to fit other statistical models such as a logistic regression model, neural network, etc.

 

Conclusion

                             There is a significant relationship between respondents’ response to job boards and respondents’ response to employment, Hobbyist, Opensource, and Opensourcer. The association is significant enough to predict how respondents would respond to the student’s status. A random forest classification model effectively models the association of student status and its respective explanatory variables.

Most of the respondents were employed full time, representing a total of 1394 respondents. Interestingly, the respondents who coded for hobby were significantly more as compared to those who coded out of a hobby. Those who coded as hobby represented a total of 1422 respondents.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

References

Bickel, P. J., & Lehmann, E. L. (2012). Descriptive statistics for nonparametric models I. Introduction. In Selected Works of EL Lehmann (pp. 465-471). Springer, Boston, MA.

Bryman, A. (2016). Social research methods. Oxford university press.

Hennink, M., Hutter, I., & Bailey, A. (2020). Qualitative research methods. SAGE Publications Limited.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news2(3), 18-22.

Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chica-Olmo, M., & Rigol-Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing67, 93-104.

Stack overflow annual development survey (2019). https://insights.stackoverflow.com/survey/

  Remember! This is just a sample.

Save time and get your custom paper from our expert writers

 Get started in just 3 minutes
 Sit back relax and leave the writing to us
 Sources and citations are provided
 100% Plagiarism free
error: Content is protected !!
×
Hi, my name is Jenn 👋

In case you can’t find a sample example, our professional writers are ready to help you with writing your own paper. All you need to do is fill out a short form and submit an order

Check Out the Form
Need Help?
Dont be shy to ask