← All Posts /

Diving Into Data- Statistical Analysis and Reproducibility of Research

NOT EVERYTHING THAT CAN BE COUNTED COUNTS AND NOT EVERYTHING THAT COUNTS CAN BE COUNTED.

Diving Into Data This quote perfectly explains the importance of data analysis in our lives. In layman’s terms, we can define data analysis as the process in which the data is collected and organised to derive the required information. Basically, it tells us about what the data wants to say or convey. Now, we move on to the basic aims and steps of data analysis.

(I) Descriptive Statistical Analysis– In Descriptive Statistics, one is describing, presenting and summarizing data, either through numerical calculations, graphs or through tables. Descriptive statistics is mainly used to describe the basic features of the data in a study. It provides simple summaries about sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.

(II) Inferential Statistical Analysis– In inferential Statistics, one tries to reach conclusions which extend beyond the immediate data. We can use inferential statistics to infer from the sample data, what the population may think. We also use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.

(III) Predictive Statistical Analysis– It is the branch of advanced analytics which is used to make predictions for future events. Many techniques from data mining, statistics, modelling and machine learning are used in this branch to analyze current data to make predictions for future. The core of predictive analytics depends on capturing relationships between explanatory variables and the predicted variables from past occurrences, and exploiting them to predict the uncertain outcome.

Now, let us shift our focus towards how to make a research under data analysis relevant.

In data science, replicability and reproducibility are some of the keys to data integrity. These sound similar, but they are quite different. But here, we will focus on how to make our research reproducible.

Reproducible research is the ability to perform a data analysis and achieve the same results as someone else. While data replication and reproduction are related to actually generating data, research reproduction is solely repeating the analysis.

Now, the question comes on- what is the relevance of reproducibility of a research.

(i) It is the evidence of correctness and accuracy– An obvious reason for reproducing research and repeating analyses is to confirm that the original results are indeed correct.

(ii) New observations-There can be different ways to analyze the same results, meaning there is the potential to reach different conclusions through different analyses. These different conclusions are in turn interesting because findings and claims can be not only built upon but new observations can be made.

(iii) Ever growing complexity of data analysis-These days the complexity of data analysis has increased remarkably. Data sets are larger and computations are more sophisticated.

How can one ensure that his research is reproducible?

Version control

A step one can take towards reproducible research while one is still carrying out his work is version control. This means continuously making records of data and files as you work on them. Doing so enables you and, more importantly, others to refer back to specific points in your research.

Report research and data analysis methods

Carrying out and making version control available can be useful, but only when it is accompanied by thorough reports. The reports should be about the methods used, the process as a whole and data analysis.

Clearly linking claims to the underlying data

One should make sure that it is clear to readers how one reached certain conclusions. Just because something makes sense to you, it is not definite that it will do so for others.

Research Data Management

RDM is a term that describes the organization, storage, preservation and sharing of data collected and used in a research project. It is an overarching process that guides researchers through the many stages of the data life cycle. In doing so, it enables scientists and stakeholders alike to make the most out of generated research data.

Recommended » Benford's Law