Reliability
Source
Primer:
These notes briefly go over the validity and reliability concept within psychometrics, the common measures in the single & repeated measure paradigms. Validity refers to whether the scale measures the unobservable construct that we wanted to measure (i.e., accuracy). Reliability refers to whether the scale measures the intended construct consistently and precisely (i.e., precision).
Reliability Measures
Inter-rater Reliability
A measure of consistency between two or more independent raters of the same construct. Here are some common options:
Joint Probability of agreement
Simplest approach for nominal rating. The percentage of options that two raters agree with each other.
Correlation Coefficients
Simplest approach for ordinal / interval ratings, with Pearson's r, Kendall's T, or Spearman's rho.
Kappa Statistics
Cohen's kappa (for two raters) & Fleiss's kappa (for multiple raters) are more robust measures for nominal ratings.
where is the relative observed agreement among raters, and is the hypothetical probability of chance agreement. For categories with observations to categorize and the number of times the rater predicted the category , we then have . It is also possible to get the estimate from binary classification confusion matrix for a binary classification task.
Split-half reliability
Split-half reliability is a measure of consistency between two halves of a construct measure. Is often just randomly splitting the inventory into two halves, then measuring the correlation. It is not suitable for inventory such as Big Five, where the intended construct is non-unidimensional.
Internal consistency reliability
Measuring consistency between different items of the same construct. One common measure is Cronbach's alpha.
Last updated