Author Topic: Reliability and ability estimates  (Read 688 times)


  • Newbie
  • *
  • Posts: 2
    • View Profile
Reliability and ability estimates
« on: March 07, 2016, 04:02:10 AM »
I am running a two-dimensional between-item model, and have a number of questions about reliability and ability estimates as below. Will appreciate any help!

Q1. Which reliability to report? I obtained mle, wle, and EAP/PV reliability. While it is expected that the 2nd dimension has lower reliability than the 1st (e.g., number of items on dimension 2 is much smaller), the difference in reliability between the two dimensions is much larger for wle than for the other two types of reliability indices. Would appreciate any insight into this observation. 

Q2. Related to the above question, any suggestion on what type of ability estimates to use, if I'd like to investigate the relationship of these measures with other independent variables? As I'd like to obtain just one ability estimate for each subject, plausible values are not under consideration.

Q3. It looks like the values of the ability variance for each dimension printed in the covariance/correlation matrix are somewhat smaller than the spread of the ability distributions on the Wright map. May I double check if this is indeed the case or just my inaccurate impression? if they are indeed different, may I know how the ability distributions on Wright map plotted?

Q4. The values of reliability (regardless of the type of reliability indices) are quite low for both dimensions. I am planning to plot the ability estimates (I used wle tentatively) and standard errors to show the magnitude of measurement errors by ability level. I understand that the variances printed in .shw file is smaller than the variances of the ability estimates obtained post hoc, as the former are attenuated of measurement errors. May I check how reliability corresponds to the standard errors of the ability estimates obtained through the "show cases!" command then?     

Thank you very much for your time and kind help!     
« Last Edit: March 07, 2016, 04:09:08 AM by Shirley »

Eveline Gebhardt

  • Administrator
  • Full Member
  • *****
  • Posts: 103
    • View Profile
    • Email
Re: Reliability and ability estimates
« Reply #1 on: March 18, 2016, 09:37:07 PM »

Hi Shirley
The choice of your reliability estimate depends on the choice of ability estimates.
Plausible values are the only estimates that give unbiased population estimates if they are drawn from the correct model. You would need to run a conditional model that includes all your “other independent variables” that you will use in secondary analysis as regressors. Using only one plausible value for each student will also lead to unbiased population estimates, but if you’d like to estimate the measurement error you will need 5 plausible values. However, plausible values are biased at the individual level, you can only use them to make inferences to the population.
If you do not have a defined population, you may prefer to use WLEs. Each WLE value corresponds to one raw score. They are easiest to estimate and they are unbiased at the individual level. However, they can lead to biased population estimates (variance can be overestimated and unlike PVs, they will be affected by floor and ceiling effects).
EAPs are almost the same as the average of the five PVs. You will need to include all your other variables of interest as regressors in the conditioning model. They are biased at the individual level and may give biased estimates for the population. If you have a large sample, enough items and good reliabilities, it is probably better to use one PV instead of the EAP.
Regarding the Wright map, I can’t tell without looking at the show file. If you email it to me (, I can have a look. Please make sure you have the latest version of ConQuest. The variance printed in the matrix is the latent variance (estimated before any individual abilities were estimated). The plot is based on the estimates for the individual students (WLEs, I assume). It is possible that a mismatch is an indication of a bias in the WLEs. Try to make a map that uses PVs (show !estimates=latent) and compare this map with the one you have now.
Regarding your last question, please have a look at RELIABILITY AS A MEASUREMENT DESIGN EFFECT by Raymond J. Adams (2005) in Studies in Educational Evaluation 31 (2005) 162-172. (I can send you a copy if you email me).
Best wishes