Summary: The UMS are the result of a joint analysis of 14 existing and 7 new motive scales for achievement, power, affiliation, and intimacy. Based on item response theory, they provide higher measurement precision in a wider range of the traits with fewer items than existing questionnaires.

Several different self-report measures of motivation exist. Although frequently used, only few studies ever tried to compare these different measures with regard to their psychometric properties (e.g., Engeser & Langens, 2010). This lack calls for a systematic analysis of these inventories based on a modern statistical approach. Thus, an item response theory analysis of the central motives (Achievement, Afifliation/ Intimacy, and Power) is carried out. Therefore, in three studies 21 different scales for these motives have been administered to 1500+ participants, containing the three subscales Achievement, Dominance (Power), and Affiliation of the Personality Research Form (PRF; Jackson, 1984), the Personality Values Questionnaire (PVQ; McClelland, 1991), the Achievement Motive Scale (Lang & Fries, 2006), the Mehrabian Affiliation Tendency Questionnaire (MAFF; Mehrabian, 1970), the Mehrabian Sensitivity to Rejection Questionnaire (MSR; Mehrabian, 1994), the Goals Questionnaire (Poehlmann & Brunstein, 1997), and additional self-constructed items (Schönbrodt & Dislich, 2010). The Samejima (1969) graded response model was used to test whether the existing self-report measures of motivation suffer from scaling problems and to construct new optimized scales based on the complete item pool. Results show that commonly used motivation scales can be improved in a number of important ways; thus, new unifying motive scales are presented that map on the underlying theoretical dimensions, are unbiased in respect to gender, and are able to provide a higher precision with fewer items.

A fourth study showed that the improved psychometric properties also are reflected in an incremental validity of the UMS in comparison to existing questionnaires.

- The publication: Schönbrodt, F. D., & Gerstenberg, F. X. R. (2012). An IRT analysis of motive questionnaires: The Unified Motive Scales. Journal of Research in Personality, 46, 725–742. doi:10.1016/j.jrp.2012.08.010
- Correlation matrix from Study 1.
- Correlation matrix from Study 2.
- Sheets of a presentation held at the ECP2012 in Triest

Reasons to use the UMS as a measure for explicit motives:

- Optimal amalgam of typically employed motive scales
- Based on IRT: Higher measurement precision in all regions of the traits
- Initial evidence for incremental criterion validity
- Three scale lengths: Full scales with 10 items, or short scales with 6 or 3 items. All versions measure the same latent dimension.
- Differentiation of affiliation and intimacy
- Fear scales for all motives (fear of failure, fear of losing control, fear of losing prestige, fear of rejection, fear of losing emotional contact)

Reasons

- The UMS measure explicit motives on the highest level of abstraction, mixing up goals, self-concepts, and preferences. If you want to do a more fine-grained analysis of these concepts (e.g., differentiating goals and values, as in Hofer, Busch, Bond, Li, & Shaw, 2010), more specialized scales should be used.

With the data set of Study 1, we examined items for differential item functioning (DIF) concerning gender using a likelihood ratio test (LR-DIF; Woods, 2011). This method compares two nested item response models for two groups (in the current study, men and women) and evaluates whether the response function for a particular item differs for both groups. Computationally, the

References:

Choi, S. W., Gibbons, L. E., & Crane, Paul K. (2010). lordif: Logistic regression differential item functioning using IRT. Retrieved from http://CRAN.R-project.org/package=lordif

Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, Measurement in a multi-ethnic society, 44, S115-S123. doi:10.1097/01.mlr.0000245183.28384.ed

Woods, C. M. (2011). Dif testing for ordinal items with poly-SIBTEST, the mantel and GMH tests, and IRT-LR-DIF when the latent distribution is nonnormal for both groups. Applied Psychological Measurement, 35, 145-164. doi:10.1177/0146621610377450

Small to medium gender differences are found for power, affiliation, intimacy, and the fear factor, no significant difference for achievement. (mean values are based on a scale ranging from 0 to 5).

This figure shows another decomposition of the predictive validity of the three achievement motive scales (the table of Study 4 in the publication only reports a summary of unique vs. shared variances).

Commonality analysis is a multiple regression technique, and it answers the question: „How much of the explained variance can be uniquely attributed to a single predictor or to the shared variance of any combinations of predictors?“. Around 25% of explained variance in the achievement related criteria can be attributed to the shared variance of all three scales (dark blue bars). A comparable amount can be attributed to the shared variance of UMS and AMS (which is not present in the PRF; light blue). The next three colors are the unique variance of each inventory: nothing unique for the PRF, some for the AMS and a large share for the UMS.

Comparable results are obtained for the affiliation and power domain (see the main publication, Study 4).

Taking item count into consideration, the AMS (5 items) performs very well, and PRF (16 items) performs even worse.

Running a MDS on existing motive questionnaires and the new UMS scales (data set of Study 1) showed that a three-dimensional solution was optimal. The second dimension (separated by panels in the figure, and indicated by dot color) distinguishes hope from fear scales.

UMS scales are printed in bold face. One can see the differentiation between affiliation and intimacy, and the intermediate position of the PVQ affiliation scale.

In Study 1, we used the silhouette width as an indicator of the quality of a cluster solution (higher = better). We chose to use a 45-cluster solution (exploratorily using the second peak at 56 clusters did not change the results, and using a 3 or 5 cluster solution would not make too much sense for the analyses we did ...). Here's the plot for 2 - 100 cluster solutions.