measuringmobility-faq

Frequently Asked Questions: How to Use, Evaluate and Adapt Measures and Scales

1. What is upward mobility/mobility from poverty?

This toolkit uses the definition of mobility developed by the US Partnership on Mobility from Poverty. Although measures of economic success such as income and assets are foundational to upward mobility, they do not fully capture people’s experiences. If an organization focuses narrowly on income, for example, helping people move one dollar above the poverty level would appear to be a success. Yet the individuals and families who achieved this milestone would likely continue to struggle. Just as important as material wealth are power and autonomy, which is people’s sense of control over the trajectory of their lives, and being valued in community, which is their sense of belonging. Thus, the definition of mobility comprises three core principles: economic success, power and autonomy, and being valued in community.

2. What is a reliable and valid measure?

Researchers use three criteria to judge whether a measure is “good”: reliability, validity, and cultural equivalence.

A reliable measure consistently assesses the same construct. To establish that a measure is reliable, researchers compute a statistic called Cronbach’s alpha, which calculates how correlated a measure’s items are with each other. Cronbach’s alpha ranges from 0 to 1, and higher numbers indicate higher reliability. An acceptable Cronbach’s alpha is usually .7 or above. For measures that claim to measure a quality that does not change over time, researchers will also give the measure to respondents at two different time points and compute how closely the scores are related.

A valid measure completely and accurately assesses the construct it claims to measure. To validate a measure, researchers test whether it correlates with or predicts things with which it should be related. For instance, if we believe that happiness and physical health are related, then scores on a measure of happiness should correlate with scores on a measure of physical health. Researchers must also ensure that a measure is not associated with things with which it should not be related. For example, scores on a measure of happiness should not correlate with scores on a test of musical ability.

A culturally equivalent measure reliably and validly assesses the same construct in different cultural groups. To establish cultural equivalency, researchers first closely consider whether the measure’s items will mean roughly the same thing in all the cultural groups they are studying. They will also consider whether the measure’s results will mean roughly the same thing.

For instance, research suggests that European-Americans value happiness more than do Asian-Americans. And so European-Americans may say a broader range of events make them happier, relative to Asian-Americans. Likewise, a high score on a happiness measure may have a more positive meaning to European-Americans than to Asian-Americans (see, for example, Tsai, Knutson, & Fung, 2006).

Next, researchers give the measure to people in different cultures and assess whether indicators of reliability and validity are similar in those cultures. For example, if a measure of happiness predicts physical health among European-Americans but not among Asian-Americans, researchers may conclude that the measure of happiness lacks cultural equivalence for Asian-Americans.

See Also

Tsai, J. L., Knutson, B., & Fung, H. H. (2006). Cultural variation in affect valuation. Journal of Personality and Social Psychology, 90(2), 288.

3. What is a construct?

Many measures in this toolkit estimate how much of a certain psychological quality people have, such as their sense of control or belongingness. But no one can actually see, hear, touch, or otherwise directly sense the things these measures are estimating. Researchers call these invisible yet measurable qualities psychological constructs.

For example, researchers cannot directly observe how lonely people feel. But they can ask people questions about their feelings (“How often do you feel starved for company?”), and from their answers they can infer how much of the underlying psychological construct loneliness each person has. This is how the UCLA Loneliness Scale works: Respondents answer questions about their feelings related to loneliness, and then researchers use each respondent’s answers to estimate his or her levels of loneliness.

Researchers can measure the psychological construct loneliness in other ways. For instance, a researcher could observe how much time people spend crying while alone. Or a researcher could ask people about their thoughts and opinions regarding loneliness. Researchers often combine measures of behaviors, thoughts, and feelings to assess a single underlying psychological construct.

To evaluate whether a scale or other instrument measures the psychological construct it claims to measure, researchers examine its correlations with real-world outcomes. For instance, researchers believe that loneliness is associated with bad relationships and poor physical health. Indeed, research confirms that people’s responses to the UCLA Loneliness Scale correlate with measures of their relationships and health (Russell, 1996). When a scale or other measure actually assesses the psychological quality it claims to assess, is said to have construct validity.

See Also

Russell, D. W. (1996). UCLA Loneliness Scale (Version 3): Reliability, validity, and factor structure. Journal of Personality Assessment, 66(1), 20-40.

4. How do you select measures for this toolkit?

Our first preference is measures that have been validated with low-income people in the U.S. and that predict outcomes relevant to economic development. These outcomes include income, employment, education, wealth, and health, as studies show that health is both a consequence and a predictor of economic standing.

Our second preference is measures that predict important outcomes among low-income people in the United States, even if the measure was not designed for this group.

Our final preference is measures that have reliably predicted differences between economic groups, even if those differences were not necessarily economic ones.

5. What is the best format for respondents’ answers?

Have you ever been asked to use a 1 to 7 scale to show how much you agree with a statement? Then you’re already familiar with the Likert scale, a common format for answering questions on psychological measures. Using a Likert scale, a person chooses the response option that best reflects how much they agree or disagree with a particular statement (e.g., 1 = strongly disagree, 2 = moderately disagree, 3 = slightly disagree, 4 = neutral, 5 = moderately agree, 6 = somewhat agree, and 7 = strongly agree). Likert scales can have different ranges (e.g., 1-4, 1-7), different labels, and may or may not have a midpoint.

Most of the measures in this toolkit have Likert scale answer formats because respondents can reliably use them. For each measure, we supply the scale numbers and labels that its creators originally used. But if you put together several measures with different answer formats, you might confuse respondents.

To help respondents easily and accurately answer your scales, we describe two possible problems and then suggest their solutions:

Problem #1: Measures with different numbers of scale points but similar labels. For example, one measure uses a 5-point scale ranging from not at all to very much, but the other measure uses a 6-point scale ranging from none to a lot.

Solution #1: Choose one set of scale points and labels and apply them to both measures.

Problem #2: Measures with the same number of scale points but different labels. For example, one measure uses a 6-point scale where 1 = strongly disagree and 6 = strongly agree, but the other measure uses a 6-point scale where 1 = strongly agree and 6 = strongly disagree.

Solution #2: Once again, choose one set of scale points and labels and apply them to both measures. In general, we recommend making higher numbers have more positive meanings (i.e., 6 = strongly agree).

These slight modifications will make your survey more consistent and help respondents answer honestly and accurately. Otherwise, we do not recommend changing the measures in this toolkit.

6. How can I modify the measures?

Because researchers validated the measures in this toolkit in their current form, we recommend that you not modify them except as described here.

Editing or removing items or changing response options is a risky business. Even if your changes seem simple, they may alter the meaning of the items and render the measure unreliable or invalid. When researchers modify their own measures, they must rigorously re-test them.

Below are ways to handle situations that may make you consider modifying the measures in this toolkit.

Issue #1: My respondents don’t speak English.

Solution #1: Conduct an internet search to make sure no one has already created a validated version of the measure in the language your respondents speak. If not, then follow these steps to create a translated version:

  1. Translate the text from English to your respondents’ language.
  2. Then back-translate the translated text to English.
  3. Compare the back-translated English version with the original English version and note discrepancies between them.
  4. Change words and phrases in the target-language version until the back-translated English version matches the original English version.
  5. If possible, test both the English and translated versions with a bilingual sample to verify that their scores on the two versions are similar.

Issue #2: The language is too adult for children or too childish for adults.

Solution #2: Do not modify the measure. Instead, search for an age-appropriate measure of the construct you are interested in.

Issue #3: The reading level of the measure’s vocabulary or syntax is too high.

Solution #3: Do not modify the measure. Instead, search for a measure of the construct that is written at the target reading level.

Issue #4: Respondents would understand the items if they heard them rather than read them.

Solution #4: Follow these guidelines to deliver the measure orally:

  1. Ensure that respondents understand the instructions and response options.
  2. If there are more than a few response options, use a visual aid with the response options written on it.
  3. Build in time to repeat items, if needed.
  4. Use the same data entry process to capture each respondent’s answers.

Issue #5: There are too many response options. Can’t I just use yes and no, or cut down a 7-point scale to a 3-point scale?

Solution #5: The measures were validated with the response options provided, so do not modify them except in the situations described in this FAQ.

7. What information about respondents can I collect and how can I use it?

Researchers have an ethical responsibility to protect respondents’ privacy and safety. Institutional Review Boards (IRBs) are independent panels that oversee the ethical conduct of research. If you are collecting information only to evaluate the effectiveness of your program, you likely do not need the approval of an IRB. But if you are using the information you collect for research purposes, you should consult with an IRB. All colleges and universities, and some nonprofits and corporations, have their own IRBs.

If neither you nor your collaborators have an institutional review board, you can enlist the help of a commercial IRB.

Regardless of whether you enlist an IRB, you should protect respondents in the following ways:

  1. Be aware that you may be legally required to report to authorities certain kinds of sensitive information such as immigration status, abuse, crime confessions, and intent to harm oneself or others. Carefully consider the pros and cons of collecting this information. If you choose to collect sensitive information, tell respondents upfront the consequences of their sharing this information with you.
  2. Because some questions could put respondents at risk, avoid collecting names, addresses, phone numbers, social security numbers, or other identifying information. Keeping respondents’ answers anonymous can also help people who are not used to participating in research feel more comfortable.
  3. If you must collect personal information — for instance, so that you may follow up with respondents in the future — take steps to protect respondents’ identities. A common practice is to replace identifying information with a unique number for each respondent, and then store a document linking the unique number to the identifying information in a separate, secure place.
  4. At the beginning of your study, tell respondents with whom and in what forms you will share the information you collect. Also explain to respondents how to withdraw from your study or remove their information from your dataset in the future.
  5. Protect your data. Store paper files in a locked file cabinet in a locked room, and store digital files on a password-protected and encrypted computer. Limit access to cabinets and computers with data stored in them.

8. What do I do with respondents’ answers?

So you’ve designed a study using measures from this toolkit and collected answers, or data, from your respondents. Now what?

This isn’t a complete guide to analyzing your data, but we can start you off with a few tips.

  1. Follow the instructions on the measure cover sheet for how to score your measure. For most measures, you will calculate the average of the responses to all items. In other words, you will add up respondents’ ratings and then divide that sum by the number of items.
  2. But watch out for reverse-scored items. These are marked with an (R) on the measure cover sheets. Reverse-scored items are worded in the opposite direction of what the scale is measuring. For instance, on the UCLA Loneliness Scale, a higher average score indicates greater loneliness. Respondents use a 4-point scale (1 = Never, 4 = Often) to express their agreement with each item.
  3. On most items, like “How often do you feel that you lack companionship?” higher agreement means more loneliness. But on some items, higher agreement means less loneliness. One such reverse-scored item is “How often do you feel close to people?” On items like this, you will need to reverse-score the response so that higher numbers still mean more loneliness.

    The formula for reverse-scoring an item is:

    ((Number of scale points) + 1) – (Respondent’s answer)

    Say, for instance, a respondent gave a rating of 4 to a reverse-scored item on the UCLA Loneliness Scale. Here is how we would reverse-score their response:

    ((Number of scale points) + 1) – (Respondent’s answer)

    (4 + 1) – 4 = 1

    The new score for the item is 1, which indicates less loneliness. That makes more sense, right? Make sure to transform all reverse-scored items before calculating the average for the entire measure.

  4. Decide which analyses you want to perform. Although you can conduct some simple analyses in Excel, others may require advanced statistical training and specialized software.
  5. Below are a few common statistical analyses:

    A correlation (r) shows how much two variables are related to each other. Correlations range from 0 to +/-1. The closer the correlation is to +1 or -1, the stronger the relationship. A positive correlation means that as the score of one measure increases, so too does the score of the second measure. A negative correlation means that as the score of one measure increases, the score of the other measure decreases.

    When interpreting a correlation, double-check your measure. Do higher scores indicate something good (like relationship closeness) or something bad (like loneliness)?

    Also remember that correlation does not equal causation. Let’s say you find a strong negative correlation between loneliness and income. You might be tempted to conclude that loneliness causes people to have lower incomes. But consider these other possibilities: Maybe low income causes people to feel lonely because they have to work many hours and don’t have time to spend with family and friends. Or, maybe a third variable, like poor health, causes both lower income (because of days absent from work) and loneliness (because of not feeling well enough to spend time with others).

    A regression is another analysis that allows you to predict some outcome based on the score of one or more measures. For example, you could run a regression analysis that tests whether subjective social status and health-related quality of life predicts income.

    A t-test can help you look for differences between groups. For example, you might use a t-test to examine whether men report better psychological wellbeing than do women. If you want to compare more than two groups, use an analysis of variance (ANOVA).

    Remember: your study design, not your statistical analysis, determines whether you can say certain factors cause certain outcomes.

9. What’s the best kind of study?

In this toolkit, we feature measures that researchers have used or could use to study mobility from poverty. These researchers have conducted many different kinds or designs of studies, and you also are likely contemplating which study design you should use.

But which study design is best?

The correct answer is: It depends on the question the researcher is asking. If the researcher wants to know a lot about one person or family, then a case study is the most appropriate research design. But if the researcher wants to know which program works for most families most of the time, an entirely different set of study designs is appropriate.

The chart below presents the most common research designs in economics, psychology, and other social and behavioral sciences. The research designs are ranked from designs that generate rich personal data but whose results are least likely to generalize to other times or people, to designs whose results are most likely to generalize to other times or people, but at the expense of personal details. We also highlight the pros and cons of each design.

In practice, researchers often begin with a less rigorous design and then progress to more rigorous designs in subsequent studies. Researchers may also combine several designs in one study. For example, a researcher may compare women and men (i.e., a cross-sectional comparison) who are randomly assigned to either a treatment or a control condition (i.e., a randomized controlled trial). Evaluations of mobility from poverty programs often use a longitudinal design.


Case Study


Definition: Observations of a single person, family, community, or even

Pros: Collects rich personal details; aids understanding of unusual instances; helps develop research questions

Cons: May be a rare event with little bearing on other instances; prone to experimenter biases

Rigor: 1/6 (1 is least rigorous, and 6 is most rigorous)


Cross-sectional Study


Definition: Observations of many people at a single point in time (e.g., the Census) / Observations of two groups (e.g., men and women) at a single point in time

Pros: Is often a cost-effective way to test relationships between many variables

Cons: Can’t tell which variables are causes and which are effects

Rigor: 2/6 (1 is least rigorous, and 6 is most rigorous)


Case-control Study


Definition: Observations of two groups with different outcomes but otherwise similar qualities (e.g., people in poverty, or “the cases,” vs. a comparison group of otherwise similar people who are not in poverty, or “the controls”); researchers then work backward to figure out what factors (e.g., dropping out of high school) led more of the cases to be in poverty, relative to the controls

Pros: Allows stronger tests of association between variables than a case-control study (see above); is less expensive than a longitudinal study (see below)

Cons: Can’t tell which variables are causes and which are outcomes

Rigor: 3/6 (1 is least rigorous, and 6 is most rigorous)


Longitudinal Study


Definition: Observations of the same group or groups of people at multiple points over time (e.g., National Longitudinal Studies)

Pros: Allows stronger guesses about certain variables causing certain outcomes; is also good for observing changes over time, such as children developing or adults aging

Cons: Still can’t verify which variables are causing which outcomes

Rigor: 4/6 (1 is least rigorous, and 6 is most rigorous)


Randomized Controlled Trial


Definition: Observations of people whom researchers randomly assign either to a treatment group (or groups) or to a control group

Pros: Allows researchers to definitely establish which causes drive which outcomes

Cons: Is inappropriate for many important research questions for which random assignment to groups would be unethical (e.g., poverty, disease); often requires artificial or contrived context

Rigor: 5/6 (1 is least rigorous, and 6 is most rigorous)


Systematic Review or Meta-Analysis


Definition: A statistical analysis of all studies relevant to a research question; see the Cochrane and Campbell libraries of systematic reviews

Pros: Includes many more people than single studies, which helps identify reliable effects

Cons: Is only as good as the studies included in the review

Rigor: 6/6 (1 is least rigorous, and 6 is most rigorous)

10. Who created this toolkit?

The Measuring Mobility Toolkit is the result of a collaboration between Stanford SPARQ, the Urban Institute, and the U.S. Partnership on Mobility from Poverty.

Stanford SPARQ: Social Psychological Answers to Real-world Questions is a Stanford Psychology Department “do tank” that partners with organizations to reduce disparities in economic mobility, health, education, and criminal justice. Stanford SPARQ’s toolkits translate social science into step-by-step instructions and materials for practitioners to use in their own work.

The nonprofit Urban Institute is a leading research organization dedicated to developing evidence-based insights that improve people’s lives and strengthen communities. For 50 years, Urban has been the trusted source for rigorous analysis of complex social and economic issues; strategic advice to policymakers, philanthropists, and practitioners; and new, promising ideas that expand opportunities for all. Our work inspires effective decisions that advance fairness and enhance the wellbeing of people and places.

The U.S. Partnership on Mobility from Poverty consisted of 24 leading voices representing academia, practice, the faith community, philanthropy, and the private sector. The Partnership’s collective ambition was that all people achieve a reasonable standard of living with the dignity that comes from having power over their lives and being engaged in and valued by their community.

FB Share