Lake Superior College Sociology Department Microcase Tutorial
01/05
Lake Superior College is a subscriber to the Microcase Curriculum Plan. This program provides social science data (the Microcase files) as well as the tools to analyze that data (the Microcase Analysis Program). Your CD includes all of these data files and tools for 2004-05. This introduction assumes that you have carefully read your textbook chapter on sociological research and that you understand terms like “variable” and “hypothesis.” When we use the Microcase data, we are engaging in what is generally called “secondary analysis” or research using “secondary data” as opposed to “primary data.” Instead of conducting a study and collecting data ourselves through a questionnaire or interview or participant observation (primary data), we are taking data that other researchers have collected (secondary data) and using it to test hypotheses and illustrate concepts in sociology.
Where can I use this program?
On campus, these files and the analysis program are available through the LSC network. You can access them in your sociology classroom and the Library Information Commons. At home, you can install this CD on your home computer. Your instructor will provide printed directions to help you do this. Our subscription is good for one year and the program on your CD will no longer work when that year has expired.
The Microcase Data Files
There are more than
500 survey data sets available on this CD. This is an
amazing amount of research results to have available at our
fingertips. All of the
social science data files are contained in the Archive, so each time
you want to
use a file, you will need to open the archive. Within the archive, the
files are
organized into four categories: Ecological, Survey, Trend, and Other.
Sociology
classes at LSC use six of these files more than the others. So we’ll
describe the
six files we use the most and show you how to access them.
Begin at the Menu page and choose “File Management,” then “Open File.” You’ll follow these same steps each time you want to open a file or change files. If you are at LSC using the network version, all you need to do is open the archive. If you are on another computer, you will need to go to the CD drive first, then find the archive by clicking “Data” and “Archive.” This lands you in the same place. If you are using this program on your home computer and you do not follow this sequence to get to the archive, you won’t be using the correct files. Remember, there are 500 files of data and it’s important that you get to the correct version of the correct file in order to correctly complete an assignment. If you ALWAYS begin with the archive, you’ll be in the right place.
GSS02
The archive is where our most useful six files “live.” The file we probably use most of all is the General Social Survey file from 2002 (GSS02). To reach it, click “Survey,” “US,” “GSS,” and GSS02. Again, it’s important that you access the file with the exact title that’s in your assignment.
The General Social Surveys are conducted by the National Opinion Research Center (NORC) in Chicago and funded by the National Science Foundation. They have been conducted since 1972, and Microcase provides 24 year’s worth of these surveys on our CD. We most often use the 2002 (most recent) version. The data in the GSS are collected by questionnaire from a random sample of 2,765 U.S. adults. Our file contains 788 pieces of information or variables for each respondent. That means that each one of 2,765 respondents answered 788 questions. This information should be visible in the gray box on your screen. When you click OK, the upper left corner of your computer screen will indicate that this file is currently open.
To use any of the files, choose “Basic Statistics” from the Menu page, the third from the top at the left of the screen. When you click “Basic Statistics,” you arrive at the analysis program for Microcase. Each function that the program will perform is listed here. We’ll use just a few of the functions. To explore the data in the GSS02 file, we’ll start with Univariate (that means one variable at a time). When you click “Univariate,” the variable list for the 788 items in this file appears on your screen. Data in this file is about individuals.
Highlight/read Pie chart Bar graph Search
Subset use 219)Watch TV and 35)Education
We’ll do more
analysis with this file later in this tutorial, but for now, go back to
the Menu.
We begin again at the Menu. To look at other files, we follow the same steps that we just used. Click “File Management,” “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The second file we’ll explore is also under “Survey,” but it is international. So click “Survey,” “International,” WVS97, and WVS95-97.
WVS95-97
The World Values Surveys are compiles by the Institute for Social Research at the University of Michigan, Ann Arbor. The respondents represent 43 nations and the majority of the world’s population; they come from societies with per capita incomes as low as $300 per year to societies with per capita incomes as high as $30,000 and from democracies with market economies to various types of authoritarian states. This is the third wave of world values surveys, at it includes more than 60 surveys. Files are available for each individual nation and a combined file is also available for the World Values Survey95-97 file. This is the file we’ll use. As you can see from the gray screen, this file includes 594 variables for each of 78,574 respondents. Data in this file is about individuals.
When you click OK, check the upper left corner of your screen to make sure that the WVS95-97 is open. To explore the file, we’ll choose “Basic Statistics” from the Menu screen. This is the same tool we used to explore the GSS02 file earlier. We’ll choose “Univariate” again to look at one variable at a time–you should see the variable list on the screen.
Highlight/read Pie chart Bar graph Search
Use
13)Independence and 218)Hell
Go back to the Menu.
XCSTNDRD
To look at the third most-used file, we follow the same steps. Go back to the Menu, click “File Management,” “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The third file we’ll explore is under “Ecological.” So click “Ecological,” “Cross-Cultural (Pre-Industrial)” and “XCSTNDRD.” This is the Standard Cross-Cultural Sample, a set of cases selected from the Ethnographic Atlas by Murdock and White. This is a sample of 186 pre-industrial societies, many of which no longer exist. The data was collected on each society when it was fully functional. So the societies represented in this file are all hunting/gathering, horticultural, pastoral, or agricultural. There are no industrial or post-industrial societies included. As you can see from the gray screen, there are 203 pieces of information available about each of the 186 societies. Data in this file is about societies.
When you click OK, check the upper left corner of your screen to make sure that the XCSTNDRD file is open. To explore the file, choose “Basic Statistics” as we have done before. But this time, we’re going to use mapping to learn about this file. Not all data in the Microcase files can be mapped, but some can, and mapping can be a very useful tool. So click “Mapping” and you should see the variable list on your screen.
Highlight/read Map Legend List Rank List Alpha Spot
Use
8)subsmode and 81)hit kids
Go back to the Menu.
States04
To look at the fourth file that we use the most, we follow the same steps. Go back to the Menu, click “File Management,” “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The fourth file we’ll explore is also under “Ecological.” So click “Ecological,” “States, Cities, and Counties,” and “States04.”
The States04 file is based on the fifty states of the U.S. and includes dozens of the latest variables from the U.S. Census and other sources, including the Federal Election Commission and the Statistical Abstract of the United States. This file also contains many of the most recently available figures released in the Uniform Crime Report, including data on corrections, crime rates, and state and local law enforcement agencies. As you can see on the gray screen, there are 1,530 pieces of information about each of the 50 states. Data in this file is about states.
When you click OK, check the upper left corner of your screen to make sure that the States04 file is open. To explore the file, choose “Basic Statistics” as we have done before. We’re going to use mapping again, since it is such a visual way to understand the data in this file. So click “Mapping” and you should see the variable list on your screen.
Highlight/read Map Legend List Rank List Alpha Spot
Use
6)WarmWinter and 1058)Kid Abse (2000) as examples
Go back to the Menu.
Global04
To look at the fifth file that we use the most, we follow the same steps. Go back to the Menu, click “File Management,” “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The fifth file we’ll explore is also under “Ecological.” So click “Ecological,” “International” and “Global04.” This global data set, updated annually, consists of 172 nations having populations of at least 200,000--from Afghanistan to Zimbabwe. While there are missing data for certain nations on some variables, it is remarkable how much data are available for each nation. In addition to standard census and economic development data, many other variables measure politics, health, the status of women, natural resources, military capacity, religion, and broadcast media. All variables can be mapped as well as analyzed. We have access to 273 variables about each nation. So the data in this file is about countries.
When you click OK, check the upper left corner of your screen to make sure that the Global03 file is open. To explore the file, choose “Basic Statistics” as we have done before. We’re going to use mapping again, so click “Mapping” and you should see the variable list on your screen.
Highlight/read Map Legend List Rank List Alpha Spot
Use
13)URBAN GRWT and 36)THREEWORLD and 93)ROADS/AREA
Go back to the Menu.
TREND
To look at the last file that we regularly use, follow the same steps. Go back to the Menu, click “File Management,” “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. This last file is under Trend. So click “Trend,” “US,” and “USTrend.” The U.S. Trends file contains data from a variety of sources such as the U.S. Government, the General Social Survey, and the American National Election Study. Some variables contain data from as early as 1789. Using the Historical Trends task in MicroCase, we’ll be able to graph data over time to look at trends. There is also an attached event file, and we can mark key historical events on the graph to place the data in historical context. Last updated in 2002, this data file currently contains 258 variables as you can see in the gray box on your screen. The data in this file is historical data tracing each variable over a period of time.
When you click OK, check the upper left corner of your screen to make sure that the USTREND file is open. To explore the file, choose “Basic Statistics” as we have done before. But in order to view the information in this file, we have to use the “Historical Trends” function instead of Univariate or Mapping. So click “Historical Trends” and you should see the variable list on your screen.
Graph 1, then 2, then 3 Illustrate time frame differences Use event file
Use 11)%urban and 257)grasslegal as examples
Those are the six
files we use most often. But all of the 500-odd files are
available to you.
MICROCASE ANALYSIS PROGRAM
Now we’re going to
switch to exploring the program that analyzes this data in
more detail. But before we use functions like cross-tabulation or
scatterplot to
test some hypotheses, we have to review how to write a hypothesis. I’m
going to
walk you through writing a hypothesis in all of the files we’ve just
explored except
the USTrend file.
HYPOTHESES IN GENERAL
As you know from reading your text, a hypothesis is a statement that predicts a relationship between two variables. It’s not a question, but a statement. So I might predict that those of you who use this tutorial will score better on Microcase assignments than those who don’t use the tutorial. If I were to collect data on all of you that included your tutorial use time and your assignment scores, I could test that hypothesis to see if it’s accurate. Here’s how I would state a hypothesis like that.
Students who spend more time practicing Microcase will score higher on Microcase assignments than students who spend less time practicing.
The variables are time spent practicing Microcase and scores on Microcase assignments. I’m using time spent practicing as the independent variable and scores on Microcase assignments as the dependent variable. The dependent variable always depends on the independent variable. So, in this example, scores depend on time spent practicing.
Now let’s write
several hypotheses that we can test with the Microcase Analysis
Program.
GSS HYPOTHESIS
Open the GSS02 file again (Menu: File Management: Open File: Archive: Survey: US: GSS: gss02) and this time, when you choose a function under “Basic Statistics,” choose cross-tabulation. Cross-tabulation is a tool that will allow us to look at two variables together and see whether they’re related to each other. When you click “Cross-tabulation,” you’ll see the variable list on the screen. We’re going to choose two variables that we could write a hypothesis about. I’ve picked variables 34)AGE KD BORN and 35)EDUCATION. Highlight 34 and read the gray box at the bottom of the screen. It contains the questions that were used to measure this variable and the possible responses that people could choose to answer that question.
Now let’s highlight variable 35.
Here’s a possible hypothesis using these two variables. When data is about individuals (as it is in GSS02), we begin the hypothesis with “Respondents who..”
Respondents who were older when their first child was born will have completed higher levels of education than respondents who were younger when their first child was born.
We could state this in the opposite way as well:
Respondents who were younger when their first child was born will have completed lower levels of education than respondents who were older when their first child was born.
Both ways of stating the hypothesis predict the same relationship. As age at first birth goes up, education completed goes up; we are predicting that the variables will change in the same direction, whether they both go up or both go down.
Now we’re ready to see if our prediction is correct. To analyze this data using cross-tabulation, we put the independent variable in the column and the dependent variable in the row. (Independent ALWAYS goes in the column.) When you click OK, a table appears on your screen. In order to read a table, we always use column% (click this in the left margin). And we read the table by column (vertical). (Read the table)
From the numbers we’re seeing, it looks like our hypothesis is supported. But the most important step is yet to come. Interpreting the statistics that Microcase calculates for us. Microcase calculates a number that tells us how strongly related the variables are. It will either calculate Cramer’s V for a cross-tabulation or Pearson’s r for a scatterplot or comparison maps. We have a cross-tabulation, so we’ll be looking at the values for v.
V CHART
Cramer’s v ranges from 0.0 to 1.0, with higher numbers meaning a stronger relationship. What does it mean to say that two variables have a strong relationship? Let’s take the two variables we’re using right now as an example. If there is a strong relationship between age at birth of first child and education level completed, that means that education level completed is very sensitive to age at birth of first child. In other words, a big change in the independent variable leads to a big change in the dependent variable. If there is a weak relationship between these two variables, then a big change in the independent variable only leads to a moderate or small change in the dependent variable.
Here is a guide to interpreting Cramer’s v.
v = .40 or higher, very strong relationship
v = .30-.39, strong relationship
v = .20-.29, moderate relationship
v = .01-.19, weak relationship
Once you’ve figured out what the v means, you need to determine whether your results indicate a real relationship between the variables or not. The computer calculates “prob” to help you do this. Prob is the probability that your results are just a “fluke,” that they are not valid. You want prob to be low, meaning that the chances are low that your results are bogus. Prob is expressed as a fraction which most folks read as a percent. Prob = .05 means that the odds of your results being bogus are five in 100. Prob is often expressed as “p is less than .05,” meaning the odds are less than 5 in 100 that your results are bogus, not real. IF prob is more than .05, your results, however great the Pearson’s r or the Cramer’s v may seem, are not usable...there is no relationship between the variables. This is probably one of the most difficult concepts to grasp in social research. The test of statistical significance is THE most important test. And if it doesn’t meet the standard, it means that your results show no relationship between the variables.
When calculating a cross-tabulation, Microcase does not show the statistics on the same screen as the table is displayed. You need to click on “Summary” in order to find the v and the prob values. Make sure that prob is less than .05. Otherwise the data is not usable. NOT USABLE. When prob is more than .05, the apparent relationship is a fluke. It is statistically impossible. If you predicted in your hypothesis that there would be no relationship between the variables, this might be a good thing. But that’s not usually what we predict. We want statistically significant results. That’s our goal.
Now, if we put all of this information together to interpret our cross-tabulation, it looks like we have a weak but statistically significant relationship between age when first child was born and education completed. Here’s how we would describe that relationship: 68.1% of respondents who were age 30 and up when their first child was born have completed college compared to 30.7% of respondents who were less than 20 when their first child was born. Our hypothesis is supported. Always use the percents in the table when describing your results.
Okay, a few reminders in summary: in a cross-tabulation, the data is printed in table form. The column (vertical) variable is the independent variable and the row (horizontal) variable is the dependent variable. When we generate a table, we will ALWAYS ask for column percents. To read a table, you start with the left column of results, read that first from top to bottom, then move to the next column to its right until you’ve read all of the columns in the table. Even though we can often tell from the table whether our hypothesis looks like it’s correct, we have to know if the relationship is statistically significant before we can draw any conclusions. So we click on “Summary.” You only need to check two items on the summary screen: the v and the prob. Don’t confuse yourself by trying to interpret the rest of the numbers on this screen. Always look for v and prob in the location I’ve showed you, at the top of the screen.
WVS95-97 HYPOTHESIS
This time open the WVS95-97 file and, just like we did for GSS02, click “Basic Statistics,” and choose cross-tabulation. Remember, cross-tabulation is a tool that will allow us to look at two variables together and see whether they’re related to each other. When you click “Cross-tabulation,” you’ll see the variable list for the World Values Survey on the screen. We’re going to choose two variables that we could write a hypothesis about. I’ve picked variables 217)BLV:DEVIL and 25)TRUST PEOP. Highlight 217 and read the gray box at the bottom of the screen. It contains the questions that were used to measure this variable and the possible responses that people could choose to answer that question.
Now let’s highlight variable 25.
Here’s a possible hypothesis using these two variables. When data is about individuals (as it is in WVS95-97), we begin the hypothesis with “Respondents who..”
Respondents who say yes, the Devil exists will be less likely to say that you can trust people, compared to respondents who say no the Devil does not exist.
We could state this in the opposite way as well:
Respondents who say no, the Devil does not exist will be more likely to say that you can trust people, compared to respondents who say yes the Devil does exist.
Both ways of stating the hypothesis predict the same relationship. As belief in the Devil varies from yes to no (1 to 2), trust in people varies from be careful to can trust (2 to 1). We’re predicting that the variables will change in the opposite direction, one going up and one going down. It’s pretty clear here that going “up” or “down” depends entirely on the way the respondent’s answers were coded. And when we’re using cross tabulation, the up/down part doesn’t matter much. But it matters in a scatterplot, so we’ll make a note of it.
Now we’re ready to see if our prediction is correct. To analyze this data using cross-tabulation, we put the independent variable in the column and the dependent variable in the row. (Independent ALWAYS goes in the column.) When you click OK, a table appears on your screen. In order to read a table, we always use column% (click this in the left margin). Remember to read the table by column. (Read table)
From the numbers we’re reading, it looks like our hypothesis is supported. But remember, we need to check the v and the prob statistics that Microcase calculates for us. Microcase calculates a Cramer’s V for cross-tabulation.
Remember this chart from our last hypothesis?
Cramer’s v ranges from 0.0 to 1.0, with higher numbers meaning a stronger relationship.
v = .40 or higher, very strong relationship
v = .30-.39, strong relationship
v = .20-.29, moderate relationship
v = .01-.19, weak relationship
Once you’ve figured out that this v of .068 indicates a weak relationship, we need to determine whether our results indicate a real relationship between the variables or not. The computer has calculated “prob” to help us. Remember, Prob is the probability that our results are just a “fluke,” that they are not valid. We want prob to be low, meaning that the chances are low that our results are bogus. Prob is expressed as a fraction which most folks read as a percent. Prob = .05 means that the odds of our results being bogus are five in 100. Prob is often expressed as “p is less than .05,” meaning the odds are less than 5 in 100 that our results are bogus, not real. IF prob is more than .05, our results, however great the Cramer’s v may seem, are not usable.
When calculating a cross-tabulation, Microcase does not show the statistics on the same screen as the table is displayed. You need to click on “Summary” in order to find the v and the prob values. Make sure that prob is less than .05. Otherwise the data is not usable....not statistically significant. When prob is more than .05, the apparent relationship is a fluke. It is statistically impossible. There is no relationship between the variables.
Now, if we put all of this information together to interpret our cross-tabulation, it looks like we have a weak but statistically significant relationship between belief in the Devil and beliefs that people can generally be trusted. And that relationship is: 21.9% of respondents who say yes, the Devil exists believe that you can generally trust people, while 27.8% of respondents who say no, the Devil does not exist believe that you can trust people. So those who believe in the Devil are less likely (21.9% vs. 27.8%) the “no-Devil” folks to say that people can be trusted. Our hypothesis is supported.
One more time on these reminders: in a cross-tabulation, the data is printed in table form. The column (vertical) variable is the independent variable and the row (horizontal) variable is the dependent variable. When we generate a table, we will ALWAYS ask for column percent. To read a table, you start with the left column of results, read that first from top to bottom, then move to the next column to its right until you’ve read all of the columns in the table. Even though we can often tell from the table whether our hypothesis looks like it’s correct, we have to know if the relationship is statistically significant before we can draw any conclusions. So we click on “Summary.” You only need to check two items on the summary screen: the v and the prob. Don’t confuse yourself by trying to interpret the rest of the numbers on this screen. Always look for v and prob in the location I’ve showed you, at the top of the screen.
XCSTNDRD HYPOTHESIS
Now we’re going to use the Standard Cross-Cultural file, write a hypothesis, and test is using both cross-tabulation AND mapping. So go back to Menu, click “File Management” and “Open File,” and use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The Cross-Cultural files are under “Ecological,” so click Ecological, then Cross-Cultural, then SCSTNDRD, just like we did when we were exploring this file earlier. When you have the file open, click “Basic Statistics,” and choose cross-tabulation. When you click “Cross-tabulation,” you’ll see the variable list for the Standard Cross Cultural file on the screen. We’re going to choose two variables that we could write a hypothesis about. I’ve picked variables 202)WARLIKE and 195)LOC.VIOLEN. Highlight 202 and read the gray box at the bottom of the screen. It contains the questions that were used to measure this variable and the possible responses that people could choose to answer that question.
Now let’s highlight variable 195
Here’s a possible hypothesis using these two variables. In this file, data is about societies (remember these are all pre-industrial societies). So we begin the hypothesis with “Societies that...”
Societies that do place a great value on violence against members of other societies will be more likely to tolerate violence against others in the local community, compared to societies that do not place value on violence against members of other societies.
We could state this in the opposite way as well:
Societies that do not place a great value on violence against members of other societies will be less likely to tolerate violence against others in the local community, compared to societies that do place value on violence against members of other societies.
Both ways of stating the hypothesis predict the same relationship. As placing value on violence against outsiders goes up from no to yes (from 0 to 1), tolerance for violence against others in the local community also goes up from no to yes (from 0 to 1). Both variables, we predict, will change in the same direction. When we’re using cross tabulation, the up/down part doesn’t matter much. But it matters in a scatterplot, and we’re going to use that next, so we’ll make a note of it.
Now we’re ready to see if our prediction is correct. To analyze this data using cross-tabulation, we put the independent variable in the column and the dependent variable in the row. (Independent ALWAYS goes in the column.) When you click OK, a table appears on your screen. In order to read a table, we always use column% (click this in the left margin). Read the table by column. (Read table)
From the numbers we’re reading, it looks like our hypothesis is supported. But remember, we need to check the v and the prob statistics that Microcase calculates for us. Microcase calculates a Cramer’s V for cross-tabulation. Click “Summary” under “Statistics” to get to this screen.
Here’s the chart again.
Cramer’s v ranges from 0.0 to 1.0, with higher numbers meaning a stronger relationship.
v = .40 or higher, very strong relationship
v = .30-.39, strong relationship
v = .20-.29, moderate relationship
v = .01-.19, weak relationship
Once we’ve figured out that this v of .332 indicates a strong relationship, we need to determine whether our results indicate a real relationship between the variables or not. The computer has calculated “prob” to help us. Remember, Prob is the probability that our results are just a “fluke,” that they are not valid. We want prob to be low, meaning that the chances are low that our results are bogus. Prob = .05 means that the odds of our results being bogus are five in 100 (that’s a 95% confidence rate). So we’re hoping for a prob that is below .05.
And our prob is .009, that’s less than .05, so our results are statistically significant–there is a REAL and strong relationship between valuing violence against outside societies and tolerating violence in local communities for these 186 pre-industrial societies in our sample. To explain our results, we would say that we have a strong and statistically significant relationship between the variables Warlike and Local Violence. That relationship is: 43.2% of societies that do place a great value on violence against other societies also tolerate violence in the local community, compared to 12.0% of societies that do not place value on violence against other societies. Our hypothesis is supported.
In the Standard Cross-Cultural file, we have the ability to use mapping as well as cross-tabulation, so we’re going to do that. We can compare maps of two variables to see whether they’re related to each other. It’s a different way to test our hypothesis, but our hypothesis is the same. Go back to Menu, then Basic Statistics, then click “Mapping.” The same variable list will appear on your screen. We’re still using variables 202 Warlike and 195 Local violence. Our hypothesis is still:
Societies that do place a great value on violence against members of other societies will be more likely to tolerate violence against others in the local community, compared to societies that do not place value on violence against members of other societies.
The order in which we place the variables to compare maps doesn’t matter, so we just enter 202 in the first box and 195 in the second box and click OK. A double map should appear on the screen. Maps are cool because you can check specifics. For instance, if you click on any dot on one of the maps, the value for variable 202 and the value for variable 195 for that society will appear on your screen. If we did that enough, we’d see that the yes-yes pairs outnumber the yes-no pairs, and that’s what we predicted. But Microcase calculates Pearson’s r for us and puts it right on the screen. R measures the strength of a relationship, much like v does.
But Pearson’s r ranges from -1.0 to +1.0, with higher numbers at either extreme meaning a stronger relationship. A negative relationship means that as one variable goes up, the other goes down or vice versa. A positive relationship is when the variables change in the same direction. An r value of +1.0 would tell you that, for every unit change in the independent variable, there would be an equal unit change in the dependent variable. That rarely ever happens, of course, so r is usually less than + or -1.0. Here is the guide for interpreting r scores:
r = .70 or higher, very strong relationship
r = .40 - .69, strong relationship
r = .30 - .39, moderate relationship
r = .20 - .29, weak relationship
r = .01 - .19, no or negligible relationship
Our r value for this comparison map is .332, and that indicates a moderate relationship, using this statistic. We’re doing well so far. But we still need to know whether r is statistically significant and in which direction the variables move with relation to each other. When r is positive, as it is here, they move in the same direction. And the sign on r is the only way we would know that if we hadn’t already printed a table for this hypothesis. So our positive r means that societies that score yes (a score of 1 vs. 0) on the warlike variable will be likely to also score yes (a score of 1 vs. 0) on the local violence variable. That’s what we predicted, so it looks like we’re in good shape. Just one last, very important question. Is it a statistically significant relationship?
Remember, once we’ve figured out what the r means, we need to determine whether our results indicate a real relationship between the variables or not. The computer calculates “prob.” For a scatterplot or comparison map, Microcase puts one asterisk after the r when prob is .05 or less and two asterisks after the r when prob is .01 or less. (That’s even better! – The probability of your results being a fluke, unusable, is only one percent when there are two asterisks.) So, when there’s at least one asterisk by your r, then you have “statistically significant results.” That’s our goal. If there were no asterisks, then we would have no statistically significant relationship between the variables. But there are two asterisks, so we have a moderate, statistically significant relationship between Warlike and Local Violence, meaning that societies that value violence against other societies will also tolerate violence in their local communities, compared to societies that do not value violence against other societies.
STATES04 HYPOTHESIS
This time open the States04 file. Go back to the Menu, choose Manage Files and Open File, use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The States04 file is in Ecological, so click Ecological, States, Cities and Counties, and States04. We’re going to compare maps again to test a hypothesis, so, when your file is open, click Basic Statistics and Mapping. You’ll see the variable list for the States04 file on the screen. We’re going to choose two variables that we could write a hypothesis about. I’ve picked variables 952)SCH_SEC3 and 953)SCH_SEC4. Highlight 952 and read the gray box at the bottom of the screen.
This looks a bit different than it did when we were using data files based on questionnaires. The States04 file contains data about the 50 states and the box tells you what the data is, where it came from, the year, and the range of scores for the 50 states. So for variable 952, percent of schools with video surveillance, we can see what the range is for the 50 states. (If we ask for a list rank on the map, we can see which states fall where.) We’re using a 2000 variable here because our other variable is from 2001, and 2000 is the closest we can get to matching the time frame.
Now let’s highlight variable 953. We can see the range of range of percents for the 50 states.
Here’s a possible hypothesis using these two variables. When data is about states (as it is here), we begin the hypothesis with “States in which....”
States in which there is a higher percent of schools with video surveillance will have higher percents of students who felt too unsafe to go to school, compared to states where there is a lower percent of schools with video surveillance.
We could state this in the opposite way as well:
States in which higher percent of students who felt too unsafe to go to school will have higher percents of schools with video surveillance, compared to states where fewer students felt too unsafe to go to school.
Both ways of stating the hypothesis predict the same relationship. As surveillance goes up, feeling unsafe goes up. We’re predicting that the variables will change in the same direction. Telling what’s up and what’s down here is pretty easy. The answers aren’t coded–we’re working with raw data. We are predicting that our variables change in the same direction, so our r should be positive, right? If r is negative, then it means as one of our variables goes up, the other goes down. So we want a positive r.
Now we’re ready to see if our prediction is correct. To analyze this data using Mapping, we put one variable in the first box and one in the second, the order doesn’t matter. So let’s put 952 in the first box and 953 in the second box and click OK. Two maps will appear on the screen. And they tell us everything we need to know.
From the positive r, it looks like our hypothesis could be supported. But remember, we need to check the chart to determine the strength of r.
Here’s the chart for r again.
Pearson’s r ranges from -1.0 to +1.0, with higher numbers at either extreme meaning a stronger relationship. A negative relationship means that as one variable goes up, the other goes down or vice versa. A positive relationship is when the variables change in the same direction. An r value of +1.0 would tell you that, for every unit change in the independent variable, there would be an equal unit change in the dependent variable. That rarely ever happens, of course, so r is usually less than + or -1.0. Here is a guide for interpreting r scores:
r = .70 or higher, very strong relationship
r = .40 - .69, strong relationship
r = .30 - .39, moderate relationship
r = .20 - .29, weak relationship
r = .01 - .19, no or negligible relationship
So our r of .205 indicates a weak relationship, but we still need to determine whether our results indicate a real relationship between the variables. For a scatterplot or comparison map, Microcase puts one asterisk after the r when prob is .05 or less and two asterisks after the r when prob is .01 or less. (That’s even better! – The probability of your results being a fluke, unusable, is only one percent when there are two asterisks.) So, when there’s at least one asterisk by your r, then you have “statistically significant results.” That’s our goal. If there were no asterisks, then we would have no statistically significant relationship between the variables. Guess what? There is no statistically significant relationship here. What might that mean?
GLOBAL04 HYPOTHESIS
Go back to the Menu, choose Manage Files and Open File, use the backup icon to get back to the screen that says Ecological/Other/Survey/Trend. The Global04 file is in Ecological, so click Ecological, International, and Global04. This time, we’re going to use scatterplot to test a hypothesis, so, when your file is open, click Basic Statistics and Scatterplot. You’ll see the variable list for the Global04 file on the screen. We’re going to choose two variables that we could write a hypothesis about. I’ve picked variables 15)POP GROWTH and 77)ELECTRIC. Highlight variable 15 and read the gray box at the bottom of the screen.
This looks similar to the States04 file. Global04 contains data about 172 countries and the box tells you what the data is, where it came from, the year, and the range of scores for the 172 countries. So for variable 15, population growth, we can see that in 2001, the current annual population growth rate ranged from -1.16% in the lowest country (they’re losing population) to 3.97% in the country with the highest growth rate. (If we ask for a list rank on the map, we can see which countries fall where.)
Now let’s highlight variable 77. We can see that the per capita annual electricity consumption in 2001 ranged from 22 kilowatt hours in the lowest consuming country to 24,607 kilowatt hours per capita in the highest consuming countries. That’s an even wider range than the weekly salary range we worked with in the States file.
Here’s a possible hypothesis using these two variables. When data is about countries (as it is here), we begin the hypothesis with “Countries where....”
Countries with higher annual population growth rates will be countries in which per capita electricity consumption will be higher, compared to countries with lower annual population growth rates.
We could state this in the opposite way as well:
Countries with lower annual population growth rates will be countries in which per capita electricity consumption will be lower, compared to countries with higher annual population growth rates.
Both ways of stating the hypothesis predict the same relationship. As population growth goes up, electricity consumption goes up. We’re predicting that the variables will change in the same direction. Telling what’s up and what’s down here is pretty easy. The answers aren’t coded–we’re working with raw data. We are predicting that our variables change in the same direction, so our r should be positive, right? If r is negative, then it means as one of our variables goes up, the other goes down. So we want a positive r.
Now we’re ready to see if our prediction is correct. To analyze this data using Scatterplot, we put one variable in the first box and one in the second, the order doesn’t matter. So let’s put 15 in the first box and 77 in the second box and click OK. What appears on the screen is a scatterplot. And it can be pretty confusing at first. But there’s only one number that you need to look at carefully on this screen. And that’s the r. Can you see it at the bottom of the screen?
It looks like a negative r and one asterisk. So now what do we do? We wanted a positive r. First, lets interpret r and make sure the prob is okay. Using the old familiar chart, we check the strength of r.
Here’s the chart for r again.
Pearson’s r ranges from -1.0 to +1.0, with higher numbers at either extreme meaning a stronger relationship. A negative relationship means that as one variable goes up, the other goes down or vice versa. A positive relationship is when the variables change in the same direction. An r value of +1.0 would tell you that, for every unit change in the independent variable, there would be an equal unit change in the dependent variable. That rarely ever happens, of course, so r is usually less than + or -1.0. Here is a guide for interpreting r scores:
r = .70 or higher, very strong relationship
r = .40 - .69, strong relationship
r = .30 - .39, moderate relationship
r = .20 - .29, weak relationship
r = .01 - .19, no or negligible relationship
Our r of -0.187* indicates a negligible (very weak) relationship, and the asterisk indicates that it is a statistically significant relationship. (Remember, Microcase puts one asterisk after the r when prob is .05 or less and two asterisks after the r when prob is .01 or less.) We have one asterisk. If there were no asterisks, then we would have no statistically significant relationship between the variables. But there is an asterisk, so we have a very weak, but NEGATIVE, statistically significant relationship between population growth rates and electricity consumption for these 172 countries. Our hypothesis predicted a positive relationship. So we have to reject our hypothesis. Our prediction was not correct–not supported by the data. If we click on reg.line at the left of the screen, Microcase will draw us a line that describes the relationship.
What do our results mean then? What the scatterplot tells us is that as population growth rates go up across the 172 countries, electricity consumption goes slightly down. The regression line goes from high to low. How can that be? Don’t more bodies consume more electricity? You’d think so. But the population growth on our planet is mostly in the poorest countries which are also agricultural. Poor kids in agricultural communities do not use electricity.
If we were doing actual research using this variable, we might search for another independent variable to us. For example, if we used variable 50) GDP per capita, a measure of income/wealth, we could try another hypothesis. As GDP/capita goes up, electricity consumption per capita goes up. And we’d be right. The r for that relationship is a positive, strong, statistically significant 0.809**. It looks like wealth, but not population growth, drives up electricity consumption around the globe. Look at the regression line for this scatterplot.
It’s really easy to
get confused looking at the dots in scatterplot. So don’t do
it–we don’t need to analyze the dots for our purposes. If you’re used
to
scatterplots because you’ve used them in another class, go ahead and
click on
the dots–Microcase will show you which country is which, and their
values on
each variable. But remember, you only need to look at the r.
As you can see, there
are hundreds of data files that we could use and many
more tools that we could employ to analyze the data (Correlation
Matrix, ANOVA,
T test, and Auto-Analyzer and Regression). With what you have already
learned
in this tutorial, you could pick these up with little problem. If your
instructor
wishes to use one of these additional files or tools for a Microcase
exercise, they
will provide you with printed instructions to follow.
This tutorial copyright Marlise R. Riffel Lake Superior College 2005