I have a difficulty with my doctoral research. I have to measure the risk factors for antisocial behaviors at the group age 13-17 years old school students using a self report survey. The sites of my research are three cities, and I already know the population of students divided by age and by cities,(urban, periurban and rural areas).
what do you think, how should I do to randomly sample these populations in order for the findings to be valid and significant? What size should the sampling be??
I have attached a file with a table presenting the populations of three cities according to each age.
I would be very glad if someone would offer an advice.

It sounds like interesting and much needed information you are collecting.

The attachment is not there. There is something wrong with the board that it is not currently accepting attachments.

Please say more about the nature of the analysis that you plan of the survey. Like the nature of the variables, What variables will you estimate the magnitudes of, and that you will estimate the margin of error of. What variables will you compare between groups to determine if group differences are significant? Etc.

The sample size required will be greater the smaller differences you want to detect as significant, and the smaller you want the margins of error to be.

What is the population -- all students in a school? What are the practical limits on the number of surveys you can handle.

Thank you very much for your reply.
I tried again to attach the file in word that contains a table with the population for each of the three cities divided in each age separately, so 13, 14, 15, 16, and 17 years old.
If you want me t, I can list the data of the population in my next message.
I have to administer a self reported instrument with socio demographic variables and drug and smoking use and age at first use, school suspension and truancy and other peer antisocial behaviors, family supervision and family emotional attachment, community sensibility towards drugs and juvenile delinquency. Most of the variables are in Likhert scale, so interval variables.
I am interested of comparing the onset of these behaviors, between age-groups, gender differences, type of school and community where they live (urban, suburban, and rural), if is there any significant difference between the three cities, and what are the most often reported factors situated, are they individual, school, family, peers and community factors.
The students are not from the same school. I intend to randomly select students from different schools , from all three areas. Since it is a doctoral research, and not an agency scientific research, it would be more convenient for me to be around 1000 students and some more, but how to divide between cities and age.
If you can give me an advice, I would be very much motivated to start right now.

Thanks,
Juljana
Hello again,
Please find attached the file with a table showing the population for each of the three cities and each of the ages.
I really hope to get some help.

Thanks
Juljana
juljana wrote:The students are not from the same school. I intend to randomly select students from different schools , from all three areas. Since it is a doctoral research, and not an agency scientific research, it would be more convenient for me to be around 1000 students and some more, but how to divide between cities and age.

Your plan's data-matrix has 15 cells. Unless there is a reason to do otherwise, why not take the same sample size in each cell? If you analyze a total of n=1,000 questionnaires, you would have 66 per cell. For n=1,500 you would have 100 per cell.

For n=1,500 total, that would give you 500 per city and 300 per age group.

If you can come up with an estimate of the standard deviation of Likert Scale data at about the same overall mean score that you expect, it would be possible to estimate the size of difference that would be significant between cities and between age groups. This would help to get a further feel for the effect of sample size. It only needs to be a rough estimate of SD for this purpose, because you will be able to use the actual data in the analysis after you collect it. Perhaps you have some prior data on that.

For total n=1,000:
To compare cities, n=500 vs 500.
To compare age groups, n=300 vs 300.
To compare two age groups within a city, n=100 vs 100.

The smaller total (n=1,000) would work too, but would just be less sensitive.

Also, what kind of analysis of results are you planning?

Thanks again for the reply, very kind.
So, if I got it right. you suggest to take a sample of 1500 respondents in total. But, I have a doubt, as you can see from the table in the file I sent the population for each age in each city is different. Shouldn't I represent this in the stratified sample? How to calculate this difference in order to have a more accurate sample for each age and for each city. IT'S A BIT TO MUCH COMPLICATED!!!
Also, replying to you question I should do regression analysis to see what kind of factors (personal, family, school or peers) cause the certain risky behaviors among the target age group, such as drug, alcohol and smoking use, truancy, conflicts with peers, etc.
I should also make paired T-test comparison among respondents from the three cities and among ages, to see if there is any significant difference.
Hoping to get some assistance.

best regards,

Juljana
To be sure I understand how you do it, I am assuming that each student fills out a survey questionnaire.

juljana wrote:Shouldn't I represent this [different population size in each cell] in the stratified sample? How to calculate this difference in order to have a more accurate sample for each age and for each city. IT'S A BIT TO MUCH COMPLICATED!!!

I think what you need is equal precision of each risk factor and each behavior score. There is an "infinite population" rule that says if the population size is 10 times or greater than the sample size then the population size will have no effect on any significance test of margin of error calculation. My natural tendency is to take equal sample size per cell unless there is some reason not to. For example, why would you want to take more data for one age group than another? Just because they are there?

juljana wrote:Also, replying to you question I should do regression analysis to see what kind of factors (personal, family, school or peers) cause the certain risky behaviors among the target age group, such as drug, alcohol and smoking use, truancy, conflicts with peers, etc.
Good, I have also found that people understand survey results in terms percentages (of students) better than they do average scores. If I were doing it my results would include -- for each demographic compared -- a table of mean scores, a corresponding table of percentages of students, (say the percentage in category 1 & 2 versus the percentage in categories 4 & 5) and an indication of whether the difference is significant. With these percentages of students shone, people can see the size of the effect as well as whether it is statistically significant.

juljana wrote:I should also make paired T-test comparison among respondents from the three cities and among ages, to see if there is any significant difference.

I think you need to perform independent (not paired) T-tests of means. In addition, I would consider a 3city X 5age groups analysis of variance for each variable, including tables of means.

My experience is in factory and laboratory so my thinking and understandings might be different from what is customary in your field. It would be interesting if you run some of these ideas by the people over at Talk Stats.

