Finalexam stats scottbesteman 1
Final project
Name :Scott Besteman
Total
/ 300 points
Please answer all questions on this sheet:
1. Read the survey attached
2. Use the data file
3. Do a basic data examination
4. What were your results from the examination? Please List the name of
variables:
10 points
a) Outliers
Variable Tex
Name — Mex
->
Turkey
Burger
Issue
Outliers None
Supreme Steak Chicken Mashed Black Steamed Baked Chocolate
Sirloin
Frites Chopped Sweet
Beans Veggies French Chip
Burger
Wrap Salad
Potatoes and
Fries
Cookie
Mango
None
None None
None
None
None
None
5. Choose 2 variables and find the confidence interval for them at the 95% level.
Explain what it means?
Tex Mex Turkey Burger = μ = 5.425781 ± 0.21144783
Supreme Sirloin Burger= μ = 5.425781 ± 0.21144783
None
We are trying predict if taste is important predictor of buying a meal . This is the
variable on page 8 of the survey Q12 .
We will refer to these as taste hereafter
To what extent did you choose your meal based on what you thought would be
the tastiest?
1 Did not choose based on taste at all (1)
2 (2)
3 (3)
4 (4)
5 (5)
6 (6)
7 Definitely chose based on taste (7)
6. Which of the main dishes (entree’s)were deemed the tastiest ? How do you
know this? Show evidence . Which measure of central tendency did you use ?
why? Supreme Sirloin Burger was the tastiest because the mean was
5.42578125 and was the highest . I used mean because it is interval and the
skewness is -1.14286634483443 so our data is not skewed
Supreme Sirloin
Burger
5.42578125
Tex Mex Turkey
Burger
4.61328125
Steak Frites Wrap
4.6953125
Chicken Chopped
Salad
5.2890625
Mean
Skew
1.14286634483443 0.42593680768284 0.474273142205226 0.866860557702164
33 points
B . Which of the desserts were deemed the tastiest ? How do you know this?
Show evidence. Which measure of central tendency did you use ? why?
Chocolate Chip Cookie was the tastiest because the mean was 5.57421875
and was the highest. I used the mean because its interval and the skewness is 1.00035811663439 so our data is not skewed.
Chocolate Chip
Cookie
5.57421875
1.00035811663439
Red Velvet Cupcake
Yogurt Granola
Parfait
5.54296875
5.15234375
0.996339988664802 0.677334023749751
Fruit Salad
5.5625
Mean
Skew
1.03460686437834
c.
Which of the sides were deemed the tastiest ? How do you know this?
Show evidence. Which measure of central tendency did you use ? why?
Baked French Fries was the tastiest because the mean was 5.52734375 and was
the highest. I used the mean because it is interval and the skewness is 1.0357038683303 so our data is not skewed
Mashed Sweet
Potatoes
4.72265625
Black Beans and
Mango
3.9765625
Steamed Veggies Baked French
Fries
4.51953125
5.52734375
0.0349604331168 0.497085068303 282
0.293539365276 1.0357038683
417
116
303
7.
Test a hypothesis that people will choose food based on taste?
points
State your null and alternate hypothesis: H0: μ=150
Ha: μ ≠150
Mea
n
Ske
w
10
Choose variables : pounds and How tall are you in feet and age
Decide if NOIR for each: height and weight and age are all ratios
Choose test : Anova because all variables are ratios.
Run test
Results :
6. Describe the data in 1-2 paragraphs. Use mean median mode skewness and
standard deviation to do so.
If F is greater than F crit, we have to reject the null hypothesis. Our data
shows F=4625.525> F crit= 3.0074 therefore we reject the null hypothesis. This
means of the three variables are not equal. At least one of the means is different.
However, the anova does not tell you where the difference lies. You would need
to use a t-test to test each pair for lies.
80 points
6b. Run a test of differences between Males and Females on :
27 points
To what extent did you choose your meal based on what you thought would be
the tastiest? (choose tab male female)
State the hypothesis (both null and alternate) H0: μ=0
Ha: μ ≠0
Explain why?
Which test will you run? Why? I ran a T-test because we have two groups no more
and no less. We perform this test to determine weather two populations means
are different
Run the test.
Write the conclusion.
In conclusion we would end up rejecting the null hypothesis. We can tell we reject
it because our P value is more then our significance level of .05. my sample
provides strong enough evidence to conclude that the two means are different.
7. Run correlations :
40 points
a. From the variables below choose 2 and create a hypothesis and give
justification for it.
To what extent did you choose your meal based on what you
thought would be the tastiest?
Overall, how much did you like the items on the menu?
If the meal was tasty the overall chances, you’d like the meal or items on the
menu will be greater. This makes sense if you thought a meal was tasty then I’d
probably rate how much I like the meal much higher than if I thought the meal
was gross or untasteful.
b. Using the 5 variables below and run correlations between all of them. Which
pair had the highest correlation? Lowest ? Write a few sentences explaining what
that means? Were there any surprises? Why or why not?
(i) To what extent did you choose your meal based on what you
thought would be the tastiest?
(ii) To what extent did you choose your meal based on what you
thought would be the healthiest?
(iii)
Overall, how much did you like the items on the menu?
(iv)
On a scale from 1 to 7, how tasty do you think each
entree is?-Supreme Sirloin Burger
(v) On a scale from 1 to 7, how tasty do you think each dessert is?Red Velvet Cupcake
The Pair that had the highest correlation was Overall, how much did you like the
items on the menu and On a scale from 1 to 7, how tasty do you think each entree
is?-Supreme Sirloin Burger.
The pair that had the lowest correlation was To what extent did you choose your
meal based on what you thought would be the healthiest and Overall, how much
did you like the items on the menu.
This means How strong or week pairs of variables are related to each other. We
can determine the strength depending on how close we are to 1 or -1. We also
get a direction depending on the correlation if negative we are going to have a
negative slope if positive, we will have a positive slope.
I am a bit shocked that Health and overall, how much you liked the menu items
had such a week correlation. You’d think that we pay attention to health and that
would be an important factor in determining the overall likeness of a meal.
8. What is the correlation between height(in feet) weight, age and taste?
points
20
Interpret it.
Feet and pounds medium correlation and increases upward slope
Feet and Year small correlation increases and increases upward slope
Feet and tasty small correlation and increases upward slope
Pounds and years medium correlation and increases upward slope
Pounds and tasty small correlation and increases upward slope
Years and tasty small correlation and decreases downward slope
9. Run a regression for the following model.
40 points
We are going to predict Likely to eat (dependent variable) Q14 from page 8 of
the survey
How likely are you to eat at the new fast food restaurant if the menu is like the
one shown above?
1 – Very Unlikely (1)
2 (2)
3 (3)
4 (4)
5 (5)
6 (6)
7 – Very Likely (7)
using the following as independent variables. These are :
How old are you (in years) ?-years
How much do you weigh (in pounds)?-pounds
How tall are you (in feet and inches)?-feet
Overall, how much did you like the items on the menu?
To what extent did you choose your meal based on what you thought would be
the healthiest?
To what extent did you choose your meal based on what you thought would be
the tastiest?
On a scale from 1 to 7, how tasty do you think each entree is?-Supreme Sirloin Burger
On a scale from 1 to 7, how tasty do you think each entree is?-Tex Mex Turkey
Burger
On a scale from 1 to 7, how tasty do you think each entree is?-Steak Frites Wrap
On a scale from 1 to 7, how tasty do you think each entree is?-Chicken Chopped
Salad
On a scale from 1 to 7, how tasty do you think each side is?-Mashed Sweet
Potatoes
a.Is the model significant? How do you know ? List the significance number?
The model is significant because the significance is 2.34044875326155E-40 witch
is lower than .05 which makes it significant.
b.What is the explained variance? Interpret it for this study.
Explained Variance is used to measure the discrepancy between a model and
actual data. We can interpret it for the study by looking at our r^2 value. Our R^2
value is 0.582656256928264 or 58.2656256928264% and this means what portion
of variance in y can be explained by x. 58.2656256928264% of variation in How
likely are you to eat is explained by our independent variables.
c.Which variables are significant? What are their coefficients? Which one is the
largest(absolute value)? Interpret it.
Variables that are significant, To what extent did you choose your meal based on
what you thought would be the tastiest? Overall, how much did you like the items
on the menu? How much do you weigh (in pounds)?-pounds
To what extent
Overall,
pounds
0.00853429867693827 1.26782004159413E- 0.0384331862066878 Significant
28
0.172792067996
0.83726821198454
0.00610258523416809 coefficients
For every unit increase in Overall, likely to eat increases by 0.8372 keeping all
other constant
10. Conclusions for the study 1-2 paragraphs.
40 points
To conclude the study, we learned that For every unit increase in Overall, likely to
eat increases by 0.8372 keeping all other constant. This tells us that for every x
unit we increase our y unit increases by .8372.. We can determine that these
results do have significance by looking at the significance and checking to see if
our value is lower then .05. if its’s lower it has significance for our data if its higher
than .05 we should just ignore the data an continue on.
Attach all output in appendix below.
…