Simple statistics in R

The following are some notes taken while doing a course on statistics. Again, I'm using R markdown to produce this both as a way for me to practice using it and second, because I think it's an awesome tool for document as you code.

It’s really meant for my future reference, since there is quite a high probability that I’ll forget all of these neat functions in a couple of months from now :P
1. The respiratory disturbance index (RDI), a measure of sleep disturbance, for a specific population has a mean of 15 (sleep events per hour) and a standard deviation of 10. They are not normally distributed. Give your best estimate of the probability that a sample mean RDI of 100 people is between 14 and 16 events per hour?
Answer: Recall that the formula for variance is
Variance = (Standard deviation)^2
Standard deviation = sqrt(variance)
Recall another formula to derive variance sample from variance of population.
Variance = (standard deviation)^2/(Sample Size)
# calculate the sample standard deviation
# first let's get the sample variance
## [1] 1
#Thus the sample standard deviation is
## [1] 1
#Since sample standard deviation is 1, we now know that 14 and 16 events are within 1 and -1 standard deviation from the sample mean. We can thus calculate the probability in that area.
## [1] 0.6826895
Thus the answer is around 68%.
2. You flip a fair coin 5 times, about what’s the probability of getting 4 or 5 heads?
Answer: To solve this, we need to know the combinations of 4 heads and 1 tails that we can get from doing the flip 5 times.
A great tutorial for this (and basically solve this problem as a whole actually) is shown at Khan Academy
Anyways, this is an example on to solve the problem using R.
# Get the number of combinations for 4 heads 1 tail
## [1] 5
# There is only 1 combinations for 5 heads and 0 tail.
# The probability of heads or tails is both 50% since it's a fair coin. This simplifies our calculation a lot.

5*0.5^5 + 1 * 0.5^5
## [1] 0.1875
# If it's not a fair coin,..say 70% heads and 30 tails; then the calculation  becomes

5*0.7^4*0.3^1 + 1*0.7^5
## [1] 0.52822
3. Suppose that diastolic blood pressures (DBPs) for men aged 35-44 are normally distributed with a mean of 80 (mm Hg) and a standard deviation of 10. About what is the probability that a random 35-44 year old has a DBP less than 70?
# we know that 70 is 1 standard deviation less than the mean (i.e standard deviation (-1)). thus, use the pnorm function.
## [1] 0.1586553
4. Brain volume for adult women is normally distributed with a mean of about 1,100 cc for women with a standard deviation of 75 cc. What brain volume represents the 95th percentile?
#just plug in the numbers in the qnorm function
round(qnorm(.95, mean= 1100, sd=75),3)
## [1] 1223.364
5. Brain volume for adult women is about 1,100 cc for women with a standard deviation of 75 cc. Consider the sample mean of 100 random adult women from this population. What is the 95th percentile of the distribution of that sample mean?
#get the variance for the sample
## [1] 56.25
#plug-in the numbers
round(qnorm(.95, mean= 1100, sd=sqrt(56.25)),3)
## [1] 1112.336
6. Consider a standard uniform density. The mean for this density is .5 and the variance is 1 / 12. You sample 1,000 observations from this distribution and take the sample mean, what value would you expect it to be near?
#get the sample standard deviation for the sample
## [1] 0.009128709
Thus the expectation is the mean should be very close to the population mean. (i.e 0.5)
7. The number of people showing up at a bus stop is assumed to be Poisson with a mean of 5 people per hour. You watch the bus stop for 3 hours. About what’s the probability of viewing 10 or fewer people?
#use the ppois function. lambda is the rate of the poisson distribution. 
ppois(10, lambda = 5 * 3)
## [1] 0.1184644


Popular posts from this blog

HIVE: Both Left and Right Aliases Encountered in Join

Assign select result to variable in Netezza stored procedure

Splitting value in Netezza using array_split