### Simple statistics in R

The following are some notes taken while doing a course on statistics. Again, I'm using R markdown to produce this both as a way for me to practice using it and second, because I think it's an awesome tool for document as you code.

It’s really meant for my future reference, since there is quite a high probability that I’ll forget all of these neat functions in a couple of months from now :P

It’s really meant for my future reference, since there is quite a high probability that I’ll forget all of these neat functions in a couple of months from now :P

1. The respiratory disturbance index (RDI), a measure of sleep disturbance, for a specific population has a mean of 15 (sleep events per hour) and a standard deviation of 10. They are not normally distributed. Give your best estimate of the probability that a sample mean RDI of 100 people is between 14 and 16 events per hour?

*Answer:*Recall that the formula for variance is

Variance = (Standard deviation)^2

Thus,

Standard deviation = sqrt(variance)

Recall another formula to derive variance sample from variance of population.

Variance = (standard deviation)^2/(Sample Size)

```
# calculate the sample standard deviation
# first let's get the sample variance
10^2/100
```

`## [1] 1`

```
#Thus the sample standard deviation is
sqrt(1)
```

`## [1] 1`

```
#Since sample standard deviation is 1, we now know that 14 and 16 events are within 1 and -1 standard deviation from the sample mean. We can thus calculate the probability in that area.
pnorm(1)-pnorm(-1)
```

`## [1] 0.6826895`

Thus the answer is around 68%.

2. You flip a fair coin 5 times, about what’s the probability of getting 4 or 5 heads?

*Answer:*To solve this, we need to know the combinations of 4 heads and 1 tails that we can get from doing the flip 5 times.

A great tutorial for this (and basically solve this problem as a whole actually) is shown at Khan Academy

Anyways, this is an example on to solve the problem using R.

```
# Get the number of combinations for 4 heads 1 tail
factorial(5)/(factorial(4)*factorial(5-4))
```

`## [1] 5`

```
# There is only 1 combinations for 5 heads and 0 tail.
# The probability of heads or tails is both 50% since it's a fair coin. This simplifies our calculation a lot.
5*0.5^5 + 1 * 0.5^5
```

`## [1] 0.1875`

```
# If it's not a fair coin,..say 70% heads and 30 tails; then the calculation becomes
5*0.7^4*0.3^1 + 1*0.7^5
```

`## [1] 0.52822`

3. Suppose that diastolic blood pressures (DBPs) for men aged 35-44 are normally distributed with a mean of 80 (mm Hg) and a standard deviation of 10. About what is the probability that a random 35-44 year old has a DBP less than 70?

```
# we know that 70 is 1 standard deviation less than the mean (i.e standard deviation (-1)). thus, use the pnorm function.
pnorm(-1)
```

`## [1] 0.1586553`

4. Brain volume for adult women is normally distributed with a mean of about 1,100 cc for women with a standard deviation of 75 cc. What brain volume represents the 95th percentile?

```
#just plug in the numbers in the qnorm function
round(qnorm(.95, mean= 1100, sd=75),3)
```

`## [1] 1223.364`

5. Brain volume for adult women is about 1,100 cc for women with a standard deviation of 75 cc. Consider the sample mean of 100 random adult women from this population. What is the 95th percentile of the distribution of that sample mean?

```
#get the variance for the sample
75^2/100
```

`## [1] 56.25`

```
#plug-in the numbers
round(qnorm(.95, mean= 1100, sd=sqrt(56.25)),3)
```

`## [1] 1112.336`

6. Consider a standard uniform density. The mean for this density is .5 and the variance is 1 / 12. You sample 1,000 observations from this distribution and take the sample mean, what value would you expect it to be near?

```
#get the sample standard deviation for the sample
sqrt(1/12/1000)
```

`## [1] 0.009128709`

Thus the expectation is the mean should be very close to the population mean. (i.e 0.5)

7. The number of people showing up at a bus stop is assumed to be Poisson with a mean of 5 people per hour. You watch the bus stop for 3 hours. About what’s the probability of viewing 10 or fewer people?

```
#use the ppois function. lambda is the rate of the poisson distribution.
ppois(10, lambda = 5 * 3)
```

`## [1] 0.1184644`

## Comments