QMM1002-Case-Study-1---Sabith-Ali.knit

title: “QMM1002 Case Study 1 [20%]”
author: - Sabith Ali - A00295187
date: ‘Due: June 24, 2024’

INTRODUCTION

As a part of my case study for QMM-1001 and QMM-1002, I started gathering data on my daily activities. The collection period ranged from 12 January 2024 until 17 June 2024. This data collection aimed to examine how I spend my days and understand if I’m spending my days the way I want to. To perform an analysis, I recorded data on each day for the below variables:

Hours spent in Zoom/Class
Hours spent studying
Sleep hours
Cups of milk
The distance I walked (in meters)
Did I go to college
The color of the T-shirt I wore
My productivity level
Semester (1 or 2)

The objective of this case study is to utilize my collected data and to come up with answers for the following questions:

1. What do I do in a day?

2. Do I spend my days how I expect?

3. Is there any difference in how I spend my time this semester compared to last semester?

My business analytics program is going strong and fast. Its been fun and intense at the same time. While the learning has been massive, it is only possible by hours of efforts and dedication. Good amounts of time are being put into studies daily basis to keep up with the never-ending load of assignments and deadlines. Though earning this quality of education is the main purpose of me making that leap to Canada all the way from the other side of the world, being an international student far away from home has its own downsides. Apart from the course lectures and study, I have to work part-time, cook food and manage all the aspects of life simaltaneously. This requires balance and planning. By understanding how I spent my days, I will be able to determine if I am spending my days the way I expect and make necessary changes to ensure my days are efficiently utilized. Since the quick transition made it harder to achieve that balance in the first semester, I gave myself some time to settle in with the intention of making things better as days go by. So I would also like to analyze if any positive changes have been made to my daily rountine in this semester as compared to the first semester.

To achieve this goal, I will be working with three of my quantitative variables which are hours spent studying, hours spent sleeping and the distance I walked. For conduting the analysis, I will compute the summary statistics for each variable. Also to perform relevant t-tests for each variable, I will check if the conditions required for using t-distribution are met or not.

setwd("C:/QMM1002")
personal.data<-read.csv(file="Ali, Sabith Personalized Data.csv", header = TRUE)
personal.vars<-subset(personal.data, select=c("Study", "Sleep", "Walking.distance"))
summary(personal.vars)

##      Study           Sleep       Walking.distance
##  Min.   :0.000   Min.   :4.000   Min.   :   50   
##  1st Qu.:0.000   1st Qu.:7.000   1st Qu.:  500   
##  Median :2.000   Median :7.500   Median : 1700   
##  Mean   :1.741   Mean   :7.424   Mean   : 2220   
##  3rd Qu.:3.000   3rd Qu.:8.000   3rd Qu.: 3500   
##  Max.   :8.000   Max.   :8.500   Max.   :10300

Conditions and Assumptions

Independence Assumption: the sampled values must be independent of each other
Randomization Condition: data must be representative of the population and randomly selected
10% Condition: the sample size, n, must be no larger than 10% of the population
Nearly Normal Condition: The data come from a distribution that is unimodal and symmetric which can be checked by making a histogram or boxplot.

Study Hours

Independence Assumption: A lot of my study involves working on assignments and deadlines. So if I were to study more on a day, that leads to studying less the next day. Thus the data points could be dependent on each other. Hence the condition is not met.
Randomization Condition: The data points are collected on a daily basis since the beggining of first semester meaning they are not randomly selected nor representative of the population. Hence the condition is not met.
10% Condition: The entire population is included in the analysis. Therefore the condition is not satisfied.
Nearly Normal Condition: The histogram is severly right skewed and not symmetric despite being unimodal. Hence the normality condition is not met.

Sleep Hours

Independence Assumption: Even though I try to get adequate sleep daily, on some days due to deadline and work schedules, I end up sleeping less and make it up the next day. Therefore some data points would be dependent on each other. Hence the condition is not met.
Randomization Condition: The data points are collected on a daily basis since the beggining of first semester meaning they are not randomly selected nor representative of the population. Hence the condition is not met.
10% Condition: The entire population is included in the analysis. Therefore the condition is not satisfied.
Nearly Normal Condition: The histogram is left skewed and unimodal with a potential outlier. Hence the normality condition is not met

Walking Distance

Independence Condition: The distance I walk daily depends on various factors. On some days I go for leisure walking while on some days I had to walk to meet certain needs. However, under no means the distance I walk on different days are dependent on each other. Hence the independence condition is satisfied.
Randomization Condition: The data points are collected on a daily basis since the beggining of first semester meaning they are not randomly selected nor representative of the population. Hence the condition is not met.
10% Condition: The entire population is included in the analysis. Therefore the condition is not satisfied.
Nearly Normal Condition: The histogram is right skewed and asymmetrical despite being unimodal. Also we can see an outlier. Hence the normality condition is not met

Analysis

Part 1 - Confidence Intervals

As the first step of my analysis, I will find the confidence interval of the average of the my three quantitative variables with a 95% confidence level. This will help me determine how stretched out the range of values are.

Study Hours

(CI.study<-t.test(personal.vars$Study, conf.level = 0.95))

## 
##  One Sample t-test
## 
## data:  personal.vars$Study
## t = 13.565, df = 157, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1.487080 1.993933
## sample estimates:
## mean of x 
##  1.740506

As stated in the t-test, I can declare with 95% confidence that the true average of my study hours per day lies between 1.5 and 2 hrs. That is a fair answer to me looking back at my daily study schedules. Most of the days with the lectures and having to work part-time, I usually spent 1-2 hours on studying. Even though some days I had to study 3-4 or more hours to keep up with the deadlines, there were a lot of days when I didnt study at all including the semester break of 2 weeks. I would like to maintain this routine of dedicating atleast 1.5 hours to studying daily and keep increasing my learning curve throughout the program.

Sleep Hours

(CI.sleep<-t.test(personal.vars$Sleep, conf.level = 0.95))

## 
##  One Sample t-test
## 
## data:  personal.vars$Sleep
## t = 131.9, df = 157, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  7.312875 7.535226
## sample estimates:
## mean of x 
##  7.424051

The t-test states that,I can be 95% confident that the true average of my sleep hours per day lies between 7.3 and 7.5 hrs. For me, adequate sleep is crucial to a healthy living. No matter the situation, I always try to obtain enough sleep required for my body which I consider to be around 7.5 hours. But on some days the laziness strikes to make me sleep a bit more to 8 and 8.5 hours offering some extra rest to my body. Yet on some days due to various uncontrollable inconveniences I could only afford to sleep less than 7 hours which made the confidence interval rightly balanced. I am quite happy with the result and continue to make sure I am sleeping enough to my needs.

Walking Distance

(CI.walkingdist<-t.test(personal.vars$Walking.dist, conf.level = 0.95))

## 
##  One Sample t-test
## 
## data:  personal.vars$Walking.dist
## t = 13.768, df = 157, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  1901.623 2538.630
## sample estimates:
## mean of x 
##  2220.127

The t-test for my walking distance states that the true average of my daily walking distance lies between 1901.6 and 2538.6 meters with 95% confidence. It is hard for me to decide if I have achieved my expected target here. I like to take care of my physical health by indulging in sports and other activities. I try to for running or walking on most of the days. But it was not always possible in the beginning of semester due to weather conditions as well as due to the intensity of the program. So on some days the only walking were to the college and back and also for shopping. Yet on some days I was fully home with barely any walking. Though a lot of days I walked over 3kms and a few days over 5 and 6 kms the average was brought down by the days I did not walk at all. Even though I am convinced it would be hard to integrate physical activity to my daily schedule due to the workloads and part-time job, I wanted to ensure I walk atleast 3kms daily to keep my physical health intact.

Part 2 - One Sample Hypothesis Tests

Study Hours

H₀: \(\mu = 3.13\)

H_A: \(\mu > 3.13\)

t.test(personal.vars$Study, mu=3.13, alternative="greater", conf.level=0.95)

## 
##  One Sample t-test
## 
## data:  personal.vars$Study
## t = -10.83, df = 157, p-value = 1
## alternative hypothesis: true mean is greater than 3.13
## 95 percent confidence interval:
##  1.528211      Inf
## sample estimates:
## mean of x 
##  1.740506

qt(0.05, df=157, lower.tail=FALSE)

## [1] 1.654617

\(p-value\) = 1

\(\alpha\) = 0.05

\(\alpha\) < \(p-value\)

t.stat = -10.83

\(t^{*}\) = 1.654617

t.stat < \(t^{*}\)

Hence, we fail to reject the null hypothesis.

We do not consider the absolute value here because we are only interested in whether the t-statistic exceeds the critical value on the positive side, as per the alternative hypothesis.

As per the hypothesis test, there is no evidence that the average number of hours I study per day is greater that 3.13 and I study less than the McGill University students. The confidence interval stated the average in the range of 1.5 - 2 hrs, so it is obviously going to be way less than the McGill students average study hours. I would like to add more hours to my study daily to get best results out of this course in terms of learning however it is hard to achieve due my current demands of life.

Sleep Hours

H₀: \(\mu = 7.5\)

H_A: \(\mu \neq 7.5\)

t.test(personal.vars$Sleep, mu=7.5, alternative="two.sided", conf.level=0.95)

## 
##  One Sample t-test
## 
## data:  personal.vars$Sleep
## t = -1.3493, df = 157, p-value = 0.1792
## alternative hypothesis: true mean is not equal to 7.5
## 95 percent confidence interval:
##  7.312875 7.535226
## sample estimates:
## mean of x 
##  7.424051

qt(0.05/2, df=157, lower.tail=TRUE)

## [1] -1.975189

\(p-value\) = 0.1792

\(\alpha\) = 0.05

\(\alpha\) < \(p-value\)

t.stat = -1.3493

\(t^{*}\) = -1.975189

|t.stat| < |\(t^{*}\)|

Hence we fail to reject the null hypothesis

Performing a two sided one-sample t-test suggests that there is no evidence the average of my daily sleep hours is different from 7.5 hours. As I had already declared my optimum sleep to be 7.5 hrs, I wanted to find if the true mean is equal to 7.5 hrs or not through hypothesis testing. I got the result that I wanted and I impressed about my sleep schedules.

Walking Distance

H₀: \(\mu = 2000\)

H_A: \(\mu > 2000\)

t.test(personal.vars$Walking.distance, mu=2000, alternative="greater", conf.level=0.95)

## 
##  One Sample t-test
## 
## data:  personal.vars$Walking.distance
## t = 1.3651, df = 157, p-value = 0.08709
## alternative hypothesis: true mean is greater than 2000
## 95 percent confidence interval:
##  1953.316      Inf
## sample estimates:
## mean of x 
##  2220.127

qt(0.05, df=157, lower.tail=FALSE)

## [1] 1.654617

\(p-value\) = 0.08709

\(\alpha\) = 0.05

\(\alpha\) < \(p-value\)

t.stat = 1.3651

\(t^{*}\) = 1.654617

t.stat < \(t^{*}\)

Hence we fail to reject the null hypothesis

A hypothesis tests reveals that there is no evidence my average daily walking distance is greater than 2000 meters. As I figured out the confidence interval of my average walking distance to be between 1901.6 and 2538.6 meters, I wanted to see if the true average is over 2kms. Yet the result is unconvincing with the average being less than 2000 meters highlighting the need to step up my daily walking distance to meet my target of 3000 meters.

Part 3 - Two-Sample Hypothesis Tests

In the final part of my analysis, I want to compare my study hours during this semester with the last semester. To do this, I will conduct a two-sample hypothesis test to see if there is a significant difference in the study hours between the two semesters.

semester1<-subset(personal.data, Semester == 1)

semester2<-subset(personal.data, Semester == 2)

Independent Groups Assumption: To use the two-sample t test methods, the groups that are being compared must be independent of each other

In our study, the data collected for semester 1 does not influence the data collected for semester 2 under no means. Therefore the two groups are independent of each other and hence the condition is satisfied.

Random Samples

set.seed(1)
sem1_sample<-sample_n(semester1, 10)

sem2_sample<-sample_n(semester2, 10)

H₀: \(\mu\)_SEM1 - \(\mu\)_SEM2 = 0

H_A: \(\mu\)_SEM1 - \(\mu\)_SEM2 > 0

t.test(sem1_sample$Study, sem2_sample$Study, paired=FALSE, var.equal=TRUE, alternative="greater", conf.level=0.95)

## 
##  Two Sample t-test
## 
## data:  sem1_sample$Study and sem2_sample$Study
## t = -3.1711, df = 18, p-value = 0.9974
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -3.093667       Inf
## sample estimates:
## mean of x mean of y 
##       0.9       2.9

qt(0.05, 18, lower.tail=FALSE)

## [1] 1.734064

\(p-value\) = 0.9974

\(\alpha\) = 0.05

\(\alpha\) < \(p-value\)

t.stat = -3.1711

\(t^{*}\) = 1.734064

t.stat < \(t^{*}\)

Hence, we fail to reject the null hypothesis.

We do not consider the absolute value here because we are only interested in whether the t-statistic exceeds the critical value on the positive side, as per the alternative hypothesis.

As per the results of performing a two-sample pooled t-test, there is no evidence that I spent more time studying per day during semester 1 than semester 2.

To determine if my average study hours is greater in semester 1 than semester 2, I generated 10 random samples of days from both semesters. I choose to perform a pooled hypothesis test because since the sample size are small, assuming equal variances can help squeeze more power out of the test. The pooled test is not appropriate in this case because the random samples are not naturally paired observations. The result is no surprise to me and it is expected. Even though I had started the first semester in a strong manner, some personal reasons required me to go back home, that drastically decreased my daily study hours. This also led to my grades going down in the second half of the second semester as I even could not work on a few assignments. I pledged myself to turn things around for my second semester and make up for the lost marks. I have been putting a good shift on my daily study and the results reflecting my effort.

First Ten Observations

sem1_10<-head(semester1, 10)

sem2_10<-head(semester2, 10)

H₀: \(\mu\)_SEM1 - \(\mu\)_SEM2 = 0

H_A: \(\mu\)_SEM1 - \(\mu\)_SEM2 > 0

t.test(sem1_10$Study, sem2_10$Study, paired=TRUE, var.equal=FALSE, alternative="greater", conf.level=0.95)

## 
##  Paired t-test
## 
## data:  sem1_10$Study and sem2_10$Study
## t = 1.7782, df = 9, p-value = 0.05454
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  -0.04011018         Inf
## sample estimates:
## mean difference 
##             1.3

qt(0.05, 9, lower.tail=FALSE)

## [1] 1.833113

\(p-value\) = 0.05454

\(\alpha\) = 0.05

\(\alpha\) < \(p-value\)

t.stat = 1.7782

\(t^{*}\) = 1.833113

t.stat < \(t^{*}\)

Hence we fail to reject the null hypothesis

A paired two-sample t-test states that there is no evidence I spent more time studying per day during semester 1 than semester 2

In this case, I sampled the first 10 observations from each semester to determine if my daily study hours is semester 1 is greater than semester 2. The paired test is the right approach here as this setup involves first 10 paired samples as they naturally linked to each other, as the data points of first semester corresponds to the same data point in the second semester. Even though I observed that I studied more in the first 10 days for semester 1 compared to semester 2, the hypothesis test takes into account both the magnitude of differences and their variability. Hence failing to reject the null hypothesis.

CONCLUSION

In summary, my analysis of the three quantitative variables—hours spent studying, hours spent sleeping, and walking distance—revealed the following:

Average Study Hours: The true average of my study hours per day lies between 1.5 and 2 hours. This aligns with my daily study schedules where I usually spend 1-2 hours studying due to other commitments.

Average Sleep Hours: The true average of my sleep hours per day lies between 7.3 and 7.5 hours. This result is consistent with my sleep schedules where I aim for around 7.5 hours of sleep daily.

Average Walking Distance: The true average of my daily walking distance lies between 1901.6 and 2538.6 meters. This indicates that while I do engage in physical activities, there are days with minimal walking, bringing the average down.

The hypothesis tests showed no evidence that my study hours are greater than 3.13 hours per day, no significant difference from 7.5 hours of sleep per day, and no evidence that my walking distance is greater than 2000 meters per day. Additionally, there was no significant difference in study hours between the two semesters or the first ten days of each semester.

Overall, this analysis helps in understanding my daily routines and making necessary adjustments to improve productivity and well-being. While I have maintained a consistency in my sleep schedule, and my daily study hours have also been upto the mark except for the second half of the semester, I must work on improving my physical health and meet the targets that I set for myself.