Module #4 Visualizing and Interpreting Hospital Patient Data

Github Repository: r-programming-assignments/Assignment_04.R at main · AustinTCurtis/r-programming-assignments

CODE FROM R Studio:
> # Define numeric vectors
> Frequency     <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)
> BloodPressure <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)
> 
> # Convert categorical strings to numeric codes
> # First assessment: bad=1, good=0
> FirstAssess   <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1)
> 
> # Second assessment: low=0, high=1
> SecondAssess  <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1)
> 
> # Final decision: low=0, high=1
> FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1)
> 
> # Create data frame
> df_hosp <- data.frame(
+   Frequency, BloodPressure, FirstAssess,
+   SecondAssess, FinalDecision,
+   stringsAsFactors = FALSE
+ )
> 
> # Inspect data
> summary(df_hosp)
   Frequency    BloodPressure     FirstAssess      SecondAssess FinalDecision
 Min.   :0.20   Min.   : 32.00   Min.   :0.0000   Min.   :0.0   Min.   :0.0  
 1st Qu.:0.30   1st Qu.: 63.75   1st Qu.:0.0000   1st Qu.:0.0   1st Qu.:0.0  
 Median :0.40   Median : 95.00   Median :1.0000   Median :1.0   Median :1.0  
 Mean   :0.43   Mean   :102.60   Mean   :0.5556   Mean   :0.6   Mean   :0.6  
 3rd Qu.:0.55   3rd Qu.:128.50   3rd Qu.:1.0000   3rd Qu.:1.0   3rd Qu.:1.0  
 Max.   :0.90   Max.   :205.00   Max.   :1.0000   Max.   :1.0   Max.   :1.0  
                                 NA's   :1                                   
> 
> # Handle missing values (drop rows with NA)
> df_hosp <- na.omit(df_hosp)
> 
> # Check cleaned data
> print(df_hosp)
   Frequency BloodPressure FirstAssess SecondAssess FinalDecision
1        0.6           103           1            0             0
2        0.3            87           1            0             1
3        0.4            32           1            1             0
4        0.4            42           1            1             1
5        0.2            59           0            0             0
6        0.6           109           0            0             1
7        0.3            78           0            1             0
8        0.4           205           0            1             1
10       0.2           176           1            1             1
> 
> # Set up plotting window: 3 rows, 1 column
> par(mfrow = c(3,1))
> 
> # Boxplot 1: First MD Assessment
> boxplot(
+   BloodPressure ~ FirstAssess,
+   data = df_hosp,
+   names = c("Good", "Bad"),
+   ylab = "Blood Pressure",
+   main = "BP by First MD Assessment",
+   col = c("lightblue", "salmon")
+ )
> 
> # Boxplot 2: Second MD Assessment
> boxplot(
+   BloodPressure ~ SecondAssess,
+   data = df_hosp,
+   names = c("Low", "High"),
+   ylab = "Blood Pressure",
+   main = "BP by Second MD Assessment",
+   col = c("lightgreen", "orange")
+ )
> 
> # Boxplot 3: Final Decision
> boxplot(
+   BloodPressure ~ FinalDecision,
+   data = df_hosp,
+   names = c("Low", "High"),
+   ylab = "Blood Pressure",
+   main = "BP by Final Decision",
+   col = c("lightgray", "lightpink")
+ )
> 




> # Reset plotting window to default
> par(mfrow = c(1,1))
> 
> # Histogram of Visit Frequency
> hist(
+   df_hosp$Frequency,
+   breaks = seq(0, 1, by = 0.1),   # bins from 0.0 to 1.0 in steps of 0.1
+   col = "skyblue",                # fill color
+   border = "white",               # clean bin edges
+   xlab = "Visit Frequency",
+   ylab = "Count",
+   main = "Histogram of Visit Frequency"
+ )
> 



> # Histogram of Blood Pressure
> hist(
+   df_hosp$BloodPressure,
+   breaks = 8,                     # split BP into 8 bins
+   col = "lightgreen",
+   border = "white",
+   xlab = "Blood Pressure",
+   ylab = "Count",
+   main = "Histogram of Blood Pressure"
+ )

Interpretation and Discussion

Looking at the boxplots, it’s pretty clear that blood pressure plays a big role in how the doctors made their assessments. For the first assessment, patients marked as “bad” usually had higher blood pressure compared to those marked as “good.” The same thing shows up in the second assessment and the final decision, higher blood pressure seems to line up with being put in the “high” group. Even with a small dataset, the relationship stands out.

The histograms also tell an interesting story. Most patients had visit frequencies in the middle range (around 0.3–0.6), with fewer at the very low or very high end. Blood pressure, on the other hand, was much more spread out. There were some really low values in the 30s and 40s, and some very high ones over 200, which definitely look like outliers. Those points would need a closer look in a real dataset.

One limitation here is that this dataset is small and made-up, so we can’t make real clinical claims. Still, it was good practice to see how missing values affect the analysis. When I used na.omit(), it dropped one row, which kept the data clean but made the dataset even smaller. In real life, I’d probably want to try imputing values instead of just removing them. Overall, this was a good reminder of how much the data prep step matters before you even get into the analysis.


Comments

Popular posts from this blog

Module # 2 Assignment Importing Data and Function Evaluation in R

Module 6: Matrix Operations and Construction