Module #4 Visualizing and Interpreting Hospital Patient Data
> # Define numeric vectors
> Frequency <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)
> BloodPressure <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)
>
> # Convert categorical strings to numeric codes
> # First assessment: bad=1, good=0
> FirstAssess <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1)
>
> # Second assessment: low=0, high=1
> SecondAssess <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1)
>
> # Final decision: low=0, high=1
> FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1)
>
> # Create data frame
> df_hosp <- data.frame(
+ Frequency, BloodPressure, FirstAssess,
+ SecondAssess, FinalDecision,
+ stringsAsFactors = FALSE
+ )
>
> # Inspect data
> summary(df_hosp)
Frequency BloodPressure FirstAssess SecondAssess FinalDecision
Min. :0.20 Min. : 32.00 Min. :0.0000 Min. :0.0 Min. :0.0
1st Qu.:0.30 1st Qu.: 63.75 1st Qu.:0.0000 1st Qu.:0.0 1st Qu.:0.0
Median :0.40 Median : 95.00 Median :1.0000 Median :1.0 Median :1.0
Mean :0.43 Mean :102.60 Mean :0.5556 Mean :0.6 Mean :0.6
3rd Qu.:0.55 3rd Qu.:128.50 3rd Qu.:1.0000 3rd Qu.:1.0 3rd Qu.:1.0
Max. :0.90 Max. :205.00 Max. :1.0000 Max. :1.0 Max. :1.0
NA's :1
>
> # Handle missing values (drop rows with NA)
> df_hosp <- na.omit(df_hosp)
>
> # Check cleaned data
> print(df_hosp)
Frequency BloodPressure FirstAssess SecondAssess FinalDecision
1 0.6 103 1 0 0
2 0.3 87 1 0 1
3 0.4 32 1 1 0
4 0.4 42 1 1 1
5 0.2 59 0 0 0
6 0.6 109 0 0 1
7 0.3 78 0 1 0
8 0.4 205 0 1 1
10 0.2 176 1 1 1
>
> # Set up plotting window: 3 rows, 1 column
> par(mfrow = c(3,1))
>
> # Boxplot 1: First MD Assessment
> boxplot(
+ BloodPressure ~ FirstAssess,
+ data = df_hosp,
+ names = c("Good", "Bad"),
+ ylab = "Blood Pressure",
+ main = "BP by First MD Assessment",
+ col = c("lightblue", "salmon")
+ )
>
> # Boxplot 2: Second MD Assessment
> boxplot(
+ BloodPressure ~ SecondAssess,
+ data = df_hosp,
+ names = c("Low", "High"),
+ ylab = "Blood Pressure",
+ main = "BP by Second MD Assessment",
+ col = c("lightgreen", "orange")
+ )
>
> # Boxplot 3: Final Decision
> boxplot(
+ BloodPressure ~ FinalDecision,
+ data = df_hosp,
+ names = c("Low", "High"),
+ ylab = "Blood Pressure",
+ main = "BP by Final Decision",
+ col = c("lightgray", "lightpink")
+ )
>
> # Reset plotting window to default
> par(mfrow = c(1,1))
>
> # Histogram of Visit Frequency
> hist(
+ df_hosp$Frequency,
+ breaks = seq(0, 1, by = 0.1), # bins from 0.0 to 1.0 in steps of 0.1
+ col = "skyblue", # fill color
+ border = "white", # clean bin edges
+ xlab = "Visit Frequency",
+ ylab = "Count",
+ main = "Histogram of Visit Frequency"
+ )
>
> # Histogram of Blood Pressure
> hist(
+ df_hosp$BloodPressure,
+ breaks = 8, # split BP into 8 bins
+ col = "lightgreen",
+ border = "white",
+ xlab = "Blood Pressure",
+ ylab = "Count",
+ main = "Histogram of Blood Pressure"
+ )Interpretation and Discussion
Looking at the boxplots, it’s pretty clear that blood pressure plays a big role in how the doctors made their assessments. For the first assessment, patients marked as “bad” usually had higher blood pressure compared to those marked as “good.” The same thing shows up in the second assessment and the final decision, higher blood pressure seems to line up with being put in the “high” group. Even with a small dataset, the relationship stands out.
The histograms also tell an interesting story. Most patients had visit frequencies in the middle range (around 0.3–0.6), with fewer at the very low or very high end. Blood pressure, on the other hand, was much more spread out. There were some really low values in the 30s and 40s, and some very high ones over 200, which definitely look like outliers. Those points would need a closer look in a real dataset.
One limitation here is that this dataset is small and made-up, so we can’t make real clinical claims. Still, it was good practice to see how missing values affect the analysis. When I used na.omit(), it dropped one row, which kept the data clean but made the dataset even smaller. In real life, I’d probably want to try imputing values instead of just removing them. Overall, this was a good reminder of how much the data prep step matters before you even get into the analysis.



Comments
Post a Comment