Module 11: Debugging and Defensive Programming in R
GitHub Repository: AustinTCurtis/r-programming-assignments
We’re working with a function designed to flag rows in a numeric matrix that are outliers in every column according to the Tukey rule. However, the original code contains a deliberate bug, specifically in the line that uses the && operator.
Original (Buggy) Code:
# Helper function for Tukey's rule
tukey.outlier <- function(v) {
qs <- stats::quantile(v, c(0.25, 0.75), na.rm = TRUE)
iqr <- diff(qs)
(v < qs[1] - 1.5 * iqr) | (v > qs[2] + 1.5 * iqr)
}
# Original (buggy) function
tukey_multiple <- function(x) {
outliers <- array(TRUE, dim = dim(x))
for (j in 1:ncol(x)) {
outliers[, j] <- outliers[, j] && tukey.outlier(x[, j]) # <-- BUG: '&&'
}
outlier.vec <- vector("logical", length = nrow(x))
for (i in 1:nrow(x)) {
outlier.vec[i] <- all(outliers[i, ])
}
return(outlier.vec)
}
Reproducing the Error:
set.seed(123)
test_mat <- matrix(rnorm(50), nrow = 10)
tukey_multiple(test_mat)
Error in outliers[, j] && tukey.outlier(x[, j]) : 'length = 10' in coercion to 'logical (1)'
Diagnosing the Bug:- The error occurs because && only evaluates the first element of each logical vector and returns a single TRUE/FALSE.- Since outliers [, j] is a vector of length 10, R attempts to assign a length-1 result to a longer object, triggering a mismatch error.- Even if it didn’t throw an error, the logic would still be incorrect each column would collapse into one Boolean value.
Corrected Function:
corrected_tukey <- function(x) { # Defensive checks if (!is.matrix(x)) stop("`x` must be a matrix.") if (!is.numeric(x)) stop("`x` must be a numeric matrix (no character/factor columns).") if (ncol(x) == 0L || nrow(x) == 0L) stop("`x` must have at least 1 row and 1 column.") outliers <- array(TRUE, dim = dim(x)) for (j in seq_len(ncol(x))) { outliers[, j] <- outliers[, j] & tukey.outlier(x[, j]) # <-- FIX: '&' } outlier.vec <- logical(nrow(x)) for (i in seq_len(nrow(x))) { outlier.vec[i] <- all(outliers[i, ]) } outlier.vec}
Validating the Fix:
set.seed(123)test_mat <- matrix(rnorm(50), nrow = 10)corrected_tukey(test_mat)
Output:
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Original (Buggy) Code:
# Helper function for Tukey's rule
tukey.outlier <- function(v) {
qs <- stats::quantile(v, c(0.25, 0.75), na.rm = TRUE)
iqr <- diff(qs)
(v < qs[1] - 1.5 * iqr) | (v > qs[2] + 1.5 * iqr)
}
# Original (buggy) function
tukey_multiple <- function(x) {
outliers <- array(TRUE, dim = dim(x))
for (j in 1:ncol(x)) {
outliers[, j] <- outliers[, j] && tukey.outlier(x[, j]) # <-- BUG: '&&'
}
outlier.vec <- vector("logical", length = nrow(x))
for (i in 1:nrow(x)) {
outlier.vec[i] <- all(outliers[i, ])
}
return(outlier.vec)
}
Reproducing the Error:
set.seed(123)
test_mat <- matrix(rnorm(50), nrow = 10)
tukey_multiple(test_mat)
Error in outliers[, j] && tukey.outlier(x[, j]) :
'length = 10' in coercion to 'logical (1)'
Diagnosing the Bug:
- The error occurs because
&& only evaluates the first element of each logical vector and returns a single TRUE/FALSE.- Since outliers [, j] is a vector of length 10, R attempts to assign a length-1 result to a longer object, triggering a mismatch error.
corrected_tukey <- function(x) {
# Defensive checks
if (!is.matrix(x)) stop("`x` must be a matrix.")
if (!is.numeric(x)) stop("`x` must be a numeric matrix (no character/factor columns).")
if (ncol(x) == 0L || nrow(x) == 0L) stop("`x` must have at least 1 row and 1 column.")
outliers <- array(TRUE, dim = dim(x))
for (j in seq_len(ncol(x))) {
outliers[, j] <- outliers[, j] & tukey.outlier(x[, j]) # <-- FIX: '&'
}
outlier.vec <- logical(nrow(x))
for (i in seq_len(nrow(x))) {
outlier.vec[i] <- all(outliers[i, ])
}
outlier.vec
}
Validating the Fix:
set.seed(123)
test_mat <- matrix(rnorm(50), nrow = 10)
corrected_tukey(test_mat)
Output:
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Comments
Post a Comment