Module 8: Input/Output, String Manipulation, and the plyr Package

GitHub Repository: AustinTCurtis/r-programming-assignments

In this post, I’ll walk through each line of code I used for the Assignment 8, explaining what every step does and why it’s important. The goal of this assignment was to import a dataset, analyze grades by gender, filter specific names, and export the results into different file formats using R Studio.

Step 1: Importing the Dataset

student6 <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE)

This line opens an interactive file-chooser window so I can select my dataset (a CSV file).

header = TRUE tells R that the first row contains the column names (Name, Age, Sex, Grade).

stringsAsFactors = FALSE ensures that text columns (like Name or Sex) stay as character strings rather than being automatically converted to factors.

Step 2: Checking the Data

head(student6)

str(student6)

head(student6) displays the first few rows so I can preview the dataset.

str(student6) shows each column’s data type and gives an overview of how many observations and variables are in the data frame.

This step is all about confirming the data imported correctly before doing any analysis.

Step 3: Loading the plyr Package

library(plyr)

The plyr library provides powerful tools for splitting data into groups, applying functions to each group, and combining the results.

Step 4: Computing the Mean Grade by Sex

gender_mean <- ddply(

student6,

"Sex",

summarise,

GradeAverage = mean(Grade, na.rm = TRUE)

)

This block calculates the average grade for each gender:

"Sex" tells R to group data by the Sex column.

summarise creates a summary table.

mean(Grade, na.rm = TRUE) computes the average of the Grade column while ignoring missing values (na.rm = TRUE).

The result is a new data frame named gender_mean with two columns: Sex and GradeAverage.

Step 5: Writing the Summary to a Text File

write.table(

gender_mean,

file = "gender_mean.txt",

sep = "\t",

row.names = FALSE

)

This exports the grouped averages to a tab-delimited text file called gender_mean.txt.

sep = "\t" means values are separated by tabs.

row.names = FALSE prevents R from writing extra row numbers.

Step 6: Confirming the File Was Created

file.exists("gender_mean.txt")

This simple check returns TRUE if the file exists in my current working directory. It’s a quick way to make sure the export worked correctly.

Step 7: Filtering Names Containing “i” or “I”

i_students <- subset(

student6,

grepl("i", Name, ignore.case = TRUE)

)

Here, I create a new subset of the data that only includes students whose names contain the letter “i” (in either uppercase or lowercase).

grepl() searches for pattern matches in text.

ignore.case = TRUE makes the search case-insensitive.

Step 8: Viewing the Filtered Results

head(i_students)

This displays the first few rows of my filtered dataset to confirm that it only includes names with “i”.

Step 9: Exporting Only the Filtered Names

write.csv(

i_students$Name,

file = "i_students.csv",

row.names = FALSE,

quote = FALSE

)

This command exports just the names of the filtered students into a CSV file.

quote = FALSE writes the names without quotation marks.

row.names = FALSE again omits row numbers.

Step 10: Exporting the Full Filtered Dataset

write.csv(

i_students,

file = "i_students_full.csv",

row.names = FALSE

)

This exports the entire subset (all columns for students with “i” in their name) to i_students_full.csv. This is useful if I want to retain age, sex, and grade information for the filtered group.

Step 11: Verifying All Output Files Exist

file.exists("gender_mean.txt")

file.exists("i_students.csv")

file.exists("i_students_full.csv")

Each line checks whether its respective file is present in the working directory. If all return TRUE, every export succeeded.

In conclusion, I practiced importing, cleaning, grouping, and exporting data in R. I used read.csv() to properly load a comma-delimited dataset, plyr::ddply() to summarize averages by gender, subset() and grepl() for conditional filtering, and multiple write.*() functions to save results.

FULL OUTPUT FROM R:

#Import DataSet
> student6 <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE)
> 
> #Check Data
> head(student6)
     Name Age    Sex Grade
1    Raul  25   Male    80
2  Booker  18   Male    83
3   Lauri  21 Female    90
4  Leonie  21 Female    91
5 Sherlyn  22 Female    85
6 Mikaela  20 Female    69
> str(student6)
'data.frame':	20 obs. of  4 variables:
 $ Name : chr  "Raul" "Booker" "Lauri" "Leonie" ...
 $ Age  : int  25 18 21 21 22 20 23 24 21 23 ...
 $ Sex  : chr  "Male" "Male" "Female" "Female" ...
 $ Grade: int  80 83 90 91 85 69 91 97 78 81 ...
> 
> #Run "plyr"
> library(plyr)
> 
> #Compute mean Grade by Sex
> gender_mean <- ddply(
+   student6,
+   "Sex",
+   summarise,
+   GradeAverage = mean(Grade, na.rm = TRUE)
+ )
> 
> gender_mean
     Sex GradeAverage
1 Female      86.9375
2   Male      80.2500
> 
> #Write means to a text file
> write.table(
+   gender_mean,
+   file = "gender_mean.txt",
+   sep = "\t",
+   row.names = FALSE
+ )
> 
> #Check File
> file.exists("gender_mean.txt")
[1] TRUE
> 
> #Filter names containing “i” or “I”
> i_students <- subset(
+   student6,
+   grepl("i", Name, ignore.case = TRUE)
+ )
> 
> head(i_students)
       Name Age    Sex Grade
3     Lauri  21 Female    90
4    Leonie  21 Female    91
6   Mikaela  20 Female    69
8      Aiko  24 Female    97
9  Tiffaney  21 Female    78
10   Corina  23 Female    81
> 
> #Export filtered names only
> write.csv(
+   i_students$Name,
+   file = "i_students.csv",
+   row.names = FALSE,
+   quote = FALSE
+ )
> 
> #Export full filtered DataSet
> write.csv(
+   i_students,
+   file = "i_students_full.csv",
+   row.names = FALSE
+ )
> 
> #Verify all outputs exist in directory
> file.exists("gender_mean.txt")
[1] TRUE
> file.exists("i_students.csv")
[1] TRUE
> file.exists("i_students_full.csv")
[1] TRUE

Search This Blog

R Programming Journal – Austin Curtis

Module 8: Input/Output, String Manipulation, and the plyr Package

Comments

Post a Comment

Popular posts from this blog

Module #4 Visualizing and Interpreting Hospital Patient Data

Module # 2 Assignment Importing Data and Function Evaluation in R

Module 6: Matrix Operations and Construction