Module 8: Input/Output, String Manipulation, and the plyr Package
GitHub Repository: AustinTCurtis/r-programming-assignments
In this post, I’ll walk through each line of code I used for the Assignment 8, explaining what every step does and why it’s important. The goal of this assignment was to import a dataset, analyze grades by gender, filter specific names, and export the results into different file formats using R Studio.
Step 1: Importing the Dataset
student6 <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE)
This line opens an interactive file-chooser window so I can select my dataset (a CSV file).
header = TRUE tells R that the first row contains the column names (Name, Age, Sex, Grade).
stringsAsFactors = FALSE ensures that text columns (like Name or Sex) stay as character strings rather than being automatically converted to factors.
Step 2: Checking the Data
head(student6)
str(student6)
head(student6) displays the first few rows so I can preview the dataset.
str(student6) shows each column’s data type and gives an overview of how many observations and variables are in the data frame.
This step is all about confirming the data imported correctly before doing any analysis.
Step 3: Loading the plyr Package
library(plyr)
The plyr library provides powerful tools for splitting data into groups, applying functions to each group, and combining the results.
Step 4: Computing the Mean Grade by Sex
gender_mean <- ddply(
student6,
"Sex",
summarise,
GradeAverage = mean(Grade, na.rm = TRUE)
)
This block calculates the average grade for each gender:
"Sex" tells R to group data by the Sex column.
summarise creates a summary table.
mean(Grade, na.rm = TRUE) computes the average of the Grade column while ignoring missing values (na.rm = TRUE).
The result is a new data frame named gender_mean with two columns: Sex and GradeAverage.
write.table(
gender_mean,
file = "gender_mean.txt",
sep = "\t",
row.names = FALSE
)
This exports the grouped averages to a tab-delimited text file called gender_mean.txt.
sep = "\t" means values are separated by tabs.
row.names = FALSE prevents R from writing extra row numbers.
Step 6: Confirming the File Was Created
file.exists("gender_mean.txt")
This simple check returns TRUE if the file exists in my current working directory. It’s a quick way to make sure the export worked correctly.
Step 7: Filtering Names Containing “i” or “I”
i_students <- subset(
student6,
grepl("i", Name, ignore.case = TRUE)
)
Here, I create a new subset of the data that only includes students whose names contain the letter “i” (in either uppercase or lowercase).
grepl() searches for pattern matches in text.
ignore.case = TRUE makes the search case-insensitive.
Step 8: Viewing the Filtered Results
head(i_students)
This displays the first few rows of my filtered dataset to confirm that it only includes names with “i”.
Step 9: Exporting Only the Filtered Names
write.csv(
i_students$Name,
file = "i_students.csv",
row.names = FALSE,
quote = FALSE
)
This command exports just the names of the filtered students into a CSV file.
quote = FALSE writes the names without quotation marks.
row.names = FALSE again omits row numbers.
Step 10: Exporting the Full Filtered Dataset
write.csv(
i_students,
file = "i_students_full.csv",
row.names = FALSE
)
This exports the entire subset (all columns for students with “i” in their name) to i_students_full.csv. This is useful if I want to retain age, sex, and grade information for the filtered group.
Step 11: Verifying All Output Files Exist
file.exists("gender_mean.txt")
file.exists("i_students.csv")
file.exists("i_students_full.csv")
Each line checks whether its respective file is present in the working directory. If all return TRUE, every export succeeded.
In conclusion, I practiced importing, cleaning, grouping, and exporting data in R. I used read.csv() to properly load a comma-delimited dataset, plyr::ddply() to summarize averages by gender, subset() and grepl() for conditional filtering, and multiple write.*() functions to save results.
#Import DataSet
> student6 <- read.csv(file.choose(), header = TRUE, stringsAsFactors = FALSE)
>
> #Check Data
> head(student6)
Name Age Sex Grade
1 Raul 25 Male 80
2 Booker 18 Male 83
3 Lauri 21 Female 90
4 Leonie 21 Female 91
5 Sherlyn 22 Female 85
6 Mikaela 20 Female 69
> str(student6)
'data.frame': 20 obs. of 4 variables:
$ Name : chr "Raul" "Booker" "Lauri" "Leonie" ...
$ Age : int 25 18 21 21 22 20 23 24 21 23 ...
$ Sex : chr "Male" "Male" "Female" "Female" ...
$ Grade: int 80 83 90 91 85 69 91 97 78 81 ...
>
> #Run "plyr"
> library(plyr)
>
> #Compute mean Grade by Sex
> gender_mean <- ddply(
+ student6,
+ "Sex",
+ summarise,
+ GradeAverage = mean(Grade, na.rm = TRUE)
+ )
>
> gender_mean
Sex GradeAverage
1 Female 86.9375
2 Male 80.2500
>
> #Write means to a text file
> write.table(
+ gender_mean,
+ file = "gender_mean.txt",
+ sep = "\t",
+ row.names = FALSE
+ )
>
> #Check File
> file.exists("gender_mean.txt")
[1] TRUE
>
> #Filter names containing “i” or “I”
> i_students <- subset(
+ student6,
+ grepl("i", Name, ignore.case = TRUE)
+ )
>
> head(i_students)
Name Age Sex Grade
3 Lauri 21 Female 90
4 Leonie 21 Female 91
6 Mikaela 20 Female 69
8 Aiko 24 Female 97
9 Tiffaney 21 Female 78
10 Corina 23 Female 81
>
> #Export filtered names only
> write.csv(
+ i_students$Name,
+ file = "i_students.csv",
+ row.names = FALSE,
+ quote = FALSE
+ )
>
> #Export full filtered DataSet
> write.csv(
+ i_students,
+ file = "i_students_full.csv",
+ row.names = FALSE
+ )
>
> #Verify all outputs exist in directory
> file.exists("gender_mean.txt")
[1] TRUE
> file.exists("i_students.csv")
[1] TRUE
> file.exists("i_students_full.csv")
[1] TRUE
Comments
Post a Comment