Final Project: Build and Share Your Own R Package
GitHub Repository: AustinTCurtis/AustinCurtis: For LIS4370
For my final project, I created an R package called co2Package, which analyzes annual CO₂ emissions by country. The package uses a real dataset from Our World in Data and includes custom functions, defensive programming, documentation, and a full vignette. This project helped me understand how professional R packages are built, tested, and documented, while also giving me hands-on experience with GitHub version control.
Choosing the Dataset
I selected the Annual CO2 Emissions per Country dataset from Our World in Data because:
-
It is real-world, meaningful, and relevant to climate policy.
-
It contains multiple variables suitable for analysis.
-
It provides an opportunity to visualize long-term global trends.
-
It fits naturally into a package with summary and plotting functions.
After downloading the CSV file, I imported, cleaned, and standardized the variable names inside my package using a script stored in the data-raw/ directory. This script also applies error handling (tryCatch) to ensure that the data loads correctly.
The final dataset included:
-
country
-
code
-
year
-
co2 (annual emissions)
This cleaned dataset is bundled inside the package as co2_country.
Building the Package Structure
I used "usethis" to create the package skeleton. This automatically generated the required folders:
-
R/for function scripts -
data/for the processed dataset -
vignettes/for R Markdown documentation -
man/for documentation generated by roxygen2 -
DESCRIPTIONandNAMESPACEfiles
This structure mirrors professional R packages, making it easier to stay organized as the project grows.
Developing the Functions
The package includes three core functions, each designed with defensive programming:
1. load_co2_data()
Loads the internal dataset and verifies:
-
required columns exist
-
numeric columns (year, co2) are correctly typed
-
data has a minimum expected size
If the structure isn’t correct, the function stops with a clear error message.
2. co2_summary()
Computes:
-
mean
-
median
-
standard deviation
for selected countries and years.
This function checks:
-
that the dataset is a valid data frame
-
that user-selected countries exist in the data
-
that filters don’t remove all rows
If the user provides invalid input, the function explains exactly what went wrong.
3. plot_co2_trend()
Uses ggplot2 to visualize CO₂ emissions across time.
Includes defensive programming such as:
-
checking column existence
-
verifying that at least one country was selected
-
confirming that ggplot2 is installed
This was the most visually useful part of the project, since it reveals how emissions change over decades.
Defensive Programming and Debugging
This project required implementing defensive programming, which helped make the package more robust. Some of the techniques I used include:
-
stop()messages for invalid inputs -
tryCatch()when reading files -
Checking for valid column names
-
Ensuring filters don’t eliminate all data
-
Warnings for unusual but allowable inputs
-
Namespace-safe calls to ggplot2
I also used devtools::check() repeatedly to identify missing documentation, imports, and namespace issues.
This process provided me with a deeper understanding of how debugging works in R and why packages must be strict with their input.
Creating the Vignette
The vignette (vignettes/co2Package-intro.Rmd) provides a comprehensive walkthrough of the package.
It includes:
-
description of the dataset
-
Example code for loading the data
-
Summary statistics
-
Emission trend plots
-
Explanation of defensive programming decisions
Building the vignette helped me understand how to communicate code and analysis in a reproducible way.
Publishing the Package on GitHub
Using RStudio’s Git tab, I:
-
Initialized Git
-
Committed all files
-
Connected to GitHub using
use_github() -
Pushed my package to a live repository
-
Added a README and license
This reinforced the importance of version control for real-world programming workflows.
GitHub now serves as a public hosting platform for my package, and it can be installed by anyone using:
What I Learned
This project taught me far more than just writing functions in R. I learned:
-
How R packages are structured internally
-
The importance of dataset cleaning and standardization
-
How defensive programming prevents user errors
-
How documentation is generated using roxygen2
-
How to build and publish a vignette
-
How to debug package errors using devtools::check()
-
How to use GitHub for software development
Overall, this project helped me bridge the gap between being an R user and becoming an R developer.
Conclusion
Creating the co2Package was one of the most comprehensive and practical projects I’ve completed in R. It guided me through the full lifecycle of software creation, data preparation, function design, defensive programming, documentation, testing, and deployment. This experience will be directly useful for future data science projects, reproducible research, and any situation where code needs to be packaged, shared, and maintained.
Comments
Post a Comment