Version 0.2.1
Leoson Hoay
Last Updated: 8 Sep 2025 (0.2.0 ->
0.2.1)
This module was born out of my genuine frustration while constructing an extremely long CASE WHEN…THEN statement to re-label categorical variables. It is most helpful for folks who intend to work with SQL directly in the R environment, likely with a SQL connector such as RODBC or RSQLite.
The package provides three main functions for generating SQL
statements: - casewhen()
- Creates CASE
WHEN…THEN statements for value mapping -
inlist()
- Creates IN() statements for
filtering - updatetable()
- Creates UPDATE
statements for bulk data updates
All functions now support flexible column selection, robust error handling, different quote types, and various options for handling NULL/NA values.
Go from this:
Hotel/Motel | Living in Shelter/Hotel/Motel |
Homeless Shelter | Living in Shelter/Hotel/Motel |
Homeless Status Not Applicable | Not Homeless |
N/A | Not Homeless |
No | Not Homeless |
Homeless, Doubled-Up | Doubled Up |
To this:
CASE WHEN 'Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
WHEN 'Homeless Shelter' THEN 'Living in Shelter/Hotel/Motel'
WHEN 'Homeless Status Not Applicable' THEN 'Not Homeless'
WHEN 'N/A' THEN 'Not Homeless'
WHEN 'No' THEN 'Not Homeless'
WHEN 'Homeless, Doubled-Up' THEN 'Doubled Up'
The package also supports the creation of long SQL IN() lists via the inlist() function, and UPDATE statements via the updatetable() function. The IN() functionality was inspired by reading about Kevin Flerlage’s Excel implementation.
The package assumes that the user has a mapping CSV file or an R dataframe similar to the example below:
samp <- system.file("extdata", "sample.csv", package = "sqlcaser")
mapping <- read.csv(samp)
mapping
#> Homeless..Hotel.or.Motel Living.in.Shelter.Hotel.Motel
#> 1 Homeless, Hotel/Motel Living in Shelter/Hotel/Motel
#> 2 Homeless, Living in Shelter Living in Shelter/Hotel/Motel
#> 3 Homeless Shelters Living in Shelter/Hotel/Motel
#> 4 Hotel/motel Living in Shelter/Hotel/Motel
#> 5 Hotel/Motel Living in Shelter/Hotel/Motel
#> 6 Homeless Shelter Living in Shelter/Hotel/Motel
#> 7 Homeless Status Not Applicable Not Homeless
#> 8 N/A Not Homeless
#> 9 No Not Homeless
#> 10 Homeless, Doubled-Up Doubled Up
#> 11 Homeless, In Doubled-Up Residence Doubled Up
#> 12 Homeless, Unsheltered Unsheltered
The function casewhen() takes an R dataframe or the file path of the mapping file as input,and returns the CASE statement as a string, while printing it to the console as well.
statement <- casewhen(samp)
#>
#> CASE WHEN 'Homeless, Hotel or Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless, Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless, Living in Shelter' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Shelters' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Hotel/motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Shelter' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Status Not Applicable' THEN 'Not Homeless'
#> WHEN 'N/A' THEN 'Not Homeless'
#> WHEN 'No' THEN 'Not Homeless'
#> WHEN 'Homeless, Doubled-Up' THEN 'Doubled Up'
#> WHEN 'Homeless, In Doubled-Up Residence' THEN 'Doubled Up'
#> WHEN 'Homeless, Unsheltered' THEN 'Unsheltered'
The user can then easily include it as part of the SQL query:
query <- paste("SELECT id, ", statement, " END AS status "," \nFROM table;")
cat(query)
#> SELECT id,
#> CASE WHEN 'Homeless, Hotel or Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless, Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless, Living in Shelter' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Shelters' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Hotel/motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Hotel/Motel' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Shelter' THEN 'Living in Shelter/Hotel/Motel'
#> WHEN 'Homeless Status Not Applicable' THEN 'Not Homeless'
#> WHEN 'N/A' THEN 'Not Homeless'
#> WHEN 'No' THEN 'Not Homeless'
#> WHEN 'Homeless, Doubled-Up' THEN 'Doubled Up'
#> WHEN 'Homeless, In Doubled-Up Residence' THEN 'Doubled Up'
#> WHEN 'Homeless, Unsheltered' THEN 'Unsheltered'
#> END AS status
#> FROM table;
# Create sample data with named columns
data <- data.frame(
status = c("Active", "Inactive", "Pending"),
display = c("Currently Active", "Not Active", "Under Review")
)
# Use column names instead of positions, add ELSE clause
advanced_case <- casewhen(data,
when_col = "status",
then_col = "display",
else_value = "Unknown Status")
#>
#> CASE WHEN 'Active' THEN 'Currently Active'
#> WHEN 'Inactive' THEN 'Not Active'
#> WHEN 'Pending' THEN 'Under Review'
#> ELSE 'Unknown Status'
# Create data with numeric values
scores <- data.frame(
grade = c("A", "B", "C"),
points = c(90, 80, 70)
)
# Auto-quote mode handles numeric values without quotes
numeric_case <- casewhen(scores,
when_col = "grade",
then_col = "points",
quote_type = "auto")
#>
#> CASE WHEN 'A' THEN 90
#> WHEN 'B' THEN 80
#> WHEN 'C' THEN 70
# Create sample data with duplicates
categories <- data.frame(
category = c("Sales", "Marketing", "Sales", "IT", "Marketing", "HR")
)
# Remove duplicates automatically
unique_list <- inlist(categories,
value_col = "category",
distinct = TRUE)
#> Removed 2 duplicate values.
#>
#> IN('Sales', 'Marketing', 'IT', 'HR')
# Create sample update data
updates <- data.frame(
id = c(1, 2, 3),
name = c("John Doe", "Jane Smith", "Bob Wilson"),
department = c("Engineering", "Sales", "Marketing"),
salary = c(75000, 65000, 60000)
)
# Generate UPDATE statements using ID as key
update_statements <- updatetable(updates,
tablename = "employees",
key_col = "id")
#>
#> UPDATE employees
#> SET name = 'John Doe', department = 'Engineering', salary = 75000
#> WHERE id = 1;
#>
#> UPDATE employees
#> SET name = 'Jane Smith', department = 'Sales', salary = 65000
#> WHERE id = 2;
#>
#> UPDATE employees
#> SET name = 'Bob Wilson', department = 'Marketing', salary = 60000
#> WHERE id = 3;
A sample mapping file is provided in this package. The file path can be accessed as follows:
casewhen()
- CASE Statement GeneratorDescription
Constructs CASE WHEN…THEN statements from mapping data with flexible column selection and robust options.
Usage
casewhen(inputfile = NULL, header = FALSE, when_col = 1, then_col = 2,
else_value = NULL, quote_type = "single", handle_nulls = "skip")
Arguments
inputfile
- R dataframe or path to mapping filewhen_col
- Column name/index for WHEN values (default:
1)then_col
- Column name/index for THEN values (default:
2)else_value
- Optional ELSE value for the CASE
statementquote_type
- “single”, “double”, or “auto” (default:
“single”)handle_nulls
- “skip”, “null”, or “error” (default:
“skip”)inlist()
- IN Statement GeneratorDescription
Creates SQL IN() statements with duplicate removal and flexible column selection.
Usage
inlist(inputfile = NULL, header = FALSE, value_col = 1,
quote_type = "single", handle_nulls = "skip", distinct = TRUE)
Arguments
inputfile
- R dataframe or path to mapping filevalue_col
- Column name/index for IN values (default:
1)quote_type
- “single”, “double”, or “auto” (default:
“single”)distinct
- Remove duplicates if TRUE (default:
TRUE)updatetable()
- UPDATE Statement GeneratorDescription
Generates bulk UPDATE statements with flexible column selection and key specification.
Usage
updatetable(inputfile = NULL, tablename = NULL, key_col = 1, update_cols = NULL,
quote_type = "auto", handle_nulls = "skip", batch_updates = TRUE)
Arguments
inputfile
- R dataframe or path to mapping filetablename
- Name of SQL table to updatekey_col
- Column name/index for WHERE clause (default:
1)update_cols
- Specific columns to update (default: all
except key_col)quote_type
- “single”, “double”, or “auto” (default:
“auto”)Quote Types: - "single"
- Use single
quotes for all values: 'value'
- "double"
-
Use double quotes for all values: "value"
-
"auto"
- Smart quoting: numeric values unquoted, text
values single-quoted
NULL Handling: - "skip"
- Skip
rows/values with NA (default, safest) - "null"
- Convert NA
to SQL NULL in output - "error"
- Stop execution when NA
encountered
Install using:
devtools::install_github("leosonh/sqlcaseR")
Much thanks to a couple of my prior colleagues at Learning Collider - Nitya Raviprakash and Jasmin Dial - who provided healthy discussion around my misery of constructing long SQL queries. Credit is also due to Kevin Flerlage, whose efforts in automating this process in Excel are commendable and partially inspired this package.
If desired, cite the package using:
citation("sqlcaseR")
License: MIT License