Skip to contents

Basic data cleaning

OR.0.rm()
Remove zero elements
OR.F.to.NA()
Convert FALSE values to NA
OR.NA.rm()
Remove missing values
OR.NA.to.0()
Convert missing values to zero
OR.NA.to.F()
Convert missing values to FALSE
OR.NA.to.empty()
Convert missing values to empty strings
OR.as.numeric()
Convert strings to numeric after normalising minus signs and stripping digit grouping separators
OR.string.to.binary()
Convert string variable to binary variable

Delimited strings

OR.delim.contains()
Detect delimited string elements (case-insensitive and whitespace-trimmed)
OR.delim.intersect()
Check if two delimited strings share any element (case-insensitive and whitespace-trimmed)
OR.delim.merge()
Merge delimited string elements pairwise (case-insensitive and whitespace-trimmed)
OR.delim.replace()
Replace delimited string elements (case-insensitive and whitespace-trimmed)
OR.delim.split()
Split delimited string elements (case-insensitive and whitespace-trimmed)
OR.delim.subset()
Check if one delimited string is a subset of another (case-insensitive and whitespace-trimmed)
OR.delim.table()
Frequency table of delimited string elements (case-insensitive and whitespace-trimmed)
OR.delim.txclass()
Classify common medical oncology treatments (work in progress)
OR.delim.txname()
Standardize common medical oncology treatment names (work in progress)

Google Maps Platform

OR.Google.TSP()
Travelling salesman distance via Google Routes API
OR.Google.address()
Resolve address using Google Geocoding API
OR.Google.distance()
Road distance between two addresses using Google Routes API

Microsoft Excel serial dates

OR.dmyY.to.Excel()
Convert mixed format dates to Microsoft Excel serial dates
OR.mdyY.to.Excel()
Convert mixed format dates to Microsoft Excel serial dates

Multiple lines of treatment

OR.txlines.restart()
Identify treatment restart after a minimum treatment-free interval
OR.txlines.timeline()
Construct treatment timelines (long format)
OR.txlines.truncate()
Truncate stop dates by maximum duration
OR.txlines.wide()
Convert treatment timelines to wide format

Outlier detection

OR.kMAD()
Small-sample MAD cutoff under normality
OR.outliers()
Non-parametric outlier detection using the modified z-score method
OR.outliers.rlm()
Polynomial linear non-parametric outlier detection
OR.outliers.rlm.ggplot()
Polynomial linear non-parametric outlier plot

Summary statistics

OR.max()
Maximum with NA removal or NA if no valid entries
OR.mean()
Arithmetic mean with NA removal or NA if no valid entries
OR.min()
Minimum with NA removal or NA if no valid entries
OR.mode()
Statistical mode
OR.modefreq()
Frequency of the mode
OR.rowleft()
Row-wise leftmost non-missing value
OR.rowmax()
Row-wise maxima
OR.rowmax.threshold()
Row-wise maxima below a threshold
OR.rowmin()
Row-wise minima
OR.rowmin.threshold()
Row-wise minima above a threshold
OR.sum()
Sum with NA removal or NA if no valid entries

Treatment dosing

OR.dose.vials()
Minimum-cost vial combination to achieve a target dose
OR.dose.weights()
Stabilize and impute dosing weights within clusters

Other

OR.LOCF()
Last observation carried forward
OR.collapse()
Collapse repeated measurements across IDs
OR.permutations()
Enumerate permutations of k out of n items
OR.read.csv.stub()
Read CSV files matching a path stub and combine by rows
OR.survoutcome()
Survival outcomes
AICR.Huber()
Robust Akaike information criterion for Huber M-estimation
rlm.Huber()
Robust linear regression using Huber M-estimation