select(): Returns a subset of columns based on their names or indices, discarding the rest.drop(): Returns the dataframe with only the specifically named columns removed.
filter_by(): Returns only the rows that evaluate to True for the provided logical expressions.
rename(): Updates the names of specific columns using key-value pairs while leaving all other columns untouched.
arrange(): Sorts the rows of the dataframe in ascending or descending order based on the values within one or more columns.
mutate(): Adds new columns or modifies existing ones by calculating values row-by-row, keeping the overall row count the same.
summarize(): Computes summary statistics (like mean or count), returning one consolidated row of results per group.
group_by(): Segments the dataframe into distinct categories so that subsequent operations are applied group-wise rather than to the whole dataset.
ungroup(): Removes the internal grouping structure, returning the dataframe to a standard, unsegmented state.
distinct(): Returns only the unique rows, dropping any duplicates based on all columns or a specified subset.
separate(): Splits a single string-based column into multiple new columns using a specified delimiter or regular expression.
unite(): Combines multiple columns into a single new string column, joining their values together with a specified separator.
sample(): Returns a random subset of rows, determined either by a fixed number or a percentage fraction of the data.
Here is a quick reference for some of the database functions:
left_join(): Returns all rows from the left data frame, appending matched columns from the right and filling missing matches with NA.
right_join(): Returns all rows from the right data frame, appending matched columns from the left and filling missing matches with NA.
inner_join(): Returns only the rows that share matching keys in both the left and right data frames, dropping all unmatched rows.
outer_join(): (Referred to as full_join() in R's dplyr) Returns all rows from both data frames, retaining all data and inserting NA wherever there are missing matches.
anti_join(): Filters the left data frame to return only the rows that do not have a corresponding match in the right data frame.
semi_join(): Filters the left data frame to return only the rows that do have a match in the right data frame, without actually adding any new columns from the right.