Rename Multiple Columns in PySpark with withColumnsRenamed()
pyspark
performance
Spark
Renaming columns one at a time with .withColumnRenamed() in a loop stacks nested plan nodes and slows Spark’s optimizer. Use .withColumnsRenamed() to apply all renames in a single pass with one clean plan node.
Modified
19/02/2026
NoteLive Execution
All code in this post is rendered against a Databricks workspace before publication.
Summary
Calling .withColumnRenamed() in a loop has the same problem as .withColumn() loops: each call adds a new Project node to the logical plan
.withColumnsRenamed() (Spark 3.4+ / DBR 13.0+) takes a dictionary of {old_name: new_name} and applies all renames in a single plan step
Perfect for converting to snake_case, expanding abbreviations, removing special characters, and applying generic naming conventions across entire DataFrames
Introduction
This is the companion anti-pattern to calling .withColumn() in a loop. If you’ve ever ingested data from vendor feeds, legacy systems, APIs, or CSVs, you know the pain. Column names come in every flavour: CustID, trnxAmt, ACCT-bal, pmt$Status, addr.Line1. The instinct is to write a loop renaming them one by one.
The problem is identical to .withColumn() loops. Each .withColumnRenamed() call returns a new DataFrame with a new Project node stacked on the logical plan. With dozens of renames, the Catalyst optimizer bogs down traversing and optimizing through all those nested nodes.
.withColumnsRenamed() was introduced in PySpark 3.4 to solve this. One dictionary mapping old names to new names, one plan node.
# Single plan node, single pass, same resultdf_result = df.withColumnsRenamed({"CustID": "customer_id","trnxAmt": "transaction_amount","ACCT-bal": "account_balance","pmt$Status": "payment_status","addr.Line1": "address_line_1",})print(df_result.columns)
Real-World: Generic snake_case Conversion for All Columns
When you want every column in your DataFrame to follow a consistent naming convention, build the rename dictionary dynamically:
import redef to_snake_case(name: str) ->str:"""Convert any naming convention to clean snake_case.""" name = re.sub(r'[.$\-@#!&*()+=/\\]', '_', name) # special chars to _ name = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', name) # camelCase split name = re.sub(r'([A-Z]+)([A-Z][a-z])', r'\1_\2', name) # ACRONYMWord split name = re.sub(r'_{2,}', '_', name) # collapse doublesreturn name.lower().strip('_')# Build rename dict from current column namesrename_map = {col: to_snake_case(col) for col in df.columns}print(rename_map)# Apply all renames in one passdf_clean = df.withColumnsRenamed(rename_map)print(df_clean.columns)
Data teams often inherit column names full of cryptic abbreviations from source systems. Combine snake_case conversion with abbreviation expansion for truly readable schemas:
import reABBREVIATIONS = {"trnx": "transaction", "cust": "customer","acct": "account", "pmt": "payment","addr": "address", "amt": "amount","bal": "balance", "qty": "quantity","desc": "description", "dt": "date","ts": "timestamp", "nbr": "number","num": "number", "cd": "code","nm": "name",}def expand_column_name(name: str) ->str:"""snake_case + expand common abbreviations."""# First convert to snake_case snake = re.sub(r'[.$\-@#!]', '_', name) snake = re.sub(r'([a-z0-9])([A-Z])', r'\1_\2', snake) snake = re.sub(r'_{2,}', '_', snake).lower().strip('_')# Then expand abbreviations parts = snake.split("_")return"_".join(ABBREVIATIONS.get(p, p) for p in parts)rename_map = {col: expand_column_name(col) for col in df.columns}print(rename_map)df_clean = df.withColumnsRenamed(rename_map)print(df_clean.columns)
Vendor data often arrives with prefixes like SRC_, RAW_, or STG_. Here we create a prefixed version and strip it back:
# First simulate vendor-prefixed data from our setup dfdf_vendor = df.withColumnsRenamed({col: f"SRC_{col}"for col in df.columns})print("Before:", df_vendor.columns)# Now strip the prefix in one passprefix ="SRC_"rename_map = { col: col[len(prefix):] if col.startswith(prefix) else colfor col in df_vendor.columns}df_clean = df_vendor.withColumnsRenamed(rename_map)print("After: ", df_clean.columns)
import org.apache.spark.sql.functions._// Scala uses a Map of old name to new nameval dfResult = df.withColumnsRenamed(Map("CustID"->"customer_id","trnxAmt"->"transaction_amount","ACCT-bal"->"account_balance","pmt$Status"->"payment_status","addr.Line1"->"address_line_1"))
TipCheck Your Spark Version
.withColumnsRenamed() requires PySpark 3.4+ (Databricks Runtime 13.0+). On older runtimes, you can use df.toDF(*new_names) if renaming all columns at once, or functools.reduce with .withColumnRenamed() as a workaround.
Detail
The Same Problem, Different Method
.withColumnRenamed() suffers from exactly the same plan-nesting issue as .withColumn(). Each call returns a new DataFrame with a new Project node appended to the logical plan. See the companion tip on .withColumns() for the full Catalyst optimizer breakdown.
The key difference is that .withColumnsRenamed() works with strings only, no Column expressions needed, making it even simpler to use.
Benefits of .withColumnsRenamed()
Single Plan Node: All renames collapse into one Project, no nested plan growth
Dictionary-Driven: Define rename rules as configuration rather than hardcoded loops
Composable: Build the rename dict programmatically from regex rules, lookup tables, or metadata
Safe Handling: If a column in the dict does not exist in the DataFrame, Spark silently skips it (no error)
Production Ready: Define your naming standards once, apply them everywhere
Common Rename Patterns
snake_case conversion: Teams following Python naming conventions
Abbreviation expansion: Improved readability and self-documenting schemas
Prefix/suffix stripping: Vendor data normalisation
Special character removal: Compatibility with downstream systems (Parquet, Delta, BI tools that choke on dots, dollar signs, or hyphens in column names)