Unlocking the Power of Databricks Magic Commands

Databricks

Python

Tips

Decembricks

Discover the magic of Databricks notebooks with built-in commands that boost productivity and streamline your workflow. * Explore a variety of line and cell magics * Leverage commands for timing code execution * Utilize magics for logging and running shell scripts

Published

11/12/2024

This tip is part of the Decembricks 2024 series to teach a new tip everyday of december.

Unlocking the Power of Databricks Magic Commands

Databricks notebooks offer a powerful and flexible environment for data exploration, analysis, and collaboration. One of the key features that makes Databricks notebooks so productivity-enhancing is the availability of built-in IPython magic commands. These magic commands allow you to interact with the notebook environment, control code execution, and extend functionality.

Summary

Magic commands are special commands in Databricks notebooks that extend functionality
Line magics start with % and operate on a single line
Cell magics start with %% and operate on the entire cell
You can list all available magics with %lsmagic
Magics like %time, %%timeit are useful for timing code execution
Magics like %logstart, %logstop, %logstate help with logging notebook activities
Magics like %sh and %perl allow running shell scripts and code in other languages

Exploring Available Magics

To see a list of all available line and cell magic commands, you can use the %lsmagic command:

%lsmagic

This will output an extensive list of the built-in magic commands, separated into line magics and cell magics. Take some time to scan through the list and make note of any that seem particularly useful for your workflow.

Some of the most commonly used magic commands include:

%matplotlib - enable Matplotlib integration for inline plotting
%time - time execution of a single statement
%%timeit - time execution of the whole cell, averaging over multiple runs
%%capture - capture the stdout/stderr of the cell
%%html - render the cell as HTML
%%sql - execute contents of cell as SQL in a temporary context

::: {.callout-note title=“Familiar Magics” appearance=“simple”} You’re likely already familiar with magics for common languages like %python, %sql, %r, and %scala. In this post, we’ll focus on some lesser-known but equally powerful magics. :::

Timing Code Execution

One of the most practical applications of magic commands is timing code execution to identify performance bottlenecks. The %time and %%timeit magics are your go-to tools for this.

::: {.callout-note title=“Line vs Cell Magics” appearance=“simple”} Remember, line magics like %time operate on a single line, while cell magics like %%timeit operate on the entire contents of the cell. :::

Using %time

Use %time to time a single line of code:

%time df = spark.sql("SELECT * FROM large_table")

CPU times: user 456 ms, sys: 123 ms, total: 579 ms
Wall time: 5.67 s

Using %%timeit

Use %%timeit to time execution of the entire cell, averaging over multiple runs for more robust timing:

%%timeit
df = spark.sql("SELECT * FROM large_table")
df.count()

2.31 s ± 135 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

By strategically using these timing magics, you can pinpoint which parts of your code are taking the longest and optimize accordingly. For example, you might discover a particular Spark SQL query that is taking much longer than expected, and then you can focus your efforts on improving that query.

Logging Notebook Activities

Databricks magic commands also provide handy tools for logging notebook activities. This can be extremely useful for tracking progress, debugging, and monitoring long-running jobs. These logs are output to local file in your notebooks local folder on your databricks workspace which by default is named ipython_log.py

Key logging magics include:

%logstart - start logging cell activities
%logstop - stop logging cell activities
%logstate - display current logging state

Using %logstart and %logstop

To start logging, simply use %logstart at the beginning of a cell:

%logstart
df = spark.sql("SELECT * FROM large_table")
df.count()

This will log all stdout/stderr output as well as execution metadata to a log file.

To stop logging, use %%logstop:

%logstop

Checking Log State

To check the current state of logging, use %logstate:

%logstate

Logging is currently ON
Log file at: <log_path>.log

The Internals of the file look like:

# IPython log file
get_ipython().run_line_magic('logstart', '')
df = spark.sql("SELECT * FROM default.my_test_delta_table")
df.count()
get_ipython().run_line_magic('logstate', '')
get_ipython().run_cell_magic('logstop', '', '')

By utilizing these logging magics, you can keep detailed records of your notebook activities. You can expand this to implement your own custom version of the logger magic.

Running Shell Commands and Other Languages

Databricks magic commands also allow you to execute shell commands and code in languages beyond the notebook’s default. This can be extremely powerful for tasks like file system operations, system monitoring, and leveraging existing scripts.

Some useful magics for this include:

%sh - execute shell commands
%%perl - execute Perl code

For example, you can use %sh to run a simple bash command:

%sh 
echo "Hello from bash!"

Or use %perl to execute a Perl one-liner:

%perl
print "Hello from Perl!\n";

These language agnostic magics open up a world of possibilities for integrating Databricks notebooks into diverse workflows or migrating old code in a grandual manner.

Exploring Further

We’ve only scratched the surface of what’s possible with Databricks magic commands. To dive deeper, check out these resources: