Unlocking the Power of Databricks Magic Commands
This tip is part of the Decembricks 2024 series to teach a new tip everyday of december.
Unlocking the Power of Databricks Magic Commands
Databricks notebooks offer a powerful and flexible environment for data exploration, analysis, and collaboration. One of the key features that makes Databricks notebooks so productivity-enhancing is the availability of built-in IPython magic commands. These magic commands allow you to interact with the notebook environment, control code execution, and extend functionality.
Summary
- Magic commands are special commands in Databricks notebooks that extend functionality
- Line magics start with
%
and operate on a single line - Cell magics start with
%%
and operate on the entire cell - You can list all available magics with
%lsmagic
- Magics like
%time
,%%timeit
are useful for timing code execution - Magics like
%logstart
,%logstop
,%logstate
help with logging notebook activities - Magics like
%sh
and%perl
allow running shell scripts and code in other languages
Exploring Available Magics
To see a list of all available line and cell magic commands, you can use the %lsmagic
command:
%lsmagic
This will output an extensive list of the built-in magic commands, separated into line magics and cell magics. Take some time to scan through the list and make note of any that seem particularly useful for your workflow.
Some of the most commonly used magic commands include:
%matplotlib
- enable Matplotlib integration for inline plotting%time
- time execution of a single statement%%timeit
- time execution of the whole cell, averaging over multiple runs%%capture
- capture the stdout/stderr of the cell%%html
- render the cell as HTML%%sql
- execute contents of cell as SQL in a temporary context
::: {.callout-note title=“Familiar Magics” appearance=“simple”} You’re likely already familiar with magics for common languages like %python
, %sql
, %r
, and %scala
. In this post, we’ll focus on some lesser-known but equally powerful magics. :::
Timing Code Execution
One of the most practical applications of magic commands is timing code execution to identify performance bottlenecks. The %time
and %%timeit
magics are your go-to tools for this.
::: {.callout-note title=“Line vs Cell Magics” appearance=“simple”} Remember, line magics like %time
operate on a single line, while cell magics like %%timeit
operate on the entire contents of the cell. :::
Using %time
Use %time
to time a single line of code:
%time df = spark.sql("SELECT * FROM large_table")
CPU times: user 456 ms, sys: 123 ms, total: 579 ms
Wall time: 5.67 s
Using %%timeit
Use %%timeit
to time execution of the entire cell, averaging over multiple runs for more robust timing:
%%timeit
= spark.sql("SELECT * FROM large_table")
df df.count()
2.31 s ± 135 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
By strategically using these timing magics, you can pinpoint which parts of your code are taking the longest and optimize accordingly. For example, you might discover a particular Spark SQL query that is taking much longer than expected, and then you can focus your efforts on improving that query.
Logging Notebook Activities
Databricks magic commands also provide handy tools for logging notebook activities. This can be extremely useful for tracking progress, debugging, and monitoring long-running jobs. These logs are output to local file in your notebooks local folder on your databricks workspace which by default is named ipython_log.py
Key logging magics include:
%logstart
- start logging cell activities%logstop
- stop logging cell activities%logstate
- display current logging state
Using %logstart and %logstop
To start logging, simply use %logstart
at the beginning of a cell:
%logstart
= spark.sql("SELECT * FROM large_table")
df df.count()
This will log all stdout/stderr output as well as execution metadata to a log file.
To stop logging, use %%logstop
:
%logstop
Checking Log State
To check the current state of logging, use %logstate
:
%logstate
Logging is currently ON
Log file at: <log_path>.log
The Internals of the file look like:
# IPython log file
get_ipython().run_line_magic('logstart', '')
df = spark.sql("SELECT * FROM default.my_test_delta_table")
df.count()
get_ipython().run_line_magic('logstate', '') get_ipython().run_cell_magic('logstop', '', '')
By utilizing these logging magics, you can keep detailed records of your notebook activities. You can expand this to implement your own custom version of the logger magic.
Running Shell Commands and Other Languages
Databricks magic commands also allow you to execute shell commands and code in languages beyond the notebook’s default. This can be extremely powerful for tasks like file system operations, system monitoring, and leveraging existing scripts.
Some useful magics for this include:
%sh
- execute shell commands%%perl
- execute Perl code
For example, you can use %sh
to run a simple bash command:
%sh
echo "Hello from bash!"
Or use %perl
to execute a Perl one-liner:
%perl
print "Hello from Perl!\n";
These language agnostic magics open up a world of possibilities for integrating Databricks notebooks into diverse workflows or migrating old code in a grandual manner.
Exploring Further
We’ve only scratched the surface of what’s possible with Databricks magic commands. To dive deeper, check out these resources: