Style#

This section demonstrates visualization of tabular data using the pandas df.style methods.

This notebook is mainly based on the official pandas styling user guide and demonstrates the interactive rendering capabilities.

Imports#

The execution of import ipypandas enables ipypandas globally.

[1]:

import numpy as np
import pandas as pd

[2]:

# enables ipypandas output
import ipypandas

Styler#

When globally enabled ipypandas will attach custom formatters (using ipywidgets) to any pandas DataFrame and Styler object.

Formatting#

The styler distinguishes the display value from the actual value, in both data values and index or columns headers. To control the display value we can use the df.style.* methods.

[3]:

df = pd.DataFrame({
     'strings': ['Adam', 'Mike'],
     'floats': [112.863602, 207.238541],
     'ints': [1, 3]
})

df.style \
  .format_index(str.upper, axis=1) \
  .format(precision=3, thousands='.', decimal=',') \
  .relabel_index(['row 1', 'row 2'], axis=0)

Specific rows or columns can be hidden from rendering by calling the df.style.hide method and passing in a row/column label, a list-like or a slice of row/column labels for the subset argument.

[4]:

df = pd.DataFrame(np.random.randn(5, 5))

df.style \
  .hide(subset=[0, 2, 4], axis=0) \
  .hide(subset=[0, 2, 4], axis=1)

Using the styler to manipulate the display is a useful feature because maintaining the indexing and data values for other purposes gives greater control.

Here is a more comprehensive example of using the formatting functions whilst still relying on the underlying data for indexing and calculations.

[5]:

df = pd.DataFrame(
     np.random.rand(7, 2) * 5,
     columns=['Tokyo', 'Beijing'],
     index=pd.date_range(start='2024-01-01', periods=7)
)

def format_index(v):
    return v.strftime('%A')

def format_values(v):
    if v < 1.75: return 'Dry'
    elif v < 2.75: return 'Rain'
    return 'Heavy Rain'

def format_conditions(s):
    s.format(format_values)
    s.format_index(format_index)
    s.background_gradient(axis=None, vmin=1, vmax=5, cmap='YlGnBu')
    s.set_caption('Weather Conditions')
    return s

df.style \
  .pipe(format_conditions)

Functions#

We can apply custom styling functions to add custom CSS styles.

For example we can build a function that colors text if it is negative and chain this with a function that partially fades cells of negligible value. Since this looks at each element in turn we use df.style.map.

We can also build a function that highlights the maximum value in a column. In this case we use df.style.apply.

[6]:

df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])

def style_negative(v, props):
    return props if v < 0 else None

def style_fading(v, props):
    return props if (v < 0.3) and (v > -0.3) else None

def style_maximum(v, props):
    return np.where(v == np.nanmax(v.values), props, '')

df.style \
  .map(style_negative, props='color:red;') \
  .map(style_fading, props='opacity:20%;') \
  .apply(style_maximum, props='color:white; background-color:green;', axis=0)

Similar we can style headers element-wise by using df.style.map_index or level-wise by using df.style.applyindex.

[7]:

df = pd.DataFrame(
     np.random.randn(4, 4),
     pd.MultiIndex.from_product([['A', 'B'], ['r1', 'r2']]),
     columns=['c1', 'c2', 'c3', 'c4']
)

def style_index(v):
    return 'color:red;' if v in ['A', 'B'] else 'color:blue;'

def style_columns(v):
    return np.where(v.isin(['c1', 'c2']), 'color:green;', 'color:purple;')

df.style \
  .map_index(style_index, axis=0) \
  .apply_index(style_columns, axis=1)

Builtin#

Some styling functions are common enough that they are built-in into the styler, so you don’t have to write them yourself. The current list of such functions is:

.highlight_null: for use with identifying missing data.
.highlight_min, .highlight_max: for use with identifying extremeties in data.
.highlight_between, .highlight_quantile: for use with identifying classes within data.
.text_gradient, .background_gradient: a flexible method for highlighting text and cells based on their or other values.
.bar: to display mini-charts within cell backgrounds.

[8]:

df = pd.DataFrame(np.random.randn(4, 4), columns=['A', 'B', 'C', 'D'])

Highlight Null#[9]:
df.iloc[0,2] = np.nan
df.iloc[2,3] = np.nan
df.style.highlight_null(color='orange')

Highlight Intervals#[10]:
df.style.highlight_min(axis=1, props='color:white; font-weight:bold; background-color:red;')

[11]:
df.style.highlight_max(axis=1, props='color:white; font-weight:bold; background-color:green;')

[12]:
df.style.highlight_between(left=0.5, right=1.5, axis=1, props='color:white; background-color:purple;')

[13]:
df.style.highlight_quantile(q_left=0.85, axis=None, color='blue')

Gradients#

[14]:

df.style.text_gradient(cmap='Oranges')

[15]:

df.style.background_gradient(cmap='Purples')

Charts#

[16]:

df.style.bar(subset=['A', 'B'], color='#D65F5F')

Limitations#

Certain styling functions may not work with non-unique indices.
Tooltips do not work as cell_ids are disabled by ipypandas.