August 13, 2020

Typed APIs in Python with dataclasses and NamedTuples

Why would Python programmers ever care about types? While Python doesn’t check any types statically (before running the program), it does perform extensive run-time type checking. Checking types at run-time without any implicit casts makes the language strongly-typed and dynamically-typed, as opposed to a language like C which is weakly-typed and statically-typed. This is an important distinction, but I won’t go over the differences between strong and weak typing in this post.

Newer versions of Python 3 have support for type annotations which gives the programmer some more information about types. Tools like mypy perform some basic static type checking. However, these static type-checkers are not all-powerful and sometimes it’s useful to provide some extra type-safety dynamically at run-time.

The API

Imagine you’re writing a Python script that uses a stock market API. The API provides a GET method called get_stocks which returns some JSON data containing information about three very specific stocks you’re interested in (this is important because we know exactly what data the API method will return and therefore can model it). This is a bit hand-wavy, but the actual API call doesn’t matter—we only care about the JSON return value.

import json
from pprint import pprint

def get_stocks() -> str:
    """
    API method returning some JSON data
    """

    return json.dumps(
        {
            "TSLA": {"price": "1000.00"},
            "AMZN": {"price": "3000.00"},
            "AAPL": {"price": "400.00"}
        }
    )


stock_data = get_stocks()
pprint(stock_data)
Python 3.8.5 (default, Jul 21 2020, 10:48:26)
[Clang 11.0.3 (clang-1103.0.32.62)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
('{"TSLA": {"price": "1000.00"}, "AMZN": {"price": "3000.00"}, "AAPL": '
 '{"price": "400.00"}}')
>>> python.el: native completion setup loaded

We’d usually consume this API by serializing the JSON string to a Python dict.

def get_tsla_price(stock_json_data: str) -> float:
    return float(json.loads(stock_json_data)["TSLA"]["price"])

print(get_tsla_price(stock_data))
1000.0

This is alright, but remembering that the price field is a string can get tedious. Let’s try and do better by defining the type of this JSON structure.

from typing import Dict

def stocks_to_dict(stock_json_data: str) -> Dict[str, Dict[str, float]]:
    return json.loads(stock_json_data)

pprint(stocks_to_dict(stock_data))
{'AAPL': {'price': '400.00'},
 'AMZN': {'price': '3000.00'},
 'TSLA': {'price': '1000.00'}}

Now a static type-checker like mypy can assume that stock_data["TSLA"]["price"] is a float.

What if the API changes, and the get_stocks method also includes the company name and the percent change (I’m not a stock market expert so this might not be the correct term) in each stock JSON object?

def get_stocks() -> str:
    """
    API method returning some JSON data
    """

    return json.dumps(
        {
            "TSLA": {
                "name": "Tesla, Inc.",
                "price": "1000.00",
                "percent_change": "+2.03%"
            },
            "AMZN": {
                "name": "Amazon.com, Inc.",
                "price": "3000.00",
                "percent_change": "-1.01%"
            },
            "AAPL": {
                "name": "Apple Inc.",
                "price": "400.00",
                "percent_change": "-1.51%"
            }
        }
    )

stock_data = get_stocks()

pprint(stock_data)
('{"TSLA": {"name": "Tesla, Inc.", "price": "1000.00", "percent_change": '
 '"+2.03%"}, "AMZN": {"name": "Amazon.com, Inc.", "price": "3000.00", '
 '"percent_change": "-1.01%"}, "AAPL": {"name": "Apple Inc.", "price": '
 '"400.00", "percent_change": "-1.51%"}}')

What does the type signature for the serialized dict even look like? We wouldn’t want to keep the percent change as a string because that would be painful to work with.

This is my best guess but it’s still not great.

from typing import Dict, Union


def stocks_to_dict(stock_json_data: str) -> Dict[str, Dict[str, Union[float, str]]]:
    return json.loads(stock_json_data)


pprint(stocks_to_dict(stock_data))
{'AAPL': {'name': 'Apple Inc.', 'percent_change': '-1.51%', 'price': '400.00'},
 'AMZN': {'name': 'Amazon.com, Inc.',
          'percent_change': '-1.01%',
          'price': '3000.00'},
 'TSLA': {'name': 'Tesla, Inc.',
          'percent_change': '+2.03%',
          'price': '1000.00'}}

Most static typecheckers for Python will not complain that this dict still doesn’t reflect the type of the function. Let’s add some type conversions:

from typing import Dict, Union


def stocks_to_dict(stock_json_data: str) -> Dict[str, Dict[str, Union[float, str]]]:
    stocks_dict = json.loads(stock_json_data)
    for symbol in stocks_dict.keys():
        stocks_dict[symbol]["price"] = float(stocks_dict[symbol]["price"])
    return stocks_dict


stocks_dict = stocks_to_dict(stock_data)
pprint(stocks_dict)
print(isinstance(stocks_dict["TSLA"]["price"], float))
{'AAPL': {'name': 'Apple Inc.', 'percent_change': '-1.51%', 'price': 400.0},
 'AMZN': {'name': 'Amazon.com, Inc.',
          'percent_change': '-1.01%',
          'price': 3000.0},
 'TSLA': {'name': 'Tesla, Inc.', 'percent_change': '+2.03%', 'price': 1000.0}}
True

Dynamically adding types

This works, but I’m lazy and don’t want to write a specialized x_to_dict function for every single API method. I want something like a dynamically type-safe C struct—a data-structure that automatically serializes a dict with the correct type conversions. Another benefit of this struct is that it provides some basic documentation for what kinds of fields the API returns and their types. Dictionaries are still great and definitely have their place in Python programs, but in my opinion, an object called Stocks is a lot more descriptive and amenable to refactoring than Dict[str, Dict[str, Union[float, str]]].

Here’s an example of some of the functionality that I want:

stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA)  # -> nice representation of the object
print(stocks.TSLA.price)  # -> 1000.0
print(stocks.TSLA.percent_change)  # -> 0.0203
print(stocks.AMZN.percent_change)  # -> -0.0101
print(stocks.AAPL.name)  # -> "Apple Inc."

Notice how the price and percent_change attributes will automatically get converted to floats.

Let’s take a stab at implementing this with a regular class:

def percent_to_float(percent: str) -> float:
    """
    Converts a percentage string to a float.

    e.g. percent_to_float("+1.01%") -> 0.0101
    e.g. percent_to_float("-22.22%") -> -0.2222
    """

    neg = -1 if percent[0] == "-" else 1
    return neg * float(percent[1:-1]) / 100


class Stocks:
   def __init__(self, *args, **kwargs):
       for symbol, info in kwargs.items():
           # e.g. sets self.TSLA to an empty object
           setattr(self, symbol, type("", (), {})())
           # e.g. sets self.TSLA.name to "Tesla, Inc."
           setattr(getattr(self, symbol), "name", info["name"])
           # e.g. sets self.TSLA.price to 1000.0
           setattr(getattr(self, symbol), "price", float(info["price"]))
           # # e.g. sets self.AMZN.percent_change to -0.0101
           setattr(getattr(self, symbol), "percent_change",
                   percent_to_float(info["percent_change"]))


stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA)  # -> nice representation of the object
print(stocks.TSLA.price)  # -> 1000.0
print(stocks.TSLA.percent_change)  # -> 0.0203
print(stocks.AMZN.percent_change)  # -> -0.0101
print(stocks.AAPL.name)  # -> "Apple Inc."
<__main__. object at 0x10cebe700>
1000.0
0.0203
-0.0101
Apple Inc.

This works pretty well! We’ve used simple metaprogramming to dynamically create class attributes at run-time, all with the correct types! The only problem is that we’d have to add a __repr__ method to each dynamically-created object to get a nice representation of stocks.TSLA when printed. Remember, I’m lazy so this is clearly too much work.

Type-safety with dataclasses

Remember that this is Python and there’s usually a simple answer to most problems in the standard library. Turns out that NamedTuples and dataclasses both do the trick.

from dataclasses import dataclass


@dataclass
class StockInfo:
    name: str
    price: float
    percent_change: float

    def __post_init__(self):
        self.price = float(self.price)
        self.percent_change = percent_to_float(self.percent_change)


print(StockInfo(**json.loads(stock_data)["TSLA"]))
StockInfo(name='Tesla, Inc.', price=1000.0, percent_change=0.0203)

That was easy! Now we can simplify the Stock class to use these StockInfo objects.

class Stocks:
   def __init__(self, *args, **kwargs):
       for symbol, info in kwargs.items():
           # e.g. sets self.TSLA to StockInfo object
           setattr(self, symbol, StockInfo(**info))


stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA)  # -> nice representation of the object
print(stocks.TSLA.price)  # -> 1000.0
print(stocks.TSLA.percent_change)  # -> 0.0203
print(stocks.AMZN.percent_change)  # -> -0.0101
print(stocks.AAPL.name)  # -> "Apple Inc."
StockInfo(name='Tesla, Inc.', price=1000.0, percent_change=0.0203)
1000.0
0.0203
-0.0101
Apple Inc.

As an added bonus, printing out stocks.TSLA gives us a nice representation of the StockInfo object, where before it would print out the raw Python object which isn’t that helpful (of course, it’s easy enough to add a __repr__ method but that’s too much work).

What happens if we try and update the stock?

stocks.TSLA.name = "SpaceX, Inc."
print(stocks.TSLA)
StockInfo(name='SpaceX, Inc.', price=1000.0, percent_change=0.0203)

This isn’t good. I want these objects to be immutable which will prevent a whole class of potential errors.

Turns out that dataclasses can be immutable with a quick modification to the decorator. That should do the trick?

@dataclass(frozen=True)
class StockInfo:
    name: str
    price: float
    percent_change: float

    def __post_init__(self):
        self.price = float(self.price)
        self.percent_change = percent_to_float(self.percent_change)


print(StockInfo(**json.loads(stock_data)["TSLA"]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-tQy25p", line 12, in <module>
    print(StockInfo(**json.loads(stock_data)["TSLA"]))
  File "<string>", line 6, in __init__
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-tQy25p", line 8, in __post_init__
    self.price = float(self.price)
  File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'price'

Looks like the frozen property gets enforced immediately after the dataclass gets initialized, so there’s no way to change the class instance variables after they’re set.

There’s a workaround where you can use super().__setattr__ to bypass the restrictions on calling setattr directly because of the frozen property. (relevant StackOverflow post)

@dataclass(frozen=True)
class StockInfo:
    name: str
    price: float
    percent_change: float

    def __post_init__(self):
        super().__setattr__("price", float(self.price))
        super().__setattr__("percent_change", percent_to_float(self.percent_change))


stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA)

stocks.TSLA.name = "SpaceX, Inc."  # raises an error
StockInfo(name='Tesla, Inc.', price=1000.0, percent_change=0.0203)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-QVEOq2", line 15, in <module>
    stocks.TSLA.name = "SpaceX, Inc."  # raises an error
  File "<string>", line 4, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'name'

Looks like this is working properly.

Type-safety with NamedTuples

If you don’t want to use dataclasses, a NamedTuple works just as well. NamedTuples are immutable by default. We want to do the type conversions before the object is actually initialized using __new__ because once the NamedTuple is created, it’s immutable.

from typing import NamedTuple


class StockInfo(NamedTuple):
    name: str
    price: float
    percent_change: float

    def __new__(cls, *args, **kwargs):
        kwargs["price"] = float(kwargs["price"])
        kwargs["percent_change"] = percent_to_float(kwargs["percent_change"])
        return super().__new__(cls, *args, **kwargs)


print(StockInfo(**json.loads(stock_data)["TSLA"]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-lQpgrh", line 4, in <module>
    class StockInfo(NamedTuple):
  File "/usr/local/Cellar/python@3.8/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/typing.py", line 1638, in __new__
    raise AttributeError("Cannot overwrite NamedTuple attribute " + key)
AttributeError: Cannot overwrite NamedTuple attribute __new__

Turns out we can’t modify the __new__ method directly to convert the types, but it’s possible to hack around this via sub-classing.

from typing import NamedTuple


class _BaseStockInfo(NamedTuple):
    name: str
    price: float
    percent_change: float


class StockInfo(_BaseStockInfo):
    def __new__(cls, *args, **kwargs):
        kwargs["price"] = float(kwargs["price"])
        kwargs["percent_change"] = percent_to_float(kwargs["percent_change"])
        return super().__new__(cls, *args, **kwargs)


stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA)
stocks.TSLA.name = "SpaceX, Inc."  # raises an error
StockInfo(name='Tesla, Inc.', price=1000.0, percent_change=0.0203)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-ayaPng", line 19, in <module>
    stocks.TSLA.name = "SpaceX, Inc."  # raises an error
AttributeError: can't set attribute

Looks like it’s working properly.

Let’s just do a quick check to make sure everything works:

stocks = Stocks(**json.loads(stock_data))
print(stocks.TSLA.price)  # -> 1000.0
print(stocks.TSLA.percent_change)  # -> 0.0203
print(stocks.AMZN.percent_change)  # -> -0.0101
print(stocks.AAPL.name)  # -> "Apple Inc."
1000.0
0.0203
-0.0101
Apple Inc.

Now we have a nice strongly-typed wrapper object for our previously stringly-typed JSON data!

Dataclass vs NamedTuple

Unpacking

What if we want to unpack the StockInfo object for multiple-assignment?

This is easy with NamedTuples since they work just like regular tuples.

tsla = NTStockInfo(**json.loads(stock_data)["TSLA"])
print("TSLA values: ", *tsla, sep=" | ")
name, _, percent_change = tsla
print(f"percent change for {name} stock is {percent_change}")
TSLA values:  | Tesla, Inc. | 1000.0 | 0.0203
percent change for Tesla, Inc. stock is 0.0203

The same can’t be said for a dataclass.

tsla = DCStockInfo(**json.loads(stock_data)["TSLA"])
name, _, percent_change = tsla
print(f"percent change for {name} stock is {percent_change}")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/folders/9k/rrglbkg540qc7_jb7g6d9l8r0000gn/T/babel-g6I1LO/python-bbTkBK", line 2, in <module>
    name, _, percent_change = tsla
TypeError: cannot unpack non-iterable StockInfo object

We can work around this by using the dataclasses.astuple function, but it’s not as intuitive.

import dataclasses

tsla = DCStockInfo(**json.loads(stock_data)["TSLA"])
print("TSLA values: ", *dataclasses.astuple(tsla), sep=" | ")
name, _, percent_change = dataclasses.astuple(tsla)
print(f"percent change for {name} stock is {percent_change}")
TSLA values:  | Tesla, Inc. | 1000.0 | 0.0203
percent change for Tesla, Inc. stock is 0.0203

Serializing to JSON

Since we’re dealing with APIs, it’s useful to quickly be able to serialize an object to JSON with the correct types.

tsla = NTStockInfo(**json.loads(stock_data)["TSLA"])

# the _asdict() method converts a NamedTuple to a mapping type
pprint(json.dumps(tsla._asdict()))
'{"name": "Tesla, Inc.", "price": 1000.0, "percent_change": 0.0203}'
import dataclasses

tsla = DCStockInfo(**json.loads(stock_data)["TSLA"])
pprint(json.dumps(dataclasses.asdict(tsla)))
'{"name": "Tesla, Inc.", "price": 1000.0, "percent_change": 0.0203}'

Both approaches work equally well in this case.

Documentation

The dataclass implementation is, in my opinion, simpler to implement and has nicer built-in documentation via help(StockInfo).

Help on class StockInfo in module __main__:

class StockInfo(builtins.object)
 |  StockInfo(name: str, price: float, percent_change: float) -> None

Since our NamedTuple implementation is a sub-class, we have to scroll down a bit to find the attributes of the class in the help output, and the type annotations are hidden away as an OrderedDict in the _fields attribute.

 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from _BaseStockInfo:
 |
 |  name
 |      Alias for field number 0
 |
 |  price
 |      Alias for field number 1
 |
 |  percent_change
 |      Alias for field number 2
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from _BaseStockInfo:
 |
 |  __annotations__ = OrderedDict([('name', <class 'str'>), ('price', ... ...
 |
 |  _field_defaults = {}
 |
 |  _field_types = OrderedDict([('name', <class 'str'>),

© Samarth Kishor 2020

Powered by Hugo & Kiss.