Using Pydantic with SQLAlchemy JSON columns

Published on

The Problem

Recently I have been working on a project that uses Pydantic for data validation and SQLAlchemy (specifically, SQLModel) for database operations. My database model includes a JSON column, to which I was writing a Pydantic model via model.model_dump(mode="json"). e.g.

class Batch(SQLModel, table=True):
    id: uuid.UUID = Field(primary_key=True, default_factory=uuid.uuid4)
    rules: dict = Field(sa_column=Column(JSONB))

class BatchIngestionRules(BaseModel):
    source_id: uuid.UUID 
    ... 

class BatchRules(BaseModel):
    ingestion: BatchIngestionRules
    ... 

batch = Batch(
    rules=BatchRules(
        ingestion=BatchIngestionRules(source_id="...")
    ).model_dump(mode="json")
)

session.add(batch)
session.commit()

and then reading it back out with

batch = session.get(id="...") # pseudocode
rules = BatchRules.model_validate(batch.rules)

However, while this approach works, it requires a lot of unnecessary boilerplate to dump/validate the model at every database interaction. This is suboptimal.

The Solution

I found the following snippet of code in the database utilities of prefecthq/prefect:

class Pydantic(TypeDecorator):
    """
    A pydantic type that converts inserted parameters to
    json and converts read values to the pydantic type.
    """

    impl = JSON
    cache_ok = True

    def __init__(self, pydantic_type, sa_column_type=None):
        super().__init__()
        self._pydantic_type = pydantic_type
        if sa_column_type is not None:
            self.impl = sa_column_type

    def process_bind_param(self, value, dialect) -> Optional[str]:
        if value is None:
            return None

        # parse the value to ensure it complies with the schema
        # (this will raise validation errors if not)
        adapter = pydantic.TypeAdapter(self._pydantic_type)
        value = adapter.validate_python(value)

        # sqlalchemy requires the bind parameter's value to be a python-native
        # collection of JSON-compatible objects. we achieve that by dumping the
        # value to a json string using the pydantic JSON encoder and re-parsing
        # it into a python-native form.
        return adapter.dump_python(value, mode="json")

    def process_result_value(self, value, dialect):
        if value is not None:
            # load the json object into a fully hydrated typed object
            return pydantic.TypeAdapter(self._pydantic_type).validate_python(value)

This can be used to type-annotate the JSON column in the database model, and will automatically convert between the database JSON representation and the Pydantic model when reading and writing to the database.

class Batch(SQLModel, table=True):
    id: uuid.UUID = Field(primary_key=True, default_factory=uuid.uuid4)
    rules: BatchRules = Field(sa_column=Pydantic(BatchRules))

batch = Batch(rules=BatchRules(ingestion=BatchIngestionRules(source_id="...")))

session.add(batch)
session.commit()

batch_ = session.get(id="...")
rules = batch_.rules
assert isinstance(rules, BatchRules)

Conclusion

Using the Pydantic TypeAdapter and SQLAlchemy TypeDecorator, we can directly load/save Pydantic models to SQLAlchemy JSON columns without having to manually validate/dump the model every time.