The Problem
Recently I have been working on a project that uses Pydantic for data validation and SQLAlchemy
(specifically, SQLModel) for database operations. My database model includes a JSON column, to
which I was writing a Pydantic model via
model.model_dump(mode="json")
.
e.g.
class Batch(SQLModel, table=True): id: uuid.UUID = Field(primary_key=True, default_factory=uuid.uuid4) rules: dict = Field(sa_column=Column(JSONB)) class BatchIngestionRules(BaseModel): source_id: uuid.UUID ... class BatchRules(BaseModel): ingestion: BatchIngestionRules ... batch = Batch( rules=BatchRules( ingestion=BatchIngestionRules(source_id="...") ).model_dump(mode="json") ) session.add(batch) session.commit()
and then reading it back out with
batch = session.get(id="...") # pseudocode rules = BatchRules.model_validate(batch.rules)
However, while this approach works, it requires a lot of unnecessary boilerplate to dump/validate the model at every database interaction. This is suboptimal.
The Solution
I found the following snippet of code in the database utilities of prefecthq/prefect:
class Pydantic(TypeDecorator): """ A pydantic type that converts inserted parameters to json and converts read values to the pydantic type. """ impl = JSON cache_ok = True def __init__(self, pydantic_type, sa_column_type=None): super().__init__() self._pydantic_type = pydantic_type if sa_column_type is not None: self.impl = sa_column_type def process_bind_param(self, value, dialect) -> Optional[str]: if value is None: return None # parse the value to ensure it complies with the schema # (this will raise validation errors if not) adapter = pydantic.TypeAdapter(self._pydantic_type) value = adapter.validate_python(value) # sqlalchemy requires the bind parameter's value to be a python-native # collection of JSON-compatible objects. we achieve that by dumping the # value to a json string using the pydantic JSON encoder and re-parsing # it into a python-native form. return adapter.dump_python(value, mode="json") def process_result_value(self, value, dialect): if value is not None: # load the json object into a fully hydrated typed object return pydantic.TypeAdapter(self._pydantic_type).validate_python(value)
This can be used to type-annotate the JSON column in the database model, and will automatically convert between the database JSON representation and the Pydantic model when reading and writing to the database.
class Batch(SQLModel, table=True): id: uuid.UUID = Field(primary_key=True, default_factory=uuid.uuid4) rules: BatchRules = Field(sa_column=Pydantic(BatchRules)) batch = Batch(rules=BatchRules(ingestion=BatchIngestionRules(source_id="..."))) session.add(batch) session.commit() batch_ = session.get(id="...") rules = batch_.rules assert isinstance(rules, BatchRules)
Conclusion
Using the Pydantic TypeAdapter and SQLAlchemy TypeDecorator, we can directly load/save Pydantic models to SQLAlchemy JSON columns without having to manually validate/dump the model every time.