This blogpost contains a summary of the Udemy course on Pydantic created by Dr. Fred Baptiste

Prerequisite - Type Hinting

  • Type hinting is useful for documentation
  • Sphinx can use the type hinting to generate documentation
  • dataclasses are code generators. Use a decorator and type hinting to create classes
  • Pydantic uses typehinting for run time validation, for serializing and deserializing
  • Pydantic is foundational in FastAPI
  • One can specify the type hints for arguments in a function.
  • One can specify the type hints for return types
  • Union helps one specify multiple types for an argument
  • Some useful classes are
    • Optional
    • Sequence
    • Any
    • Set
    • Dict
    • Callable
    • Iterator
    • Iterable

Basics

Introduction

  • Pydantic Models vs Data Classes
    • actual implementation is very different
    • @dataclass is a code generator. it takes your class definition and creates a brand new class with all the functionality
    • Pydantic uses inheritance to add functionality to standard Python classes
      • Just like custom classes gain a lot of functionality by inheriting from the object class
    • For simple stuff, they both look similar because both use Python type hinting as a way for us to add metadata to attributes

Creating a Pydantic Model

  • Any class that inherits BaseModel is a Pydantic Model
  • Usually one creates attributes of the class and then each attribute is provided a type hint
  • As an API provider, you would want to look at the input and then throws exception if the user has not sent the right input to the API

Deserialization

  • Deserialization is the act of taking data to create and populate a new model instance
  • One can load a dictionary using ** but it is not recommended. It is better to use model_validate
  • The other way to deserialize the data is by using model_validate_json. This is very beneficial when working with REST APIs, where request and responses are typically in JSON format
  • There are three ways to deserialize data in to a model instance
    • pass each field as an argument
    • create a dict and use model_validate
    • create a json string and use model_validate_json

Serialization

  • Takes pydantic model and transforms in to something else
  • Any pydantic model are regular classes and the instances are regular python instances of the class. One can retrieve all the relevant attributes using a dict
  • To serialize the model, one can use model_dump and creates a dict or use model_dump_json to create a JSON string
  • Under the hood, Pydantic uses dumps() from json module
  • When you choose to serialize the Pydantic model, one has the flexibility to choose the fields to include or exclude

Type Coercion

  • Pydantic will transform the input data in to the correct type
  • You can control this transformation by using the mode variable
  • By default, the default coercion is termed lax - and it attempts a variety of type coercions
  • If you are an API consumer, it is always better to have pydantic validation set to strict mode as you never know when the API changes. It is always better to ensure that your code raises a ValidationError whenever such an issue arises

Required vs Optional Fields

  • Default values can be provided in the class definition
  • Pydantic validates the default values provided
  • An interesting thing I have learned after playing around with the code is that pydantic does not carry out default value validation. If you give an incorrect default value, it accepts as the correct value and even duck types it
1
2
3
4
5
6
7
from pydantic import BaseModel, ValidationError, Field
class Circle(BaseModel):
    center: tuple[int, int] = 2
    radius: int
data = {"radius": 1}
p = Circle.model_validate(data)
print(p.center)
1
2

As one can see even though we have specified incorrect defaults it works. There is no way to inherently check this via pydantic

  • When we define a default for a function argument, these defaults are calculated and stored at the compile time and not at the run time. So the defaults are stored once. When you call the function, the same default gets picked up. In case you modify the default value, then the default value persists for next calls
1
2
3
4
5
6
def crap_function(x =[]):
    x.append(2)
    return x
print(crap_function())
print(crap_function())
print(crap_function())
1
2
3
[2]
[2, 2]
[2, 2, 2]

Unlike default arguments in Python, Pydantic handles in a special way and makes a deep copy of the default arguments. There is no share list of fields between two objects

Nullable Fields

  • It means it will entertain the actual field type or None
  • An optional field simply means that the data being deserialized does not need to contain the key for that specific field
  • A nullability of a field has nothing to do with whether it is optional or not. It basically just indicates whether a field can be set to None
1
2
3
4
5
6
7
8
9
from pydantic import BaseModel
from typing import Union, Optional
class Model(BaseModel):
    field_1: int | None = None
    field_2: Union[int, None] = None
    field_3: Optional[int] = None

print(Model.model_fields)

1
{'field_1': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'field_2': FieldInfo(annotation=Union[int, NoneType], required=False, default=None), 'field_3': FieldInfo(annotation=Union[int, NoneType], required=False, default=None)}

Combining Nullable and Optional

  • There are four ways to combine both these features
    • Required and Not Nullable
    • Required and Nullable
    • Optional and Not Nullable
    • Optional and Nullable
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from pydantic import BaseModel
class Coordinate_1(BaseModel):
    x: int

class Coordinate_2(BaseModel): x: int|None

class Coordinate_3(BaseModel): x: int = None

class Coordinate_4(BaseModel): x: int|None = None

In the above class definitions, each captures one of the combination of nullability and required fields

Inspecting Fields

  • model_fields_set gives the list of fields that were set by deserialization from input data. Here is an example where it might be useful where you only want to serialize the attributes that you have setup
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from pydantic import BaseModel
class Model(BaseModel):
    x1: int
    x2: int | None
    x3: int = 1
    x4: int | None = 3

m1 = Model(x1=2, x2=2) m2 = Model(x1=2, x2=None) m3 = Model(x1=2, x2=2, x3=12) m4 = Model(x1=2, x2=None, x4=12)

print([x.model_dump(include=x.model_fields_set) for x in [m1, m2, m3, m4]])

1
[{'x1': 2, 'x2': 2}, {'x1': 2, 'x2': None}, {'x1': 2, 'x2': 2, 'x3': 12}, {'x1': 2, 'x2': None, 'x4': 12}]

JSON Scheme generation

  • If you are developing using FastAPI, it takes care of integrating with Pydantic
  • model_json_schema creates a JSON schema for the model

Model Configuration

Introduction

  • Pydantic had certain model behaviors
    • Extra fields provided in the data being serialized are ignored
    • type coercion
    • default values do not get validated
  • Pydantic provides us a way to modify, or override, this default behavior
    • use a special object ConfigDict
    • attach this object to any model by setting a special class attribute

Handling Extra fields

  • If there are additional fields in the model, generally they are ignored. One can use ConfigDict to create specific behaviors

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    from pydantic import BaseModel
    from pydantic import ConfigDict
    from pydantic  import ValidationError
    

    class Model(BaseModel): model_config = ConfigDict(extra='allow') field_1:int

    a = Model(field_1 = 2, field_2 = 10) print(dict(a)) print(a.model_extra)

    1
    2
    
    {'field_1': 2, 'field_2': 10}
    {'field_2': 10}
    
  • use case is that you want to validate a set of fields and not all the fields.

Strict vs Lax Type coercion

  • List is coerced to tuple based on the model fields
  • Tuple is coerced too list based on the model fields
  • You can set the model itself as strict or lax mode
  • JSON structure understands the following data types
    • boolean: true or false
    • number: float or integer does not actually matter
    • string
    • a null value represented by characters null
    • an array (square brackets, comma separated values of any JSON type)
    • object: a dictionary, delimited by only braces with key value pairs
  • When you are deserializing JSON, JSON has limited set of data types. Hence you can specify in Pydantic the way you can deserialize individual fields
  • ConfigDict as True does not have anything got to do with conversion between list to tuples

Validating Default values

  • At the model level, you can set up code to validate default values. It is useful when validators transform the data
  • We want the default value through some transformations and hence it is useful to validations on the default value
  • model_config = ConfigDict(validate_default = True) can be used to validate default value

Validating Assignment

  • When we define a Model and instantiate it, Pydantic validates the input. However one can always assign the field value to another type and by default pydantic does not raise any error
  • The reason the above is the default behavior is that developer is controlling the values one is setting
  • config = ConfigDict(validate_assignment=True) can be used to ensure that one cannot set fields of a model instance to any other type that specified in the model

Mutability

  • We can control whether the model is mutable at the model level. We can instruct Pydantic to ensure that model is immutable. This means any instance to Pydantic model is not allowed
  • When you specify the Pydantic model as frozen, then the added advantage is that you can use this object as a key in a dictionary
  • Side effects of having a frozen pydantic model means that you can put them as keys in the dictionary

Coercing Numbers to Strings

  • There are options in the ConfigDict that can be used to convert numbers to strings
  • coerce_numbers_to_str

Standardizing Strings

  • There are options to standardize string input such as converting to lower, converting to upper case, specifying minimum length for all strings, specifying max length for all strings

Enum values

  • For whatever reason, you want the model dump not to contain python enum object but the relevant enum value, one can use use_enum_values to ensure that option
  • the reasons, as far as my experience goes, is very limited. It usually has to do with trying to serialize the result of a model_dump() to JSON ourselves

All config options

There are 49 options listed as of Pydantic v 2.11.7

Config Option Description
title Model title used in JSON schema.
model_title_generator Function to generate a model’s schema title.
field_title_generator Function to generate field titles for JSON schema.
str_to_lower Convert string inputs to lowercase.
str_to_upper Convert string inputs to uppercase.
str_strip_whitespace Strip leading/trailing whitespace from strings.
str_min_length Minimum allowed string length globally.
str_max_length Maximum allowed string length globally.
extra How to treat extra fields: ‘ignore’, ‘allow’, or ‘forbid’.
frozen Make model instances immutable.
populate_by_name Allow field population using field names even when alias is defined.
use_enum_values Serialize enums using their `.value` instead of instance.
validate_assignment Revalidate fields on assignment.
arbitrary_types_allowed Allow arbitrary (non-annotated) types in model fields.
from_attributes Allow construction from object attributes.
loc_by_alias Use field alias in validation errors.
alias_generator Function to generate field aliases.
ignored_types Types to ignore during validation/schema creation.
allow_inf_nan Allow `inf`, `-inf`, and `nan` for float/decimal fields.
json_schema_extra Extra fields or callable for enriching JSON schema.
json_encoders Custom JSON encoders (deprecated in v2).
strict Enable strict validation across the model.
revalidate_instances Revalidate input if already model instance.
ser_json_timedelta Serialize `timedelta` as `‘float’` or `‘iso8601’`.
ser_json_bytes Serialize `bytes` as `‘utf8’` or `‘base64’`.
val_json_bytes Deserialize bytes from `‘utf8’` or `‘base64’`.
ser_json_inf_nan Serialize `inf`, `-inf`, `nan` as float or raise error.
validate_default Validate default field values on instantiation.
validate_return Validate return values from validators.
protected_namespaces Reserved field prefixes (e.g., ‘__') that cannot be used.
hide_input_in_errors Whether to include input data in validation errors.
defer_build Postpone validator building until needed.
plugin_settings Dictionary for plugin-specific config.
schema_generator Override schema generator class.
json_schema_serialization_defaults_required Include `default` in schema only if required.
json_schema_mode_override Override schema generation mode (e.g., `serialization`).
coerce_numbers_to_str Convert numeric input to string when required.
regex_engine Choose regex engine (e.g., `python`, `ecmascript`).
validation_error_cause Include underlying cause in error details.
use_attribute_docstrings Use field docstrings as descriptions.
cache_strings Enable caching of string coercions.
validate_by_alias Allow validation using field aliases.
validate_by_name Allow validation using original field names.
serialize_by_alias Use alias for output serialization.
with_config Decorator to apply config to function-based models.

Field Aliasing, Serialization and Deserialization

Introduction

  • We could need multiple names for the same thing. Sometimes we may need to reference a field
    • Python follows snake case
    • JSON follows camelCase
  • If we want to serialize a class that has snake case fields but want to serialize in to field names that are camelCase
  • Suppose JSON data contains a field named id, we can create a class with field id but linters will complain, since id is a reserved word
  • The workaround is that one usually append with underscore such as id_, type_
  • Pydantic has a concept of field alias. The field has a name in the model but a different name when serializing
  • We could have one name when deserialization and we could have another name when serializing the model
  • We can describe deserialization alias and serialization alias
    • alias: for deserialization
    • serialization alias: for serialization
  • Can we use either name or alias in deserialization ? Not possible by default. Option to enable using either name or alias
  • Using either name or alias in serialization. We have an option to use name or alias when using model dump
  • Validation aliases
    • allows us to specify aliases used for deserialization specifically
    • can be used in combination with serialization aliases
    • overrides plain alias if it is present
  • We have the option to specify multiple validation aliases for the same field
  • Pydantic’s AliasChoices can be used to specify multiple aliases
  • One can combine validation and serialization alias
    • if we starts an alias, then that can be used for both serialization and deserialization
    • if we set validation alias, then validation alias used for deserialization, alias is used for serialization
    • if we set serialization alias, alias is used for deserialization and serialization_alias used for serializing
    • if we set validation alias and serialization alias, then validation alias used for deserialization and serialization alias is used for serialization
  • Pydantic auto generates aliases for common use cases
    • We can tell Pydantic to generate a camel case alias for every field in the model
    • can override auto generated alias for individual fields
  • Auto generating alias is specified by a function and that is attached to model config
  • Customized data serialization : Pydantic can handle most serialization. It may not be sufficient in certain cases. We might have to override and hence we can define our own functions to do that

Field Aliases and Default Values

  • Specifying aliases that can be used to deserialize
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from pydantic import BaseModel, Field, ValidationError
class Model(BaseModel):
    id_: int = Field(alias="id")
    last_name: str = Field(alias="lastName")

json_data = """ { "id" : 100, "lastName": "RK" } """

print(Model.model_validate_json(json_data))

1
id_=100 last_name='RK'
  • Specifying aliases and defaults
1
2
3
4
5
6
7
from pydantic import BaseModel, Field, ValidationError
class Model(BaseModel):
    id_: int = Field(alias="id", default=100)
    last_name: str = Field(alias="lastName")

print(Model(lastName="rj"))

1
id_=100 last_name='rj'
  • Deserialization alias
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from pydantic import BaseModel, Field, ValidationError
class Person(BaseModel):
    id_: int = Field(alias="id", default=100)
    first_name: str | None = Field(alias="firstName", default=None)
    second_name: str | None = Field(alias="lastName")
    age: int | None = None

print(Person(firstName ="rk", lastName="rj", age=12).model_dump(by_alias=True))

1
{'id': 100, 'firstName': 'rk', 'lastName': 'rj', 'age': 12}

Alias Generator functions

  • You can specify alias for each field present in each class and specify aliases
  • If there is a systematic way to convert the field name to alias, one can write a function at the Model level

Deserializing by Field Name or Alias

  • Fields for which alias defined must use alias. We can configure to use either field name or alias
  • populate_by_name=True in the configuration ensures that the deserialization can use either the field name or alias

Serialization aliases

  • In cases where the deserialization field name and serialization field name differ from the field name, then one can use serialization alias
1
2
3
4
5
6
from pydantic import BaseModel, Field, ValidationError, ConfigDict
class Test(BaseModel):
    first_name: str = Field(alias = 'Firstname', serialization_alias= 'firstName')

print(Test(Firstname='re').model_dump_json(by_alias=True))

1
{"firstName":"re"}

Validation aliases

  • There are three types of aliases. One is the normal alias, the second is the validation alias and third is the serialization alias. The validation alias is used in the process of deserialization. The serialization alias is used in the process of serialization
  • You can also give an array of ValidationAliases that the input data can contain data fields under any of the names given in the ValidationAliases choices 1

Custom Serializers

Specialized Pydantic Types

Introduction

  • Pydantic offers a huge variety of specialized types with specialized validation and Serialization
    • PositiveInt: positive integers
    • conlist : constrained list
    • PastDate : datetime that must be in the past
    • HttpURL: validates URL and provides special parsing methods
    • EmailStr: validates an email address

PositiveInt

1
2
3
4
5
from pydantic import BaseModel, PositiveInt, ValidationError
class Circle(BaseModel):
    center: tuple[int, int] = (0, 0)
    radius: PositiveInt = 1
print(Circle.model_fields)
1
{'center': FieldInfo(annotation=tuple[int, int], required=False, default=(0, 0)), 'radius': FieldInfo(annotation=int, required=False, default=1, metadata=[Gt(gt=0)])}

UUID

  • There are specific Pydantic datatypes that can be used for UUIDs
  • One can also specify default_factory if you want the default value to be different for various instances

Additional Field Features

  • One can add numerical constraints to the field
  • One can add string constraints to field
  • One can use default factories to configure default values to the fields
  • One can specify parameters at the model level in the ConfigDict and then override at parameters at a field level

Annotated Types

  • Python has an annotated type hint
    • Simple a way to add metadata to an existing type
    • Python itself does nothing with it - it behaves like a regular type
  • It gives third party libraries the option to use it for their own purposes
  • Pydantic makes extensive use of Annotated types
  • especially useful for creating re-usable types
    • fields
    • validators
    • serializers
  • Use typevariables to create Annotated Types to make it more generic
  • One can use Annotated Type and “StringConstraints” to make more effective validation of strings
1
2
3
4
5
6
7
from pydantic import BaseModel, ConfigDict, Field, field_serializer, UUID4
from pydantic import StringConstraints
from typing import Annotated
from typing import TypeVar
T = TypeVar("T")
BoundedString = Annotated[str, StringConstraints(min_length=2, max_length=50)]
BoundedList = Annotated[list[T], Field(max_length=5, min_length=1)]

Custom Validators

Introduction

  • There are two kinds of custom validators that one can write using Pydantic. One is before validators and the second one is after validators
  • Before validators are evaluated in the reverse order of their presence in their code whereas after validators are evaluated in the same order they are present in the class
  • All these custom validators can be thought of taking an input and returning an output. This means one can think of them as transform functions. They can either come before the data ends in to the field level validation phase by Pydantic or they can enter after the field level validation is done by Pydantic
  • Can reference other fields in a model that have been validated
  • same validator can be attached to multiple fields
  • attach validators to a type
    • add validators to a type
  • How custom validator should indicate validation failure ?
    • raise ValueError and provide a good description
    • Pydantic docs indicate that you can use assertions. Don’t do it
      • Python can be executed with a flag to turn of all assertions
    • PydanticCustomError rarely used - just more flexible in terms of reporting error
    • the above exceptions will end up raising a Pydantic ValidationError
  • All other exceptions will just bubble up as-is
  • plain validators bypass pydantic validators
  • wrap validators most flexible but they are confusing and are rarely needed

After Validators

  • transformation function that can be used to transform the data being deserialized.. They can be used for validation only, serialization only or both
1
2
3
4
5
6
7
from pydantic import BaseModel, ValidationError, Field, field_validator
class Model(BaseModel):
    x: int
    @field_validator("*")
    @classmethod
    def transform_x(cls, value):
        return f"in to string {value}"

Before validators

  • from dateutil.parser import parse can be used to parse strings to datetime
  • One needs to add before argument to turn the default validator that is after validator in to before validator
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from pydantic import BaseModel, ValidationError, Field, field_validator
from datetime import datetime
from dateutil.parser import parse
from typing import Any

class Model(BaseModel): dt: datetime @field_validator('dt', mode='before') @classmethod def parse_datetime(cls, value): if isinstance(value,str): try: return parse(value) except: raise ValueError(str(ex)) return value

Custom Validators using Annotations

  • One can use Annotated to use any function defined outside Pydantic class to create before and after validators
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from pydantic import BaseModel, ValidationError, Field, field_validator
from datetime import datetime
from dateutil.parser import parse
from pydantic import BeforeValidator, AfterValidator
from typing import Annotated
def bef_valid(x):
    print("before validator")
    return x

def after_valid(x): print("after validator") return x

custom_type = Annotated[int, BeforeValidator(bef_valid), AfterValidator(after_valid)] class Model(BaseModel): x: custom_type

print(Model(x=12))

1
2
3
before validator
after validator
x=12
  • The following snippet shows that a generic UniqueList type has been created using Annotated type, the default type is a generic type, there have been field level validations specified by using Field function and a custom after validation is put in using AfterValidator
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
T = TypeVar('T')
def are_elements_unique(values: list[Any]) -> list[Any]:
    unique_elements = []
    for value in values:
        if value in unique_elements:
            raise ValueError("elements must be unique")
        unique_elements.append(value)
    return values
UniqueList = Annotated[
      list[T],
      Field(min_length=1, max_length=5),
      AfterValidator(are_elements_unique)
  ]
  class Model(BaseModel):
    numbers: UniqueList[int] = []
    strings: UniqueList[str] = []

Dependent Field Validations

  • Pydantic will keep continuing validating fields even if one of the fields of the class fails
  • It is important to keep in mind that if you want dependent field validation, you have to account for the fact that previous field might not been validated because of errors and you might not have that data in ValidationInfo

Properties and Computed Fields

  • One can add methods and properties to a Pydantic Class
  • One can use @property and @cached_property to add properties
  • They do not show up in the model representation string, they do not get serialized
  • One can make a field in to computed field
  • Regular field has a read write option
  • Write to a computed field does not make sense
  • One can use @computed_field decorator
    • property now included in serialization
  • Since properties are evaluated after the model is created, property has access to all the fields in the model
  • Technically you don’t need @property decorator. If you use @computed_field, Pydantic will auto wrao your method a @property
  • @property: each access triggers the getter. Mutating is blocked unless a setter is defined.
  • @=cached_property=: access triggers the getter once, then stores the result in=_dict_=. Later, it’s treated like a plain attribute
  • One can add additional parameters in the computed_field decorator to get alias printed

Custom Serializers using Annotated Types

  • In the decorated function, one can use FieldSerializationInfo as that gives an ability to distinguish between serializing to dictionary vs JSON
  • For custom validators, one could use Annotated Type as an alternative to @field_validors. In the same way, one can use Annotated Type as also an alternative to field_serializers

Complex Models

  • model fields can be Pydantic models
  • One can use Pydantic to use custom classes for composition. By default Pydantic will not use custom classes for composition
  • One can compose Pydantic models with other Pydantic models
  • One can create a model that inherits from Base Model
  • Most common use for inheritance is model configs
  • Inherit or Composition usecases are different
  • Inheritance
    • Provide a common model your models can inherit from
    • Can be tricky and difficult to debug
  • Composition
    • Use when you have complex model definitions with sub models
    • easy to understand
    • first choice for common models

Practical Applications

  • Query a REST API and model response
  • Using a Pydantic model when loading a CSV file
  • Validating Python function arguments using Pydantic
  • auto-generating Pydantic models using a model generator. These tools work ok for simple models but fail for complicated models

What can one learn from the course ?

  • Model Basics
    • required vs optional fields
    • nullable fields
    • type coercion
    • basic validation
  • Model Configuration
    • extra fields
    • lax vs strict type coercion
    • validating defaults and assignments
  • Field aliases
    • manual specification
    • auto-generated aliases
  • Specialized Pydantic types with validations
    • constrained numbers
    • constrained lists
    • dates and times
    • URL types
  • additional Field configurations
    • constraints
    • mutable defaults
    • default factories
  • Serialization and Deserialization
    • ignored fields
    • aliases
    • validation vs serialization aliases
  • Annotated Types
    • Python
    • Pydantic
    • Leveraging generics
  • Custom Validators
    • Before and After Validators
    • Decorator Approach
    • Annotated Approach
    • Combining Multiple Before and After Validators
    • Dependent Field Validators
    • Sequence Type Validators
    • Modifying data via Validators
  • Complex models
    • inheritance
    • composition
  • Practical examples
    • consuming REST API JSON data
    • loading CSV data
    • validating Python function arguments
    • using model code generators

Takeaway

The course is very well structured and if one spends some time going over the contents and the associated jupyter notebooks, then one can learn quite a bit about this amazing library. There are many other developments from Pydantic team and hence knowing their first big contribution to the Python world might be useful if you want to use the other libraries from the Pydantic team