Pydantic library
Contents
This blogpost contains a summary of the Udemy course on Pydantic created by Dr. Fred Baptiste
Prerequisite - Type Hinting
- Type hinting is useful for documentation
- Sphinx can use the type hinting to generate documentation
dataclasses
are code generators. Use a decorator and type hinting to create classesPydantic
uses typehinting for run time validation, for serializing and deserializingPydantic
is foundational inFastAPI
- One can specify the type hints for arguments in a function.
- One can specify the type hints for return types
Union
helps one specify multiple types for an argument- Some useful classes are
Optional
Sequence
Any
Set
Dict
Callable
Iterator
Iterable
Basics
Introduction
- Pydantic Models vs Data Classes
- actual implementation is very different
@dataclass
is a code generator. it takes your class definition and creates a brand new class with all the functionality- Pydantic uses inheritance to add functionality to standard Python classes
- Just like custom classes gain a lot of functionality by inheriting from the object class
- For simple stuff, they both look similar because both use Python type hinting as a way for us to add metadata to attributes
Creating a Pydantic Model
- Any class that inherits
BaseModel
is a Pydantic Model - Usually one creates attributes of the class and then each attribute is provided a type hint
- As an API provider, you would want to look at the input and then throws exception if the user has not sent the right input to the API
Deserialization
- Deserialization is the act of taking data to create and populate a new model instance
- One can load a dictionary using
**
but it is not recommended. It is better to usemodel_validate
- The other way to deserialize the data is by using
model_validate_json
. This is very beneficial when working with REST APIs, where request and responses are typically in JSON format - There are three ways to deserialize data in to a model instance
- pass each field as an argument
- create a dict and use
model_validate
- create a json string and use
model_validate_json
Serialization
- Takes
pydantic
model and transforms in to something else - Any
pydantic
model are regular classes and the instances are regular python instances of the class. One can retrieve all the relevant attributes using adict
- To serialize the model, one can use
model_dump
and creates a dict or usemodel_dump_json
to create a JSON string - Under the hood,
Pydantic
usesdumps()
fromjson
module - When you choose to serialize the
Pydantic
model, one has the flexibility to choose the fields to include or exclude
Type Coercion
Pydantic
will transform the input data in to the correct type- You can control this transformation by using the
mode
variable - By default, the default coercion is termed lax - and it attempts a variety of type coercions
- If you are an API consumer, it is always better to have
pydantic
validation set to strict mode as you never know when the API changes. It is always better to ensure that your code raises a ValidationError whenever such an issue arises
Required vs Optional Fields
- Default values can be provided in the class definition
- Pydantic validates the default values provided
- An interesting thing I have learned after playing around with the code is
that
pydantic
does not carry out default value validation. If you give an incorrect default value, it accepts as the correct value and even duck types it
|
|
|
|
As one can see even though we have specified incorrect defaults it works. There
is no way to inherently check this via pydantic
- When we define a default for a function argument, these defaults are calculated and stored at the compile time and not at the run time. So the defaults are stored once. When you call the function, the same default gets picked up. In case you modify the default value, then the default value persists for next calls
|
|
|
|
Unlike default arguments in Python, Pydantic handles in a special way and makes a deep copy of the default arguments. There is no share list of fields between two objects
Nullable Fields
- It means it will entertain the actual field type or
None
- An optional field simply means that the data being deserialized does not need to contain the key for that specific field
- A nullability of a field has nothing to do with whether it is optional or
not. It basically just indicates whether a field can be set to
None
|
|
|
|
Combining Nullable and Optional
- There are four ways to combine both these features
- Required and Not Nullable
- Required and Nullable
- Optional and Not Nullable
- Optional and Nullable
|
|
In the above class definitions, each captures one of the combination of nullability and required fields
Inspecting Fields
model_fields_set
gives the list of fields that were set by deserialization from input data. Here is an example where it might be useful where you only want to serialize the attributes that you have setup
|
|
|
|
JSON Scheme generation
- If you are developing using FastAPI, it takes care of integrating with Pydantic
model_json_schema
creates a JSON schema for the model
Model Configuration
Introduction
- Pydantic had certain model behaviors
- Extra fields provided in the data being serialized are ignored
- type coercion
- default values do not get validated
- Pydantic provides us a way to modify, or override, this default behavior
- use a special object
ConfigDict
- attach this object to any model by setting a special class attribute
- use a special object
Handling Extra fields
-
If there are additional fields in the model, generally they are ignored. One can use
ConfigDict
to create specific behaviors1 2 3 4 5 6 7 8 9 10 11
from pydantic import BaseModel from pydantic import ConfigDict from pydantic import ValidationError
class Model(BaseModel): model_config = ConfigDict(extra='allow') field_1:int
a = Model(field_1 = 2, field_2 = 10) print(dict(a)) print(a.model_extra)
1 2
{'field_1': 2, 'field_2': 10} {'field_2': 10}
-
use case is that you want to validate a set of fields and not all the fields.
Strict vs Lax Type coercion
- List is coerced to tuple based on the model fields
- Tuple is coerced too list based on the model fields
- You can set the model itself as strict or lax mode
- JSON structure understands the following data types
- boolean:
true
orfalse
- number: float or integer does not actually matter
- string
- a null value represented by characters null
- an array (square brackets, comma separated values of any JSON type)
- object: a dictionary, delimited by only braces with key value pairs
- boolean:
- When you are deserializing JSON, JSON has limited set of data types. Hence you can specify in Pydantic the way you can deserialize individual fields
ConfigDict
as True does not have anything got to do with conversion between list to tuples
Validating Default values
- At the model level, you can set up code to validate default values. It is useful when validators transform the data
- We want the default value through some transformations and hence it is useful to validations on the default value
model_config = ConfigDict(validate_default = True)
can be used to validate default value
Validating Assignment
- When we define a Model and instantiate it,
Pydantic
validates the input. However one can always assign the field value to another type and by default pydantic does not raise any error - The reason the above is the default behavior is that developer is controlling the values one is setting
config = ConfigDict(validate_assignment=True)
can be used to ensure that one cannot set fields of a model instance to any other type that specified in the model
Mutability
- We can control whether the model is mutable at the model level. We can instruct Pydantic to ensure that model is immutable. This means any instance to Pydantic model is not allowed
- When you specify the Pydantic model as frozen, then the added advantage is that you can use this object as a key in a dictionary
- Side effects of having a frozen pydantic model means that you can put them as keys in the dictionary
Coercing Numbers to Strings
- There are options in the
ConfigDict
that can be used to convert numbers to strings coerce_numbers_to_str
Standardizing Strings
- There are options to standardize string input such as converting to lower, converting to upper case, specifying minimum length for all strings, specifying max length for all strings
Enum values
- For whatever reason, you want the model dump not to contain python enum
object but the relevant enum value, one can use
use_enum_values
to ensure that option - the reasons, as far as my experience goes, is very limited. It usually has to
do with trying to serialize the result of a
model_dump()
to JSON ourselves
All config options
There are 49 options listed as of Pydantic v 2.11.7
Config Option | Description |
---|---|
title | Model title used in JSON schema. |
model_title_generator | Function to generate a model’s schema title. |
field_title_generator | Function to generate field titles for JSON schema. |
str_to_lower | Convert string inputs to lowercase. |
str_to_upper | Convert string inputs to uppercase. |
str_strip_whitespace | Strip leading/trailing whitespace from strings. |
str_min_length | Minimum allowed string length globally. |
str_max_length | Maximum allowed string length globally. |
extra | How to treat extra fields: ‘ignore’, ‘allow’, or ‘forbid’. |
frozen | Make model instances immutable. |
populate_by_name | Allow field population using field names even when alias is defined. |
use_enum_values | Serialize enums using their `.value` instead of instance. |
validate_assignment | Revalidate fields on assignment. |
arbitrary_types_allowed | Allow arbitrary (non-annotated) types in model fields. |
from_attributes | Allow construction from object attributes. |
loc_by_alias | Use field alias in validation errors. |
alias_generator | Function to generate field aliases. |
ignored_types | Types to ignore during validation/schema creation. |
allow_inf_nan | Allow `inf`, `-inf`, and `nan` for float/decimal fields. |
json_schema_extra | Extra fields or callable for enriching JSON schema. |
json_encoders | Custom JSON encoders (deprecated in v2). |
strict | Enable strict validation across the model. |
revalidate_instances | Revalidate input if already model instance. |
ser_json_timedelta | Serialize `timedelta` as `‘float’` or `‘iso8601’`. |
ser_json_bytes | Serialize `bytes` as `‘utf8’` or `‘base64’`. |
val_json_bytes | Deserialize bytes from `‘utf8’` or `‘base64’`. |
ser_json_inf_nan | Serialize `inf`, `-inf`, `nan` as float or raise error. |
validate_default | Validate default field values on instantiation. |
validate_return | Validate return values from validators. |
protected_namespaces | Reserved field prefixes (e.g., ‘__') that cannot be used. |
hide_input_in_errors | Whether to include input data in validation errors. |
defer_build | Postpone validator building until needed. |
plugin_settings | Dictionary for plugin-specific config. |
schema_generator | Override schema generator class. |
json_schema_serialization_defaults_required | Include `default` in schema only if required. |
json_schema_mode_override | Override schema generation mode (e.g., `serialization`). |
coerce_numbers_to_str | Convert numeric input to string when required. |
regex_engine | Choose regex engine (e.g., `python`, `ecmascript`). |
validation_error_cause | Include underlying cause in error details. |
use_attribute_docstrings | Use field docstrings as descriptions. |
cache_strings | Enable caching of string coercions. |
validate_by_alias | Allow validation using field aliases. |
validate_by_name | Allow validation using original field names. |
serialize_by_alias | Use alias for output serialization. |
with_config | Decorator to apply config to function-based models. |
Field Aliasing, Serialization and Deserialization
Introduction
- We could need multiple names for the same thing. Sometimes we may need to
reference a field
- Python follows snake case
- JSON follows camelCase
- If we want to serialize a class that has snake case fields but want to serialize in to field names that are camelCase
- Suppose JSON data contains a field named
id
, we can create a class with fieldid
but linters will complain, sinceid
is a reserved word - The workaround is that one usually append with underscore such as
id_
,type_
Pydantic
has a concept of field alias. The field has a name in the model but a different name when serializing- We could have one name when deserialization and we could have another name when serializing the model
- We can describe deserialization alias and serialization alias
- alias: for deserialization
- serialization alias: for serialization
- Can we use either name or alias in deserialization ? Not possible by default. Option to enable using either name or alias
- Using either name or alias in serialization. We have an option to use name or alias when using model dump
- Validation aliases
- allows us to specify aliases used for deserialization specifically
- can be used in combination with serialization aliases
- overrides plain alias if it is present
- We have the option to specify multiple validation aliases for the same field
- Pydantic’s
AliasChoices
can be used to specify multiple aliases - One can combine validation and serialization alias
- if we starts an alias, then that can be used for both serialization and deserialization
- if we set validation alias, then validation alias used for deserialization, alias is used for serialization
- if we set serialization alias, alias is used for deserialization and serialization_alias used for serializing
- if we set validation alias and serialization alias, then validation alias used for deserialization and serialization alias is used for serialization
Pydantic
auto generates aliases for common use cases- We can tell
Pydantic
to generate a camel case alias for every field in the model - can override auto generated alias for individual fields
- We can tell
- Auto generating alias is specified by a function and that is attached to model config
- Customized data serialization :
Pydantic
can handle most serialization. It may not be sufficient in certain cases. We might have to override and hence we can define our own functions to do that
Field Aliases and Default Values
- Specifying aliases that can be used to deserialize
|
|
|
|
- Specifying aliases and defaults
|
|
|
|
- Deserialization alias
|
|
|
|
Alias Generator functions
- You can specify alias for each field present in each class and specify aliases
- If there is a systematic way to convert the field name to alias, one can write a function at the Model level
Deserializing by Field Name or Alias
- Fields for which alias defined must use alias. We can configure to use either field name or alias
populate_by_name=True
in the configuration ensures that the deserialization can use either the field name or alias
Serialization aliases
- In cases where the deserialization field name and serialization field name differ from the field name, then one can use serialization alias
|
|
|
|
Validation aliases
- There are three types of aliases. One is the normal alias, the second is the validation alias and third is the serialization alias. The validation alias is used in the process of deserialization. The serialization alias is used in the process of serialization
- You can also give an array of ValidationAliases that the input data can contain data fields under any of the names given in the ValidationAliases choices 1
Custom Serializers
Specialized Pydantic Types
Introduction
Pydantic
offers a huge variety of specialized types with specialized validation and SerializationPositiveInt
: positive integersconlist
: constrained listPastDate
: datetime that must be in the pastHttpURL
: validates URL and provides special parsing methodsEmailStr
: validates an email address
PositiveInt
|
|
|
|
UUID
- There are specific
Pydantic
datatypes that can be used for UUIDs - One can also specify
default_factory
if you want the default value to be different for various instances
Additional Field Features
- One can add numerical constraints to the field
- One can add string constraints to field
- One can use default factories to configure default values to the fields
- One can specify parameters at the model level in the ConfigDict and then override at parameters at a field level
Annotated Types
- Python has an annotated type hint
- Simple a way to add metadata to an existing type
- Python itself does nothing with it - it behaves like a regular type
- It gives third party libraries the option to use it for their own purposes
- Pydantic makes extensive use of Annotated types
- especially useful for creating re-usable types
- fields
- validators
- serializers
- Use typevariables to create Annotated Types to make it more generic
- One can use Annotated Type and “StringConstraints” to make more effective validation of strings
|
|
Custom Validators
Introduction
- There are two kinds of custom validators that one can write using Pydantic. One is before validators and the second one is after validators
- Before validators are evaluated in the reverse order of their presence in their code whereas after validators are evaluated in the same order they are present in the class
- All these custom validators can be thought of taking an input and returning an output. This means one can think of them as transform functions. They can either come before the data ends in to the field level validation phase by Pydantic or they can enter after the field level validation is done by Pydantic
- Can reference other fields in a model that have been validated
- same validator can be attached to multiple fields
- attach validators to a type
- add validators to a type
- How custom validator should indicate validation failure ?
- raise
ValueError
and provide a good description Pydantic
docs indicate that you can useassertions
. Don’t do it- Python can be executed with a flag to turn of all assertions
PydanticCustomError
rarely used - just more flexible in terms of reporting error- the above exceptions will end up raising a
Pydantic
ValidationError
- raise
- All other exceptions will just bubble up as-is
- plain validators bypass pydantic validators
- wrap validators most flexible but they are confusing and are rarely needed
After Validators
- transformation function that can be used to transform the data being deserialized.. They can be used for validation only, serialization only or both
|
|
Before validators
from dateutil.parser import parse
can be used to parse strings to datetime- One needs to add
before
argument to turn the default validator that is after validator in to before validator
|
|
Custom Validators using Annotations
- One can use
Annotated
to use any function defined outsidePydantic
class to create before and after validators
|
|
|
|
- The following snippet shows that a generic UniqueList type has been created
using
Annotated
type, the default type is a generic type, there have been field level validations specified by usingField
function and a custom after validation is put in usingAfterValidator
|
|
Dependent Field Validations
- Pydantic will keep continuing validating fields even if one of the fields of the class fails
- It is important to keep in mind that if you want dependent field validation,
you have to account for the fact that previous field might not been validated
because of errors and you might not have that data in
ValidationInfo
Properties and Computed Fields
- One can add methods and properties to a Pydantic Class
- One can use
@property
and@cached_property
to add properties - They do not show up in the model representation string, they do not get serialized
- One can make a field in to computed field
- Regular field has a read write option
- Write to a computed field does not make sense
- One can use
@computed_field
decorator- property now included in serialization
- Since properties are evaluated after the model is created, property has access to all the fields in the model
- Technically you don’t need
@property
decorator. If you use@computed_field
, Pydantic will auto wrao your method a@property
@property
: each access triggers the getter. Mutating is blocked unless a setter is defined.- @=cached_property=: access triggers the getter once, then stores the result in=_dict_=. Later, it’s treated like a plain attribute
- One can add additional parameters in the
computed_field
decorator to get alias printed
Custom Serializers using Annotated Types
- In the decorated function, one can use FieldSerializationInfo as that gives an ability to distinguish between serializing to dictionary vs JSON
- For custom validators, one could use Annotated Type as an alternative to
@field_validors
. In the same way, one can use Annotated Type as also an alternative tofield_serializers
Complex Models
- model fields can be
Pydantic
models - One can use Pydantic to use custom classes for composition. By default
Pydantic
will not use custom classes for composition - One can compose
Pydantic
models with other Pydantic models - One can create a model that inherits from Base Model
- Most common use for inheritance is model configs
- Inherit or Composition usecases are different
- Inheritance
- Provide a common model your models can inherit from
- Can be tricky and difficult to debug
- Composition
- Use when you have complex model definitions with sub models
- easy to understand
- first choice for common models
Practical Applications
- Query a REST API and model response
- Using a Pydantic model when loading a CSV file
- Validating Python function arguments using
Pydantic
- auto-generating Pydantic models using a model generator. These tools work ok for simple models but fail for complicated models
What can one learn from the course ?

- Model Basics
- required vs optional fields
- nullable fields
- type coercion
- basic validation
- Model Configuration
- extra fields
- lax vs strict type coercion
- validating defaults and assignments
- Field aliases
- manual specification
- auto-generated aliases
- Specialized Pydantic types with validations
- constrained numbers
- constrained lists
- dates and times
- URL types
- additional Field configurations
- constraints
- mutable defaults
- default factories
- Serialization and Deserialization
- ignored fields
- aliases
- validation vs serialization aliases
- Annotated Types
- Python
- Pydantic
- Leveraging generics
- Custom Validators
- Before and After Validators
- Decorator Approach
- Annotated Approach
- Combining Multiple Before and After Validators
- Dependent Field Validators
- Sequence Type Validators
- Modifying data via Validators
- Complex models
- inheritance
- composition
- Practical examples
- consuming REST API JSON data
- loading CSV data
- validating Python function arguments
- using model code generators
Takeaway
The course is very well structured and if one spends some time going over the
contents and the associated jupyter notebooks, then one can learn quite a bit
about this amazing library. There are many other developments from Pydantic team
and hence knowing their first big contribution to the Python world might be
useful if you want to use the other libraries from the Pydantic
team