Python: SQLAlchemy – user-defined query classes for scoped_session.query_property(query_cls=None). | behai-nguyen software development learnings and documentation

We are discussing SQLAlchemy user-defined query classes. A user-defined query class can be passed to scoped_session.query_property(query_cls=None) in place of parameter query_cls: i.e. scoped_session.query_property(MyQuery).


Python: SQLAlchemy – user-defined query classes for scoped_session.query_property(query_cls=None).

SQLAlchemy's scoped_session.query_property(query_cls=None) accepts a user-defined query class via parameter query_cls. If one is provided, then when we use scoped_session.query property to query the database, our user-defined query class will be used instead of the default one.

This post is a continuation of Python: SQLAlchemy — understanding sessions and associated queries; we have previous touched scoped_session.query_property(query_cls=None) in point ❺⓵ and point ❺⓶, but we did not do user-defined query class: we are looking at it in this post.

The source database used is still the MySQL test data released by Oracle Corporation. Downloadable from https://github.com/datacharmer/test_db. And we continue with the employees table and the Employees class.

Let's get to the codes, and we'll discuss documentation later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from threading import get_ident
from sqlalchemy import (
    create_engine,
    Column,
    Integer,
    Date,
    String,
)
from sqlalchemy.orm import (
    sessionmaker, 
    scoped_session, 
    declarative_base, 
    Query,
)

SQLALCHEMY_DATABASE_URL = "mysql+mysqlconnector://root:pcb.2176310315865259@localhost/employees"

engine = create_engine(SQLALCHEMY_DATABASE_URL, echo=False, future=True)

session_factory = sessionmaker(autocommit=False, autoflush=False, bind=engine, future=True)

database_sesssion = scoped_session(session_factory, scopefunc=get_ident)

class BaseQuery(Query):
    def print_all(self):        
        print(f"\nThere are {self.count()} rows.")

        resultset = self.all()
        for row in resultset:
            print(row.__dict__)

        return self

class BaseModel(object):
    query = database_sesssion.query_property(BaseQuery)

    @classmethod
    def set_query(cls, query_cls: Query) -> None:
        cls.query = database_sesssion.query_property(query_cls)

Base = declarative_base(cls=BaseModel)

class Employees(Base):
    __tablename__ = 'employees'

    emp_no = Column(Integer, primary_key=True)
    birth_date = Column(Date, nullable=False)
    first_name = Column(String(14), nullable=False)
    last_name = Column(String(16), nullable=False)
    gender = Column(String(1), nullable=False )
    hire_date = Column(Date, nullable=False )

Lines 1-22 -- this block is a repeat from the last mentioned post. We're just establishing the database connection, leading to defining our scoped_session session.
Lines 41-51 -- this block is also a repeat. We're defining our Employees declarative model.
Lines 24-32 -- this is our user-defined query class:

class BaseQuery(Query):
    def print_all(self):        
        print(f"\nThere are {self.count()} rows.")

        resultset = self.all()
        for row in resultset:
            print(row.__dict__)

        return self

A user-defined query class must descend from sqlalchemy.orm.Query. BaseQuery defines its own method print_all(), which get all available results from a previous method call (✿), and for each row, prints out the value of the __dict__ attribute. And finally print_all() returns self: this is to maintain methods chainability.

(✿) previous method call: e.g. filter(...), which also maintains methods chainability.

Lines 34-39 -- our BaseModel:

class BaseModel(object):
    query = database_sesssion.query_property(BaseQuery)

    @classmethod
    def set_query(cls, query_cls: Query) -> None:
        cls.query = database_sesssion.query_property(query_cls)

We've used a simpler version of this BaseModel in the previously mentioned post, too. This time, we define our own query class, via:

    query = database_sesssion.query_property(BaseQuery)

That means BaseQuery is the default query class for any model which has BaseModel as a base class. So, BaseQuery is the default query class for the Employees model.

Another new addition to the BaseModel is the class method set_query(...). This is an experimentation as we shall see later: I was wondering if we could swap out BaseQuery, and replace it with another new user-defined query class like this:

Employees.set_query(SerialiseQuery)

We'll discuss it more in the next example. Back to this example, now that our classes are in place. Let's test out our custom query:

result = Employees.query.filter(Employees.emp_no==16621)
print(type(result))

The output:

<class '__main__.BaseQuery'>

As mentioned previously, filter(...) maintains methods chainability, the output should make sense -- we can call other BaseQuery's methods using result:

❶ The followings are equivalent and should produce the same output:

result.print_all()
Employees.query.filter(Employees.emp_no==16621).print_all()

Both give:

There are 1 rows.
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x0000022903034AC0>, 'last_name': 'Strehl', 'emp_no': 16621, 'hire_date': datetime.date(1992, 6, 11), 'first_name': 'Parviz', 'gender': 'M', 'birth_date': datetime.date(1962, 5, 30)}

❷ Similarly:

print(result.first().__dict__)
print(Employees.query.filter(Employees.emp_no==16621).first().__dict__)

{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x000001B91670BE50>, 'first_name': 'Parviz', 'gender': 'M', 'birth_date': datetime.date(1962, 5, 30), 'last_name': 'Strehl', 'emp_no': 16621, 'hire_date': datetime.date(1992, 6, 11)}

❸ Consider the following two statements:

first_rec = Employees.query.filter(Employees.hire_date.between('2000-01-01', '2000-12-31'))\
    .order_by(Employees.emp_no).print_all().first()
	
print(first_rec.__dict__)

We're selecting all employees whom have been hired during the year 2000, order by integer primary key column emp_no, then we call to method print_all() to display all rows, finally we call method first(), which returns the result to variable first_rec. Finally, we print the value of first_rec's __dict__ attribute.

-- We've discussed how print_all() maintains method chainability earlier.

For this next example, we add another user-defined query class SerialiseQuery, which via its own serialise() method, converts each data row into a dictionary by calling model new method as_dict(), and returns all dictionaries as a list, in a new attribute serialised_array. Method serialise() must also maintain chainability. New method as_dict() is added onto BaseModel. The new example, lines 1-22 ( one to twenty two ) are identical to the previous example:

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
...
class BaseQuery(Query):
    def print_all(self):        
        print(f"\nThere are {self.count()} rows.")

        resultset = self.all()
        for row in resultset:
            print(row.__dict__)

        return self

class SerialiseQuery(BaseQuery):
    def serialise(self):
        self.serialised_array = []
        resultset = self.all()
        for row in resultset:
            self.serialised_array.append(row.as_dict())

        return self

class BaseModel(object):
    query = database_sesssion.query_property(BaseQuery)

    @classmethod
    def set_query(cls, query_cls: Query) -> None:
        cls.query = database_sesssion.query_property(query_cls)

    def as_dict(self):
        """
        Converts an instance to dictionary.

        References:
            https://stackoverflow.com/questions/5022066/how-to-serialize-sqlalchemy-result-to-json
            How to serialize SqlAlchemy result to JSON?
            https://stackoverflow.com/questions/1958219/convert-sqlalchemy-row-object-to-python-dict
            How to convert SQLAlchemy row object to a Python dict?
        """
        return { c.name: getattr(self, c.name) for c in self.__table__.columns }

Base = declarative_base(cls=BaseModel)

class Employees(Base):
    __tablename__ = 'employees'

    emp_no = Column(Integer, primary_key=True)
    birth_date = Column(Date, nullable=False)
    first_name = Column(String(14), nullable=False)
    last_name = Column(String(16), nullable=False)
    gender = Column(String(1), nullable=False )
    hire_date = Column(Date, nullable=False )

❶ Recall that we have postponed the discussion of class method set_query(...). We will discuss it now. It should be apparent that for the new codes, the default query class is still BaseQuery; let's replace it with SerialiseQuery, and also see how the new one works:

Employees.set_query(SerialiseQuery)

result = Employees.query.filter(Employees.hire_date.between('2000-01-01', '2000-12-31'))\
    .order_by(Employees.emp_no).serialise()

from pprint import pprint
pprint(result.serialised_array)

I am printing out just the first and the last rows:

[{'birth_date': datetime.date(1960, 9, 9),
  'emp_no': 47291,
  'first_name': 'Ulf',
  'gender': 'M',
  'hire_date': datetime.date(2000, 1, 12),
  'last_name': 'Flexer'},
  ...
   {'birth_date': datetime.date(1954, 5, 6),
  'emp_no': 499553,
  'first_name': 'Hideyuki',
  'gender': 'F',
  'hire_date': datetime.date(2000, 1, 22),
  'last_name': 'Delgrande'}]

❷ Since serialise() maintains chainability, the following should also work:

result.print_all()

❸ And also:

first_rec = result.first()
print(first_rec.__dict__)

❹ Chaining print_all() and serialise() in either order:

result = Employees.query.filter(Employees.emp_no==16621).print_all().serialise()
print("\n")
print(result.serialised_array)

In the context of this example, the chaining order should not matter:

result = Employees.query.filter(Employees.emp_no==16621).serialise().print_all()
print("\n")
print(result.serialised_array)

-- Since both methods return self: this should make sense.

❺ Let's make BaseQuery the default query class:

Employees.set_query(BaseQuery)

This code should raise an exception:

result = Employees.query.filter(Employees.emp_no==16621).print_all().serialise()

And it does:

There are 1 rows.
{'_sa_instance_state': <sqlalchemy.orm.state.InstanceState object at 0x0000012B96511FF0>, 'first_name': 'Parviz', 'gender': 'M', 'birth_date': datetime.date(1962, 5, 30), 'last_name': 'Strehl', 'emp_no': 16621, 'hire_date': datetime.date(1992, 6, 11)}
Traceback (most recent call last):
  File "F:\my_project\src\my_project\my_python_script.py", line 109, in <module>
    result = Employees.query.filter(Employees.emp_no==16621).print_all().serialise()
AttributeError: 'BaseQuery' object has no attribute 'serialise'

This makes sense, BaseQuery only has print_all() method.

The official document method sqlalchemy.orm.scoping.scoped_session.query_property(query_cls=None), states:

… There is no limit to the number of query properties placed on a class. …

My interpretation is: we can give our models as many query properties as required -- though I'm not certain this is right. However, this interpretation works, that is:

...
class BaseModel(object):
    query = database_sesssion.query_property(BaseQuery)
    serialise_query = database_sesssion.query_property(SerialiseQuery)
    ...
	
...	

The rest of the codes stay the same. And we can now do:

result1 = Employees.query.filter(Employees.emp_no==16621).print_all()
result2 = Employees.serialise_query.filter(Employees.emp_no==16621).print_all().serialise()

print(result2.serialised_array)

This also means the class method set_query(...) in BaseModel is obsolete!

The sqlalchemy.orm.Query class has many interesting and useful methods, e.g. offset(int) and limit(int):

result = Employees.serialise_query.filter(Employees.hire_date.between('2000-01-01', '2000-12-31'))\
    .order_by(Employees.emp_no).offset(2).limit(3).serialise()

from pprint import pprint
pprint(result.serialised_array)

offset(int) is a 0-based index. After querying the database and getting the result, we then serialise rows 3 ( offset 2 ), 4 and 5, and print out the list of all three ( 3 ) dictionary entries:

[{'birth_date': datetime.date(1953, 2, 9),
  'emp_no': 72329,
  'first_name': 'Randi',
  'gender': 'F',
  'hire_date': datetime.date(2000, 1, 2),
  'last_name': 'Luit'},
 {'birth_date': datetime.date(1955, 4, 14),
  'emp_no': 108201,
  'first_name': 'Mariangiola',
  'gender': 'M',
  'hire_date': datetime.date(2000, 1, 1),
  'last_name': 'Boreale'},
 {'birth_date': datetime.date(1960, 9, 12),
  'emp_no': 205048,
  'first_name': 'Ennio',
  'gender': 'F',
  'hire_date': datetime.date(2000, 1, 6),
  'last_name': 'Alblas'}]

sqlalchemy.orm.Query class provides enough capabilities out of the box to implement our own pagination functionality. I'm working on one at the moment, it's progressing quite well:

class BaseQuery(Query):
    def paginate(self, page: int, per_page: int):
        return Paginator(self, page, per_page).execute()

I have not found any official example -- at all -- on how to construct user-defined query classes. After some searching, I found the following three ( 3 ), rather old, Stackoverflow posts:

These posts set me in the right direction, and this post is the result of that learning process... SQLAlchemy is a large library, there are still so much to get through.

I found this all very interesting, since I am very interested in database works. I hope you find this post helpful in some way. Thank you for reading and stay safe as always.