movq

Wherein I Move a Lot of Words Around

SQLite, Python, and Transactions

I'm working an intermediary library in Python that turns a fair number of normal databases and database-like stores into simple key-value stores. There's a long story of why, but it does it have an actual use case outside of "because I can" (though, the symlink backend is pure, unadulterated "because I can").

For the SQLite store I was running into transaction issues here and there. At seemingly-random times data wouldn't save or I'd get an error saying a transaction hadn't been started when I tried to commit. So, I did some research.

John CS - Transactions with Python's sqlite3 Package:

By automatic transactions here I am not referring to SQLite's built-in behavior of wrapping data-changing operations in transactions. Rather, I am speaking of the Python module's special behavior where it actually executes BEGIN and COMMIT statements on your behalf.

A transaction is automatically started (ie: a BEGIN is executed) when all of the following conditions are met:

  1. The isolation_level is not None. [1]
  2. The Connection object thinks a transaction has not already been started.
  3. An UPDATE, DELETE, INSERT, or REPLACE statement is being executed. [2]

The BEGIN statement will be concatenated with whatever value is in isolation_level. For example, if isolation_level was equal to DEFERRED, the statement executed when the above conditions are met is BEGIN DEFERRED.

Introduction to SQLite in Python | Python Central:

Please remember to always call commit to save the changes. If you close the connection using close or the connection to the file is lost (maybe the program finishes unexpectedly), not committed changes will be lost.

11.13. sqlite3 — DB-API 2.0 interface for SQLite databases — Python 2.7.11 documentation:

If you want autocommit mode, then set isolation_level to None. Otherwise leave it at its default, which will result in a plain “BEGIN” statement, or set it to one of SQLite’s supported isolation levels: “DEFERRED”, “IMMEDIATE” or “EXCLUSIVE”.

The lesson learned here comes down to these points:

  1. Python's sqlite3 module conforms to the broken Python PEP for database access which mandates that there's always a transaction in play and your only control over it is to commit or roll back.
  2. DB modules are supposed to keep track of the current transaction state internally and not offer a begin() API.
  3. The value of connection.isolation_level can be set to None to turn off all this or to DEFERRED, IMMEDIATE, or EXCLUSIVE to turn it on. The text in isolation_level is sent with the BEGIN statement to SQLite, so all of those carry SQLite's meaning:
    1. DEFERRED - Get a shared lock on the first statement. Get an exclusive lock only when a DML statement is used. (Default)
    2. IMMEDIATE - Get a write lock immediately. Others can read, but writes will be blocked (timeout applies).
    3. EXCLUSIVE - Get an exclusive lock immediately. Others cannot even read.
  4. You can turn this behavior off, but then you have to fall back to 100% manual transaction management, and in many cases this can be done very wrong and cause random slowdowns or data loss (if you aren't well-versed in transactional databases).
  5. Due to how sqlite3 handles threaded and multi-process access, there's a timeout on getting the lock to start a DML statement, and the default is five seconds. This can lead to some interesting situations.

So, in short: abide by the PEP gods and presume a messed-up transaction environment that you can only control with commit() and rollback() or turn it all off and wing it.

In the end, I chose to manually commit at key points in order to both ensure the data was written but also for the side effect of starting a new transaction.

That, folks, is when you know you've made a broken API. Side effects should never be the goal of a line of code, but due to the API design they must be here.