From the DB-API (https://www.python.org/dev/peps/pep-0249/):
OperationalError
Exception raised for errors that are related to the database's
operation and not necessarily under the control of the programmer,
e.g. an unexpected disconnect occurs, [...]
Additionally, psycopg2 was inconsistent, at least in the async case:
depending on how the "connection closed" error was reported from the
kernel to libpq, it would sometimes raise OperationalError and
sometimes DatabaseError. Now it always raises OperationalError.
There's a race condition that only seems to happen over Unix-domain
sockets. Sometimes, the closed socket is reported by the kernel to
libpq like this (captured with strace):
sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110
recvfrom(3, 0x12d0330, 16384, 0, 0, 0) = -1 ECONNRESET (Connection reset by peer)
That is, psycopg2/libpq sees no error when sending the first query
after the connection is closed, but gets an error reading the result.
In that case, everything worked fine.
But sometimes, the error manifests like this:
sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
recvfrom(3, "E\0\0\0mSFATAL\0C57P01\0Mterminating "..., 16384, 0, NULL, NULL) = 110
recvfrom(3, "", 16274, 0, NULL, NULL) = 0
recvfrom(3, "", 16274, 0, NULL, NULL) = 0
i.e. libpq received an error when sending the query. This manifests as
a slightly different exception from a slightly different place. More
importantly, in this case connection.closed is left at 0 rather than
being set to 2, and that is the bug I'm fixing here.
Note that we see almost identical behaviour for sync and async
connections, and the fixes are the same. So I added extremely similar
test cases.
Finally, there is still a bug here: for async connections, we
sometimes raise DatabaseError (incorrect) and sometimes raise
OperationalError (correct). Will fix that next.
(almost... except for micros rounding)
While this is probably an improvement on the previous implementation,
I am largely waving a dead chicken at windows, which keeps failing to
pass the seconds overflow test. If it doesn't pass now either I'll start
blaming Python's timedelta.
So we test the effect, not the implementation. Tests pass on master too
this way, three tests fail in this branch, related to autocommit
(sort-of-obviously).
It's not so much about tests being slow: some just get stuck and timeout
travis.
Skipped all tests taking about more than 0.2s to run on my laptop.
Fast testing takes about 8s instead of 24.