[PyGreSQL] Thread safety (again)

Jerome ALET Jerome.Alet at unice.fr
Sun Apr 9 17:56:26 EDT 2000


It appears I've sent a private answer while I wanted to send it
to the list.

I want to apologize for what I've written, I've completely misunderstood
what Ken wrote.

On Sun, Apr 09, 2000 at 12:49:03AM -0600, Ken Kinder wrote:
> So let me get this straight:
> 
> You spawn two threads: one runs a query. The other does not touch the
> database connection. And the second has problems? That doesn't make sense
> -- how could it be effected by PyGreSQL if it doesn't do anything with it?

this can't affect PygreSQL and PygreSQL can't affect it. 

In fact with the
official 2.4 version, other Python threads get blocked whenever you execute
compiled code, e.g. PygreSQL's pgmodule.c, but that's not a bug in PygreSQL
at all, it's just the way Python handles threads. Other threads get released
whenever the compiled code ends (e.g. when a call to PygreSQL returns). 
Python signals suffer the same problem in fact.

Python provides a way to not block other threads with your compiled code:
you have to enclose between Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS
every piece of code you want to run, but the enclosed piece of code mustn't
modify any Python object between the Py_BEGIN_ALLOW_THREADS and 
Py_END_ALLOW_THREADS macros, or at least it's what I've understood.

In fact
what we (I ?) really want is to not block other threads during noticeable
periods of time (e.g. if a thread continuously displays the current time 
with a one second
resolution, we can't block this thread during ten seconds because the screen
wouldn't have been updated during this time), so what we just have to do 
is to protect between Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS
only the most time consuming code, and let the rest of the code as-is, unless
we really have special needs (realtime for example, but you certainly don't
make realtime programs in Python).

So, since the PQexec function is probably the most time consuming function
in PygreSQL (in fact it's not really a part of PygreSQL, but a part of the libpq 
library which is called by PygreSQL), we only have to protect this function
call. This way, even if we do a query which lasts for an hour or so before
returning its result, our other Python threads can continue to run instead
of being blocked during this query.

That's just what does the 2.4 based patch I've posted to this list: your 
other Python threads
will still get blocked during the execution of most of your calls to 
PygreSQL's compiled code,
but you probably won't notice it because the execution time of these calls
is very small, however, when you do an SQL query which may have blocked
your other threads, they are not blocked any more during the PQexec call
itself.

You can extend the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS use
to other PygreSQL or libpq functions if you really need it, personnally I don't. 

I think generalizing the use of these macros should be interesting in PygreSQL,
particularly if you use PygreSQL with Zope which is a multi-threaded Python program.

> So what version of PyGreSQL does your patch work on? The devel or stable?

My patch was designed for the 2.4 version which is the official one, and
it works perfectly for me with this version,
but attached to this message you'll find a patch for the latest 3.0 beta
available via the web site:
PygreSQL-3.0.pre000409

I have compiled it but my program which works fine in 2.4 regularly dies 
unexpectedly in 3.0.pre, with or without 
the patch, so I haven't tested it yet and you should use it at your own risks.

> On Sun, 9 Apr 2000, Jerome ALET wrote:
> 
> > On Fri, Apr 07, 2000 at 07:11:10PM -0600, Ken Kinder wrote:
> > > ?? Ok, let me get this straight... I can have a Pygresql-using program, 
> > > provided that I don't have two queries running concurrently on the same 
> > > object?
> > 
> > I've personnally not tested this, either with or without my patch.

My answer should have been: YES YES YES. I (mis-)understood that he really 
wanted to have two queries running concurrently on the same object.

bye,
Jerome
-------------- next part --------------
diff -urbw PyGreSQL-3.0-pre000409/pgmodule.c PyGreSQL-3.0-pre000409-patched/pgmodule.c
--- PyGreSQL-3.0-pre000409/pgmodule.c	Sun Apr  9 12:31:54 2000
+++ PyGreSQL-3.0-pre000409-patched/pgmodule.c	Sun Apr  9 15:22:37 2000
@@ -156,6 +156,27 @@
 
 #define is_pgsourceobject(v) ((v)->ob_type == &PgSourceType)
 
+static PGresult *nonblocking_PQexec(PGconn *cnx, char *q)
+{
+        /* we should do the same thing at least for each libpq call */
+        /* e.g. PQclear, PQprint, etc..., as well as for each PygreSQL */
+        /* C function which doesn't modify any Python object, */
+        /* but since PQexec is certainly */
+        /* the most time consuming function it should be OK like that */
+        /* because it's better than before, and better than nothing, */
+        /* and nobody should notice: other python threads will still be blocked, */
+        /* except during PQexec, but for very short periods of time since IMHO */
+        /* only PQexec can last for several minutes */
+        PGresult *p;
+        
+        Py_BEGIN_ALLOW_THREADS
+        
+        p = PQexec(cnx, q);
+        
+        Py_END_ALLOW_THREADS
+        
+        return p;
+}
 
 #ifdef LARGE_OBJECTS
 /* pg large object */
@@ -354,7 +375,7 @@
 	self->num_fields = 0;
 
 	/* gets result */
-	self->last_result = PQexec(self->pgcnx->cnx, query);
+        self->last_result = nonblocking_PQexec(self->pgcnx->cnx, query);
 
 	/* checks result validity */
 	if (!self->last_result)
@@ -1925,7 +1946,7 @@
 
 	/* gets notify and builds result */
 	/* notifies only come back as result of a query, so I send an empty query */
-	result = PQexec(self->cnx, " ");
+        result = nonblocking_PQexec(self->cnx, " ");
 
 	if ((notify = PQnotifies(self->cnx)) != NULL)
 	{
@@ -2004,7 +2025,7 @@
 	}
 
 	/* gets result */
-	result = PQexec(self->cnx, query);
+        result = nonblocking_PQexec(self->cnx, query);
 
 	/* checks result validity */
 	if (!result)
@@ -2240,7 +2261,7 @@
 	/* starts query */
 	sprintf(buffer, "copy %s from stdin", table);
 
-	if (!(result = PQexec(self->cnx, buffer)))
+        if (!(result = nonblocking_PQexec(self->cnx, buffer)))
 	{
 		free(buffer);
 		PyErr_SetString(PyExc_ValueError, PQerrorMessage(self->cnx));


More information about the PyGreSQL mailing list