If we add a list type to our customer object, we run into a snag with
marshalling and unmarshalling.
Let's add the object type, first as a standard Archetypes object
stored completely in the ZODB:
# CustomerList.py
from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
schema = BaseSchema + Schema((
TextField('body',
required=1,
primary=1,
searchable=1,
default_output_type='text/html',
allowable_content_types=('text/restructured',
'text/plain',
'text/html',
'application/msword'),
widget=RichWidget,
),
StringField("phone",
index="FieldIndex",
),
LinesField("clients"),
)) + TemplateMixin.schema
class CustomerList(TemplateMixin, BaseContent):
"""Our example object"""
schema = schema
archetype_name = "Customer List"
actions = TemplateMixin.actions
registerType(CustomerList)
Put this in the schema and restart Archetypes.
As we're storing this in the ZODB (not in the relational database),
everything works fine. The form widget for the clients field is a
textarea in which the user enters newline-separated entries. These are
converted by Zope to a Python list and stored as an attribute of the
object.
If we create a new Archetypes type that contains this same lines
field, but tries to store it in the relational database, we run into
problems with Archetypes's default behaviors.
First, the object type:
# CustomerListSQL.py
from Products.Archetypes.public import *
from Products.Archetypes.TemplateMixin import TemplateMixin
from Products.Archetypes.SQLStorage import PostgreSQLStorage
schema = BaseSchema + Schema((
TextField('body',
required=1,
storage=PostgreSQLStorage(),
primary=1,
searchable=1,
default_output_type='text/html',
allowable_content_types=('text/restructured',
'text/plain',
'text/html',
'application/msword'),
widget=RichWidget,
),
StringField("phone",
index="FieldIndex",
storage=PostgreSQLStorage(),
),
LinesField("clients",
storage=PostgreSQLStorage()),
)) + TemplateMixin.schema
class CustomerListSQL(TemplateMixin, BaseContent):
"""Our example object"""
schema = schema
archetype_name = "Customer List SQL"
actions = TemplateMixin.actions
registerType(CustomerListSQL)
Restart Archetypes, and don't forget to add the new type to
portal_types.
At the time of this writing, Archetypes tries to create the new table
with the field type lines for the clients field. This is not a
valid field type for PostgreSQL (or any other database I know of), and
therefore, the addition of the table fails, and any attempt to add an
object of this type fails since there is no table to store them in.
There are several different ways we could fix this problem.
Create the table before Archetypes does.
If the table already exists, Archetypes won't create it. We can
easily create the table, and give it a text type for the clients
field.
The table structure would be:
CREATE TABLE Customerlistsql (
uid text NOT NULL PRIMARY KEY,
parentuid text,
body text,
phone text,
clients text
);
Change the mapping performed by Archetypes.
We can fix this problem by patching SQLStorage.py to do the right
thing and create a text field by changing the type mapping that
Archetypes does. You can do this either by editing SQLStorage.py
and making changes for your database type, or, if you'd rather not
modify the Archetypes source code, you can subclass your storage
type, make the changes there, and use this new subclassed storage
type. We'll look explicitly at the subclassing strategy later in
this document; for now, we'll make changes directly to
SQLStorage.py.
The change we want is in the dictionary db_type_map, which
translates an Archetypes field type into the relational database
field type. As of this writing, there is no translation for lines,
so Archetypes uses lines as the relational database field type.
We'll add a translation for lines to become text:
db_type_map = {
'object': 'bytea',
'file': 'bytea',
'fixedpoint': 'integer',
'reference': 'text',
'datetime': 'timestamp',
'string': 'text',
'metadata': 'text',
'lines':'text', # THIS IS THE CHANGE
}
If you restart Archetypes and try to add your object now, it will
create the table and let you create objects.
Create a suitable domain in PostgreSQL.
PostgreSQL, like many advanced SQL databases, supports the notion of
domains. A domain is a custom-defined explanation of a database
type, which can be referenced as if it were a real type.
For example, if you commonly want to use a varchar(20) not null
for telephones in a database, you could create a domain called
telephone that is defined as varchar(20) not null, and then you
can simply create your tables with the field type telephone to get
the right definition and behavior.
We'll create a domain called lines:
CREATE DOMAIN lines AS text;
Domains can contain restrictions (such as CHECK constraints and
NOT NULL requirements), but in this case, we don't want or need
any of these. This simple definition will be enough.
Now, when Archetypes tries to create a field with the type lines,
it will succeed.
In some ways, this is the best strategy of our three, as it lets
other applications and users understand that this is a lines
field. It's still stored as text, and behaves as such, but if you
look at the table structure, you'll see lines, which can remind
you of its newline-separated, list-oriented use.
A serious problem still persists, though.
The newline-separated entries from the form (the "lines") are turned
into a Python list by Archetypes, such as:
[ 'cat', 'dog', 'bird' ]
but SQLStorage attempts to store this list directly in the
database. This ends up as the literal string value
"['cat,'dog','bird']" which is Archetypes stores in the database:
database=# SELECT uid, clients FROM Customerlistsql ;
uid | clients
---------------------------------+------------------------
CustomerListSQL.2003-07-23.1619 | ['cat', 'dog', 'bird']
(1 row)
Unfortunately, this string representation of a Python list is a
difficult format to work with in the database, and not handled
correctly coming out by Archetypes. When Archetypes gets the data back
from the relational database, it sees it as a single string. It tries
to turn this isnto a list, with the following results:
[
'
c
a
t
'
,
'
d
o
g
'
,
'
b
i
r
d
'
]
As this is the way Python handles being given a string and being told
to treat it like a list.
The solution is that we want to write a custom marshaller and
unmarshaller. These are the routines that Archetypes will run on a
value before it tries to write them to the database, and after it
retrieves the value from the database.
There are hooks in Archetypes for this: any function called map_XXX
is called when storing field type XXX and a method called
unmap_XXX is called when retrieving field type XXX.
Our mapping will convert this list back to a newline-separated string,
and this is the format it will be given to our relational database
as:
def map_lines(self, field, value):
return '\n'.join(value)
Our unmapping method will convert the newline-separated string back to
a Python list:
def unmap_lines(self, field, value):
return value.split('\n')
Both of these should go into SQLStorage.py, as methods the class
SQLStorage or as methods of the class for your particular relational
database. If don't want to (or can't) modify the source to
Archetypes, you could subclass your storage class, add the methods to
the subclass, and have your object schema fields use your new,
subclass storage type. We'll cover this concept of subclassing a
storage class extensively later, when we subclass an improved version
of the PostgreSQL storage class.
Now we can transparently work with our lists: they appear and are
edited on the form as a newline-separated string (so we can easily
edit them in a textarea), they're handled in Zope as a Python list
object (so we can work naturally with them and don't have to be
concerned with how they're stored), and they're stored in the
relational database as a simple newline separated list so we can
access them simply.
While our solution above lets Archetypes store the data and get it
back in one piece, it isn't very suitable in the relational database:
most relational database querying programs and reporting programs are
ill-equipped to deal with searching for individual values that are
stuffed into text fields.
To find all customers that have two values, "fish" and "cat", in
clients, you could write queries like:
SELECT * FROM Customerlistsql
WHERE clients LIKE 'cat\n%fish'
OR clients LIKE 'cat\n%fish\n%'
OR clients LIKE '%\ncat\n%fish'
OR clients LIKE '%\ncat\n%fish\n%'
OR clients LIKE 'fish\n%cat'
OR clients LIKE 'fish\n%cat\n%'
OR clients LIKE '%\nfish\n%cat'
OR clients LIKE '%\nfish\n%cat\n%'
(and this is still an incomplete example for this!)
However, this is ugly, slow, unindexable , and error-prone,
especially as you add more predicates to the logic.
We'll exploit a feature of PostgreSQL that allows us to store arrays
in a field, so that one field holds an array of values. While this is
similar to storing as newline-separated text, there are many functions
in PostgreSQL that can quickly find records having a value in an
array, or count the number of values in an array, and so on--all the
things that would be slow and unwieldy using text.
First, let's change our table structure to use arrays:
database=# ALTER TABLE Customerlistsql DROP CLIENTS;
ALTER TABLE
database=# ALTER TABLE Customerlistsql ADD CLIENTS text[];
ALTER TABLE
The type text[] is a PostgreSQL type for storing an array of text
values. We can test out the array storage works directly in
PostgreSQL by updating an existing record and examining it:
database=# UPDATE Customerlistsql SET clients='{cat,dog,bird}';
UPDATE 1
database=# SELECT uid, clients FROM Customerlistsql;
uid | clients
---------------------------------+----------------
CustomerListSQL.2003-07-23.1619 | {cat,dog,bird}
(1 row)
database=# SELECT uid, clients[1] FROM customerlistsql;
uid | clients
---------------------------------+---------
CustomerListSQL.2003-07-23.1619 | cat
Now we can change our map_lines and unmap_lines methods from
above, to write out and retrieve values written in this format:
def map_lines(self, field, value):
return "{%s}" % ','.join(value)
def unmap_lines(self, field, value):
return value.strip("{}").split(',')
Note
These are very naive implementations, as they do not deal with the
possibility that a list item might have a comma in it. It would be
quite easy, though, to write versions that quoted the comma and
unquoted it for unmapping.
Restart Archetypes to pick up the changes to the storage type, then
edit an existing or add a new object. Notice how the values you put
into the clients field end up as array in PostgreSQL, and are read
correctly.
While the last section works wonderfully, not everyone is lucky enough
to work with PostgreSQL, "The World's Most Advanced Open Source
Database". Many databases don't support an array type.
In this case, or even when using PostgreSQL, you can store the
individual client items in a related child table. We could, of course,
do this entirely in stock Archetypes: have the Customer object be
folderish (inherited from BaseFolder), and add independent client
objects, as we did earlier for Orders. In some cases, as it did for
Orders, this makes sense: if the child record has more than one
piece of information to it, or if it might need to undergo separate
approval, workflow, etc., you want to have a customizable way for
Archetypes to let users edit this information, etc.
In many cases, though, this would be overkill and annoying for the
user. For keeping track of a simple list of names or such (as our
examples for the clients field have been), having to add separate
objects for each individual client is burdensome from a UI
perspective, and creates additional Zope objects that aren't really
needed.
Rather, we'll keep our simple, clean textarea/lines interface, but
dynamically create and delete child records in a related table as
needed.
We'll store these list items in a related table, called clients:
CREATE TABLE clients ( uid TEXT NOT NULL
, client TEXT NOT NULL
, PRIMARY KEY ( uid, client ) );
Note
(the last line in this statement makes the primary key a
compound
primary key composed of both the
uid and
client fields. We can
have multiple children for each parent UID, and different parents
can have the same client value, but each child can have the same
client only once. If it should be possible in your application that
the same item could appear in the list more than once, remove this
restriction.)
What we now is a incoming function that will take the
newline-separated string and create child records for each line, and
an outgoing function that will turn child records back into a
newline-separated string.
So far, we've written procedural functions in PostgreSQL in both
PL/PgSQL and PL/perlU. We could do this new function in either of
those languages. PL/PgSQL has very poor string handling, though, so
we'll rule that out. Instead of coding more Perl (this is a Zope
HOWTO, after all!), let's take this chance to try out Python coding in
PostgreSQL. The examples below do not require any untrusted language
features, so they'll work with PL/Python (PostgreSQL 7.3 and previous)
or PL/PythonU (PostgreSQL 7.4); however, if you're using PL/PythonU,
you'll need to change the LANGUAGE declaration at the end of the
function definition to be plpythonu.
First, make sure that plpython is enabled for your database:
$ createdb plpython your_db_name
Then, let's add a Python function that, given the uid and
newline-separated clients field, adds the child records:
/* split newline-sep text and insert into child table */
CREATE OR REPLACE FUNCTION lines_to_clients (text, text) RETURNS integer as '
plan = plpy.prepare( "INSERT INTO Clients VALUES ( $1, $2 )",
[ "text", "text" ] )
plpy.execute("DELETE FROM Clients WHERE uid=''%s''" % args[0])
for ln in args[1].split("\\n"):
plpy.execute(plan, [ args[0], ln ])
return 1'
LANGUAGE plpython;
First, we delete any existing clients associated with this uid.
Then, we iterate over the list, and inserts a client for each entry.
You can find out full information about using PL/Python with
PostgreSQL at PL/Python .
We can try out our function to make sure it works:
database=# SELECT lines_to_clients('a','one\ntwo\nthree');
lines_to_clients
------------------
1
(1 row)
database=# SELECT * FROM Clients;
uid | client
-----+--------
a | one
a | two
a | three
(3 rows)
You can ignore the return value of the first SELECT; functions must
return a value, so the one is just a throwaway value. The second
SELECT, however, demonstrates that our function is working.
Now, a function to pull together all of the children and return as a
newline-separated string:
/* get child records and concatenate into newline-sep text */
CREATE OR REPLACE FUNCTION clients_to_lines (text) RETURNS text as '
rv = plpy.execute("SELECT client FROM Clients WHERE uid=''%s''" % args[0])
return "\\n".join([ ln["client"] for ln in rv ])
'
LANGUAGE plpython;
And we can test that:
database=# select clients_to_lines('a');
clients_to_lines
------------------
one
three
two
(1 row)
We can see that our individual records are returned as one row of
newline-separated text. (There isn't actually a space at the start of
the "one" record; this is simply a mirage created by the formatting
psql does of query results).
To use this, though, we'll need for PostgreSQL to return the result of
clients_to_lines when Archetypes selects from the table--so we'll
need to use a view instead of our table. Views are discussed earlier
in this document, in If You Need A Very Different Table Structure.
First, we'll move the table to a new name:
ALTER TABLE customerlistsql RENAME TO customerlistsql_table;
Then create our view:
CREATE VIEW customerlistsql AS
SELECT uid
, parentuid
, body
, phone
, clients_to_lines(uid) AS clients
FROM customerlistsql_table;
Note that we explicitly tell our relational database to call the final
column in the view by the name clients. Otherwise, our relational
database wouldn't know what to call this field and would give it a
generic name, and Archetypes couldn't find it.
Next, we'll create the update rule. This is what will be called when
you try to update the view. It performs a normal update to the
behind-the-scenes customerlistsql_table table, then calls our
inserting function function for the clients:
CREATE RULE customerlistsql_upd AS
ON UPDATE TO customerlistsql DO INSTEAD (
UPDATE customerlistsql_table
SET uid=NEW.uid
, parentuid=NEW.parentuid
, body=NEW.body
, phone=NEW.phone
WHERE uid=OLD.uid;
SELECT lines_to_clients(NEW.uid, NEW.clients);
);
We can test this out:
database=# UPDATE Customerlistsql SET clients='dog\ncat';
lines_to_clients
------------------
1
(1 row)
database=# SELECT uid, clients FROM Customerlistsql;
uid | clients
---------------------------------+---------
CustomerListSQL.2003-07-23.1619 | cat
dog
(1 row)
Again, we can ignore the SELECT results we get from our UPDATE.
This is because our function is returning some information, even
though it's not useful to us.
Our insert rule is similar and straightforward:
/* when inserting into the view, break apart the new mylines
value and insert
*/
CREATE RULE customerlistsql_ins AS
ON INSERT TO customerlistsql DO INSTEAD (
INSERT INTO customerlistsql_table ( uid
, parentuid
, phone
, body
)
VALUES ( NEW.uid
, NEW.parentuid
, NEW.phone
, NEW.body
);
SELECT lines_to_clients( NEW.uid, NEW.clients );
);
We can test this in PostgreSQL to see that it works. First, our
insert:
INSERT INTO customerlistsql ( uid
, parentuid
, phone
, body
, clients )
values ( 'b'
, 'c'
, '555-1212'
, 'body'
, 'one\ntwo\npickle\nshoe' );
Now, let's examine the results:
database=# SELECT * FROM customerlistsql WHERE uid='b';
uid | parentuid | body | phone | clients
-----+-----------+------+----------+---------------------
b | c | body | 555-1212 | one
pickle
shoe
two
(1 row)
database=# SELECT * FROM clients WHERE uid='b';
uid | client
-----+--------
b | one
b | pickle
b | shoe
b | two
(4 rows)
Note
In most relational databases (including PostgreSQL), there's no
guarantee that the order records were inserted is the order they
will be selected back out (that's what the
ORDER BY clause of the
SELECT statement is for.) In this last example, we inserted
one\ntwo\npickle\nshoe but got back the re-ordered
[one,
pickle, shoe, two]. If the order of items is important to your
database application, see
Maintaining order ,below.
And, finally, our DELETE rule:
/* when deleting from the view, delete the child records as well */
/* note: this could be handled automatically if we used referential
integrity and has 'on delete cascade' in our table definition
for the child table.
*/
CREATE RULE customerlistsql_del AS
ON DELETE TO customerlistsql DO INSTEAD (
DELETE FROM clients
WHERE uid=old.uid;
DELETE FROM customerlistsql_table
WHERE uid=old.uid;
);
And we can test this last piece in PostgreSQL:
database=# DELETE FROM customerlistsql WHERE uid='b';
DELETE 1
database=# SELECT * FROM customerlistsql WHERE uid='b';
uid | parentuid | body | phone | clients
-----+-----------+------+-------+---------
(0 rows)
database=# SELECT * FROM clients WHERE uid='b';
uid | client
-----+--------
(0 rows)
As mentioned in the comments, the delete rule would not be neccessary
if you have referential integrity on the customerlistsql and
clients tables. In this case, you could either block deleting a
CustomerListSQL object if it had any related clients, or you could
cascade the deletion. Since this is a small, related table that is
meant to change everytime someone edits the CustomerListSQL object,
cascading seems more likely to be your choice.
We could set this up as:
ALTER TABLE Clients ADD FOREIGN KEY (uid)
REFERENCES customerlistsql_table ON DELETE CASCADE;
And drop our deletion rule and recreate it in a simpler form:
DROP RULE customerlistsql_del ON customerlistsql;
CREATE RULE customerlistsql_del AS
ON DELETE TO customerlist_sql DO INSTEAD
DELETE FROM customerlistsql_table
WHERE uid=OLD.uid;
Note that even though our deletion rule is as straightfoward as
possible, we still need it, since PostgreSQL (and most relational
databases) won't let you delete from a view unless you have a rule
explaining how to do it.
The referential integrity version is slightly nicer as it protects you
in case you deleted directly from customerlistsql_table.
As we saw in the above section, the order of the child client lines as
retrieved may not match the order they were inserted.
We can solve this, if we need to, by adding a SERIAL field to our
client table and using this same field to order the selections.
Let's add the field to our table:
CREATE SEQUENCE clients_sort_seq;
ALTER TABLE clients ADD sort INT;
UPDATE client SET sort = nextval('clients_sort_seq');
ALTER TABLE clients ALTER sort SET NOT NULL;
ALTER TABLE clients ALTER sort SET DEFAULT nextval('clients_sort_seq');
(or, we could do this when we create the table by just adding the
field then, which is much easier):
...
sort SERIAL NOT NULL
...
Then we'll make a simple change to our clients_to_lines function:
/* get child records and concatenate into newline-sep text */
CREATE OR REPLACE FUNCTION clients_to_lines (text) RETURNS text as '
rv = plpy.execute( "SELECT client"
. " FROM Clients WHERE uid=''%s''"
. " ORDER BY sort" % args[0])
return "\\n".join([ ln["client"] for ln in rv ])
'
LANGUAGE plpython;
Now we're guaranteed to records back in the same order we inserted
them.