Re: Philosophy/Codebase fixes



On Mon, 2005-03-14 at 00:33 -0500, Seth Nickell wrote:

> If you're not familiar with XP, this is a pretty reasonable "15 minute summary":
> http://www.eng.mu.edu/corlissg/168Search.03F/n_extremeProgramming.html
> If we were following more of that stuff (refactor constantly, don't
> design too much up front, etc) we'd be doing a lot better. The whole
> goal of the XP worldview is to make development more "agile". We need
> that pretty badly right now.

Yeah, XP sounds cool. However, it seems to me that we sort of block on
step 1:
     1. Pick a user's story for your next task
We really don't have any user stories, because we don't yet know how the
user uses Yarrr.

> Some other observations from skimming through a lot of the hibernate
> code we have:
> 1) We should not be implementing .equals() and .hashcode() in
> Persistent using the DB key according to:
> http://www.hibernate.org/hib_docs/v3/reference/en/html/persistent-classes.html#persistent-classes-pojo#persistent-classes-equalshashcode
> We probably should just drop Persistent altogether and implement these
> methods in each individual class.

Persistent is sort of nice to avoid some code for the Id in each class,
but we should probably remove the equals and hashcode methods.

> 3) Methods og persisted classes that use a Session object should take
> one rather than directly calling HibernateUtil.getSession().

Why? The hibernate book recommends the Thread Local Session pattern
(i.e. HibernateUtil which is mostly directly typed in from the Hibernate
book), as does the best practises link you gave.

Furthermore, a persistent object really does belong to one and only one
single session, so conceptually basically all functions would need the
session objects, and you would need to pass the right session object.

> 4) Deletes should probably not get a JDBC connection and then manually
> DELETE. See:
> http://www.hibernate.org/hib_docs/v3/reference/en/html/querysql.html#querysql-cud
> for an alternative. I also wonder if there's a way to do an HQL-ish
> mass-delete (clearly we don't want to do an item by item delete using
> session.delete()).

We're doing this because the hibernate (2.1) docs recommended against
using hibernate for mass deletion. This might be better with hibernate
3.

> 5) We should rollback transactions if an exception is thrown.

We already do.

> 6) I don't think the PostgreSQLBlobType / PostgreSQLOIDDialect hacks
> are required anymore in Hibernate3 (the web page mentions the fix was
> committed to a Hibernate3 alpha). There's also a number of workaround
> settings in hibernate.properties that may not be relevant anymore.
> Alex is probably best qualified to re-evaluate these.

Its possible this has been fixed, we need to verify it. I remember it
mostly as a postgresql jdbc driver issue though, so I'm not sure it is
fixable in hibernate. The fix might be in the postgresql 8 jdbc driver
though.

> 7) In TopicPerUser (and maybe other places?) we are using composite IDs.
> The hibernate docs push an alternative approach:
> http://www.hibernate.org/hib_docs/v3/reference/en/html/components.html#components-compositeid
> In a number of places in the docs they state that composite-id was
> included for compatibility with legacy tables that already use this
> approach and should not be used in hibernate-only code. Also, perhaps
> we should be using collection mappings here (Set, etc) instead of
> explicitly the mapping tables ourselves? Didn't look closely so could
> be way off base.

I'm not sure i entirely understand that approach/example. What is the
lineId field? We don't want to generate a separate unique id for the
each per-user part of the topic data. However, if we can avoid that then
maybe there is a better way to map it using this. I don't really think
the current mapping is bad though. It maps accurately what the per-user
data is.

About using collection mappings, I've tried to do that, but it turns out
that collection mappings have some issues. Whenever you add an item to a
collection mapped like that Hibernate will query for the whole
collection. This isn't very good for large mappings, which is
unfortunate, because using mappings (and filters) would be nice.

> 8) I am very suspicious of all the manual locking going on in, e.g.,
> Topic and TopicPerUser. Transaction isolation should mitigate a *lot*
> of the need for explicit locking and won't result in weird performance
> stalls. INSERT is often the most expensive operation in a DB.

If we ignore issues like race conditions I guess we could just ignore
this issue.

However, there really are problems here. A simple example would be
counting the number of people reading a message. Without any locking,
two concurrent updates of the count might result in only adding one to
the count. One can avoid this problem (and locks) using opportunistic
locking, but then we force clients to handle these conflicts instead. 

The way we currently handle the "version" field in e.g. a topic makes
avoiding races like this more important. Whenever we update a topic we
bump the version on the topic and the message that changed, so that you
can easily get incremental updates of a topic. (You just get all the
messages that changed since the last version you got returned.)

> 10) I'm seeing periodic "weird stuff" to deal with performance
> problems I'm not convinced will ever exist, or code to deal with race
> conditions I'm not sure will be an issue. Just as its good for a
> mature codebase to deal with lots of these issues, its a good thing
> for a rapidly changing codebase to avoid getting itself wrapped up in
> all this extra code (esp. since if done wrong things like locking can
> bite you in the ass really hard later). KISS. Lets make things
> interesting before we make them perfect. (I'm not saying I think
> anything I've seen is specifically wrong, I just don't know, its just
> a general caution :-)

I'm not exactly sure what you mean here, but we did see some extremely
bad performance problems with a remote server. The latency of xmlrpc
requests multiplied by the number of calls we needed to make caused the
client to be basically unusable. This was made much better when we
introduced HTTP 1.1 pipelining in the client, but even after that it was
a real problem. (Maybe this is more noticeable to us Europeans, since
the server is in the us.) 

I agree that we should try to keep things as simple as possible though.

> 18) To that end: We should consider using Derby for development (or
> HSQLDB if Derby initializes slowly, though Derby is the more "real DB"
> of the two...) and only pushing to postgresql for production (maybe
> not even then? performance isn't exactly a big issue right now.... and
> we can always switch in the future. Since we'll have a great test
> suite it'll be easy to deal w/ any issues that come up, if any ;-).
> Derby is supported in Hibernate3.

How stable are these databases considered to be?

> 19) We need to clean out DefaultMethods and the various persisted
> classes. It looks like they have a lot of unnecessary baggage. We
> should probably also have methods like topic.getMessages(criteria)
> that take pieces of Criteria queries so that we don't end up writing
> the same code a bazillion times for minor criteria differences.
> 
> 20) DefaultMethods should be small. Each method there should do a
> simple call on some other class. DefaultMethods should mostly contain
> type conversions, and many of those should be shunted into general
> "type mappings" registered w/ the XML-RPC bindings (I *think* it can
> do this, though I didn't see how immediately).

Due to the latency issues, we really need to make each xmlrpc call do a
lot of things so we can avoid having to do multiple calls. This
sometimes makes the xmlrpc calls strange combinations of calls, so the
xmlrpc calls that may look very strange in other classes. 

There is some support in xmlrpc for batching a set of requests in one
call that might help a bit here, but it won't help if the requests
depend on each other.

> 21) We should consider using SOAP instead of XML-RPC. XML-RPC is all
> very well, but part of the mess we're getting is because we really
> could use an object API. We don't have to use the complex parts of
> SOAP, and Apache appears to have decent SOAP Java bindings.

A fine-grained object API is very likely to introduce a lot of latencies
though. :(

Also, neither xml-rpc or soap really have a "session", which is
something I think we need. Right now each call is a totally separate
entity. It would be nice if we could e.g. verify the user in the start
of the session and then never have to do that again, or have the ability
to cache stuff in the server during the lifetime of a session.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a sword-wielding day-dreaming rock star looking for 'the Big One.' She's 
a bloodthirsty green-skinned wrestler with a knack for trouble. They fight 
crime! 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]