Codebase Cleanups / Philosophy



Everyone should read:
http://www.hibernate.org/hib_docs/v3/reference/en/html/best-practices.html

There's a number of points there we're not doing so well on...in particular:
  * Write fine-grained classes and map them using <component>
  * Don't manage your own JDBC connections
  * In a three tiered architecture, consider using saveOrUpdate()
     (we're sort of 2.33 tiered)
  * Consider abstracting your business logic from Hibernate. " Hide
(Hibernate) data-access code behind an interface."
    We're not that large yet, but we're getting there and the
hibernate-ness being spread throughout the code is getting to be a
mess.
   * Implement equals() and hashCode() using a unique business key.

If you're not familiar with XP, this is a pretty reasonable "15 minute summary":
http://www.eng.mu.edu/corlissg/168Search.03F/n_extremeProgramming.html
If we were following more of that stuff (refactor constantly, don't
design too much up front, etc) we'd be doing a lot better. The whole
goal of the XP worldview is to make development more "agile". We need
that pretty badly right now.

Some other observations from skimming through a lot of the hibernate
code we have:
1) We should not be implementing .equals() and .hashcode() in
Persistent using the DB key according to:
http://www.hibernate.org/hib_docs/v3/reference/en/html/persistent-classes.html#persistent-classes-pojo#persistent-classes-equalshashcode
We probably should just drop Persistent altogether and implement these
methods in each individual class.

2) I'd like to have us explicitly persist classes, and provide factory
methods with an option for getting a new object that is already
persisted. Explicitly persisting classes using session.persist()
instead of doing it automatically on new just involves less invisible
magic, and will also allow multiple DBs in the future (migration,
synching mirrors, etc).

3) Methods og persisted classes that use a Session object should take
one rather than directly calling HibernateUtil.getSession().

4) Deletes should probably not get a JDBC connection and then manually
DELETE. See:
http://www.hibernate.org/hib_docs/v3/reference/en/html/querysql.html#querysql-cud
for an alternative. I also wonder if there's a way to do an HQL-ish
mass-delete (clearly we don't want to do an item by item delete using
session.delete()).

5) We should rollback transactions if an exception is thrown.

6) I don't think the PostgreSQLBlobType / PostgreSQLOIDDialect hacks
are required anymore in Hibernate3 (the web page mentions the fix was
committed to a Hibernate3 alpha). There's also a number of workaround
settings in hibernate.properties that may not be relevant anymore.
Alex is probably best qualified to re-evaluate these.

7) In TopicPerUser (and maybe other places?) we are using composite IDs. 
The hibernate docs push an alternative approach:
http://www.hibernate.org/hib_docs/v3/reference/en/html/components.html#components-compositeid
In a number of places in the docs they state that composite-id was
included for compatibility with legacy tables that already use this
approach and should not be used in hibernate-only code. Also, perhaps
we should be using collection mappings here (Set, etc) instead of
explicitly the mapping tables ourselves? Didn't look closely so could
be way off base.

8) I am very suspicious of all the manual locking going on in, e.g.,
Topic and TopicPerUser. Transaction isolation should mitigate a *lot*
of the need for explicit locking and won't result in weird performance
stalls. INSERT is often the most expensive operation in a DB.

9) Our connection pool is too small right now. We need to update the
values in hibernate.propreties or we're going to get more of those
"mysterious hangs" on certain operations. 20 connections or so is a
reasonable starting point for "max number" of connections.

10) I'm seeing periodic "weird stuff" to deal with performance
problems I'm not convinced will ever exist, or code to deal with race
conditions I'm not sure will be an issue. Just as its good for a
mature codebase to deal with lots of these issues, its a good thing
for a rapidly changing codebase to avoid getting itself wrapped up in
all this extra code (esp. since if done wrong things like locking can
bite you in the ass really hard later). KISS. Lets make things
interesting before we make them perfect. (I'm not saying I think
anything I've seen is specifically wrong, I just don't know, its just
a general caution :-)

11) TESTS!!! GODAMNIT! Our testing level is still abysmal. Most
classes, if they have a test suite at all, have like 10% test coverage
tops. If things are hard to test, which I think they are right now,
that's often a very good sign that we need major refactoring. We also
need to be supporting the test suite with lots of "test code". You'll
see I wrote a lot of this for my tests, so that each test is quite
small, and there are lots of them.

12) The DefaultMethods tests are overengineered and make it a pain to
write integration tests for these. I don't think we really need to
test the XML-RPC route to this class for now, lets just get full test
coverage for the calls themselves, which means making tests easy to
write.

13) Underengineer, refactor constantly. We're tending to overengineer
at the start, and then let the codebase go out of control as people
copy-and-paste (often without understanding the code fully) all over.
Collective code ownership is *very* important here, e.g. if you find
yourself duplicating code, refactor it so its not duplicated. Good
unit tests mean that you can refactor w/o fear that you've broken code
you don't understand.

14) We should probably move the tests into a per-package test setup
(e.g. org.yarrr.person.tests) etc. Having one test package is ok when
things are small but makes it hard to find things now.

15) Test *as little as possible* in each test. Don't write monster
tests. Think of this as the ideal: when something breaks the name of
the test that breaks should describe exactly what's broken. Of course,
don't take this *too* extreme, but if you write enough test fixture
and functions to make tests writing easier, you can have fine grained
tests w/o a lot of overhead.

16) Make tests easy to write. Write bodies of supporting code to make
each test as small as possible. The tests themselves can and should be
quick and dirty, as long as its easy to write lots of new tests.
Investment in making tests easier to write is almost always a win,
just remember to keep it as shallow as possible so you can tell WTF
happened when something goes wrong (as always, don't overengineer up
front).

17) The test suite runs FAR too slowly. Fast tests are key. (BTW, you
can select any test in Package Explorer in Eclipse and say "Run
As->JUnit Test" to run just that test, you can also select a failed
test in the junit view and run just that test method, etc). Mostly the
problem is not the size of the suite but that postgresql initializing
is way way way too slow.

18) To that end: We should consider using Derby for development (or
HSQLDB if Derby initializes slowly, though Derby is the more "real DB"
of the two...) and only pushing to postgresql for production (maybe
not even then? performance isn't exactly a big issue right now.... and
we can always switch in the future. Since we'll have a great test
suite it'll be easy to deal w/ any issues that come up, if any ;-).
Derby is supported in Hibernate3.

19) We need to clean out DefaultMethods and the various persisted
classes. It looks like they have a lot of unnecessary baggage. We
should probably also have methods like topic.getMessages(criteria)
that take pieces of Criteria queries so that we don't end up writing
the same code a bazillion times for minor criteria differences.

20) DefaultMethods should be small. Each method there should do a
simple call on some other class. DefaultMethods should mostly contain
type conversions, and many of those should be shunted into general
"type mappings" registered w/ the XML-RPC bindings (I *think* it can
do this, though I didn't see how immediately).

21) We should consider using SOAP instead of XML-RPC. XML-RPC is all
very well, but part of the mess we're getting is because we really
could use an object API. We don't have to use the complex parts of
SOAP, and Apache appears to have decent SOAP Java bindings.

-Seth



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]