Skip to content
akeefer edited this page May 8, 2011 · 5 revisions

For these examples, I'm going to base it on having the following datamodel:

  • Issue - an issue in a bug-tracking system
  • Task - a task attached to an issue, which has a foreignkey Issue_id
  • User - a user in the system
  • Group - a group in the system
  • join_User_Group - a join table that links Users and Groups in a many-to-many relationship

Summary of Changes/Work Items

  • Make issue.Tasks.add(task) and issue.Tasks.remove(task) immediate in the database
  • Make issue.Tasks.add(task) fail if the issue is not yet committed
  • Make issue.Tasks.add(task) insert task if task is not already committed
  • Make issue.Tasks.add(task) fail if the task is already assigned to another Issue
  • Make issue.Tasks.remove(task) fail if the task is not part of the Tasks list
  • Sort the issue.Tasks array by ID by default, if no other sort is specified
  • Make issue.Tasks lazy-load, with Tasks retrieved the first time get() or iterator().next() is called
  • Make issue.Tasks.size() issue a count(*) in the database, unless the backing Tasks have already been loaded
  • Adjust the pointer semantics around issue.Tasks such that Task objects added to the list will be returned on subsequent calls (if possible)
  • Make issue.Tasks.add(task) and issue.Tasks iteration set the Issue back-pointer property on the Task
  • Make task.Issue read-only
  • Make user.Groups.add(group) and user.Groups.remove(group) affect the pointers on the other side, so that group.Users is updated appropriately
  • Make user.Groups.add(group) fail if user is not yet committed
  • Make user.Groups.add(group) insert group if group is not yet committed
  • Make user.Groups lazy-load
  • Sort user.Groups by group_id by default
  • Make user.Groups.size() issue a count(*) if the backing data has not yet been loaded
  • Potential: Provide a sorting API for issue.Tasks and user.Groups, such that sorts can be applied prior to loading the list and can affect the actual DB query
  • Potential: Provide a way to dump the array and fk caches on an object
  • Potential: Provide a way to page the loading/iteration of issue.Tasks and user.Groups
  • Potential: Provide a preload() or similar method on issue.Tasks and user.Groups to force loading of the backing data.

More Detailed API Description

FK/Array Manipulation

  • issue.Tasks
  • Desired Behavior: issue.Tasks should return a lazily-populated List that can be used to manipulate the set of Tasks associated with the Issue by their Issue_id method. It should not actually load anything until some method like get() or size() is called. Implicitly, this should always sort by ID at load time to provide a stable iteration order in the absence of any other implied sort. The initial implementation will not attempt to page in the results lazily, but we should reserve the right to do so in the future, either explicitly or implicitly.
  • Pointer semantics: If called repeatedly on the same issue, issue.Tasks should return the same Tasks array with the same Task objects in the array. Calls on a different issue object have no such guarantee.
  • Caching: I could be convinced otherwise, but for now I'll argue that issue.Tasks, once loaded, does not re-sync with the DB on subsequent calls, and the list of Tasks is cached locally. We should provide a method like reloadFromDB() or dropCaches() or something on an object to drop all array and fk caches. That facilitates the situation where a user calls issue.Tasks several times without thinking too much about it, at the expense of the situation where they really want to see concurrent updates from elsewhere in the system.
  • Current Behavior: Currently, the Tasks object is a populated List that actually contains all the items that are in the database. The list is cached on the object, and add() or remove() calls have no affect on the database.
  • issue.Tasks.add(task)
  • Desired Behavior: The general idea is that this call will immediately add task to issue in the database. If issue has not yet been committed, this method will throw an exception. If task already belongs to another Issue, this method will throw an exception. If task has not yet been inserted, this method will issue an insert in the database. Otherwise, this method will issue an update in the database only on the Issue_id column on the task.
  • Pointer Semantics: Adding the task to this issue should result in subsequent calls to issue.Tasks returning the same task pointer back. That may prove difficult to implement in practice, however.
  • Caching: The added task should become part of the cached Tasks list on the Issue object.
  • Current Behavior: Currently, this call is effectively a no-op.
  • issue.Tasks.remove(task)
  • Desired Behavior: The general idea is similar to add(), in that this call will immediately remove the task in the database by nulling out the Issue_id column on the task table. If this task is not associated with this issue, an exception will be thrown. Otherwise, an update statement will be issued to the task.
  • Pointer Semantics: After this call, the task will not appear in the list of Tasks returned by issue.Tasks.
  • Caching: The removed task will no longer be part of the cached Tasks list on the Issue object.
  • Current Behavior: Currently this is effectively a no-op.
  • issue.Tasks.size()
  • Desired Behavior: The size() method on Tasks should issue a count() in the database (or whatever the most efficient mechanism is) if the array has not yet been loaded. Otherwise, it should use the cached size of the loaded array.. Note that this can result in race conditions if done outside of a transaction, such that size() can change between when it's called and when the list is iterated over. This could also result in some inefficiencies if size() is called repeatedly and then the list is iterated, since it would issue the count() and then load everything anyway. I believe that's preferable to the alternative, however, which is that size() calls always load the entire array. Note that isEmpty() should rely on size(), and thus will also only issue a count(*) in the database. isEmpty() could potentially be coded to use an EXISTS clause in the DB to be even more efficient. As an optimization, we can also provide an additional load() method on all object collections which will pre-load all elements.
  • Pointer Semantics: Nothing important
  • Caching: As mentioned above, if the array has already been loaded, the size() method will use the contents of that list for the size. Otherwise, the size() method will issue a query in the database.
  • Current Behavior: issue.Tasks currently loads all elements in, so the current behavior is that we always rely on the cached value, effectively, so this proposal would attempt to optimize the case where the array hadn't yet been loaded.
  • task.Issue
  • Desired Behavior: task.Issue should be a read-only property that returns the associated Issue object.
  • Pointer Semantics: the Issue object pointer should be set on the task object when the task is loaded into the Tasks array and when issue.Tasks.add(task) is called. If the task is loaded entirely separately from the issue, then the pointers will not be the same, but if the task is obtained through the issue or added to the issue, they'll match.
  • Caching: the Issue object will be cached on the task after it's loaded. The Issue object will either be assigned when the task is loaded from the Tasks list or added to the Tasks list, or it will be loaded from the DB the first time the Issue property is requested
  • Current Behavior: the Issue object can be set directly
  • Other potential changes/additions
  • Add in comments to the CREATE TABLE statements to control treatment of FKs: as arrays, one-to-ones, or naked fks
  • Add in sorting methods to issue.Tasks that can be applied to the DB if Tasks hasn't yet been loaded
  • Provide a way to make issue.Tasks page data

Join Table Manipulation

  • user.Groups
  • Desired Behavior: Like issue.Tasks, user.Groups should be a lazily-loaded collection. The semantics should effectively be exactly the same as with issue.Tasks. By default, user.Groups should be sorted by group id.
  • Pointer Semantics: The semantics should be the same as for issue.Tasks
  • Caching: The semantics should be the same as for issue.Tasks
  • Current Behavior: The current behavior is that the join array is loaded completely at the time that it's first referenced
  • user.Groups.add(group)
  • Desired Behavior: Similar to the mental model for issue.Tasks. If user is in the database, issues an immediate INSERT call on the join table. If user is not in the database, throws an exception. If the group is not in the database, the group will be inserted first. However, group can be added even if it's associated with different users already, since the relationship is many-to-many.
  • Pointer Semantics: Same as issue.Tasks: after user.Groups.add(group) is called, user.Groups should return a pointer to group. The call to user.Groups should always return the same pointer. In addition, calls to group.Users should return this same user pointer.
  • Caching: Same as issue.Tasks: after the lazy-load is triggered, the contents are cached.
  • Current Behavior: Adds are immediate in the database currently. I'm not sure about the pointer semantics, however.
  • user.Groups.remove(group)
  • Desired Behavior: The same mental model as add. The remove call issues an immediate delete in the database. If the group is not actually in user.Groups, this will throw an exception.
  • Pointer Semantics: No real implication here.
  • Caching: user.Groups should no longer contain group after this call, and group.Users should also no longer contain user.
  • Current Behavior: The remove is immediate, the the cached result on user.Groups is updated, but I don't believe we attempt to update group.Users.
  • user.Groups.size()
  • Desired Behavior: The same as issue.Tasks.size(). If the relationship has already been loaded, use the cached value. Otherwise, issue a count(*).
  • Pointer Semantics: N/A
  • Caching: As described above, size() will rely on the cached value if available, the DB value if not.
  • Current Behavior: Currently, the value is always effectively cached.

Basic TypeInfo Methods

  • Issue.fromID(long : id) : Issue
  • Desired Behavior:
  • Pointer Semantics:
  • Caching:
  • Current Behavior:
  • issue.toID() : long
  • issue.update()
  • issue.delete()
  • Issue.countWithSql(sql : String) : int
  • Issue.count(template : Issue) : int
  • Issue.findWithSql(sql : String) : List
  • Issue.find(template : Issue) : List
  • Issue.findSorted(template : Issue, sortProperty : PropertyReference, ascending : boolean) : List
  • Issue.findPaged(template : Issue, int : pageSize, int : offset) : List
  • Issue.findSortedPaged(template : Issue, sortProperty : PropertyReference, ascending : boolean, pageSize : int, offset : int) : List
  • Issue._New : boolean

Transaction Usage

Notes: I need to double-check what our current semantics are around transactions as they relate to reads.

Clone this wiki locally