Joining a MySQL table against itself (self-joining MySQL tables)

So I had an interesting MySQL problem this weekend -- I have a table:

mysql> desc ldaplog;
+------------+------------------+------+-----+---------------------+----------------+
| Field      | Type             | Null | Key | Default             | Extra          |
+------------+------------------+------+-----+---------------------+----------------+
| ldaplog_id | int(10) unsigned |      | MUL | NULL                | auto_increment |
| detected   | datetime         |      | MUL | 0000-00-00 00:00:00 |                |
| uid        | varchar(255)     |      | MUL |                     |                |
| event      | varchar(255)     |      | MUL |                     |                |
| attr       | varchar(255)     | YES  | MUL | NULL                |                |
| value      | text             | YES  |     | NULL                |                |
+------------+------------------+------+-----+---------------------+----------------+

It's a log table: every time a change gets detected in LDAP, a row is generated.

'Event' is one of:

mysql> select count(*), event from ldaplog group by event;
+----------+--------------+
| count(*) | event        |
+----------+--------------+
|     3389 | ATTR_ADDED   |
|        1 | ATTR_DELETED |
|       20 | ATTR_UPDATED |
|      357 | UID_ADDED    |
|       25 | UID_DELETED  |
+----------+--------------+

And your typical set of rows associated with a single UID looks lile:

mysql> select * from ldaplog where uid = 'mccreavy';
+------------+---------------------+-----------+------------+-----------------+-----------------------+
| ldaplog_id | detected            | uid       | event      | attr            | value                 |
+------------+---------------------+-----------+------------+-----------------+-----------------------+
|       2181 | 2008-09-28 00:00:00 | mccreavy  | UID_ADDED  | NULL            | NULL                  |
|       2191 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | uid             | mccreavy              |
|       2183 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | givenname       | Michael               |
|       2183 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | lastname        | McCreavy              |
|       2184 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | l               | Santa Clara, CA       |
|       2185 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | mail            | mike@mccreavy.com     |
|       2186 | 2008-09-28 00:00:00 | mccreavy  | ATTR_ADDED | mobile          | 650-274-6633          |
+------------+---------------------+-----------+------------+-----------------+-----------------------+

If a UID gets deleted from LDAP, a row looking like this will get logged:

mysql> select * from ldaplog where uid = 'heinhold' and event = 'UID_DELETED';
+------------+---------------------+----------+-------------+------+-------+
| ldaplog_id | detected            | uid      | event       | attr | value |
+------------+---------------------+----------+-------------+------+-------+
|       4025 | 2008-12-02 00:00:00 | heinhold | UID_DELETED | NULL | NULL  |
+------------+---------------------+----------+-------------+------+-------+

And I was wondering if I could find out two things:

  1. Get a result set of deleted UIDs including the date when the UID was originally detected.
  2. Get a list of UIDs that have not been deleted.

I guess the first one would be kind of easy with a subselect to just get a list of deleted UIDs:

select * from ldaplog where uid in (select uid from ldaplog where event = 'UID_DELETED');
but that wouldn't include the date of the UIDs deletion.

So if I really needed the creation and deletion date of the UID in a single query, I'd have to get fancy with some self-joining.

Self-joining is where you join a table against itself. I should explain it with some more simple examples, but I'll just move on instead.

First, the base query: get the "UID Addred" rows:

select A.uid, A.event, A.detected from ldaplog as A where A.event = 'UID_ADDED';

Second, join the table against itself:

select A.uid, A.event, A.detected, B.event, B.detected
  from ldaplog as A
  left join ldaplog as B
  using (uid)
  where A.event = 'UID_ADDED' and B.event = 'UID_DELETED';

This is the result set I wanted. It just shows the added UIDs that have been deleted.

If the table was logically inconsistent and there was a "UID_DELETED" row with no corresponding "UID_ADDED", that row would not be selected... This is because the base query started with a result set of UIDs that have been added.

Now that I know the list of deleted UIDs, I can use that as a subselect to find the active UIDs:

select uid, detected from ldaplog
  where event = 'UID_ADDED'
  and uid not in
    (select A.uid from ldaplog as A
     left join ldaplog as B
     using (uid)
     where A.event = 'UID_ADDED' and B.event = 'UID_DELETED');

Or even just a count of active UIDs:

select count(*) from ldaplog
  where event = 'UID_ADDED'
  and uid not in 
    (select A.uid from ldaplog as A
     left join ldaplog as B
     using (uid)
     where A.event = 'UID_ADDED' and B.event = 'UID_DELETED');

I bet there are other ways to get these result sets, but this is how I did it.


Follow on Twitter


LinkedIn
Facebook
Notchup, Member 74311




Copyright (C) 2008, Mike McCreavy
[ x ] Console
Timestamp Message
hey
[ x ] Socket Console
Timestamp Message
hey