Files
ubicloud/spec/model/strand_spec.rb
Daniel Farina a05fc1db61 Begin marking cleared leases by backdating the lease 1000 years
The next step after this is to set `NOT NULL` and change the
dispatcher the scan predicate.

While doing a bit of optimizing on common queries against `strand`,
the exclusion of leased tuples is somewhat more complicated than
necessary:

    => CREATE INDEX strand_scan_test ON strand (schedule, lease);
    => SET enable_seqscan = false; -- useful in test, with empty strand relation
    => EXPLAIN SELECT *
    FROM "strand"
    WHERE ((lease IS NULL OR lease < now()) AND schedule < now()
           AND exitval IS NULL) ORDER BY "schedule" LIMIT 80;

                                            QUERY PLAN
    ------------------------------------------------------------------------------------------
     Limit  (cost=0.29..60.47 rows=80 width=259)
       ->  Index Scan using strand_scan_test on strand  (cost=0.29..304.95 rows=405 width=259)
             Index Cond: (schedule < now())
             Filter: ((lease IS NULL) OR (lease < now()))
    (4 rows)

This is mostly okay, since the fraction of tuples that are leased is
generally small, but would be better to evaluate the entire predicate
in the index condition. Unfortunately, Postgres doesn't know how to
start two distinct scans into the index (e.g. one to scan for `lease
IS NULL`, a second for the `lease < now()` condition.  For
illustration, a `UNION ALL` can accomplish this:

    => EXPLAIN (SELECT * FROM strand
     WHERE lease IS NULL AND schedule < now() AND exitval IS NULL
     ORDER BY schedule LIMIT 80)
    UNION ALL
    (SELECT * FROM strand
     WHERE lease < now() AND schedule < now() AND exitval IS NULL
     ORDER BY schedule LIMIT 80)
    ORDER BY schedule LIMIT 80;
                                                     QUERY PLAN
    -------------------------------------------------------------------------------------------------------------
     Limit  (cost=0.27..16.35 rows=2 width=212)
       ->  Merge Append  (cost=0.27..16.35 rows=2 width=212)
             Sort Key: strand.schedule
             ->  Limit  (cost=0.13..8.15 rows=1 width=212)
                   ->  Index Scan using strand_scan_test on strand  (cost=0.13..8.15 rows=1 width=212)
                         Index Cond: ((schedule < now()) AND (lease IS NULL))
             ->  Limit  (cost=0.13..8.15 rows=1 width=212)
                   ->  Index Scan using strand_scan_test on strand strand_1  (cost=0.13..8.15 rows=1 width=212)
                         Index Cond: ((schedule < now()) AND (lease < now()))

But another way to do this, pursued by this patch, is to eliminate the
two distinct algebras, one for `IS NULL` and one for `<` altogether,
by eliminating NULLs, so that the following query is complete to
identify all scheduled work with an expired lease (as seen in the
second half of `UNION ALL` above):

     SELECT * FROM strand
     WHERE lease < now() AND schedule < now() AND exitval IS NULL
     ORDER BY schedule LIMIT 80

To do this, when clearing a lease, backdate the current time by 1000
years, thus, they'd look like `1025-04-01 16:07:12.805075-07:52:58`.
This format has a few positive properties:

1. It's early enough to maintain correctness of leasing mutual
   exclusion.
2. It's immediately able to be visually identified by a leading `1`
3. It doesn't have a zero-padded integer for a year (e.g. `0025`) that
   can sometimes cause confusion in parsing routines.
4. It's never a negative year.
5. The lower bits can inform you when the lease was released, i.e. the
   calculation is reversible.
6. It's easy to split the data set into cleared and uncleared leases,
   by using a constant value (e.g. year 2000):

        SELECT * FROM strand WHERE lease < '2000 01 01'
2025-04-01 17:58:40 -07:00

3.4 KiB