Files
ubicloud/spec/db_spec.rb
Daniel Farina 596c56cae3 Add test to check that collations of text columns are "C"
Most Postgres `initdb` will create databases influenced by the system
collation, and both typically and reasonably, that is not the "C"
collation.  Our migrations all set the "C" collation in text
attributes for tables created for clover, and this test keeps it that
way, unless specific exceptions are added.

When the test fails, it has output like this to identify the tables
and attributes:

    expected: ""
         got: "public.bar.a, public.foo.a"

    (compared using ==)

In order to be convenient in development and work with `initdb`s with
a non-C default collation, system tables are excluded from the query.

So, why bother?

In a nutshell, as a percentage of all `text` attributes that will ever
be made in clover, collations will be useful for very few, and
collations do not come for free.

Collation code is complex, risking bugs, version changes, or
implementation-specific behavior from `strxfrm`.  The result of such a
problem is corrupt indices.

Additionally, a *single* defaulted collation is suspect for cases
where in-database collation support is required in a global world: is
`en_US` really good enough for everyone?  It seems like any such case
might require consideration of more than one collation, and explicit
handling.

The one case where `en_US` or some other common collation might be
nice-to-have is as it comes to sorting in mixed capitalization and
punctuation of programming identifiers, but in a setting where it's
too burdensome to specify a collation.  In my experience, this case is
only theoretical.

As a nice but non-essential effect for our use on limited amounts of
data, textual comparisons are all a bit faster when `strcmp` or
`memcmp` can be used to compare strings.
2023-08-23 07:19:34 -07:00

16 lines
635 B
Ruby

# frozen_string_literal: true
RSpec.describe DB do
it "has no unexpectedly collated columns" do
expect(described_class[<<SQL].all.map { _1[:name] }.join(", ")).to eq ""
SELECT quote_ident(nspname) || '.' || quote_ident(relname) || '.' || quote_ident(attname) AS name
FROM pg_class
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid AND
pg_namespace.nspname NOT IN ('pg_catalog', 'information_schema')
JOIN pg_attribute ON pg_class.oid = pg_attribute.attrelid
JOIN pg_collation ON pg_attribute.attcollation = pg_collation.oid AND
pg_collation.collcollate <> 'C'
SQL
end
end