Most Postgres `initdb` will create databases influenced by the system collation, and both typically and reasonably, that is not the "C" collation. Our migrations all set the "C" collation in text attributes for tables created for clover, and this test keeps it that way, unless specific exceptions are added. When the test fails, it has output like this to identify the tables and attributes: expected: "" got: "public.bar.a, public.foo.a" (compared using ==) In order to be convenient in development and work with `initdb`s with a non-C default collation, system tables are excluded from the query. So, why bother? In a nutshell, as a percentage of all `text` attributes that will ever be made in clover, collations will be useful for very few, and collations do not come for free. Collation code is complex, risking bugs, version changes, or implementation-specific behavior from `strxfrm`. The result of such a problem is corrupt indices. Additionally, a *single* defaulted collation is suspect for cases where in-database collation support is required in a global world: is `en_US` really good enough for everyone? It seems like any such case might require consideration of more than one collation, and explicit handling. The one case where `en_US` or some other common collation might be nice-to-have is as it comes to sorting in mixed capitalization and punctuation of programming identifiers, but in a setting where it's too burdensome to specify a collation. In my experience, this case is only theoretical. As a nice but non-essential effect for our use on limited amounts of data, textual comparisons are all a bit faster when `strcmp` or `memcmp` can be used to compare strings.
16 lines
635 B
Ruby
16 lines
635 B
Ruby
# frozen_string_literal: true
|
|
|
|
RSpec.describe DB do
|
|
it "has no unexpectedly collated columns" do
|
|
expect(described_class[<<SQL].all.map { _1[:name] }.join(", ")).to eq ""
|
|
SELECT quote_ident(nspname) || '.' || quote_ident(relname) || '.' || quote_ident(attname) AS name
|
|
FROM pg_class
|
|
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid AND
|
|
pg_namespace.nspname NOT IN ('pg_catalog', 'information_schema')
|
|
JOIN pg_attribute ON pg_class.oid = pg_attribute.attrelid
|
|
JOIN pg_collation ON pg_attribute.attcollation = pg_collation.oid AND
|
|
pg_collation.collcollate <> 'C'
|
|
SQL
|
|
end
|
|
end
|