If timing is one dimension of your data and you want to index a timestamptz column. How do you index it? What solution is the best for your case?
Suppose that you have table_a, which has a timestamptz column named column_tz. Possibly, column_tz has numerous values of timestamp. By any reason you need to index column_tz by date because you don’t want to build a huge B-tree with many nodes in your memory unnecessarily. Unfortunately you will recognize that you can not create the index like this
CREATE INDEX index_table_a_column_tz_by_date ON table_a (column_tz::DATE);
Can magic parentheses help in this case?
CREATE INDEX index_table_a_column_tz_by_date ON table_a ((column_tz::DATE));
NO, PostgreSQL will say “ERROR: functions in index expression must be marked IMMUTABLE”
Basically timestamptz is time zone related and it depends on time zone of the server. In other words, DATE of a timestamptz will be different in different time zones. But don’t worry, you still can index a timestamptz column if you lock it with a certain time zone.
CREATE INDEX index_table_a_column_tz_by_date ON table_a ((timezone('UTC', column_tz)::DATE));
Then don’t forget to lock time zone for column_tz in your query too, otherwise the index won’t be used. It is reasonable because PostgreSQL doesn’t know what time zone you want to filter.
SELECT timezone('UTC', column_tz)::DATE FROM table_a WHERE timezone('UTC', column_tz)::DATE > '2021-01-01';
The explain result shows that the planner uses Bitmap Index Scan to process the query. This means PostgreSQL scan the index to get tuple addresses and group the addresses by blocks based on visibility map (bitmap) in order to reduce number fetches from disk. Obviously, this strategy is very helpful in case your query has to work a big number of tuples. In my run, the query takes about 1 second to count many tuples.
Alternatively, you can create an immutable function and use it to index column_tz by date.
CREATE OR REPLACE FUNCTION date_of_tz(timestamp) RETURNS date AS$BODY$ SELECT $1::DATE $BODY$LANGUAGE sql IMMUTABLE;CREATE INDEX index_table_a_column_tz_by_date_with_function ON table_a (date_of_tz(column_tz));
Unexceptionally, you also need to use the function in your query if you want to use the index.
SELECT date_of_tz(column_tz) FROM table_a WHERE date_of_tz(column_tz) > '2021-01-01';
For the same purpose but this time the query takes about 3.7 seconds even though it uses the new index. Index Scan means the planner uses the index but it fetches tuple by tuple from disk to answer the query. Of course, number of fetches in this run is much more than the prior run.
Honestly I don’t know why PostgreSQL uses Bitmap Index Scan in the first solution and Index Scan in the second solution for now. However, according to what I see, the second solution is better.