If you’re thinking about using a key value store for your technology stack, you shouldn’t just consider all the new kids on the block.
Postgres has a some very cool tricks up its sleeve. It has its own integrated key value data type.
And today we’ll go through how to write your own aggregations for postgres hstore.
The use case
At adeven our technology stack is heavily based on postgres. Our latest product adjust.io tracks
downloads, clicks, impressions and many more. To see the distribution of the parameters we track, we need to implement our own aggregations.
Let’s say you have a simple test table which stores creation time along with country:
123456789101112
CREATETABLEtest(countrycharactervarying(12),created_attimestampwithout time zone);INSERTINTOtest(created_at,country)SELECTtime,(Array['de','en','fr','us','it'])[(random()*4+1)::int]FROMgenerate_series('2012-10-20 00:00'::timestamp,'2012-10-20 23:59','1 second')time;CREATEEXTENSIONIFNOTEXISTShstore;
Count countries
If you want to count the number of times different countries are represented in the dataset, you can do it using the postgres hstore function.
You’re returned an hstore document with country as key and the count as value. Note that both key and value in hstores must be of type text.
This basically counts the occurrence of values in an array since array_agg(country) returns an array of countries.
And in order to not confuse your coworker, you can put that in a handy function:
Let’s say you want to store pre-aggregated results in some other table.
123456789
CREATETABLEtest_by_hour(countrieshstore,created_attimestampwithout time zone);INSERTINTOtest_by_hour(countries,created_at)SELECTarray_count(array_agg(country)),date_trunc('hour',created_at)AShourFROMtestGROUPBYdate_trunc('hour',created_at);
To get the distribution of the whole table you now need to aggregate (sum) hstore documents.
Luckily you can define your own aggregations in postgres; to do so, you need a function that knows how to add two hstores.