Indexes

Aidbox index management is in draft stage. API will change.

Database indexes are essential for performance. In particular you will need indexes to speed up search requests.

Aidbox provides mechanisms to

  • manage indexes

  • suggest indexes

  • generate indexes automatically

Background

Aidbox uses PostgreSQL database for storage. Most of resource data is contained in resource column with jsonb type.

Consider simple example: active searh parameter for Patient resource.

Let's try the search query

GET /Patient?active=true

Use _explain to find out SQL query generated by this request

GET /Patient?active=true&_explain=analyze

Possible response is

query:
  - >-
    SELECT "patient".* FROM "patient" WHERE "patient".resource @> ? LIMIT ?
    OFFSET ? 
  - '{"active":true}'
  - 100
  - 0
plan: >-
  Limit  (cost=0.00..1.01 rows=1 width=124) (actual time=0.015..0.015 rows=0
  loops=1)
    ->  Seq Scan on patient  (cost=0.00..1.01 rows=1 width=124) (actual time=0.014..0.014 rows=0 loops=1)
          Filter: (resource @> '{"active": true}'::jsonb)
          Rows Removed by Filter: 1
  Planning Time: 0.729 ms

  Execution Time: 0.050 ms

Corresponding SQL is

SELECT "patient".*
FROM "patient"
WHERE "patient".resource @> '{"active": "true"}'::jsonb
LIMIT 100
OFFSET 0

Here @> is containment operator. It tests whether jsonb value on the right-hand side is contained in the jsonb value on the left-hand side.

Without indexes Postgres has to check this condition for every Patient resource stored in the database.

However, GIN indexes can speed up these kind of queries.

We can create GIN index for the resource column

CREATE INDEX patient_resource_gin_idx
ON Patient
USING GIN (resource)

Now Postgres can use this index to make search much faster.

Functional indexes

Consider more complex example: name search parameter for Patient resource.

Request

GET /Patient?name=abc

Generates SQL like

SELECT *
FROM Patient
WHERE
  aidbox_text_search(
    knife_extract_text(
      resource,
      '[["name","family"],["name","given"],["name","middle"],["name","text"]]'
    )
  ) ILIKE unaccent('% abc%')
LIMIT 100
OFFSET 0

Postgres' pg_trgm module supports index searches for ILIKE queries.

You can create functional index to speed up this query:

CREATE INDEX patient_name_trgm_idx
ON Patient
USING GIN (
  aidbox_text_search(
    knife_extract_text(
      resource,
      '[["name","family"],["name","given"],["name","middle"],["name","text"]]'
    )
  ) gin_trgm_ops
)

Index management

Managing indexes manually is quite tedious and error-prone.

Aidbox provides a mechanism to manage indexes automatically using Aidbox configuration project.

Consider this example

{ns main
 import #{aidbox.index.v1
          aidbox
          aidbox.repository.v1}

 my-index
 {:zen/tags #{aidbox.index.v1/index}
  :table :patient
  :expression "(jsonb_path_query_array(\"patient\".resource, ( '($.\"name\"[*]).** ? (@.type() == \"string\")')::jsonpath)::text) gin_trgm_ops"
  :type :gin}

 patient-repository
 {:zen/tags #{aidbox.repository.v1/repository}
  :resource-type :patient
  :indexes #{my-index3 my-index1}}

 repositories
 {:zen/tags #{aidbox/service}
  :engine aidbox.repository.v1/engine
  :repositories #{patient-repository}}

 box {:zen/tags #{aidbox/system}
      :services
      {:repositories repositories}}}

Here

  • box is the Aidbox project entrypoint

  • repositories is an Aidbox service

  • patient-repository is an Aidbox repository. It configures resource behavior. Currently it can only add indexes and search parameters.

  • my-index is index definition

Use aidbox.index.v1/sync-indexes RPC to update indexes.

POST /rpc
Content-Type: application/json
Accept: application/json

{"method": "aidbox.index.v1/sync-indexes"}

This RPC runs workflow that creates indexes requested by your configuration. And removes indexes not requested.

Aidbox managed indexes start with aidbox_mng_idx prefix. So your custom indexes which do not start with this prefix will not be affected by sync-indexes RPC.

Auto-generated indexes

You can make index by your own with Index Suggestion API. However, Aidbox can be configured to make indexes for desired SearchParameters at start automatically.

Import aidbox.index.v1 in the example above and add :indexes into patient-repository.

 patient-repository
 {:zen/tags #{aidbox.repository.v1/repository}
  :resourceType "Patient"
  :indexes #{my-index1}
  :extra-parameter-sources :all
  :search-parameters #{my-parameter}}

Add new symbol my-index1 with tag aidbox.index.v1/auto-index to make index on start, based on Index Suggestion API for Patient.brthd SearchParameter.

  my-index1
 {:zen/tags #{aidbox.index.v1/auto-index}
  :for my-parameter}

After restart new index will be added.

select * from pg_indexes where tablename = 'patient';

How to make my index explicitly with SQL?

Use aidbox.index.v1/index tag with :expression and PostgreSQL index:type fields:

 my-index2
 {:zen/tags #{aidbox.index.v1/index}
  :table "patient"
  :expression "(jsonb_path_query_array(\"patient\".resource, ( '($.\"name\"[*]).** ? (@.type() == \"string\")')::jsonpath)::text) gin_trgm_ops"
  :type :gin}

After restart new index will be added:

CREATE INDEX aidbox_mng_idx_main_my_index2
ON public.patient 
USING gin (((jsonb_path_query_array(resource, '$.\"name\"[*].**?(@.type() == \"string\")'::jsonpath))::text) gin_trgm_ops)

As in previous section, actuall creation/deletion of indexes is triggered with aidbox.index.v1/sync-indexes RPC.

Formal description of Zen Indexes:

pageZen Indexes

Index suggestion

Aidbox provides two RPCs which can suggest you indexes

Suggest indexes for parameter

Use aidbox.index/suggest-index RPC to get index suggestion for specific search parameter

POST /rpc
Content-Type: text/yaml
Accept: text/yaml

method: aidbox.index/suggest-index
params:
  resource-type: <resourceType>
  search-param: <searchParameter>

Suggest indexes for query

Use aidbox.index/suggest-index-query RPC to get index suggestions based on query

POST /rpc
Content-Type: text/yaml
Accept: text/yaml

method: aidbox.index/suggest-index-query
params:
  resource-type: Observation
  query: date=gt2022-01-01&_id=myid

Last updated