$lookup

Efficient lookup for resources by key attributes

Overview

There are scenarios when you want to quickly lookup patients or practitioners with the prefix search by multiple key elements like a family name, date of birth, and identifier. Prefix search means you want to say in query string jo do 79 and find John Doe with 1979 birthdate. Sometimes there are millions of patients in your database and you want to do it efficiently to show type-ahead dropdown choices in your UI.

$lookup operations is especially designed to be efficient implementation for this case.

Demo. US NPI by Aidbox

https://search-resource.netlify.com/ Here is demo of US NPI ~10M practitioners in Aidbox Cloud using $lookup operation. You can source code of this app on github.

There is no way to implement the efficient multidimensional prefix search with ranking and sorting in the relational database. $lookup based on specific assumptions to find the right trade-off: if the search returns more than count (by default 50) results, we consider that the search failed and result can have some anomalies, for example, not complete sorting.

Here is how it works.

First of all, you have to describe priority groups of attributes with by parameter. Groups are separated by ; and inside group you specify the list of paths separated by ,. Each path expression consists of dot separated elements and indexes and should end with primitive type (examples: name.given or identifier.value).

Result will be sorted with order of priority groups. For example, if you want to rate first matches of name, identifier and birth of data, and second matches in address and telecom you will use following expression:name.family,name.given,identifier.value,birthDate;address.state,address.city,address.line,telecom.value

Let's say you are searching joh 1979 ny Aidbox will initially search in first priority group by expression like this:

expr = extract_space_separated(resource, paths) 
where expr ilike ' % joh' AND  expr ilike '% 1979'
limit 50

If this query returns 50 records, aidbox will respond with this records.

GET /Patient/$lookup?\
  by=name.family,name.given,birthDate,identifier.value;address.city,address.line&\
  sort=name.family,name.given&\
  q=Joh+Do+1980&\
  count=50&\
  limit=200

Parameters

Each path expression should point to primitive element!

  • by: ;-separated list of priority groups. Each group is ,-separated list of path expressions.

  • sort: ,-separated list of path expressions to sort by

  • q: is + or space separated term (prefixes) to search

  • limit: is internal search limit (default 200)

  • count: number results to return (default 50)

  • mode: if mode= index Aidbox returns index DDL for specific search

Create Indexes

To create indexes you have to make request with mode=index and execute DDL returned by Aidbox:

GET /Patient/$lookup?\
  by=name.family,name.given,birthDate,identifier.value;address.city,address.line&\
  sort=name.family,name.given&\
  q=Joh+Do+1980&\
  count=50&\
  limit=200&\
  mode=index
response
DROP INDEX IF EXISTS lookup_patient_g_1;
CREATE INDEX lookup_patient_g_1 ON "patient" USING GIN 
 ( regexp_replace ( /*text*/ aidbox_text_search ( /*texts*/ knife_extract_text ( /*doc*/ resource , /*expr*/ $$[["name","family"],["name","given"],["birthDate"],["identifier","value"]]$$ ) ) , /*regexp*/ '[ -.,";:'']+' , /*repl*/ ' ' , /*flag*/ 'g' ) 
 gin_trgm_ops );

DROP INDEX IF EXISTS lookup_patient_g_2;
CREATE INDEX lookup_patient_g_2 ON "patient" USING GIN 
 ( regexp_replace ( /*text*/ aidbox_text_search ( /*texts*/ knife_extract_text ( /*doc*/ resource , /*expr*/ $$[["name","family"],["name","given"],["birthDate"],["identifier","value"],["address","city"],["address","line"]]$$ ) ) , /*regexp*/ '[ -.,";:'']+' , /*repl*/ ' ' , /*flag*/ 'g' ) 
 gin_trgm_ops );
VACUUM ANALYZE "patient";

Last updated

#2416:

Change request updated