Query Filters with GraphQL in Gatsby
This documentation is up to date as of
gatsby@2.23.0
.
Summary
Gatsby stores all data loaded during the source-nodes phase in Redux and it allows you to write GraphQL queries to query that data. This data, stored as individual “nodes”, can be searched through using a query language that is inspired by MongoDB queries.
Filtering is used in GraphQL root fields of node types (e.g. for file type it would be file
and allFile
). The GraphQL filter
argument is passed to the filtering system and will return all the nodes that match each of the given filters. The rest of the processing, such as pagination, is handled on the GraphQL resolver level.
History and Sift
For a long time Gatsby used the Sift library through which you can use MongoDB queries in JavaScript.
Unfortunately Sift did not align with how Gatsby used it and so a custom system was written to slowly replace it. This system was called “fast filters” and as of gatsby@2.23.0 (June 2020) the Sift library is no longer used.
Query Language
The syntax and API used by the filters is based on the MongoDB query syntax but keep in mind only a subset of comparators is supported in Gatsby.
In general a filter has a “filter path” component ending with a “comparator”, and a “filter value”. It’s a way of programmatically asking “show me all nodes where node[i].a.b.c
leads to a value less than 5
“.
- path: an exact property path when traversing an object (
{ a: { b: { c: { op: x } } } }
has patha
,a.b
, anda.b.c
) - value: the needle to compare against the value at the end of a path
- comparator: a way to tell the system how to compare the given filter value to the resulting value at the end of the path. Should it be equal, lower than, or match in a special way?
Filters are generally applied to a subset of all nodes, matching a specific node.internal.type
value, like `PageData`
or `markdownRemark`
.
When applying a filter, the path will be checked for each node and the resulting value is compared against the requested filter value with the rules of the comparator used.
Example
Say we have a filter and a set of nodes:
The filter path is post.author.name
, the comparator is eq
, and the filter value is 'Alex'
.
This ought to return two nodes, the ones with id 1 and 4, because those are the only ones where the result value node[i].post.author.name
equals 'Alex'
.
A filter path can combine multiple paths, for example:
{ id: { gt: 2 }, post: { author: { name: { eq: 'Alex' } } } } }
would return only the node with id 4{ post: { title: { regex: '/Gatsby/i' }, author: { eq: 'Ika' } } }
would return the node with id 3.
Supported comparators
Gatsby supports a subset of MongoDB comparators. Here is the list of supported comparators and how they match node(s);
eq
: is exactly the filter value.ne
: is anything except for the filter value.in
: matches any element in given arraynin
: matches none of the elements in given arraylt
: is lower than the filter valuelte
: is lower than or equal to the filter valuegt
: is greater than the filter valuegte
: is greater than or equal to the filter valueregex
: matches given JavaScript regexglob
: matches given micromatch pattern
Internally the glob
comparator is converted to a regular expression by the micromatch library and uses the same code path as the regex
comparator.
elemMatch
One additional feature that Gatsby supports is the elemMatch
query. This query has a path where one or more steps are named elemMatch
and while traversing a node these steps should resolve to arrays. Gatsby will then attempt to apply the remainder of the path to each element in that array.
A contrived elemMatch
example:
This will return the nodes with id 1 and 3 because node[0].a[1].b
and node[2].a[0].b
equal 5
.
Note that elemMatch
will contribute each node at most once, even if multiple elements in the array would match.
You can have nested elemMatch
occurrences and they work the same every step. Every comparator can be used in an elemMatch
query.
Comparator details
Certain comparators restrict the kind of filter value they will accept. In general, and due to the GraphQL layer, only “scalar” or “primitive” values can be used. That is to say; numbers, strings, and booleans. It is possible to match against null
but every operator has specific edge cases in how these are treated.
In the next guide, a “node with partial path” is any node that has none, or only a part, of the filter path. Internally that results in undefined
, as any non-existing property would be. For brevity there’s no distinction between a partial and non-existing path.
Specific rules:
eq
- This is like
filterValue === resultValue
in JavaScript - Can only be used with numbers, strings, booleans, and null
- Strict comparison except for
null
- When matching
null
, and only then, it also returns all nodes with partial paths
- This is like
ne
- This is like
filterValue !== resultValue
in JavaScript - Can only be used with numbers, strings, booleans, and null
- Strict comparison except for
null
- Will always return any nodes with partial paths
- This is like
in
- This is like
filterValue.includes(resultValue)
in JavaScript - Can only be used with an array of numbers, strings, booleans, and null
- Strict comparison
- When matching
null
, and only then, it also returns all nodes with partial paths
- This is like
nin
- This is like
!filterValue.includes(resultValue)
in JavaScript - Can only be used with an array of numbers, strings, booleans, and null
- Strict comparison
- Will always return any nodes with partial paths
- This is like
lt
,lte
,gt
,gte
- This is like
<
,<=
,>
,>=
respectively - Can only be used with numbers, strings, booleans, and null
- If filter value is
null
;lt
andgt
will never match anythinglte
andgte
will only match nodes with a result value of exactlynull
- Will never return nodes with partial paths
- Weak comparison due to the nature of the JavaScript operators
- This is like
regex
,glob
- This is like
new RegExp(filterValue).test(resultValue)
(with caveats for thefilterValue
syntax) - Glob pattern is converted to a JavaScript RegExp with micromatch
- The
regex
filterValue
must be a stringified regular expression, including leading and trailing forward slash and optional flags; Like"/foo/g"
- Never returns nodes with partial paths
- While testing, result values are explicitly cast to a string through
String(resultValue)
before passing it toregex.test()
- This is like
Nulls and partial paths
Due to legacy support for MongoDB compatibility, there are edge cases for each comparator when it comes to null
values and for partial or non-existing paths. It’s best to try and avoid these cases altogether.
Performance
The key metric that impacts the performance of your queries is the node count for your type of nodes. You can see a dump of these counts when using the --verbose
flag while building (gatsby build --verbose
). It tells you the node counts per type during the bootstrap sequence. There will be a separate message of the number of page nodes, since they are generated later.
While the number of pages is definitely a factor for the number of nodes, there can be other factors at play that cause many internal nodes even when the actual number of pages seems to be low.
The actual performance of filtering is a combination of the number of nodes you have to search through, the kind of comparator being used, and how many unique filters a query consists of.
For a low node count none of this matters. Roughly speaking, a site with fewer than 1000 - 10000 nodes should not have to worry too much about this.
When scaling up these are some guidelines to keep in mind when filtering:
- the
eq
comparator is by far the fastest comparator- the
regex
andglob
comparators are the slowest and do not scale - the
gt
andgte
comparators are slower than theirlt
andlte
counterparts - all range comparators (
lt
,lte
,gt
, andgte
) must copy a subset of the array they match (slow at scale) - the
in
andnin
comparators create aSet
of all nodes before applying their exclusions
- the
- a single unique path will always outperform multiple paths
- the
elemMatch
feature does have a fixed one-time cost but should not impact overall performance at scale - while not an absolute guarantee, the output array of a filter is normally ordered by insertion order
While you can’t always avoid going for slower comparators or filters with multiple paths (like for ranges), you should keep in mind that filters with equal logic do not have equal performance. So { eq: 5 }
will perform much better than { in: [5] }
. No internal effort is done to detect these cases.
Custom resolvers
Before applying filters, all fields of any node that a filter wants to match should be completely resolved. As such, before applying a filter all nodes will go through a final resolve step which populates the node.__gatsby_resolved
field.
This means your custom resolver may be invoked multiple times even if fewer (or zero) nodes are actually returned by a filter step. But at most once per build, unless the node’s state somehow changes afterwards.