# 8. Search aggregate API
Date: 2020-05-22
Driver: Wisen Tanasa
# Status
Accepted
# Context
We are required to display an aggregate data in our search page when
num of people
filter is not specified. The data that we're currently showing
in our listing card is all coming from space, therefore we have not shown any
aggregate data before in our search page.
The aggregate that we need to show is the highest capacity
in a building given
a search criteria. There are two kinds of aggregate data that we should show:
Static aggregate: Given a building highest capacity is 100, the static aggregate will always be 100
- Shown when
maxBudget
orminBudget
is not specified
- Shown when
Dynamic aggregate: Given a building highest capacity is 100, and the seeker's budget doesn't permit that space, the dynamic aggregate will not show 100 as the highest capacity.
- Shown when
maxBudget
orminBudget
is specified
- Shown when
As showing the static aggregate is fairly straightforward as compared to the dynamic aggregate, this ADR aims to tackle the dynamic aggregate problem. Dynamic aggregate is a difficult problem as Algolia is not be able to calculate Dynamic aggregate, this is a common limitation to any search database as they are optimised to perform search.
Naturally then, we will then have to do the aggregation somewhere if the search database is not able to handle the aggregation request. Each of the approaches outlined below will make an attempt to do the aggregation in the various parts of our architecture.
# Approach 1: Aggregate in Inventory Storage
Details: Calculate all space aggregated information and store it in our inventory data storage.
Business upside | Business downside | Mitigation |
---|---|---|
Retain global capability and simplicity | Can't support dynamic aggregation | N/A |
Adopting this approach actually means that we're not supporting the business requirement. So it's ruled out.
# Approach 2: Expand Algolia capability / Enrich in Search-API
Summary: The best way to think about this is, we're adding new functionality to Algolia. Abstract Algolia behind an API endpoint. When the results is returned from Algolia, we would calculate the aggregate data for that record, then enrich and return the data.
Business upside | Business downside | Mitigation |
---|---|---|
Lower migration cost away from Algolia | Expensive when supporting globally | Global solution |
Better consistency across devices | Slower search experience as compared to Algolia | Profiling |
Mitigation notes:
- Global solution: Leverage Lambda@Edge to optimize global search performance. This solution is still uncertain as we'll need to make sure every components at play to be global.
- Profiling: This is generally an expensive mitigation as it requires fine-tuning and quite error prone. It'll be expensive trying to match Algolia's capability.
# Approach 3: Enrich in the client side
Details: The client will hit Algolia first, when needed, it'll fetch the aggregate data from the Search API.
Business upside | Business downside | Mitigation |
---|---|---|
Retain Algolia global capability on common search | Slower display on aggregated data | Precalculate static aggregate |
Mitigation:
- Precalculate static aggregate: To improve performance, only enrich for dynamic aggregation. The static aggregation can be stored in Algolia. This is undesirable from technical perspective as it will introduce further complexity to the web application. It will also have an added cognitive overload.
# Approach 4: Aggregate in the client side
Details: The client will hit Algolia, and it will return an updated record with all active spaces. The client then will calculate the aggregated function dynamically.
Business upside | Business downside | Mitigation |
---|---|---|
Fully leverage Algolia global capability | Limits active spaces, facilities, images | Payload compression |
Increased cost on consistency across device | N/A |
Mitigation notes:
Payload compression:
- Sizes:
- Algolia limit is 10KB. If we upgrade to enterprise, it's 100KB.
- Without compression, we can fit 30 images, 50 facilities, and 100 active spaces
- With compression, we may be able to fit 60 images, 100 facilities, and 200 active spaces
- Sizes:
These are the compression approaches when compressing
[{ "capacity": 1, "price": 500 }, { "capacity": 2, "price": 400 }]
:
Approach | Output |
---|---|
Bytecode | [{ "c": 1, "p": 500 }, { "c": 2, "p": 400 }] |
Emoji | [{ "👨💼": 1, "💷🪑": 500 }, { "👨💼": 2, "💷🪑": 400 }] |
Sequencing | [1, 500, 2, 400] |
I'm sure there are other techniques, this will be covered in other ADR if we go with this approach.
# Decision
We will be going with Approach 3.
# Consequences
Approach 4 is high desirable from technical perspective as we'll able to keep Algolia as the only solution we need to show listings in our search page. This benefit however comes at a cost of putting constraints to the business requirements. The data that we have considered in a listing are active spaces, listings, and facilities -- and we haven't even considered building tags for example. The moment we hit this limit, the effort we have spent on optimising this will be wasted.
We'll not be storing less non-searchable data in Algolia. We would like to preserve what we keep in Algolia to what we can search a space on, and Approach 4 would require us to store extra data that the seekers will not search on (it'll only be used for display).
The performance impact of Approach 3 will go higher when our webapp is being used globally. Our API are currently hosted in Ireland, and roundtrip to get the aggregate data will be impacted by the network latency. This solution can be tackled separately, and we think that the cost of supporting a global solution is unavoidable as we'll need to make other APIs global.
We will be able to iterate the architecture in a more evolutionary way, where we can always show the static aggregate data first, and add the dynamic aggregate data later as a further improvement.
By executing the enrichment on the client side, we'll be able to keep the core benefit of using Algolia, which are performance and global capability. We wouldn't need to abstract Algolia away as that's an unnecessary upfront cost for now.