# 8. Search aggregate API

Date: 2020-05-22
Driver: Wisen Tanasa

# Status

Accepted

# Context

We are required to display an aggregate data in our search page when num of people filter is not specified. The data that we're currently showing in our listing card is all coming from space, therefore we have not shown any aggregate data before in our search page.

The aggregate that we need to show is the highest capacity in a building given a search criteria. There are two kinds of aggregate data that we should show:

Static aggregate: Given a building highest capacity is 100, the static aggregate will always be 100
- Shown when maxBudget or minBudget is not specified
Dynamic aggregate: Given a building highest capacity is 100, and the seeker's budget doesn't permit that space, the dynamic aggregate will not show 100 as the highest capacity.
- Shown when maxBudget or minBudget is specified

As showing the static aggregate is fairly straightforward as compared to the dynamic aggregate, this ADR aims to tackle the dynamic aggregate problem. Dynamic aggregate is a difficult problem as Algolia is not be able to calculate Dynamic aggregate, this is a common limitation to any search database as they are optimised to perform search.

Naturally then, we will then have to do the aggregation somewhere if the search database is not able to handle the aggregation request. Each of the approaches outlined below will make an attempt to do the aggregation in the various parts of our architecture.

# Approach 1: Aggregate in Inventory Storage

Details: Calculate all space aggregated information and store it in our inventory data storage.

Business upside	Business downside	Mitigation
Retain global capability and simplicity	Can't support dynamic aggregation	N/A

Adopting this approach actually means that we're not supporting the business requirement. So it's ruled out.

# Approach 2: Expand Algolia capability / Enrich in Search-API

Summary: The best way to think about this is, we're adding new functionality to Algolia. Abstract Algolia behind an API endpoint. When the results is returned from Algolia, we would calculate the aggregate data for that record, then enrich and return the data.

Business upside	Business downside	Mitigation
Lower migration cost away from Algolia	Expensive when supporting globally	Global solution
Better consistency across devices	Slower search experience as compared to Algolia	Profiling

Mitigation notes:

Global solution: Leverage Lambda@Edge to optimize global search performance. This solution is still uncertain as we'll need to make sure every components at play to be global.
Profiling: This is generally an expensive mitigation as it requires fine-tuning and quite error prone. It'll be expensive trying to match Algolia's capability.

# Approach 3: Enrich in the client side

Details: The client will hit Algolia first, when needed, it'll fetch the aggregate data from the Search API.

Business upside	Business downside	Mitigation
Retain Algolia global capability on common search	Slower display on aggregated data	Precalculate static aggregate

Mitigation:

Precalculate static aggregate: To improve performance, only enrich for dynamic aggregation. The static aggregation can be stored in Algolia. This is undesirable from technical perspective as it will introduce further complexity to the web application. It will also have an added cognitive overload.

# Approach 4: Aggregate in the client side

Details: The client will hit Algolia, and it will return an updated record with all active spaces. The client then will calculate the aggregated function dynamically.

Business upside	Business downside	Mitigation
Fully leverage Algolia global capability	Limits active spaces, facilities, images	Payload compression
	Increased cost on consistency across device	N/A

Mitigation notes:

Payload compression:
- Sizes:
  - Algolia limit is 10KB. If we upgrade to enterprise, it's 100KB.
  - Without compression, we can fit 30 images, 50 facilities, and 100 active spaces
  - With compression, we may be able to fit 60 images, 100 facilities, and 200 active spaces

These are the compression approaches when compressing [{ "capacity": 1, "price": 500 }, { "capacity": 2, "price": 400 }]:

Approach	Output
Bytecode	`[{ "c": 1, "p": 500 }, { "c": 2, "p": 400 }]`
Emoji	`[{ "👨‍💼": 1, "💷🪑": 500 }, { "👨‍💼": 2, "💷🪑": 400 }]`
Sequencing	`[1, 500, 2, 400]`

I'm sure there are other techniques, this will be covered in other ADR if we go with this approach.

# Decision

We will be going with Approach 3.

# Consequences

Approach 4 is high desirable from technical perspective as we'll able to keep Algolia as the only solution we need to show listings in our search page. This benefit however comes at a cost of putting constraints to the business requirements. The data that we have considered in a listing are active spaces, listings, and facilities -- and we haven't even considered building tags for example. The moment we hit this limit, the effort we have spent on optimising this will be wasted.

We'll not be storing less non-searchable data in Algolia. We would like to preserve what we keep in Algolia to what we can search a space on, and Approach 4 would require us to store extra data that the seekers will not search on (it'll only be used for display).

The performance impact of Approach 3 will go higher when our webapp is being used globally. Our API are currently hosted in Ireland, and roundtrip to get the aggregate data will be impacted by the network latency. This solution can be tackled separately, and we think that the cost of supporting a global solution is unavoidable as we'll need to make other APIs global.

We will be able to iterate the architecture in a more evolutionary way, where we can always show the static aggregate data first, and add the dynamic aggregate data later as a further improvement.

By executing the enrichment on the client side, we'll be able to keep the core benefit of using Algolia, which are performance and global capability. We wouldn't need to abstract Algolia away as that's an unnecessary upfront cost for now.

← 7. Serverless conditional future task 9. Auth0 and Mailchimp direct integration →