Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating the Algolia search index: Record is too big #388

Open
MakisH opened this issue Apr 24, 2024 · 6 comments
Open

Updating the Algolia search index: Record is too big #388

MakisH opened this issue Apr 24, 2024 · 6 comments
Labels
technical Technical issues on the website

Comments

@MakisH
Copy link
Member

MakisH commented Apr 24, 2024

Trying to update our Algolia search index, I got the following error:

Rendering to HTML (100%) |====================================================================================================================================================================================|
[✗ Error] Record is too big                                                      =====================================================================                                                        |
                                                                                 
The jekyll-algolia plugin detected that one of your records exceeds the 10.00 Kb 
record size limit.                                                               
                                                                                 
title:    Overview of adapters                                                   
url:      /adapters-overview.html                                                
size:     9.98 Kb                                                                
                                                                                 
Most probable keys causing the issue:                                            
   html (6.84 Kb), content (2.29 Kb), keywords (0.06 Kb)                         
                                                                                 
Complete log of the record has been extracted to:                                
   /home/gc/repos/precice/website/jekyll-algolia-record-too-big.log              
                                                                                 
This issue can be caused by malformed HTML preventing the parser to correctly    
grab the content of the nodes. Double check that the page actually renders       
correctly with a regular `jekyll build`.                                         
                                                                                 
You can also exclude the page generating this error from the indexing by editing 
the `files_to_exclude` key of your config.                                       
                                                                                 
If you think this is an error and your current Algolia plan should allow you to  
push records bigger than 10.00 Kb, you can change the `max_record_size` config   
option to increase the limit. Paid plans have a limit set to 20Kb, while free    
Community plans have it set to 10Kb.                                             
                                                                                 
The following documentation might help you:                                      
   - https://community.algolia.com/jekyll-algolia/options.html#files-to-exclude  
   - https://community.algolia.com/jekyll-algolia/options.html#nodes-to-index    
   - https://community.algolia.com/jekyll-algolia/options.html#max-record-size   

Log: jekyll-algolia-record-too-big.log (9.98 Kb, on the 10 Kb limit)

Trying to reduce the size of the (borderline) adapters-overview.html, I get the same issue with another file: jekyll-algolia-record-too-big.log (17.59 Kb, over the 10 Kb limit, corresponding to this file), which probably is indeed a bit too complex.

I assume the same would apply for several more files.

Both pages seem to have no validation errors:

Since I did not face similar issues when I recently last time updated the index (around March), I am wondering:

  • Did we introduce any change that overloaded our records?
  • Did anything change to the plan @chlorenz is subscribed to? (yes, there is a FOSS plan, which I guess we did not manage to get into so far)
  • Did anything change to the Algolia policy / plans / way of computing the records?

Algolia documentation

@chlorenz
Copy link
Collaborator

Hi @MakisH, I've started to investigate this issue and so far I can say this:

Let's take the second document, dev-publications.md, as an example. A search record is an HTML block of the rendered document dev-docs-publication-strategy.html. In this particular case the record is the <code></code> block after point '4. Download components' (which HTML elements correspond to records can be configured here).

Because we have syntax highlighting enabled, the rendered code consists of lots of <span></span> elements which blows up the size of the record (as shown by the log files). This is why we hit the 10kB limit.

This change was introduced in d926e2c last week.

The easiest fix is to break up the code block? I don't see how code blocks could be broken up into smaller pieces.

(I've also found this page in the Algolia docs suggesting that some (newer) plans allow bigger records >10kB.)

@chlorenz chlorenz reopened this Apr 28, 2024
@MakisH
Copy link
Member Author

MakisH commented Apr 28, 2024

Thanks for investigating!

If we ignore this file, do we get several more instances? If this is the only instance, then the easiest would be to move this code to a file in some repository, or just disable the syntax highlighting.

What would be more interesting to me is if anything bigger changed recently, which now affects several files.

@chlorenz
Copy link
Collaborator

In fact it turns out this is exactly the answer to #300, and has happened before, see 74e377c 😆

In this specific case the code can be split into two blocks like in 74e377c. I agree that, say, a very involved shell script could be hosted as an external file.

@uekerman
Copy link
Member

The open-source plan could be a solution, see also #237

@MakisH
Copy link
Member Author

MakisH commented May 13, 2024

Today, I requested getting access to the "Algolia for Open Source" plan via the on-platform "contact sales" feature. Waiting for an answer.

Edit: I got an answer three days later, forwarding to a form, which also needed some usage details. On August 29, I submitted the form, requesting 20k records (we currently have 8.36k).

@fsimonis
Copy link
Member

fsimonis commented Sep 4, 2024

Update: we got the open-source plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
technical Technical issues on the website
Projects
None yet
Development

No branches or pull requests

4 participants