Recently I had an interesting task: to display 1GB file and 2 million rows of data on the frontend and implement filtering through several tables. In this article, I’d like to share my experience of how to implement this task.
Once upon a time I created simple (as it seemed to me then) React application. This app just loads some data from the server and render it in several tables. After a successful demonstration of the React application, a customer gave me access to the production data. And here, as it usually happens, the most interesting began. When I reconnected application from development API to production and the reloaded page, I saw something like “Aw, Snap! Something went wrong while displaying this webpage”. After debugging, I noticed that the size of JSON file from the prod server was ~500MB (instead of expected 2–5 MB as from dev server).
The new requirements were 🙄:
- JSON files can be up to 1 GB.
- No opportunity to add pagination on the backend — just accept as fact :(
First, I tried to use react virtualized — React component which allows to efficiently render large lists via virtual rendering. (more info about react virtualized and virtual rendering). But a few days later a new requirement arrived:
3. “Standard browser search (Ctrl/Cmd + F) doesn’t work correctly. Fix it!”
The main concept behind the virtual list is rendering only what is visible. So if a user types something in the search box, the browser performs a search only in the visible part of the virtual list.
There is a demo, how browser search works with virtual lists. Pay attention to that fact, that records which have appeared after scrolling (virtual list re-rendering) were not highlighted, although included search value ‘@’.
Demonstration of work standard browser searches with the virtual list.
I decided to create a custom search box with similar functionality of default browser search, but that would be able to search through all the 2 million records.
Simple filtering of a large amount of data leads to “heap out of memory”. As of April 2018, I didn’t find any virtual list implementation for React with built-in search/filtering.
🤔🤔🤔 After a couple of hours of “googling” and “stackoverflowing” I got the idea to use Web Workers and used Simple Web Worker library. The main idea of this method is to split a big array into smaller parts and process every part asynchronously (semblance of multithreaded) using Web Workers. Below you can see the pseudocode that describes this approach:
⚠️ Warning ⚠️: it’s necessary to find an optimal length of the chunk. The shorter the length of chunk — the slower search will occur, but the longer the length — the greater the chance to see “heap out of memory” on low-performance devices. In my case, the optimal length of 3000 was found experimentally.
At the demo below, you can see the work of a custom search box. Highlights don’t disappear after scrolling and users can navigate between rows and tables by arrow buttons. Yes, it takes some time to filter but allows to process much larger amounts of information than before. For simplicity the array has only 9k items, feel free to add as much as you need, but be prepared to increase filter time (filtering 2 million rows may take up to 5–7 minutes)
This approach helped me to solve several problems:
- Filtering large amount of data without browser crashing. In my case the maximum size of files was ~1GB and up to ~2 million rows.
- Create a search box with a similar UX to native browser search that is able to search through all the 2 million rows.
Short description of my implementation
The React app consists of 3 components:
- App.js — the main component of the application;
- SearchBox.js — component which implements search box functionality with arrow buttons for navigation through found values.
- TablesViews.js — component which renders several tables as it was required by the customer and it demonstrates how to implement SearchBox navigation between these tables.