Huge Dataset Slowing Form Loading

cmuchuchuti · May 13, 2020, 7:23pm

I am running a survey which is likely to have 100,000 records in one table. So far there are 7,000records. I have more than 50 data collectors.
In my design I have adopted the Hope Study (sample which comes with the ODK-X). One of the functions which is in the Hope study is the rendering of records on HTML page within ODK Tables. It also has the functionality to Search records easily using the odkData.query function in JavaScript.
The challenge I now have is: its now taking long to load the records as more and more records are entered.
(1) I am wondering if there is a way to have the tablet Sync only data belonging to this particular data collector. Each tablet is configured with the specific user account, or
(2) Is there a way to hold certain data from downloading (from server to tablet) when they Sync. Say data captured 2 months or more ago or which is marked as COMPLETE.
(3) I am open to any other suggestions which can make the rendering of data on the HTML file be quicker for the data collectors.

elmps2018 · May 13, 2020, 7:56pm

For idea (1) you can set permission filters, which don’t sync less data but do let users access less data:
https://docs.odk-x.org/data-permission-filters/
(2) Deleting data (that you don’t need anymore) from the server will be the best way to keep it from downloading; you can write scripts in whatever software you use to update the status to delete if COMPLETE or 2 months old, etc. Let us know whether one of those options works.

cmuchuchuti · May 14, 2020, 12:54pm

@elmps2018 thank you for your reply. Unfortunately option (2) will not work best considering that we need the full dataset at the end of the project to be available as one file.

Option (1) only deals with permissions, which I have already implemented. So data collector cannot edit data which does not belong to them.

So the actual issue on delay of rendering of the data in Tables which is taking longer still remains. But if there is no other solution I may have to continue with this challenge.

W_Brunette · May 14, 2020, 1:43pm

You could also change the query in tables so it does not return the entire data set. This will make it faster since the UI objects are not being generated for each row.

Also you can change the query so only certain fields of the row is being returned by the database which will also speed things up since less data has to move from the native side of android to the webkit.

Emil · May 19, 2020, 9:39am

Perhaps it could be an option to divide your data into a logical groups? For instance we have a large study, where we divided the data into geographical regions and setup a sync endpoint for each.

Also, I have found it beneficial to add indices to the sqlite database directly on the device. But that would probably only be feasible for smaller datasets…