We are preparing for another summer of data collection on trees in forest research plots. Each year we visit a different suite of plots, so to prepare for this year we want delete all of last year’s data from the tablets and server before populating them with new data for this year. Our application has three data tables. Last year, the largest of these tables had data on nearly 9,000 trees. Those data are downloaded and backed up, so we want to delete them from the tablets and our server without altering the structure of the tables. What is the most efficient way of doing this?
It seems like there are two options:
(1) From the tablet, by pushing a new csv file to a tablet, then resetting the server in the synchronization screen, and then synchronizing to push the new data from the tablet to the server. This is what we did last year. It worked except that dummy data records from testing our app kept showing up in our synchronized data. We could never figure out how to delete the test data.
(2) Via ODK-X Suitcase. This approach was described in a post to the forum in 2020, found here: Bulk Deleting of Data - #6 by elmps2018. However, there are some things about that approach that are unclear to me, so if Suitcase is the best approach to deleting old data I would have some follow up questions.
Either approach can work (1) or (2). If you are truly completely resetting the server and data, you can do that from either approach. To get without test data, you would just re-push to the tablet and not create the data, or could use suitcase to remove it. Have you looked at ODK-X Suitcase — ODK-X Docs documents on the directions for suitcase? Did you have some specific follow-up questions?
Thank you, Caroline. I sure appreciate how promptly you and other ODK experts always respond to my questions in the forum. Below is my follow up on both approaches.
Regarding the first option in my original post, you wrote “re-push to the tablet and not create the data.” I’m not clear on what you meant by “not create the data.” Can you elaborate on that?
Regarding the option of using ODK-X Suitcase to delete data, here is my understanding of how this approach would work and some follow-up questions on these steps:
Step 1. Modify each of the downloaded data files (csv) from Suitcase by inserting a new column named “operation” and populating all rows with the word “DELETE”.
Step 2. Upload each of the modified csv files to the server (one at a time) to delete all data records (rows) in each table on the server.
Step 3. Synchronize the tablets with the server to wipe clean last year’s data from the tablets.
Questions on this approach with Suitcase:
First, are those steps the correct way of doing things?
Assuming so, must the modified csv files (from Step 1) have all the other columns of data that the downloaded csv files had?
Synchronizing (Step 3): Prior to synchronizing, the tablets will still have last year’s data. So when we synchronize with the server (which at this point should not have any data rows remaining), wouldn’t the bi-directional nature of synchronization send last year’s data from the tablets back to the server again? That is not what we want. We want both the server and the tablets to be cleaned of last year’s data before pushing/uploading this year’s csv file and starting “fresh.”
I hope that makes sense, but let me know if I can clarify anything. And thank you again!
For the first option, it sounded like you had created some test data testing on the tablet and then couldn’t get rid of it. If you re-push the program without the test data (wipe the tablet, re-push) then when you reset the server there won’t be the test data.
Yes, you have the steps right on option 2. I don’t think you need ALL other columns of data, but that’s probably the easiest way to do it. You definitely need some of the columns, like the _id
For the synchronizing step, I think the delete would then carry over to the tablets and the old data would get deleted (this is how it worked when we did this!). But if you want to start totally fresh you can also force stop apps, delete the opendatakit folder on the tablet, and then you’ll have blank tablets to sync up.
Here’s a follow-up on trying to remove last year’s data from the server and the tablets.
First, we used Suitcase with the ‘Update’ operation (set to ‘DELETE’) in each of our data tables. This worked perfectly to delete all data rows from the server.
In the next step we tried to synchronize the tablets (which still have last year’s data) with the server (with no data), with the idea that this would delete the old data on the tablets. Note that the name of our data file is "prev_data.csv" which has ~25,000 records. During syncing, the tablet displays this message: “Fetching prev_data data rows, matching row updates from server.” We let this process run for more than 3 hours with no changes to the syncing message (“fetching”). This seems like a very long time for synchronizing, so we aborted the process.
Questions:
Since there are no data rows in the server, there are no matching rows between the server and tablets. How does ODK-X deal with this?
Any ideas on how long the fetching/syncing process should take in a situation like this and with many thousands of records? Should we let this process run all day to see if it works?
If syncing won’t work to clear the old data from the tablets, it seems like we have two options left: (a) Upload this year’s (new) data file to the server and sync with the tablets (though again, there will be no matching rows between server (new data) and tablet (old data); and (b) completely remove the ODK-X app and data from the tablets and re-push the app to the tablets, with the new data (we are calling this the nuclear option). What would you recommend?
I am not sure how long it would take to sync, it might depend on data size and connection speed. If you can let a tablet try to sync again for longer that would be good to try. But if that doens’t work, option 3, the nuclear option, should be fine. Although always good to test on one tablet first!
@RJP, if you need to delete the opendatakit folder on the tablet, you should be able to login and sync with the server in services - you won’t need to push the files from a PC to the tablet.
One thing to keep in mind is “resetting” the server resets the database from scratch. Deleting the data means the old data’s history is still maintained in the database. If the phone had the data already it will maintain some history unless the opendatakit folder is deleted. Again a server reset will delete everything and start from scratch.
Here’s a summary on what worked and what didn’t for getting a new csv file onto our server and tablets.
Server: we were not able to upload a new csv file to the server using the Suitcase GUI. To us the instructions in the online documentation are not clear enough. The instructions read:
“To Upload files to ODK-X Cloud Endpoint, you need to lay out the files and folders in the correct file structure which is described in details in the Application Configuration File Structure. Your upload directory should look similar to the config directory and contain subdirectories assets and/or tables.”
So, we copied the entire config folder with its subdirectories from our app folder to the Upload folder. Then we modified our csv file to insert columns for _id and table_id and populated them with dummy values. The upload function did not work. Any thoughts on what we might have done wrong?
Tablets: We were successful at removing the app and the old data on the tablets by following Caroline’s instructions above: force-stopping the ODK-X apps (Survey, Tables, Services) and then deleting the “opendatakit” folder to remove our app (with the old data). Note that ODK-X Services is stored in a different folder and so remains on the tablet, which allowed us to synchronize with the server to get the our app back onto the tablets.
To get the new csv file onto the tablets we had to push data from the PC (we have a custom utility to do this which incorporates the command-line instructions from the ODK-X documentation). Once the csv file was on the tablets we could sync with the server to get the new data onto the server.
In the future we would like to use Suitcase for uploading our new csv files, so any suggestions on how to do that correctly would be welcomed (a simple step-by-step description would be especially appreciated!).
Waylon - I apologize for the delay in responding to your reply, which was quite helpful in understanding the distinction between a server reset and simply deleting data on the server. Also interesting that the history of the old data is maintained after just deleting the data.