Folks, a user very helpfully reported to me that when hashing a large volume of files using the FileS tab and then choosing to save the results as HTML, Quickhash was crashing.
I was confused about this at first because when I released v3.3.0 I made a lot of updates to memory handling and introduced more filestreams for large volumes of data. However, what I did not realise was that I had adjusted the database interaction to use .RecordCount, and the mistaken belief that gave me a count of all the records in a given table. However, it generally only gives you the number of those currently displayed!
So when QuickHash asked “is there more than 20K rows?”, in which case it uses RAM and StringLists if its less than 20K rows, whereas it uses FileStreams and stream writes if its more than 20K rows, it was getting the wrong answer when I called the function CountGridRows.
So, in all cases, it would try and write the HTML file using RAM, instead of filestreams.
So once I realised this, I updated the CountGridRows function. But then I realised that after that, I was using the DBGrid calls of .First and .Last, which are also great for a few tens of thousands of rows, but not good at all for hundreds of thousands. So I had to change that too.
In both cases I have now resorted to a dedicated TSQLQuery instead of relying on the DBGrid element. In tests I am now able to save 407K rows of data as a large 56Mb HTML file in less than 10 seconds. So, it seems to be working OK.
Then I noticed that the same performance issue is hitting the saves to CSV\TSV functions (SaveFILESDBToCSV and SaveCopyDBToCSV and SaveC2FDBToCSV). For the exact same reasons. I am gutted not to have realised how bad DBGrid was for performance with large data sets. I thought that was what it was designed for. But obviously not. So on the 23rd Dec 2021 I made changes to the functions as mentioned in the Github issue listed here. Basically, I have converted all of those to use TSQLQuery as well, and just in the same way, massive exports of 400K rows to CSV is achieved in just a second or two now, with no performance hit at all.
The branch code for v3.3.1 has continually being committed to Github and I am hoping to get v3.3.1 compiled and out before, or on, New Years Day or thereabouts, as I will hopefully get some free hours over Christmas to finish these adjustments and others.
Leave a Reply