Feb 15, 2011

Selecting distinct entities across a large table

I have faced this kind of problem some time ago.
I tried some of the solutions suggested below (in memory sort and filtering, encoding things into keys etc. and I have benchmarked those for both latency and cpu cycles using some test data around 100K entities)
An other approach I have taken is encoding the date as an integer (day since start of epoch or day since start of year, same for hour of day or month depending on how much detail you need in your output) and saving this into a property. This way you turn your date query filter into an equality only filter which does not even needs to specify an index) then you can sort or filter on other properties.
Benchmarking the latest solution I have found that when the filtered result set is a small fraction of the unfiltered original set, is 1+ order of magnitude faster and cpu-eficient. Worst case when no reduction of the result set due to filtering the latency and cpu usage was comparable to the previous solutions)

Hope this helps, or did I missed something ?
Happy coding-:)