In Search Of Quality
/Well, we’re getting back on track. I’ve spent some time trying to return us to our previous levels of quality, and enable us to move forward on adding longer blacklists again.
I believe I’ve been able to return us to the previous state of affairs successfully, now - but the cost isn’t cheap.
The solution depends on the fact that we do have a known sort order for all returned results - interestingness. If we assume that interestingness, as a score, is stable and not random, that should mean that our query can be split not only by positive term, but by negative terms, and taking the intersection of results for a particular positive query term; if a photo doesn’t appear in all of the split queries for a positive term, then it must contain one of the negative terms, and can be discarded from the result set.
This means, again, we’re doing a lot more queries. Specifically 6 locations * 2 years * # of positive search terms * (# of negative search terms / 9) ~= 1.5k Flickr API requests per search result as a worst case. (This used to be a lot higher - I’m having to bring down the search range in order to allow for more queries per positive term.)
Now, we do a lot of caching along the way whenever possible, because, again, this is a free service on freebie quota, but that’s a lot of outbound API calls per inbound user request.
Hopefully things look better again.