In Search Of Quality
Sunday, May 22, 2011 at 5:13PM Well, we’re getting back on track. I’ve spent some time trying to return us to our previous levels of quality, and enable us to move forward on adding longer blacklists again.
I believe I’ve been able to return us to the previous state of affairs successfully, now - but the cost isn’t cheap.
The solution depends on the fact that we do have a known sort order for all returned results - interestingness. If we assume that interestingness, as a score, is stable and not random, that should mean that our query can be split not only by positive term, but by negative terms, and taking the intersection of results for a particular positive query term; if a photo doesn’t appear in all of the split queries for a positive term, then it must contain one of the negative terms, and can be discarded from the result set.
This means, again, we’re doing a lot more queries. Specifically 6 locations * 2 years * # of positive search terms * (# of negative search terms / 9) ~= 1.5k Flickr API requests per search result as a worst case. (This used to be a lot higher - I’m having to bring down the search range in order to allow for more queries per positive term.)
Now, we do a lot of caching along the way whenever possible, because, again, this is a free service on freebie quota, but that’s a lot of outbound API calls per inbound user request.
Hopefully things look better again.
Gregory Block
I’m watching the request logs go by and looking at the results of some searches by hand to verify that quality is back to normal, and starting to try and pick up additional queries based on problems seen in Buffalo NY (USA), London (UK), and Brussels (Belgium).
If you see something awful, I’m listening.
flickr,
quality,
search terms,
weathrman
Post a Comment |