In Search Of Quality

Well, we’re getting back on track.  I’ve spent some time trying to return us to our previous levels of quality, and enable us to move forward on adding longer blacklists again.

I believe I’ve been able to return us to the previous state of affairs successfully, now - but the cost isn’t cheap.

The solution depends on the fact that we do have a known sort order for all returned results - interestingness.  If we assume that interestingness, as a score, is stable and not random, that should mean that our query can be split not only by positive term, but by negative terms, and taking the intersection of results for a particular positive query term; if a photo doesn’t appear in all of the split queries for a positive term, then it must contain one of the negative terms, and can be discarded from the result set.

This means, again, we’re doing a lot more queries. Specifically 6 locations * 2 years * # of positive search terms * (# of negative search terms / 9) ~= 1.5k Flickr API requests per search result as a worst case. (This used to be a lot higher - I’m having to bring down the search range in order to allow for more queries per positive term.)

Now, we do a lot of caching along the way whenever possible, because, again, this is a free service on freebie quota, but that’s a lot of outbound API calls per inbound user request.

Hopefully things look better again.

Weathrman Search Quality: A Progress Report

I’m in the process of doing some server-side results joining, to reduce the complexity of our queries.  The upside of this is that I should be able to, over time, bring us back up to our previous quality of searches.  In fact, we’ll actually be slightly better off - performing a series of decomposed searches will ultimately result in more search results to process, as we’re doing more queries and getting more results.

This mean the load on the flickr API - and the load on my server - are going up.

I’m still in the awful position of not being able to charge for this, having made it free once - so I still want this to fall within freebie quota if at all humanly possible.

Last, I’ve got a shiny new Xoom, and I’ll be spending some time on the client shortly getting some basic changes into the UI to make it slightly prettier when used there.  I’ve already made some server-side tweaks to broaden the minimum image size so that Xoom resolutions get more Flickr results.  The biggest problem at the moment is that there just aren’t many large image sizes fetchable via the API, and the images that do come back are far below the 1920x1408 background image size that a xoom wants.

More soon.

Weathrman 1.5 hits the Android Market... now

This update is fairly experimental.  One small change - ensuring that we don’t try and fetch data when we haven’t got a connection established - should hopefully improve performance under edge-of-network behaviour.

The other is tougher.  I’m making some changes to the queries we perform against Flickr’s API in the hopes of improving the quality of results, especially under clear conditions.  Some of these are as follows:

  • For queries performed at night, AND() in a set of words that are likely to be associated with night photography.
  • Add a list of “banned” terms, including “naked” and “nude”.  I don’t mind looking at them on the web, but I don’t want someone’s naked butt hanging out of my phone.  Your mileage may vary.
  • Separate daytime descriptions of “clear” from nighttime ones; include in those things you only see during the day/night cycle.  (Nights, for example, might include stars, constellations, or the moon.)

These and other tweaks will hopefully improve the quality of search results returned, and they should feel more relevant; the negative side of this is that it might be much harder to find a well tagged local photo.

If you have good ideas, as always, I’m listening.