In Search Of Quality

Well, we’re getting back on track.  I’ve spent some time trying to return us to our previous levels of quality, and enable us to move forward on adding longer blacklists again.

I believe I’ve been able to return us to the previous state of affairs successfully, now - but the cost isn’t cheap.

The solution depends on the fact that we do have a known sort order for all returned results - interestingness.  If we assume that interestingness, as a score, is stable and not random, that should mean that our query can be split not only by positive term, but by negative terms, and taking the intersection of results for a particular positive query term; if a photo doesn’t appear in all of the split queries for a positive term, then it must contain one of the negative terms, and can be discarded from the result set.

This means, again, we’re doing a lot more queries. Specifically 6 locations * 2 years * # of positive search terms * (# of negative search terms / 9) ~= 1.5k Flickr API requests per search result as a worst case. (This used to be a lot higher - I’m having to bring down the search range in order to allow for more queries per positive term.)

Now, we do a lot of caching along the way whenever possible, because, again, this is a free service on freebie quota, but that’s a lot of outbound API calls per inbound user request.

Hopefully things look better again.

Weathrman Search Quality: A Progress Report

I’m in the process of doing some server-side results joining, to reduce the complexity of our queries.  The upside of this is that I should be able to, over time, bring us back up to our previous quality of searches.  In fact, we’ll actually be slightly better off - performing a series of decomposed searches will ultimately result in more search results to process, as we’re doing more queries and getting more results.

This mean the load on the flickr API - and the load on my server - are going up.

I’m still in the awful position of not being able to charge for this, having made it free once - so I still want this to fall within freebie quota if at all humanly possible.

Last, I’ve got a shiny new Xoom, and I’ll be spending some time on the client shortly getting some basic changes into the UI to make it slightly prettier when used there.  I’ve already made some server-side tweaks to broaden the minimum image size so that Xoom resolutions get more Flickr results.  The biggest problem at the moment is that there just aren’t many large image sizes fetchable via the API, and the images that do come back are far below the 1920x1408 background image size that a xoom wants.

More soon.

Sick days, Weathrman, and you

So I’m home sick today, and going through the laundry list of things I haven’t gotten around to recently.

Weathrman is on that list.

I’ve pushed a new version of the server that performs searches on behalf of app users up.  This is designed to do two things:  reduce the cost of running the service, and improve the search quality.

In order to reduce the cost, I’m going to do the obvious thing:  I’m going to produce fewer searches.  Specifically, that means:

  • Instead of asking for a 3-hour window, we’re asking for a 4-hour window around ‘now’.
  • Increase the search window from 3 months to 4 months (2 in either direction)
  • Fetch 4 pages of results instead of 5.
  • Reduce the minimum number of results needed at any search level from 5 to 3.
  • Remove street-level searching (level 16) - it too rarely has results, and just creates latency.

 This means our worst case goes from being 7 tiers of 5 parallel searches to being 6 tiers of 4 parallel searches.  It also means the average case, which cost 1 tier of 5 searches before we were likely to find any results, will now get faster, and I won’t be paying for that wasted time.

The downside is that I’m only fetching four pages of results; searches cover the entire four-month window, and it’s possible that none of the first four pages will have photos taken within the required time-of-day range; if that happens, we skip up a level.  Dropping one of the pages of search results makes that more likely, and it’s probably not completely mitigated by the widening of the 3-hour window to 4.  The net effect is that searches may feel… a little less local.

Hopefully, these changes mean I’ll be able to continue supporting this service for longer, and more cheaply.  Going back to the days when the client was responsible for performing all these searches just isn’t going to happen - it’s just too convenient to be able to run the searches from the server, and much more reliable.  It does mean that I’m bearing the cost of a free app - but as long as I can keep the costs down, I don’t mind.

Weathrman 3: The Weather Cloud

One of the big problems with Weathrman’s current implementation is that the whole implementation lives in your phone; a worst-case search can trawl through literally tens of thousands of search results, searching for an image relevant to your weather conditions and time of day.

Many of those searches are common to others; city-level searches are the same all over London, for example; local searches are the same for everyone sitting near me.  Much of this can be cached aggressively, massively reducing the amount of time it takes to get good results, and allowing me to do more searches, more often, at less cost and lower latency to end users.

It’s not ready yet - I’m nearly ready, probably another day away or so, and I’ll probably test it out for a week or so.  Come Google I/O, though, I’ll be ready to ship.

It’s notably faster, and pushing the image scaling to the server has resulted in massive improvements in image quality, while turning hundreds of RPC calls to flickr into a single call to the weatherman service.

Weathrman 2.5: In the mix

So I’ve made a few more small tweaks since the 2.x series started, the biggest of which is in who provides our weather data.

Yahoo’s feed is damn good; it has flaws, but it also has huge benefits to us - the most important of which is that Yahoo’s weather API provides current conditions and the sunrise/sunset times in your location.  This means that as of 2.4, we started preferring photographs of sunrises and sunsets whenever the weather was clear or cloudy.

As of 2.5, we’re again increasing the number of queries we perform on a search, and that’s going to increase the amount of time we spend updating; but the upside is that we’ve got more images to choose from, and as of now, we’re going to stop preferring what Flickr thinks is interesting, and start randomly selecting from the set of selections we have at the nearest location to you we can get them.

For a while, I was seeing the same photos, day after day - now, I don’t think I’ve seen the same thing twice.  At the moment, I have a particularly beautiful view of Leicester Square, taken by maistora.

I look forward to hearing your opinions on the new version.

Weathrman 2, and the Pile of Shame

Shadow Complex is done; and while I’m happy that I’ve managed to take another game off of the pile of shame, which is currently shrinking faster than its rate of growth, I feel a bit funny about playing an Orson Scott Card title, given his political slant.  I wish Chair wouldn’t do that; it’s not like that ending was so good that they needed OSC’s involvement - that was one of the worst endings I’ve ever seen in a videogame.

You’re forgiven for thinking that there’s been no progress on Weathrman as of late; in fact, there’s quite a bit going on behind the scenes, including:

  • Translated search terms, to pick up tourist photos.  EFIGS first.
  • Status bar notification while the desktop is visible to provide better UI discoverability
  • UI showing details of the current photo and linking to it on Flickr
  • Some kind of resolution to the updater problem

I still need some kind of resolution to the updater problem, but that might not happen until 3.  At worst, I’ll build an intent that explains the nature of the application and gives the user a one-button way of udpating wallpaper.

But first, I need a haircut.  And to start Darksiders.

Weathrman 1.5 hits the Android Market... now

This update is fairly experimental.  One small change - ensuring that we don’t try and fetch data when we haven’t got a connection established - should hopefully improve performance under edge-of-network behaviour.

The other is tougher.  I’m making some changes to the queries we perform against Flickr’s API in the hopes of improving the quality of results, especially under clear conditions.  Some of these are as follows:

  • For queries performed at night, AND() in a set of words that are likely to be associated with night photography.
  • Add a list of “banned” terms, including “naked” and “nude”.  I don’t mind looking at them on the web, but I don’t want someone’s naked butt hanging out of my phone.  Your mileage may vary.
  • Separate daytime descriptions of “clear” from nighttime ones; include in those things you only see during the day/night cycle.  (Nights, for example, might include stars, constellations, or the moon.)

These and other tweaks will hopefully improve the quality of search results returned, and they should feel more relevant; the negative side of this is that it might be much harder to find a well tagged local photo.

If you have good ideas, as always, I’m listening.