Weathrman Search Quality: A Progress Report

I’m in the process of doing some server-side results joining, to reduce the complexity of our queries.  The upside of this is that I should be able to, over time, bring us back up to our previous quality of searches.  In fact, we’ll actually be slightly better off - performing a series of decomposed searches will ultimately result in more search results to process, as we’re doing more queries and getting more results.

This mean the load on the flickr API - and the load on my server - are going up.

I’m still in the awful position of not being able to charge for this, having made it free once - so I still want this to fall within freebie quota if at all humanly possible.

Last, I’ve got a shiny new Xoom, and I’ll be spending some time on the client shortly getting some basic changes into the UI to make it slightly prettier when used there.  I’ve already made some server-side tweaks to broaden the minimum image size so that Xoom resolutions get more Flickr results.  The biggest problem at the moment is that there just aren’t many large image sizes fetchable via the API, and the images that do come back are far below the 1920x1408 background image size that a xoom wants.

More soon.

24 Hours of Weathrman Search Failures

So, around 24 hours ago, searches began failing for the thousands of people running Weathrman.  I was notified by a user a few hours ago that he began seeing strange things happening.

Flickr has made a drastic change to its API: No more than 10 OR terms are allowed in searches.

This is pretty lethal to us; we were doing, in some cases, 15 or more OR queries to select photographs for inclusion, and another 30 or so negative queries to strip away porn, keg parties, musicians, concerts, indoor photographs, you name it; we relied heavily on the ability to provide complex queries to their engine for processing, and managed a comprehensive set of allow and block filters that we used to get the set of matching results from their service.

This leaves us in an odd quandary, and so I have to begin by apologising:  I’ve had to remove most of our keywords from our queries.  You’re going to see a lot more irrelevant crap, now, and there’s not much I can do about it…

Other than doing one of two things:

1) Stop asking flickr for photos, and move to someone who has a more comprehensive (and less comprehensively restrictive) API for performing queries.

2) Build a parallel database of flickr photos by exporting and indexing everything myself.

Option 1 might be possible, but is a long way off; and there aren’t databases of geocoded searchable photos that are easy to get to dotted all over the web.  There just aren’t many places to go.

Option 2 isn’t feasible without charging a lot more money than FREE.

We could be reaching the end of Weathrman’s useful life.  If I can’t resolve search quality issues, I may be forced to take it down; we’ll have to wait and see.

Another weekend, another update...

This weekend, I’m doing something I’ve been working on for a few weeks now, and am working on tidying up and preparing for release.

I’m a recent LOVEFiLM user - fell in love with the service fairly quickly, when Majin and the Forsaken Kingdom showed up at my door.  There are films that I haven’t seen but wanted to, but won’t go through the cost of buying on blu-ray.  There are, more importantly, games I’d like to play, but won’t buy because, quite frankly, they’re just not good enough, or probably not the kind of game I’m likely to finish.

But what I’ve found is that my interest is…  fleeting.  I know that I wanted to have seen a film when someone mentions something that triggers that memory - but unless I can do something about it right now, it’s never going to happen.

What you really want is a mobile client, oriented towards search and discovery, for helping you quickly get to (and add to your queue) films you’d like to see.

Working title was liveFiLM, but that’s a violation of their naming policy.  For now, it’s Phantoscope.

 

Sick days, Weathrman, and you

So I’m home sick today, and going through the laundry list of things I haven’t gotten around to recently.

Weathrman is on that list.

I’ve pushed a new version of the server that performs searches on behalf of app users up.  This is designed to do two things:  reduce the cost of running the service, and improve the search quality.

In order to reduce the cost, I’m going to do the obvious thing:  I’m going to produce fewer searches.  Specifically, that means:

  • Instead of asking for a 3-hour window, we’re asking for a 4-hour window around ‘now’.
  • Increase the search window from 3 months to 4 months (2 in either direction)
  • Fetch 4 pages of results instead of 5.
  • Reduce the minimum number of results needed at any search level from 5 to 3.
  • Remove street-level searching (level 16) - it too rarely has results, and just creates latency.

 This means our worst case goes from being 7 tiers of 5 parallel searches to being 6 tiers of 4 parallel searches.  It also means the average case, which cost 1 tier of 5 searches before we were likely to find any results, will now get faster, and I won’t be paying for that wasted time.

The downside is that I’m only fetching four pages of results; searches cover the entire four-month window, and it’s possible that none of the first four pages will have photos taken within the required time-of-day range; if that happens, we skip up a level.  Dropping one of the pages of search results makes that more likely, and it’s probably not completely mitigated by the widening of the 3-hour window to 4.  The net effect is that searches may feel… a little less local.

Hopefully, these changes mean I’ll be able to continue supporting this service for longer, and more cheaply.  Going back to the days when the client was responsible for performing all these searches just isn’t going to happen - it’s just too convenient to be able to run the searches from the server, and much more reliable.  It does mean that I’m bearing the cost of a free app - but as long as I can keep the costs down, I don’t mind.

Weathrman on holiday...

Let me start by saying I’m really happy with the app.  I still see amazing images, and I still hear lots of really positive things from users.

The problem is, there are a lot of them these days.  I took the App Store approach of making the app free for a while, to get more users, and more reviews, and figure out how many users I can handle without needing to pay too much for the AppEngine cost that backs the app is.

Unfortunately for me, I made it free.  When you do that on the Android Market, you can never charge for your app again.

There’s no point raging at my cow orkers, I’m sure they’ve got reasons for why it’s impossible for me to make the app paid again.  I’m sure it has something to do with some awful decision someone made at some point along the way that means I can never go back to a charging model.

It also means I have to be really careful about what I do - because the intention of being paid in the Android Market isn’t to make money - it’s to throttle new user signup and to offset the cost of running the service.

Effectively, this means I won’t be updating the app until they fix this.  I’ve halted all of the ad campaigns I was running, which were mostly for experimentation anyways, because I really *don’t* want new users now.  Usage has just fallen back into the ‘free’ zone, and so I’m out of pocket… oh, nearly 500 quid so far.  Which I don’t mind, don’t get me wrong - but I don’t want to be in a position to deny anyone service for the app I’ve built.

Now, I could do lots of retooling to make the whole experience faster.  I could remember where you are, and automatically perform searches against all locations that have queried within the last 72 hours so that your search is only over local data.  I could keep a thumbs-up/thumbs-down count, and apply some machine learning so you don’t get things that you don’t like.  I could do all kinds of stuff.

But it’s not safe for me, or my pocketbook, to do so.  This is, after all, a 5% hobby, not a job.  I’ve spent… oh, I’d guess about ten days of development time, spread across a few months, and could easily drop another 10 into just the server.  Once the server supported a more instantaneous search response, I could cache multiple images on the client and implement transitions.  Or just change it each time the screen powered.  And then there’s that all-important set of client-side preferences that I’d want every time I’m on holiday and only want it to fetch images over wifi - because while I’m willing to pay roaming charges for Google Maps, I’m not really interested in paying mobile carriers so that I can change my backdrop.

All this stuff I’d be doing, if only Android Market wasn’t making my life suck.  I apologise if that suckage spills over into your user experience; for those who paid their<50p to £1, depending on where in the throttle you ended up>, I hope you’ve gotten your money’s worth; it could be a while before you see an update.  For those getting a free ride, enjoy.

I’m not willing to follow the Market suggestions of changing the app name and uploading a different APK at a charging rate, because then I’m screwing the people who *did* pay out of an upgrade they deserve.  I think what we’re doing at Google is the wrong thing for the user.  Note I said ‘we’: I work here.  I just don’t work on a team in a position to fix this, and have too many 20% projects already to take on one more.

Someday, this will get fixed.  When it does, I’ve got ideas; I look forward to Weathrman backdrops on my Google TV, for example.  When there’s an SDK.  And a home screen that supports live wallpaper.  And a reimplementation of the client in RenderScript.  And client-controlled size restrictions to let users choose their resolution ‘window’.  And support for the new large image size at Flickr.   And…  And….  And…

My twelve step program is over.  I’ve gone through anger, and remorse, and all of the others, and am left with acceptance.

Weathrman 3.5.1: The Best Intentions

I’ve taken some time to do some internal cleanup; I’ve learned a great deal about Android’s programming model from building this and other projects, and it’s time to go back and re-apply some of that knowledge.

As such, I’ve restructured the bulk of execution in IntentService implementations, which has simplified a lot of the code; notification handling and update processing are now handled by dedicated services, rather than burying the whole processing chain in an AsyncTask.

We’ve lost a few things I’d like to have kept along the way, on the inside, but the code is cleaner and faster for it.

I’ve also done something I thought I’d never see myself do:  I’ve de-guiced the app.

Done well, I’m sure Guice is a good fit; but it’s a very small app, and moving to IntentService instances has basically meant that I’ve now got three independent application-parts that communicate by sending intents to one another.  What’s left isn’t worth Guicing.

What’s funny, and somewhat unfortunate, is that by coming to this with my server hat on, I missed the fact that Android applications don’t need to be built as the monolith binaries that Guice is designed to help break apart, and some of the tools at your disposal on android are designed to ensure that not only do you not need to allocate those objects, you can avoid most of the processing time you would have spent even thinking about those elements of the object graph - binding processing just isn’t free, and on mobile devices you might not see the cost, but you can feel it, almost imperceptibly.

So 3.5.1 is out; it’s got fixes for the cache growth problems, reaffirms my commitment to 2.1-based devices, and provides a much needed speed boost in both startup performance and, due to some more tweaking on my part, on the speed of image retrieval in many cases.

I’ve also tweaked the negative keywords associated with daytime searches on the server side; hopefully you’ll see improved search quality.

And for a short time, it’s free - I’m looking to get an idea of how active users correspond with server load, and giving away a free app is a cheap way of load testing.  :)