Archive for December, 2005

ViEmu, adwords and clickfraud

Friday, December 9th, 2005

While I was doing some http logs analysis on the number of downloads of ViEmu, my commercial vi/vim emulator for Visual Studio, some interesting information turned up. Given that I think it may be useful to other entrepreneurs using adwords to promote their business, and that I have also received several requests for my experience with adwords, I’ll be sharing that information in this post. Hopefully it will save a few bucks for other fellow developers.

Some warnings are due before I delve into the details. First, I don’t really have any evidence of clickfraud – there simply are some things in my logs which look, hm, weird, and they may be a signal of something else. But it could all be due to my more-than-limited understanding of adwords. Given that I’m not spending much, I haven’t spent too much time investigating it. It would be wasting that time which is better spent in other areas.

As well, I haven’t taken the time to read all the information available on the net on these issues, so please feel free to point out the possible flaws in my reasoning.

As to the applicability of my case to other people, I guess my case is not the most common one, as I think I’m the only advertiser working on many of my keywords. Given this, there is hardly any bidding at all, and click prices are very cheap (5 euro cents/click). It must work very differently if you are advertising on keywords with a tough competition (I guess I will be able to comment on that once I release the NGEDIT text editor).

Anyway. I set up my google adwords campaing at the end of July, as I released ViEmu 1.0. It took a few hours or days for advertisements to appear for relevant searches, but it’s been working almost unattended since. I changed a minor detail in the ad text, and I added other keyword combinations as google searches reached my site and taught me what terms people actually use to search for in case they’re interested in vi/vim integration with Visual Studio.

Given I had hardly researched at all, I learnt stuff as things happened. When I set up the campaign, I saw I had to pay about 4 euro cents per click. But afterwards, I had to raise the bid to 5 cents/click, as google warned me and turned off advertising for some key phrases because of a price that was too low. This is pretty simple to see at your adwords.google.com account.

I also started getting hits from those clicks. I found them as hits referred from “pagead2.googlesyndication.com/…” where the “…” is a really long and complex reference. It actually took me a while to realize thouse clicks were from google adwords.

There have also been other weird hits, which had a referer address of “searchportal.information.com” followed by some kind of encoded ID (such as “UVsPWVALXVUMVV8LWQgQRggaCFIXE1Y_CFEIDA0BAQ”). These addresses took me to a search page which has the nasty habit of becoming a “frame parasite” to your web surfing, and used to encode URLs to those ID strings and route everything through their site. I had severe doubts that someone educated enough to use vi/vim would surf with such a bugger.

Anyway, back to my http log review, I started doing an analysis on my November data. I usually keep track of how many downloads of my product there are a month, and try to study the correlation with monthly sales (given the 30 day trial period, tracking is a bit difficult, but I think general trends are still there). I decided to classify all hits to www.ngedit.com in November to be able to tell how many of those came from IP addresses that ended up downloading the trial version of product.

I used vim on the log files to do this process. vi/vim is pretty good for this kind of text processing and I had the desired list in a short while, although it did involve some of that vi black magic.

Anyway, it turned out that, out of the ~20,000 hits of the month, over 6,000 belonged to IP addresses that downloaded ViEmu. As an aside, it was higher than I expected. But now that I could focus better on less information, I could start seeing some new information. I removed all lines not containing “pagead2” in this reduced hit log (ad-vi-tisement: “:v/pagead2/d”), and got myself down to just 11 lines – and to my amazement, there were only 3 IP addresses! One IP appeared only once, but the other two appeared 5 times each. In the full log, there were 24 hits from pagead2, and the repetition of IPs was kind of “hidden” (I hadn’t done a :!sort on them to see the unique addresses).

I nslookup’ed both addresses, which actually only differed in the last byte of the IP address, and only ‘localhost’ was returned from the reverse DNS lookup. I went back to the full hit log, removed everything but IPs belonging to the same subnetwork (n.n.n.*), and I also found out that some of the “searchportal.information.com” links belonged to them. Things started to make some sense.

Let me show you one of the googlesydication referers at this point (broken up in lines for nice display):

http://pagead2.googlesyndication.com/pagead/ads?
client=ca-pub-0919305250342516
&dt=1131084326621
&lmt=1131084326
&format=336x280_as
&output=html
&url=http%3A%2F%2F72.14.203.104%2Fsearch%3Fq%3Dcache
  %3Aq1fqcI2sut4J
  %3Awww.lyrics007.com
  %2FBeverly%252520Craven%252520Lyrics
  %2FPromise%252520Me%252520Lyrics.html
  %2Bpromise%2Bme%26hl%3Dvi
&color_bg=FFFFFF
&color_text=000000
&color_link=0000FF
&color_url=008000
&color_border=FFFFFF
&ref=http%3A%2F%2Fwww.google.com.vn%2Fsearch
  %3Fhl%3Dvi
  %26q%3Dpromise%2Bme
  %26btnG%3DT%25C3%25ACm
  %2Bki%25E1%25BA%25BFm
  %2Bv%25E1%25BB%259Bi
  %2BGoogle
  %26meta%3D
&cc=161
&u_h=600
&u_w=800
&u_ah=570
&u_aw=800
&u_cd=32
&u_tz=420
&u_his=12
&u_java=true

That’s a URL!

I started trying to decipher these URLs. Watching other pages that implement adsense, and how they appear on google’s cache, I deduced the referer for this click (for which I was being charged), came from a google cache page (“url=http://72.14.203.104/search…”). It was a cached page from www.lyrics007.com, which is a repository of song lyrics. The google cache had been accessed from a search at google Vietnam (a trip to www.google.com.vn showed that). The search seemed to include part of the lyrics and “vi” with some weird unicode characters in between (I’m as of yet unsure of whether those %25BB are geometric signs or diacritic marks).

Who gets payed for that click? I think the owner of www.lyrics007.com does. A whois look up showed that the hosting provider is located in Houston, Texas, and that it is registered by someone in Hong Kong.

If I visit any of the lyrics pages, sure, the Google ads are relevant to the content of the page. But it seemed that the ad for ViEmu appeared when looking at a google-cached copy of the page. It’s weird but it may happen. I think the “vi” with weird characters in between may have tricked the adsense engine into showing my ad.

The visits coming from ‘lyrics007’ showed different types of activity. Sometimes just a hit to the ‘html’ file, other times regular page viewing involving hits for the graphics on the page, and even downloading the product! I even found other hits from the same IP addresses coming from ‘searchportal.information.com’.

So what may have happened? I have two possible explanations.

One is that a developer in Vietnam was looking for the lyrics to some song, using www.google.com.vn. Developers also listen to music and check lyrics once in a while. He clicked on the google cache, in order to access the page, and Google picked my ad (as, obviously, adsense technology is imperfect and the relevance of the ad is just an heuristic). While humming to the tune of the song, the guy in question saw the ad to my product, and was excited to finally see vi emulation in Visual Studio. He clicked on the ad and came to my site. He even downloaded it.

This would mean that I payed google and lyrics007.com for reaching a potential customer of mine – someone who I wouldn’t have reached easily in another way. Fair enough.

The weird thing is that this same guy has some other friends using the computer (or sharing the IP address) who went through exactly the same process several other times during the month. With different song lyrics, of course. And some of the times, their browser crashed before even hitting the ‘css’ file or the page graphics (or maybe they surf with images deactivated?).

They even downloaded ViEmu several times during the month – they must have a messy download directory.

This also happenend from other domains, not only lyrics007. I haven’t researched them much, but they seem to come from nearby areas. If all of the cases have similar explanations, then the domain holders / adsense publishers are not to blame at all.

And then I have a second possible explanation.

Some guy in Hong Kong has set up several domains with song lyrics and other easily accessible content downloaded from other sites. As those guys are damn smart, they have figured a way to force a google cache access to their page into showing any adsense ad. I’ve been trying to do it myself, and haven’t been able to, but the cache does show weird adsense results. Then, they have some kind of bot which accesses those pages and simulates clicks on the ads. They probably click on many “cheap” advertisers & keywords like mine, but every once in a while they might click on a 50 cent or even a $1 ad. I guess they can make quite some cash that way, apart from the legitimate traffic that their site drives. They even use another method based on ‘searchportal.information.com’ URL hijacking, which hides even more information from advertisers. And they have even improved the bot to fake normal access to web sites.

I can’t know which one is the right explanation. But, I talked to Andy Brice of PerfectTablePlan, and followed his suggestion of turning off advertising on the “content” network (adsense). I’m only advertising on google’s own search results, for which only google gets paid, and which removes the clickfraud incentive for 3rd party publishers.

I’ve also limited ads to specific countries. Mainly, I’ve limited the countries to those on which I already have customers:

  • USA
  • Canada
  • Russia
  • UK
  • Australia
  • Netherlands
  • Germany
  • Finland
  • Norway

I’ve also added other countries which I think are as likely as those to get me customers: Sweden, France, New Zealand, etc… but that’s about it.

Given that I have been spending less than €10 a month, the scam hasn’t been problematic for me. That’s the main reason it took me several months to investigate and optimize the issue – doesn’t make sense to optimize expenses when they are among the lowest ones.

I expect my adwords costs to go down to one tenth of what they have been – which I think amounts to pretty much the legitimate/interesting traffic that I was getting anyway.

I’m pretty happy for the result of google ads – one customer did tell me that they had found about ViEmu through an ad in google search. That single sale makes up for the rest, which I like to understand as the cost of my training with google adwords.

I hope this post doesn’t upset google – I believe it helps other people make a better use of adwords, and thus also helps google have more happy customers!