ViEmu, adwords and clickfraud

While I was doing some http logs analysis on the number of downloads of ViEmu, my commercial vi/vim emulator for Visual Studio, some interesting information turned up. Given that I think it may be useful to other entrepreneurs using adwords to promote their business, and that I have also received several requests for my experience with adwords, I’ll be sharing that information in this post. Hopefully it will save a few bucks for other fellow developers.

Some warnings are due before I delve into the details. First, I don’t really have any evidence of clickfraud – there simply are some things in my logs which look, hm, weird, and they may be a signal of something else. But it could all be due to my more-than-limited understanding of adwords. Given that I’m not spending much, I haven’t spent too much time investigating it. It would be wasting that time which is better spent in other areas.

As well, I haven’t taken the time to read all the information available on the net on these issues, so please feel free to point out the possible flaws in my reasoning.

As to the applicability of my case to other people, I guess my case is not the most common one, as I think I’m the only advertiser working on many of my keywords. Given this, there is hardly any bidding at all, and click prices are very cheap (5 euro cents/click). It must work very differently if you are advertising on keywords with a tough competition (I guess I will be able to comment on that once I release the NGEDIT text editor).

Anyway. I set up my google adwords campaing at the end of July, as I released ViEmu 1.0. It took a few hours or days for advertisements to appear for relevant searches, but it’s been working almost unattended since. I changed a minor detail in the ad text, and I added other keyword combinations as google searches reached my site and taught me what terms people actually use to search for in case they’re interested in vi/vim integration with Visual Studio.

Given I had hardly researched at all, I learnt stuff as things happened. When I set up the campaign, I saw I had to pay about 4 euro cents per click. But afterwards, I had to raise the bid to 5 cents/click, as google warned me and turned off advertising for some key phrases because of a price that was too low. This is pretty simple to see at your adwords.google.com account.

I also started getting hits from those clicks. I found them as hits referred from “pagead2.googlesyndication.com/…” where the “…” is a really long and complex reference. It actually took me a while to realize thouse clicks were from google adwords.

There have also been other weird hits, which had a referer address of “searchportal.information.com” followed by some kind of encoded ID (such as “UVsPWVALXVUMVV8LWQgQRggaCFIXE1Y_CFEIDA0BAQ”). These addresses took me to a search page which has the nasty habit of becoming a “frame parasite” to your web surfing, and used to encode URLs to those ID strings and route everything through their site. I had severe doubts that someone educated enough to use vi/vim would surf with such a bugger.

Anyway, back to my http log review, I started doing an analysis on my November data. I usually keep track of how many downloads of my product there are a month, and try to study the correlation with monthly sales (given the 30 day trial period, tracking is a bit difficult, but I think general trends are still there). I decided to classify all hits to www.ngedit.com in November to be able to tell how many of those came from IP addresses that ended up downloading the trial version of product.

I used vim on the log files to do this process. vi/vim is pretty good for this kind of text processing and I had the desired list in a short while, although it did involve some of that vi black magic.

Anyway, it turned out that, out of the ~20,000 hits of the month, over 6,000 belonged to IP addresses that downloaded ViEmu. As an aside, it was higher than I expected. But now that I could focus better on less information, I could start seeing some new information. I removed all lines not containing “pagead2” in this reduced hit log (ad-vi-tisement: “:v/pagead2/d”), and got myself down to just 11 lines – and to my amazement, there were only 3 IP addresses! One IP appeared only once, but the other two appeared 5 times each. In the full log, there were 24 hits from pagead2, and the repetition of IPs was kind of “hidden” (I hadn’t done a :!sort on them to see the unique addresses).

I nslookup’ed both addresses, which actually only differed in the last byte of the IP address, and only ‘localhost’ was returned from the reverse DNS lookup. I went back to the full hit log, removed everything but IPs belonging to the same subnetwork (n.n.n.*), and I also found out that some of the “searchportal.information.com” links belonged to them. Things started to make some sense.

Let me show you one of the googlesydication referers at this point (broken up in lines for nice display):

http://pagead2.googlesyndication.com/pagead/ads?
client=ca-pub-0919305250342516
&dt=1131084326621
&lmt=1131084326
&format=336x280_as
&output=html
&url=http%3A%2F%2F72.14.203.104%2Fsearch%3Fq%3Dcache
  %3Aq1fqcI2sut4J
  %3Awww.lyrics007.com
  %2FBeverly%252520Craven%252520Lyrics
  %2FPromise%252520Me%252520Lyrics.html
  %2Bpromise%2Bme%26hl%3Dvi
&color_bg=FFFFFF
&color_text=000000
&color_link=0000FF
&color_url=008000
&color_border=FFFFFF
&ref=http%3A%2F%2Fwww.google.com.vn%2Fsearch
  %3Fhl%3Dvi
  %26q%3Dpromise%2Bme
  %26btnG%3DT%25C3%25ACm
  %2Bki%25E1%25BA%25BFm
  %2Bv%25E1%25BB%259Bi
  %2BGoogle
  %26meta%3D
&cc=161
&u_h=600
&u_w=800
&u_ah=570
&u_aw=800
&u_cd=32
&u_tz=420
&u_his=12
&u_java=true

That’s a URL!

I started trying to decipher these URLs. Watching other pages that implement adsense, and how they appear on google’s cache, I deduced the referer for this click (for which I was being charged), came from a google cache page (“url=http://72.14.203.104/search…”). It was a cached page from www.lyrics007.com, which is a repository of song lyrics. The google cache had been accessed from a search at google Vietnam (a trip to www.google.com.vn showed that). The search seemed to include part of the lyrics and “vi” with some weird unicode characters in between (I’m as of yet unsure of whether those %25BB are geometric signs or diacritic marks).

Who gets payed for that click? I think the owner of www.lyrics007.com does. A whois look up showed that the hosting provider is located in Houston, Texas, and that it is registered by someone in Hong Kong.

If I visit any of the lyrics pages, sure, the Google ads are relevant to the content of the page. But it seemed that the ad for ViEmu appeared when looking at a google-cached copy of the page. It’s weird but it may happen. I think the “vi” with weird characters in between may have tricked the adsense engine into showing my ad.

The visits coming from ‘lyrics007’ showed different types of activity. Sometimes just a hit to the ‘html’ file, other times regular page viewing involving hits for the graphics on the page, and even downloading the product! I even found other hits from the same IP addresses coming from ‘searchportal.information.com’.

So what may have happened? I have two possible explanations.

One is that a developer in Vietnam was looking for the lyrics to some song, using www.google.com.vn. Developers also listen to music and check lyrics once in a while. He clicked on the google cache, in order to access the page, and Google picked my ad (as, obviously, adsense technology is imperfect and the relevance of the ad is just an heuristic). While humming to the tune of the song, the guy in question saw the ad to my product, and was excited to finally see vi emulation in Visual Studio. He clicked on the ad and came to my site. He even downloaded it.

This would mean that I payed google and lyrics007.com for reaching a potential customer of mine – someone who I wouldn’t have reached easily in another way. Fair enough.

The weird thing is that this same guy has some other friends using the computer (or sharing the IP address) who went through exactly the same process several other times during the month. With different song lyrics, of course. And some of the times, their browser crashed before even hitting the ‘css’ file or the page graphics (or maybe they surf with images deactivated?).

They even downloaded ViEmu several times during the month – they must have a messy download directory.

This also happenend from other domains, not only lyrics007. I haven’t researched them much, but they seem to come from nearby areas. If all of the cases have similar explanations, then the domain holders / adsense publishers are not to blame at all.

And then I have a second possible explanation.

Some guy in Hong Kong has set up several domains with song lyrics and other easily accessible content downloaded from other sites. As those guys are damn smart, they have figured a way to force a google cache access to their page into showing any adsense ad. I’ve been trying to do it myself, and haven’t been able to, but the cache does show weird adsense results. Then, they have some kind of bot which accesses those pages and simulates clicks on the ads. They probably click on many “cheap” advertisers & keywords like mine, but every once in a while they might click on a 50 cent or even a $1 ad. I guess they can make quite some cash that way, apart from the legitimate traffic that their site drives. They even use another method based on ‘searchportal.information.com’ URL hijacking, which hides even more information from advertisers. And they have even improved the bot to fake normal access to web sites.

I can’t know which one is the right explanation. But, I talked to Andy Brice of PerfectTablePlan, and followed his suggestion of turning off advertising on the “content” network (adsense). I’m only advertising on google’s own search results, for which only google gets paid, and which removes the clickfraud incentive for 3rd party publishers.

I’ve also limited ads to specific countries. Mainly, I’ve limited the countries to those on which I already have customers:

  • USA
  • Canada
  • Russia
  • UK
  • Australia
  • Netherlands
  • Germany
  • Finland
  • Norway

I’ve also added other countries which I think are as likely as those to get me customers: Sweden, France, New Zealand, etc… but that’s about it.

Given that I have been spending less than €10 a month, the scam hasn’t been problematic for me. That’s the main reason it took me several months to investigate and optimize the issue – doesn’t make sense to optimize expenses when they are among the lowest ones.

I expect my adwords costs to go down to one tenth of what they have been – which I think amounts to pretty much the legitimate/interesting traffic that I was getting anyway.

I’m pretty happy for the result of google ads – one customer did tell me that they had found about ViEmu through an ad in google search. That single sale makes up for the rest, which I like to understand as the cost of my training with google adwords.

I hope this post doesn’t upset google – I believe it helps other people make a better use of adwords, and thus also helps google have more happy customers!

10 Responses to “ViEmu, adwords and clickfraud”

  1. Kirby Turner Says:

    Thanks for posting your Adword experience. I just started using Adwords in November for my first product and I was surprised at the increase in traffic. However, my sales have not changed much. I have not reviewed my logs as thoroughly as you but I have been suspecting some type of click fraud. Things look a little odd in the logs and I need to research what is happening more.

    I didn’t know about the option to turn off the content network advertising. I’m going to do that immediately and see what effect it has.

    Thanks!

    -KIRBY

  2. Nathan Lloyd Says:

    I’veh ad a very similar experience, but a few more than 10 euros… You can read the details here – I won’t try to repost it, as it’s fairly lengthy.

    http://www.iambanned.com/index.php/topic,56.msg64.html#msg64

  3. J Says:

    Nathan, I read your story, and I’m sorry to hear it. Hopefully they won’t scam you more. I didn’t bother to contact google, given the small amount, but I see they are not “very responsive” to say the least. Good luck recovering.

    I think it is a worthwhile practice for every adwords customer to scan their logs for searchportal.information.com activity.

  4. software.gurock.com » Adwords performance and conversion rates Says:

    […] I often read complains about Google’s Adwords advertising program. We had our starting problems with Adwords as well, but the situation improved a lot in the past weeks. I’m not sure why, but we got much more Adwords traffic than before. And I’m pretty sure it’s not click-fraud from Google’s Adsense network, because our visitors often read multiple pages and download our trial or subscribe to our newsletter. […]

  5. The growing pains of NGEDIT » Blog Archive » Google *loves* the H1 tag Says:

    […] I have an adwords campaign (read my report on adwords for details on the effectiveness, click fraud, etc), which helps out, but I’d really prefer to be on the main results. What’s more, I couldn’t easily understand why I wasn’t. […]

  6. The growing pains of NGEDIT » Blog Archive » Rough strategy sketch Says:

    […] This is not a list of principles I try to adhere to. It’s more of a recollection of the kind of decisions I’ve found myself taking on intuitive grounds. I’ve seen that I will trade the best design for some functionality, in order to be closer to release, and I’ve found that I’ve traded every sensible business principle by deciding to implement some very complete (and costly) vi/vim emulation. The fact that my sticking to vi/vim emulation has resulted in ViEmu, which is a nice product, (kind of) validates the principles. Actually, I think it validates them because I find myself enjoying the effort, which helps in sustaining the long term effort, and the business is gaining momentum. Apart from this, the ViEmu experience has been an incredible sandbox where to learn, and the lessons learned will play a nice role towards the actual release of NGEDIT. For example, the Google SEO front, and also the adwords & clickfraud front. […]

  7. mike baker Says:

    Click Fraud is an interesting topic – one which both clicktracks and adwatcher will stop, one which will cost you an estimated 20% of your ad budget.
    This is an interesting article with valuable information. I have used both clicktracks and adwatcher to prevent clickfraud. What we and many other webmasters are starting to do is invest our marketing dollars into clicktracks, adwatcher or other ad tracker software.
    If you are looking for more information on adwatcher or clicktracks i recommend you take a look at: http://www.trackingsoftwarereviews.com they have full reviews on both clicktracks and adwatcher!

    Mike Baker

  8. LZZR » Closing searchportal.information.com Subject Says:

    […] Closing searchportal.information.com Subject Writing my first post on this issue I couldn’t expect the reaction it will produce, even less I expected the kind of reaction that followed. If this thread I quietly watched without intervening happened to be openly hostile than this comment I got in my own blog is simply offensive, so I feel I need to reiterate my points and clarify my position to bring the whole issue to some logical closure (I really hate unresolved issues like this hanging about). I’ve done some research over the past few days and now ready to provide a proper account so, forgive me when I’ll be repeating some points already mentioned but this is something that is needed to maintain the logic of my reply. Here are facts: April 7th, 2007 – Myself and some of my friends report strange redirects from a number of websites to searchportal.information.com IP 66.151.179.147 this page reports the same DNS issues with livejournal.com (NS1.SIXAPART.COM NS2.SIXAPART.COM) happening on a massive scale April 8th, 2007 – Dreamhost reports DNS DOS Attack which incapacitates Dreamhost DNS servers and subsequently affected DNS servers resolve all domains for which Dreamhost is authority to 66.151.179.147 April 9th, 2007 – The issue is observed sporadically for websites having their DNS records either privately (EVERETT.ORG, ADVERDNS.NET, PROZ.COM etc) or at large hosting providers (1AND1.COM, EASYDNS.COM etc) and these are just ones I myself have seen affected (also reported here and on April 13 here). Here is my interpretation: From what I’ve seen myself and what I gathered on the net I insist that we are dealing with DNS poisining also known as DNS Hijacking. How do I know this? How do I know it’s not a virus? Well, it isn’t cause I ain’t stupid and check my machines for viruses regularly, but mainly because changing DNS servers in browser or router settings to unaffected ones resolved the issue entirely. Why do I think Dreamhost DOS issues and DNS poisoning are related? Note: I never said or implied that Dreamhost DNS was hacked or hijacked and hence was the cause of the problem – if some have read my postings this way it’s their problem, not mine. Not only because these two things hapened at the same time, but because it all fits too well in a standard DNS poisoning scenario. To inject poisoned DNS data you need to disable authority DNS servers for domains you are trying to rewrite, otherwise your poisoned data will not be accepted by DNS servers you are trying to inject poisoned data into. Of course it is not a hard proof and there is a chance of coincidence, but do you believe in coincidences? Judging from the fact that most responses to this attack are coming from Russia and from what I understood from the translation of the page quoted above the point of injection was a DNS of one of Russian ISPs from where it spread to some European DNS servers. Fortunately it was not a large scale episervic but large enough to cause a serious disruption. Conserning the security issue: I am tired of repeating the same thing – if you landed on searchportal.information.com page instead of a site you have an account at – change password for this site be it your own or otherwise – if you are dumb enough not to you have only yourself to blame as you’ve been forewarned so many times. Redirect script to searchportal.information.com reads your cookies! Note: the Russian livejournal post I referenced above provides HTTP headers transcript wherre you can read yourself how your browser happily provides cookies to this redirect script. The same should be done with your email and FTP passwords if you accessed affected sites with your email or FTP clients. As this thread, this thread and many others suggest searchportal.information.com affiliates are involved in browser hijacking using various techniques since at leat 2005. Particlularly interesting is this article which I am yet to analyse properly but this one hints on possible AdSense click fraud. If understanding the issue itself was not too difficult for me, the hostile, ill-mannered and frankly pointless response to my posts still puzzles me a lot. However, concerning the amount of money involved in these operations it shouldn’t be surprising that threre are some who wouldn’t like too much attention to be drawn to the activities of certain searchportal.information.com affiliates. That’s all I know and all I think about it. DIXI […]

  9. Wolfman Says:

    I had this happen to me and I figured it out for my scenario. I downloaded a host file ad blocker. In the ad blocker it points all lame ad sites to localhost. For some reason I went to a site by fat fingering the URL and that URL popped up…searchportal.information.com. It wanted me to login…LAME! Freaked me out a bit at first and I was like…hmmmmm. NOPE!

    I searched the host file and sure enough here’s the entry.

    127.0.0.1 searchportal.information.com #[Panda.Spyware:Cookie/Searchportal]

    The ad blocking host file works great. If you want to download it check out this site.

    http://www.mvps.org/winhelp2002/hosts.htm

    For me I have IIS running on my localhost…hence, the login prompt!

    It never prompted me to login before with normal web surfing but since I went directly to the URL it must have kicked something off.

    Google has become a machine…I’m thinking about adding them to my host file ha ha.

    Later…

    RW

  10. J Says:

    Wolfman, thanks for sharing. I don’t think that applies to the adwords clickfraud scenario though, but it should be of help to those ‘infected’ by the malware.

Leave a Reply