Predictions Guaranteed to Be Correct

Free Money Back Guaranteed Predictions

Feed Your Brain Over The New Year

This is the time of year when predictions are made for the coming year. I am borrowing from Gregg Easterbook and instead proposing predictions for the future in general, which is surely a safer bet because I can never be wrong!

That is, unless our world ends. In that case look me up in whatever afterlife may potentially exist for your refund.

My predictions are at the end, some brain food for the holiday weekend to get you up to speed just in case. A special thanks to Michele Hinojosa of Red Door Interactive for taking time to chat about the 1st vs. 3rd party cookie issue.

‘The Joy of Stats’

A fantastic ride through statistics which is approachable by anyone of any level of understanding. Just one hour long it starts with nothing, introduces the basics and then moves on to what may be possible in the future.

If you aren’t sure what is coming soon for web analytics David Speighalter offers a glimpse of the data deluge which we as practitioners will be expected to handle in short order, if not already:

Just looking at one thing at a time doesn’t tell you very much. You’ve got to look at the relationship between things. How they change over time, how they vary together, and that’s what correlation is about.

That’s how you start trying to understand the processes that are going on in the world and in society.

From the documentary we know that the data on the internet doubled from 2009 to 2010, and if put into printed materials we could make 90 stacks of books from the Earth to the Sun.

Waiting around for a company to release connectors from the data source you want to the tool which you are using is a gigantic opportunity cost.

‘Inferring social ties from geographic coincidences’

Direct Download Link

The emerging discussion of privacy, governmental regulation of privacy and what we as a community should do about it has been a long time coming. The focus of the discussion has been on how we can better communicate the value of tracking to the public as well as some sort of guidelines for ethical tracking.

What has been missing is an alternative view, that we can build models of preferences based on geographic coincidences, which practitioners could then use to segment populations based on geographic and temporal co-location.

This article discusses the beginning of this exact model, how we can infer social ties from geographic coincidences. But would friends really have the same wants and desires?

When coupled with the content of ‘The Big Sort’ the answer is potentially, yes. The underlying academic research of ‘The Big Sort’ has even more depth and strongly recommended for practitioners.

While this is perhaps not a replacement for all online tracking, exploring the value of this aggregate data should be considered part of the core plan for any serious web analytics department.

‘Improvement in minority attack detection with skewness in network traffic’

Article Abstract

What you could do with aggregate data is perhaps be more correct, or better stated more accurate. The brilliantly titled accuracy paradox:

The accuracy paradox for predictive analytics states that predictive models with a given level of accuracy may have greater predictive power than models with higher accuracy. It may be better to avoid the accuracy metric in favor of other metrics such as precision and recall.

Simply: there is a trade off between accuracy and precision. Imagine you shoot three arrows at a target; accuracy is how close to the bulls-eye your grouping is, precision is how tight the grouping is.

The three arrows might not be grouped very close but would all individually be close to the target.

If you shoot three hundred arrows they might trend towards a tight circle but further from the target than the loose grouping of fewer arrows.

We are often trying to find incredibly rare events in an almost infinitely complex network, combining aggregate data across systems could one day provide better results. But geo-location participation for Twitter, as an example, is very low so how would we know where people are?

‘You Are Where You Tweet: A Content-Based Approach to Geo-locating Twitter Users’

Direct Download Link

The study was able to take 100s of tweets and predict where an individual lives within 100 miles of their actual location.

Not the level of precision which would be acceptable in a production environment today, but that is only today. That was the first study to try and work out geo-locating tweets with further iterations surely forthcoming.

This article has been a great help in my own study of the use of dialectical English on Twitter which should add another layer of location awareness and comprehension to social media.

Changing Corporate Landscape:

The Twitter-verse has been all agog that Facebook ranked higher than Google for the first time as the most visited site, but to me the most interesting Facebook related development was the increasingly close partnership with Microsoft.

Microsoft, if you haven’t heard and despite some really hard working smart employees, hasn’t been doing all that well recently.

When Microsoft was out making mountains of money they were solving problems while crushing their competition like ants invading the Microsoft company picnic.

If only there was a way to ensure that advertising was increasignly routed through Microsoft they could insert themselves in the conversation. But that wouldn’t necessarily be possible unless they somehow devised a way to stop a large portion of the behavioral targeting being done right now. That is just too hard to do, right?

Oh by the way, Internet Explorer 9 allegedly blocks third party cookies by default. Because, you know, the internet standard for browsers since before 2000 is that they should and it is the right thing to do. Totally unrelated to advertising.

Predictions:

  • Web analysts will increasingly be called on to perform custom data sourcing, cleansing, integration and analysis with existing tools.
  • Predictive models based on aggregate social media data, governmental data, off line behavior data, online behavior data, et cetera segmented out into geographically distinct and statistically significant areas will provide an increasingly powerful return.
  • Mobile is nowhere near a settled industry, the lack of consistent location accuracy in Foursquare has left a wide opening for someone else to step up.
  • Web analytics in general will come under increasing pressure to self regulate or else; the third party cookie is in serious danger from regulators and with the next go round of browser releases.
  • Companies will continue to experience serious pressure to perform online and a percentage will continue to choose to perform illegally, tracking and keeping what they shouldn’t. How we, as a community, react to those occurrences will define the public’s perception of us.
  • Microsoft will look to increase the relevancy of the user data they collect, via Windows Live and OS, by decreasing the chance that other user data is available while doing whatever it takes to work with Facebook.
  • Google and Twitter will continue to improve in their respective domains, perhaps becoming friends by default with no other dance partners in the room of aggregated data marketers.

What are you working on?

If you are working on these sorts of problems and want to chat about it, or are just looking for someone to actually pay to do some cool stuff email me.

Technorati Tags: Career Development, Econometrics, Michele Hinojosa, Statistics, WAA

Share
This entry was posted in Analytics, Career Development, Data Democracy, Econometrics, Ethics, Statistics, Twitter, WAA Code of Ethics, Web Analytics Association and tagged , , , , . Bookmark the permalink.

One Response to Predictions Guaranteed to Be Correct

  1. Pingback: Ethically Dynamic Analysts | MichaelDHealy.com

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>