DIY Market Research in Consumer Generated Media
At this instant, many universities around the world are spawning out small start ups and VC are raising eyebrows as angel captial invests in a new type of market research and intelligence firm.
The new enterprise opportunities are based on the sheer volume of CGM and the vogue for the big brands on the web in this area: Twitter, Facebook, Digg, and the latest brave new entry, google-buzz. Statistics, as I discussed below, are a little hard to use in reality and the cold world of market-movements and quantitative , conclusive, inferential and the numerically indicative is somewhat removed from CGM at the moment. What the meat-of-the-dinner is in fact, remains qualitative research with some utilisable methods for stat's which help describe the parameters and prominent qualities within voice-of-public or "Buzz" in social media.
Now from the arena I have seen, there is quite a smidgen of the "emperors new clothes" around in social media-monitoring. Take for example what is all dressed up and being used and no doubt abused: In areas like sentiment with some pretty flimsy algorythms out there, or little if any statistical significance to confirm the relative changes over time or differences between brands.
Also hit count statistics: for reasons of the prevalence and magnetism of the big sticky threads I discussed earlier, these can in fact populate a large amount of your hits in a topic, and if a topic has become google-rooted ( search engine ranks are high for that forum on the given free search in the topic area) then these get alot of noise about nothing other than one place to look. You see where I am going? If you want to open a kosha sandwich deli, then you will soon realise that most of the current world market is in new york.
It is actually pretty easy to follow the path " he who hath shall have a cup which over floweth, and he who hath not shall go without for ever" : the sticky sites and the sticky threads suck in a lot of the numbers and within this lies some of the really good qaulitative insight. You don't need large indexing or meta crawling tools to get the same qualitative result: but you do need sound judgement and the "corner pieces" of your social media space and range of consumer expression.
The opposite is also true: very small postings or postings which are very similar over a range of web forums and other social media, can point to a lead indicator or early problem alert after NPI. New users posting in a period after product launch are worth picking up: they are often the tip of the iceberg of customer dissatisfaction!
Until a few years ago, search engines did not want to index "live content" for various reasons best known to them selves! So any php , asp or cfm pages where ignored as perishable and not to be indexed. This had me stuck on a few forums we ran for clients a decade and more ag- Iit became a bit tedious because back then the big-thread magnet phenomenon, and ettiquette (discussed two blogs ago). However the corporate bosses were hanging on every word written down in awe and fear of libel suites or some tumultuous disclosure ( which did happen actually)
So your start point should be to follow the well trodden path like a wolf amongst the sheep who go google, and then like the idea of CGM forum rather than reading the corporatised blurb or sanitised PR bloggs. The doors to the crime scene are all open and there are hundreds of footprints.
So your tools are the search engines. Beware being all google centric: some may be more prominent nationally or within a specialist niche of global or national citizens ( academics always used Alta Vista and then moved over to FAST all the web for example). Now you add google buzz, google blogg search, twitter search, youtube, tweet deck etc and you start to have a powerful set of doorways to be able to set out and build a report like "attitudes the the bumble bee brand amongst international english speaking consumers in Social Media"
A few weeks ago, Google announced they would be indexing public content on FaceBook which will make both some opportunity , and a big stick to beat yourself with. As with analysing tweets, it can be a torturous route of reading conversations or following links to actually make sense of hit results.
It is a little difficult to get meaningful statistics in DIY SMMing but some clever use of search string arithmetic will help. More on this , making your google etc advanced or multiple searches efficient and exhuastive in a later blog.
You can meta-track launches, from rumour mill to unboxing and consumer adoption. You can track political issues, viral news story discussion...anything that affects several hundred thousand people in a western country, and you can bet it will be posted on, blogged, tweeted or have it's own fan or hate club on FB.
On some topics you will find a fairly concise set of mega-threads, a smattering of blogs and a pitter-patter of small threads and comments around the various social media nodes. Other topics you choose to research will be huge, sprawling and broad in both their appeal and the spectrum of opinion which is expressed.
Larger topics are usually worth sub categorising by sub topic, geography or forum-colour. Alternatively you can try to see the amenability of searches which find a type of segmentation based around a more qualitative factor: like consumer intention to purchase ; polars of sentiment ; brand or feature comparative posts and pages.
When you find page hits ( times 10 for post hits on average!) which run into the hundreds then it is worth using a very simple, well validated sampling methodology. First ensure that the page listings are exhaustive and you know the total number. Then take this and take it as every tenth page to counts of 100 or 500, and every 25th page for over 500 and so on. This will mean opening everything in "new tab". Most pages on forums will have 10 posts, but some may list the entire thread or hundreds. Then you can apply the same rule of thumb: every nth post: 5 for 100 would be a more quality result. The point of this discipline being that you sample from the whole distribution, (population of posts as species if you like) and you don't follow "interesting routes". In other words, you are forced to take a wide angled shot so you understand the landscape before you can decide which features are actually representative, prominent or meangingful in light of the whole spectrum.
From this approach you can do some surprisingly quick five-bar gate counts of keywords, brands or even sentiment. Many forums have sentiment ratings, and if you include comments and reviews on places like Amazon as CGM then you can start to do sample based sentiment ratings - which in fact can be pretty much as accurate as the latest AI driven ratings- if you have enough time.
All is not equal, as discussed in the sticky threads blog below. Some threads which are large or have topical subject lines, receive many more hits than others. Also some medias are more prominent and perhaps carry more status: like the BBC web forums and comments boxes. Forums with high SE rankings tend to have the most traffic. Retweet rate /total is another meta-metric .
From a knowlegde of prominent forums for a product type, brand, band, author, lifestyle or political view point you can then consider the sub set of consumers who are most interesting to follow up: the innovators, the early adopters, the opinion shapers, the self-appointed authorities, brand champions ( fan boys / fanboi's) ..brand terrorists....and follow their posting to gain a high level view of the discussion: see if indeed they are influencing people or if generally people make their own minds up and buy that pink coloured laptop anyway!
So you start to get a feel for how a report may be structured, using simple hit counts as a top level introduction and then results from your measurements within the samples. Finally you get into the qualitative observation with the prominence of the media and the activity of the opinion leaders, and the sentiment tallies from the different samples to give some kind of summative opinion poll for the topic. The conclusions you may draw should therefore be based upon prominent information, a knowledge of why it is prominent and what else lies in the spectrum, a handle on the polarity of sentiment expressed and the average point for consumers, be it neutral or not! When you make a conclusion which points to a useful management insight, then go back and check the prominence: check the hit coutns relative to other topics or shades or opinions etc, check your sample is exhaustive and re-check your search strings ( a little more on this latter below and then another blog , coming soon to a soggy-spot near you!)
There are plenty of kid on numbers you can put around these things. For computer scientist graduates, metacrawling or re-indexing can be a way forward to producing statistics based aroung the single post as the "Unit of selection". Different sampling strategies based on random and temporal dips can be useful when confronted with 50 million tweets per day!
For the very numerate amongst you as marketers, sociologists or computer scientists, you should be aware than CGM is in such large numbers that a topic such as a fairly common brand name or product, will have a "normal distribution" of opinion if you like, and this can be captured in a correspondingly 2 SD centric list of keywords: the first six search strings capture the first two or even three standard deviations .
There is a bell curve : x axis rating versus Y axis volume. The majority of opinion/keywords etc, will be within the first two standard deviations. When you do a nth sampling you really get to check that the bell curve is covered. If you do manage to plot data, sentiment or keywords, and you find that there are more peaks and troughs than one bell curve then you have either too small a sample size, a poorly defined opinion-keyword-etc scale, or in fact you are measuring two different things: either from two destinct populations with some degree of polarity to each other on your scale ( OOPS! you sample tory and labour forums ( republican / democrat) and not general political discussion!) .
When you know you have a nice bell curve then you can be very safe in using nth sampling or random statistical sampling and that your comparisons can be shown to be statistically significant: FOR THIS DESCRIPTIVE DATA SET. You cannot use this as inferential statistics, primarily because you cannot accurately capture social demographics in CGM and there fore you cannot make any extrapolations to the population as a whole.
If you combine an offline survey which identifies people's demographics in relation to their interaction with CGM, it can be possible to make some tentative inferences based on the knowledge that your large sampling base is composed of a cross section of society idenitfied in this CGM interaction survey . Even then you have to tread very carefully, statistically speaking, because your "Hits" are by a decided number of authors, some using several handles over forums, some using multiple identities to stimulate discussion on the same forums ! In other words your actual "n" for the study group is too small. Is the post more important than the author? Hmmm well people tend to be consistent and only change opinion after some degree of cognitive dissonance so really your "n" is authors and not posts.
Inference to the general population soon evapourates when as you get into small number of authors per posts, and some of my "sticky, syrupy threads" are very much dominated by a gang of less than 10 key proponents. But then again a knowledge of what cross section are reading those forums and the thread rankings on the SE's means you can start to make a judgemental call on the importance of an issue or the opinions around a topic.
Sociologists and psychologists are very taken up with not interfering with the subjects:not introducing experimental method source errors, researcher interference or interpreter bias. If it is purely observational, then just a simple permission disclaimer is all that interfere, or in focus groups, skilled moderators stimulate debate and keep it on topic while being allegedly carful not to introduce biases (observers are usually in other rooms and should not confer on their notes themselves! )
But in the area of CGM you can be a little more anarchic. Having identified and qualified your CGM sources as "prominent" then you can set out to interact a little by starting threads, or tweets, yourself. This is a purely qualitative approach, but it can help you gain insight in an area where you found many tangenital conversations, unclear opinion or forum leader-or fanboy -bullying ( shutting out opinions, topics , alternative products/ solutions etc) previously skewing the area you are researching. Tread a little carefully and pick those forums or social networks where you have established that "noobs" ( newbies...first time or low count posters) receive a positive welcome and a range of replies and are not shut out when they post sensible . This means you can pose a question within a subject which is tenable : this could indeed include concept building around latent demand and unmet needs.
I hope this has stimulated some ideas for just going out and doing some DIY research from your desktop. This approach deals not only with qualitative observations, but you may also pick up some qualitive ideas on what would work with a crawling-indexing system, or a new type of social media platform!
To show how long in the tooth I am, and just how jaded I am by the industry, market research is a be-whoared cinderella within marketing. Of course it should be the lead violin, the first on the dancefloor but instead it is the working girl who turns up in her best frock only to have a hand put up her skirt! They want her knickers off, just to get as quick as they can to what makes them happy: to drop the analogy, product managers have often made their own minds up about what makes a good campaign and where they are going and only want market research which will support that or their plan B. They have sales and national account managers to keep happy and they need to steal a bit of limelight by doing something unique.
This is true of research in social media, and it there is a danger for observer bias in generating keywords and search strings, and the in choosing themes or summarising the spectrum of opinion. Conversely, any-road-will-take-you-there-if-you-dont-know-where-you-are-going, so it is easy to follow seemingly prominent themes and paths of arguement which take you down blind alleys. Avoid the critical path approach, and keep it broad and objective.
In a later blog I will discuss how you create an objective set of search strings which are both exhaustive enough while being efficient in "containing" a topic, and as mentioned making sure you are within the first couple of standard deviations for a given distribution with the majority of your efforts.