eBusiness: The Hope, the Hype, the Power, the Pain
|
(© Jack M. Wilson,
1999, 2000) |
The internet makes it possible to collect data in volumes never before possible. As Gell-Mann’s hypothesis notes: “What is not forbidden is compulsory!” If people can collect this data, they will collect this data. With few established laws or ethical principles to guide them, enterprises had a free field to experiment. We will discuss the collection techniques, the analysis techniques, and the applications of data mining in a data rich environment. In later chapters we will see the implications for marketing and the struggle with privacy and other legal issues.
The key to this part of the process is the database and the software that surrounds it. There is a war going on in the database market between Oracle and IBM. As they battle for market share with one another, Microsoft is playing the role of the classic “disruptive technology” in the sense that they are described by Clayton Christensen.[1]

In its earliest incarnation, data mining was a statistical art practiced by mathematicians and statistician who often used statistical analysis tools like SAS and SPSS. Today the tools are being built into ERP and CRM systems and are often transparent to the users. The data mining process starts with raw data resources that are processed into large database systems often called data warehouses. Then sophisticated statistical tools look for patterns in the data to create personal profiles of ;potential customers. Prior to the internet, the data often came from publicly available information such as census data, telephone books, legal records, or newspapers. This data was often augmented by records of existing customers obtained through direct mail or purchase records, and information from market research studies.
The Internet opened a whole new world of resources for data mining. It is literally possible to watch every move a person makes on the Internet. Some of the things that may be measured and recorded include:
· The URL’s that customers visit.
· The time that they spend on each page of any web site.
· The objects that they click on.
· The linked pattern of moves
· The last place they visit on the site
The quantity of data collected is so large and so fragmented that one might assume that it is just too much to make sense of. Internet users also have a (largely false) sense of anonymity in the belief that they are not being identified and tracked. In the chapter on technology we saw how cookies could be used to store data about the user of a particular machine. Even then the identification was the machine and not the user. It was also fragmented because a different site placed each cookie. That changed without fanfare with the rise of the web marketing companies like DoubleClick. DoubleClick would manage the web marketing efforts of many sites and then be able to aggregate the cookie data from large numbers of sites. Still, it seemed that the personalization of data was to the machine and not the user.
Web marketers soon learned that customers could be induced to give up lots of data about themselves for free goods and services. There seemed no limit to the kinds of information that they would provide. Because of the lack of established privacy policies, this data was freely bought sold and transferred on the open market. It was only a matter a time until web data miners began to combine the semi-anonymous web data with the personalized data obtained from other sources. As we shall see in the next chapter, DoubleClick enraged the world when they (brazenly) announced their intention to create personalized profiles from combined sources.
DoubleClick was not the first, probably not the largest, and certainly not the last to do this. We will consider the legal and ethical issues in the next chapter, but without data mining none of this would have been economically feasible.
Data mining and the web combined to create value out of the extensive data and this economic value has and will continue to drive companies to exploit that data to the maximum effect.
Who are the leading data mining software suppliers? According to Elder Research these are:[2]
These are the hardcore statistical tools used by academics and statisticians. When data mining practitioners were asked in October to list the tools that they used they had a slightly different take.[3]
Again SPSS, SAS, and IBM led the pack. Clementine is an SPSS Product as is
AnswerTree. Each has a slightly different
focus. Clementine endeavors to combine
the data with your own business models to develop predictive models while
AnswerTree is looking for segments and factors in the data.
These tools are for the mathematically sophisticated user. Another segment of this market addresses the integrated customer analysis market. These are CRM like tools that (more-or-less) automatically sort through data to analyze customer behavior. Leaders in this area include:[4]
· Accrue: http://www.accrue.com/
· Broadbase: http://www.broadbase.com/
· Epiphany: http://www.epiphany.com/
· NetGenesis: http://www.NetGenesis.com/
· Siebel: http://www.Siebel.com/
· Vignette: http://vignette.com
These tools focus on the customer relationship and try to develop models and patterns that will allow companies to better service the customer’s need.
The problem: When children visited the Microsoft Encarta homepage and then clicked on a banner ad to go to SmarterKids, they would not stay long. Why did kids leave the site so quickly? What was it about the experience that turned them off? Why did they not look around?
The analysis: SmarterKids used NetAnalysis from NetGenesis to analyze the behavior of these children who came to their site through the Encarta home page. They discovered that the banner ad was bringing the kids directly to a search tool. The kids would then try to search for their favorite toy groups, like Barbie or Pokemon. Since SmarterKids did not have these toys, the search engine simply returned a “Sorry, no results were found” message. Quickly turned off, the kids headed off for a more interesting site. The problem was a classic case of having to grab their interest immediately.
The solution: They redesigned the web site so that when the kids arrived they were presented with some of the more interesting SmarterKids products. Rather than giving the kids an undirected option to search, they tried to attract them to chose one of the existing products rather than risk their searching for something they did not have!
The results: Now kids coming from Encarta are far more likely to stay and browse and far more likely to buy.
Overall SmarterKids claims that using the analysis tools and redesigned the site regularly has led to the tripling of the number of orders placed per visitor to the site.
According to Al Noyes, executive Vice President of SmarterKids commented that “Before NetAnalysis, our marketing decisions were made in the dark. We didn’t know where qualified customers were coming from and what they were doing. Now, we can track people progress through our site, including where they clicked, what pages they stayed on and exit pages.”
[Yahoo, AltaVista, Double Click, Amazon.com, E-Trade, eBay]
[1] Clayton Christensen, “The Innovator's Dilemma : When New Technologies Cause Great Firms to Fail;” Harvard Business School Press; 1997.
[2] Elder Research Website: www.datamining.com
[3] Kdnuggets website: http://www.kdnuggets.com/polls/dm_tools_oct_2000.htm
[4] “Data Mining -Sharpen Your Edge: E-businesses are using the data mining features packed into the latest customer service tools to outshine the competition.” by B. Reimers; InternetWeek; May 10, 2000.
[5] “Data Mining -Sharpen Your Edge: E-businesses are using the data mining features packed into the latest customer service tools to outshine the competition.” by B. Reimers; InternetWeek; May 10, 2000.