These days, more than ever, businesses are operating in data rich environments. Data emanating from every-day business operations, sales and customer account activities, service call activity, financial and economic transactions, regulatory reporting and all the other business-related events of the world are routinely captured and stored in databases. Existing global databases are adding terabytes of new information daily. Every moment of every day bank transactions and electronic funds transfers, point-of-sale systems, hospital tests and procedures, factory production lines, airline reservations, service calls and even electric meters and gasoline pumps are creating digital records that are stored somewhere in a database.
The vast majority of these data, however, will never see the light of day. More often than not, these data will be stored for a specified period of time, in some cases as required by law, and then “purged” to make room for more current data of the same kind. This process is likely to repeat ad infinitum, each time replacing the “old” data with “new” data until the “new” data itself becomes “old” and must once again be replaced. Yet in many cases these data can represent a “rich ore” of valuable information and knowledge about the domain from which it has been taken.
What better source is there to learn about patterns of customers’ preferences and buying habits than from the customers themselves; not just what they tell you they need or like in a Customer Needs and Requirements/Satisfaction Survey, but what they actually buy. What better source is there to learn about equipment failures and service requirements than from the equipment itself; not just from what your field technicians tell you, but directly from the equipment. What better source is there to learn about the risk in lending or extending credit than from your business’s own financial successes and failures; not just from what your banks or creditors tell you, but from your own financial experiences, both good and bad. The list goes on and on.
Organizations are always searching for knowledge that can advance their cause and keep them abreast of the market, anticipated trends and the competition. Marketing managers would love to know what makes their customers “tick”. Manufacturing managers would do anything to find out how they could improve the quality of their products, even by just a fraction of a percentage. Not to mention the securities traders who would “sell their corporate souls” just to keep a half-step ahead of the pack in being able to detect a change in trends or receive an “early warning signal”.
Oftentimes the answers to these questions are contained in the data that businesses routinely collect, store and discard from their ever-growing databases. Many companies have already recognized the potential of this source of knowledge and have invested substantial effort and significant amounts of resources to uncover the precious knowledge “hidden” in their data. Among the various emerging technologies being utilized, some employ a combination of both the traditional and newer, more “exotic” paradigms in a field known as knowledge discovery, or database mining.
Credit card issuers are using advanced knowledge discovery methods to identify usage patterns that indicate fraud in an attempt to execute more effective fraud avoidance systems and, ultimately, minimizing their exposure to losses. Warranty management organizations are using similar methods to detect fraud in an attempt to reduce their traditional losses in this area.
Digital marketing companies use related methods to create more targeted and effective lists for the products and services they are promoting to improve their overall effectiveness. Automotive companies use the same techniques to discover patterns of failures and corresponding information to incorporate into the proprietary knowledge bases that they distribute to their authorized dealers and licensed mechanics. Many more applications of a similar nature span across businesses and industry segments of all types under the banner “Let The Data Work for You”.
The analogy of database mining to quarry mining is very appropriate too. In ore mining the process goes through tons and tons of dirt in order to extract one precious gram of gold. Similarly, in database mining, one may also need to go through very large quantities of data just to get to the “one piece of information that makes it all worthwhile”.
However, it is typically at this point where traditional analytical methods and approaches have failed, and the businesses that have historically used them have pretty much “given up”. Going through a large “mine” of raw data only to transform it into a somewhat smaller pile of statistics or summary tables is of very little use and often quite discouraging, and questions like; “What do the data mean?”, “How can we make use of it?”, and “How does it relate to our bottom line?” are all hard to tell.
Traditional statistical methods make assumptions about the data used and require a model in the form of an hypothesis that one can then either accept or reject. Quite often the data do not conform with the assumptions and there is no model. In addition statistics excludes from its realm many forms of data that are quite common in the expression and representation of some of the phenomena that are around us. To overcome these drawbacks, the process of extracting knowledge from data has turned to machine learning techniques.
Machine learning techniques, developed under the umbrella of Artificial Intelligence (AI), were originally patterned after a unique human intelligence trait – the ability to acquire and create new knowledge. From this basis, new and highly sophisticated AI techniques have been developed using a broad array of disciplines and strategies, and reflecting various levels of success.
In later stages of research some of these techniques have been incorporated into a knowledge acquisition process which represents a critical step in the process of building and maintaining knowledge-based systems. Prior to the development of such a process, this was typically the area that represented the largest “bottleneck” in terms of actually having the capability of building and using knowledge-based systems in practical business applications. Moving from this point forward (i.e., to expanding the use of learning mechanisms to database mining knowledge discovery), the distance is very short.
Today, knowledge discovery tools and methods employ a broad range of technologies and methodologies. Neural networks are probably the best known and most widely used approach to machine learning. The technology is quite versatile, relatively mature and has been used very successfully in a broad array of applications ranging from the screening of credit card applications, to placing geographically-based advertisements in national magazines, to reading handwritten addresses and routing the mail. Other discovery methods are based on technologies such as information theory, fuzzy set theory, rough set theory, nearest neighbor metrics and others.
Finally, with respect to the question “Why knowledge discovery?”, the answer should be more apparent by now. Your organization may be sitting on a “goldmine” of data which could be converted into useful knowledge – knowledge that can be used to help you focus your strategic and marketing planning efforts; monitor and improve the quality of your production and service delivery processes; and explain your customers’ sensitivity to your competitive pricing structure, customer service performance, brand name recognition, advertising and promotional campaigns or anything else you would like to learn about the markets in which you operate.
Many organizations have already recognized the potential benefits of these new technology applications and are utilizing these tools to lead them to smarter, more efficient and more productive operations. The list of such companies is growing every day – and your organization should also leverage the knowledge to join them.