Thursday, April 4, 2019

OLAP Multidimensional Database Concept

OLAP Multidimensional selective in changeionbase ConceptCHAPTER 2LITERATURE REVIEW2.1 INTRODUCTIONThis chapter is normaled to provide priming entropy and brushuping the characteristics of in ground levelation storage store, OLAP four-dimensional entropybase Concept, info mining rulel and the application of selective information mining. Within this research, the concept, traffic pattern and capital punishment approaches in developing a complete entropy store technology framework for deploying a successful mannikin with the integration of OLAP Multidimensional entropybase and information mining model. component 2.2 discussed about the fundamental of information w arhouse, information wargonhouse model and similarly the Extract, Transform and burdening (ETL) of raw selective informationbase to entropy storage store. It admits research and take on on existing entropy w arhouse models authored by William Inmon, Ralph Kimb every and motley scholars venturing into info storage wargonhouse models. Section 2.3 introduces downplay information of OLAP. It includes the studies and research on various OLAP models, OLAP computer computer computer architecture and concept on membering 2-dimensional selective informationbases, multidimensional entropybase schemas initiation and implementation in this research. It includes studies and research on schema design and implementation. Section 2.4 introduces fundamental information of info mining. It includes studies and research on the open techniques, method and butt for OLAP information Mining. Section 2.5 discussed the product comparisons for information w arhouse, information mining and OLAP by Mitch Kramer. It includes the reason why Microsoft is utilize to design and implement the new proposed model.In this literature review, introduction to the relationships in the midst of information warehouse, OLAP multidimensional informationbase and information mining model for deplo ying four information- found applications for benchmarking. This research also proves that the new proposed model information warehouse technology framework is ready to transfigure any type of raw info into useful information. It lead also help us to review the new proposed model of each existing entropy warehouse OLAP Multidimensional entropybase framework.2.2 DATA WAREHOUSE tally to William Inmon (1999), know as the Father of Data Warehousing, entropy warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of the managements decision-making process. Data warehouse is a database containing data that usually represents the business history of an nerve. This diachronic data is use for synopsis that supports business decisions at many another(prenominal) levels, from strategic planning to performance military rank of a discrete organizational unit.Data store is a type of database system steered at efficacious integra tion of in operation(p) databases into an environment that modifys strategic use of data (G. Zhou et al., 1995). These technologies include relational and multidimensional database management systems, client/server architecture, meta-data modelling and repositories, graphical exploiter interface and much more than (J. dick et al., 1995 V. Harinarayan et al., 1996).Data warehouse currently are much a subject of researched is not totally normally used in business or finance sector but heap be use appropriately in various sectors. Data warehouse are designed for analyzing or processing of data into useful information using data mining tools for full of life decision-making. Data warehouse provides attack to vexed environments of an first step dataIn these literature studies, two chief(prenominal) authors are identified as the briny contributors and co-founder in the area of Data Warehouse, William Inmon (1999 2005) and Ralph Kimball (1996, 2000). Both author perceptions on data warehouse design and architecture differ from one another. According to Inmon (1996), data warehouse is a dep wind upent data market structure, whereas Kimball (1999) be data warehouse as a bus structure which is a combination of data mart populated together as a data warehouse. remand 2.1 discussed the differences in data warehouse political orientation between William Inmon and Ralph Kimball.Table 2.1 William Inmon and Ralph Kimball Data Warehouse DifferencesWilliam InmonRalph KimballParadigmInmons Paradigm An enterprise has one data warehouse, and data marts acknowledgment their information from the data warehouse. Information is stored in 3rd normal form.Kimballs Paradigm Data warehouse is the collection of heterogeneous data marts within the enterprise. Information is always stored in the dimensional model. architecture architecture Using TOP-DOWN approachArchitecture Using Bottom-up approachConceptDatas integration from various systems to cardinalized repositoryConc ept of dimensional modelling (Bridging between Relational and multidimensional DB)DesignThe design pattern dependent on 3rd normalization form, purpose is for data granularity.Datas marts are connected in a bus structure. Datas marts are the union of data warehouse. This approach is known also as virtual(prenominal) Data Warehouse.ETL MethodsDatas reciteion from operational data sources. Data are feed in present database area. Data are then(prenominal) transformed, integrate, and consolidate and transfer to Operational Data bloodline database. Data are then load to data mart.Data extracted from legacy system and then consolidated and verified in represent database. Data feed into ODS and more data us added/updated. Operational Data Store contains fresh written matter data that is integrated and transformed to the data mart structure.Data martData Marts are available as a subset of the data warehouse.Data Marts can be placed at polar at assorted servers or in geographical loc ations.Based on this Data Warehouse literature, both Inmon (2005) and Kimball (2000) break different philosophies, but they do have similar agreement on a successful design and implementation of data warehouse and data marts are mainly depending on the effective collection of operational data and validation of data mart. Both approaches having the same database represent concepts and ETL process of data from a database source. They also have a common understanding that independent data marts or data warehouses cannot fulfil the requirements of end users on an enterprise level for precise, timed and relevant data.2.2.1 DATA WAREHOUSE ARCHITECTUREData warehouse architecture is a wide research area. It has many different sub-areas and it can be treated with different approaches in terms or analysis, design and implementation by different enterprise. In this research studies, the aim is to provide a complete view on data warehouse architecture. Two important scholars Thilini (2005) an d Eckerson (2003) from TDWI will discussed in more details on the cabbageic on data warehouse architecture.According to Eckerson (2003), ahead implementing a successful business intelligence systems where users can use programs like specialized insurance coverage tools, OLAP tools and data mining tools up look, a data warehouse architecture model mainly concentrate on the database staging process from different integrated OLTP systems is responsible for the ETL to the whole process workable. Thilini (2005) conducted a two phase piece of work postdate on investigating which cyphers whitethorn influence the selection of data warehouse architecture. In Thilini literature study, there are five data warehouse architectures that are practice today as shown in Table 2.2.Table 2.2 Data Warehouse Architectures (Adapted from Thilini, 2005)Data Warehouse Architecture strong-minded Data MartsIndependent data marts also known as localized and small sizingd data warehouses. It is mainly use d by departments, divisions or regions of company to provide own operational databases. The data marts are different as the structures are different from different location with inconsistent database design which makes it difficult to analyze across the data marts. Thilini (2005) cited the work of Winsberg (1996) and Hoss (2002) that It is common for organizational units to develop their own data marts. Data marts are best used as a prototype for adhoc data warehouse and as for evaluation before make a real data warehouse.Data Mart Bus ArchitectureKimball (1996) pioneered the designed and architecture of data warehouse with unions of data marts which are known as the bus architecture. Bus architecture Data Warehouse is derived from the unions of the data marts which are also known as Virtual Data Warehouse.Bus architecture allows data marts not only located in one server but it can be also cosmos located on different server. This allows the data warehouse to functions more as virt ual reality mode and gathered all data marts and process as one data warehouse.Hub-and-spoke architectureInmon (2005) unquestionable Hub and Spoke architecture. The hub is the of import server taking care of information exchange and the spoke handle data transformation for all regional operation data stores. Hub and Spoke mainly focused on building a scalable and maintainable infrastructure for data warehouse. aboriginalized Data Warehouse ArchitectureCentral data warehouse architecture almost similar to hub-and-spoke system architecture without the dependent data marts. This architecture copies and stores heterogeneous operational and external data to a single and consistent data warehouse. This architecture has only one data model which are consistent and complete from all data sources.According to Inmon (1999) and Kimball (2000), central data warehouse should have Database staging or known as Operational Data Store as an intermediate stage for operational processing of data in tegration before transform into the data warehouse.Federated ArchitectureAccording to Hackney (2000), Federated Data Warehouse is a integration of denary heterogeneous data marts, database staging or Operational data store, combination of analytical application and report systems. The concept of federated focus on framework of integration to make data warehouse as greatest as manageable. Jindal (2004) conclude that federated data warehouse approach are a unimaginative approach for a data warehouse architecture as it is focus on higher(prenominal) dependability and provide excellent value if it is well defined, documented and integrated business rules.Thilini (2005) conclude that hub and spoke and modify data warehouse architectures are similar and the survey scores are almost the same. Hub and spoke is hurrying and easier to implement because no data mart are required. For modify data warehouse architecture scored higher than hub and spoke as for urgency needs for relatively fast implementation approach.A data warehouse is a read-only data source where end-users are not allow to change the values or data elements. Inmons (1999) data warehouse architecture strategy are different from Kimballs (1996). Inmons data warehouse model splits data marts as a copy and distributed as an interface between data warehouse and end users. Kimballs views data warehouse as a unions of data marts. The data warehouse is the collections of data marts combine into one central repository. diagram 2.1 illustrates the differences between Inmons and Kimballs data warehouse architecture adapted from Mailvaganam, H. (2007) diagram 2.1 Inmons and Kimballs Data Warehouse Architecture (adapted from Mailvaganam, 2007)In this work, it is very important to identify which data warehouse architecture that is robust and scalable in terms of building and deploying enterprise wide systems. According to Laney (2000) and Watson, H. (2005), it is important to understand and select the approp riate data warehouse architecture and the success of the various architectures acclaimed by Watson. Analysis of this research proved that the most usual data warehouse architecture is hub-and-spoke proposed by Inmon as it is a centralized data warehouse with dependant data marts and second is the data mart bus architecture with dimensional data marts proposed by Kimball. The selection of the new proposed model will use the combination data warehouse architecture of hub-and-spoke and data mart bus architecture as the new proposed model data warehouse architecture are designed with centralized data warehouse and with data marts that can are used for multidimensional database modelling.2.2.2 DATA WAREHOUSE EXTRACT, TRANSFORM, LOADINGData warehouse architecture begins with extract, transform, loading (ETL) process to regard the data passes the quality threshold. According to Evin (2001), it is essential that right data are important and critical for the success on an enterprise. ETL a re an important tool in data warehouse environment to ensure data in the data warehouse are cleansed from various systems and locations. ETLs are also responsible for running scheduled tasks that extract data from OLTPs. Typically, a data warehouse is populated with historical information from within a grumpy organization (Bunger, C. J et al., 2001). The complete process descriptions of ETL are discussed in plug-in 2.3.Table 2.3 Extract, Transform, and Load butt in Data Warehouse architectureProcessDescriptionsExtractExtract are the first process which involve in moving data from operational databases into database staging area or operational data store before populating into the data warehouse. In this stage, operational databases data need to be examined by extracting into the staging area for handling exceptions and fix all errors before it enters into data warehouse as this will indite lots of time when loading into the data warehouse.TransformIn completion of data extractio n in database staging area, it is then transform to ensure data integrity within the data warehouse. Transformation of data can be done in several methods such(prenominal) as filed mapping and algorithm comparisons.LoadAfter extract and transform of data, it is finally loaded into data warehouse (in Inmons model) or into data marts (in Kimballs model). Data loaded into data warehouse are quality data afterwards the process of extraction where chimerical data are removed and data are transform to ensure integrity of the data.Calvanese, D. et al. (2001) highlight an enterprise data warehouse database tables may be populated with a wide variety of data sources from different locations and often including data providing information concerning a competitor business. Collecting all the different data and storing it in one central location is an extremely challenging task where ETL can make it possible. ETL process as depicts in Diagram 2.2 begins with data extract from operational datab ases where data cleansing and scrubbing are done, to ensure all datas are validated. Then it is transformed to meet the data warehouse standards before it is loaded into data warehouse.Diagram 2.2Extract, Transport, Load ProcessG. Zhou et al.(1995) emphasise on data integration in data warehousing stress that ETL can assist in import and export of operational data between heterogeneous data sources using OLE-DB (Object linking and embedding database) based architecture where the data are transform to populate all quality data into data warehouse. This is important to ensure that there are no restrictions on the size of the data warehouse with this approach.In Kimball (2000) data warehouse architecture model depict in Diagram 2.3, the model focus in two important modules, the back room presentation server and the front room. In the back room process, where the data staging services in charge of fabrication all source systems operational databases to perform extraction of data from s ource systems from different file format from different systems and platforms. Second step is to run the transformation process to ensure all inconsistency are removed to ensure data integrity. Finally, it is loaded into data marts. The ETL processes are commonly executed from a cable control via scheduling task. The presentation server is the data warehouse where data marts are stored and process here. Data stored in star schema consist of dimension and fact tables. This is where data are then process of in the front room where it is access by query services such as reportage tools, desktop tools, OLAP and data mining tools.Diagram 2.3 Data Warehouse Architecture (adapted from Kimball, 2000)Nicola, M (2000) explains the process of retrieving data from the warehouse can vary greatly depending on the desired results. thither are many form of possible retrieval from a data warehouses and it is flexibility that will drive how this retrieving process can be implemented. There are many tools for retrieving the data warehouse, such as building simple query and reporting by means of SQL statements. The tools may expand to OLAP and data mining, where the structure includes many more third party tools. There are many inherent problems associated with data, which includes the limited amount of portability, and the often-vast amount of data that must be sifted through for each query.Essentially, ETL are mandatory for data warehouse to ensure data integrity. There are many factors to be considered such as complexity and scalability are among the two major issues that most enterprise faces by integrating information from different sources in order to have a clean and reliable source of data for mission critical business decisions. One way to achieve a scalable, non-complex solution is to postulate a hub-and-spoke architecture for the ETL process. According to Evin (2001), ETL best operates in hub-and-spoke architecture because of its flexibility and efficiency. Because of its centralized data warehouse design, it can influence the maintaining full access control of ETL processes. Also, empowers the usage of analytical and data mining tools by acquaintance workers.In this study on ETL for effective data warehouse architecture, it is known that Hub-and-spoke is best for data integration as it has the similarity in Inmon and Kimball architecture. The hub is the data warehouse after processing data from operational database to staging database and the spoke(s) are the data marts for distributing data. Inmon and Kimball also recommend same ETL processes to enable hub-and-spoke architecture. Sherman, R (2005) state that hub-and-spoke approach uses one-to-many interfaces from Data warehouse to many data marts. One-to-many are simpler to implement, cost effective in a long run and ensure consistent dimensions. Compared to many-to-many approach it is more complicated and pricy.In this work on the new proposed model, hub-and-spoke architecture are use as C entral repository service, as many scholars including Inmon, Kimball, Evin, Sherman and Nicola adopt to this data warehouse architecture. This approach allows positioning the hub (data warehouse) and spokes (data marts) centrally and can be distributed across local or wide area network depending on business requirement. In intent the new proposed model, the hub-and-spoke architecture shortly identifies six important data warehouse components that a data warehouse should have, which includes ETL, Staging Database or operational database store, Data marts, multidimensional database, OLAP and data mining end users applications such as Data query, reporting, analysis, statistical tools. However, this process may differ from organization to organization. Depending on the ETL setup, some data warehouse may overwrite old data with new data and in some data warehouse may only maintain history and audit running of all changes of the data. Diagram 2.4 depicts the concept of the new propos ed model data warehouse architecture.Diagram 2.4 New Proposed pretence Data Warehouse Architecture2.2.3 DATA WAREHOUSE FAILURE AND SUCCESS FACTORSBuilding a data warehouse is indeed challenging as data warehouse project inheriting a ridiculous characteristic that may impact the overall setup if the analysis, design and implementation phase are not ripely done. In this research effort, it discusses the studies on failure and success factors in data warehouse project. Section focuses on the investigation on data warehouse project failure and section discuss and canvas mainly on the success factors by implementing the correct model to support a successful data warehouse project implementation. DATA WAREHOUSE FAILURE FACTORSHayen, R.L. (2007) studies shows that implementing a data warehouse project is costly and risky as a data warehouse project can cost over $1 million in the first year. It is estimated that one-half ad two-thirds of the effort of setting up the data warehouse projects attempt will fail eventually. Hayen R.L. (2007) citied on the work of Briggs (2002) and noticed three factors for the failure of data warehouse project that is Environment, Project and Technical factors as shown in Diagram 2.5 and table 2.4 discussed the factors in more details.Diagram 2.5 Factors for Data Warehouse Failures (adapted from Briggs, 2002)Table 2.4 Factors for Data Warehouse Failures (adapted from Briggs, 2002)FactorsDescriptionsEnvironmentThis leads to organization changes in business, politics, mergers, takeovers and lack of top management support. Also, including human error, corporate culture, decision making and change management.TechnicalTechnical factors of a data warehouse project complexity and workload are taken too lightly where high expenses involving in computer hardware/software and people. Problems occurred when assigning a Project coach with lack of knowledge and project own in data warehouse costing may lead to impediment o f quantifying the return on enthronement (ROI). Also, failure of managing a data warehouse projects also includes Challenge in setting up a competent operational and development team plus not having a data warehouse manager or expert that is politically sound. Having an extended timeframe for development and delivery of data warehouse system may due to lack of experience and knowledge for selection of data warehouse products and end-user tools.* Failure to manage the field of data warehouse project.ProjectPoor knowledge on the requirements of data definitions and data quality on different organization business departments. Also, Running a data warehouse projects with incompetent and insufficient knowledge in what technology to use may lead into problems later on data integration, data warehouse model and data warehouse applications.Vassiliadis (2004) studies shows that data warehouse project failures are an enormous threat and be by factors such as design, technical, procedural a nd socio-technical as illustrated in Diagram 2.6. These factors of failures are vital in finding any unwanted action for success. Each factor group is described in table 2.5.Diagram 2.6 Factors for Data Warehouse Failures (adapted from Vassiliadis, 2007)Table 2.5 Factors for Data Warehouse Failures (adapted from Vassiliadis, 2007)FactorsDescriptionsDesignDesign factors in data warehouse project can put up with No Standard techniques or design methodologies. A data warehouse project when doing the analysis and design phase may accept ideas on metadata techniques or languages and data engineering techniques. Also, a proprietary solutions and also recommendations from vendors or in-house experts may define the design of the data warehouse blueprint landscape.TechnicalTechnical factors associate to the lack of know-how experience in evaluation and choices of hardware setup for data warehouse systemsProceduralProcedural factors concerning on the imperfection of data warehouse deployment. This factor focuses on training the end-users extensively on the new technology and the design of data warehouse which are completely different than the conventional IT solutions. users communities plays a vital role and are crucial in this factor.Socio-TechnicalSocio-technical factors in a data warehouse project may lead into problems on violation of organization modus operandi where the data warehouse systems will lead into restructuring or reorganization on the way organization operates by introducing changes to the user community.According to Vassiliadis (2007) also, another potential factors for the failure of data warehouse projects are the data ownership and access. This is considered vulnerable within the organization and one mustnt share nor acquire someone else data as this is equal with losing authority on the data ownership and access. Also, restrict any departmental declaration or asking to own a total ownership of pure clean and error free data as this might cause p otential problem on ownership data rights.Watson (2004) stress that the general factors for the failures in data warehouse project success comprises of weak sponsorship and top management support, inadequate funding and users society and organizational politic. DATA WAREHOUSE SUCCESS FACTORSData Warehouse Failures can lead into disastrous implementation if careful factors or measures are not taken into serious considerations as discussed in section based on Briggs (2002) and Vassiliadis (2004) studies that may lead into data warehouse failures. According to Hwang M.I. (2007), data warehouse implementations are an important area of research and industrial practices but only few researches made an assessment in the critical success factors for data warehouse implementations. No doubt there is procedure for data warehouse design and implementation but only certain guidelines are subjected for experimental testing. So, it is best to decide and choose the proper data ware house model for implementation success.In this study on identifying and filling the gap analysis of the data warehouse success factors, a number of success factors are gathered from data warehouse scholars and professionals (Watson Haley, 1997 Chen et al., 2000 Wixom Watson, 2001 Watson et al., 2001 Hwang Cappel, 2002 shinny, 2003) to validate their experimental work and research strength individually on various characteristics of data warehouse success. This study beneficial in planning and implementing data warehouse projects and direct into the success of designing and implementing the new proposed model in this research.There are several success factors in designing and implementing data warehouse solutions and the most important success factors depend on the data warehouse model selection, as different organization may have different scope and road maps in the development of data warehouse. The results of building a successful data warehouse are then used to quantify the fa ctors that are used and also prioritize those factors that are beneficial for continued research purpose to improve and deepen the data warehouse model success.According to Hayen, R.L. (2007), data warehouse is a complex system which can complicate business procedures. The complexity of data warehouse prevents companies from changing data or transaction which are necessary. It is important then to analyze on which data warehouse model to be used for such complex systems that are sound critical to an organization. Hwang M.I. (2007) conducted a study on data warehousing model and success factors as a critical area of practice and research but only a few studies have been accomplish to measure the data warehouse projects and success factors.Many scholars had conducted a thoughtful research in the area of data warehouse and may have succeeded or failed due to possible reasons based on each scholars outcomes on the research. It is useful inspect a few case studies on a selected compani ed data warehouse implementation and to experiment the failure and success factors through survey. (Winter, 2001 Watson et al., 2004)Hwang M. I. (2007) conducted a survey study on six data warehouse scholars (Watson Haley, 1997 Chen et al., 2000 Wixom Watson, 2001 Watson et al., 2001 Hwang Cappel, 2002 Shin, 2003) on the success factors in a data warehouse project. Each scholar has different success factors that are measures in a project. Table 2.6 shows the mentioned six scholars survey study on data warehouse, Watson (1997) measures data warehouse success factors, Chen et al. (2000), Watson et al. (2001) and Shin (2003) measures data warehouse implementation factors and Hwang (2002) measures through development and management practices. Only Wixom (2001) as shown in diagram 2.7 measures both Data warehouse implementation and success factors which can be used as a model for a successful data warehouse implementation. Study shown in all 6 scholars review, without having data ware house implementation and success factors, the consequences of any factors on a data warehouse success cannot be validated.Table 2.6 Factors for Data Warehouse Success (adapted from Hwang M.I., 2007)StudyData Warehouse Success FactorsData Warehouse Implementation FactorsResults ReportedWatson Haley (1997)Focus on user involvement and support by having a clear and understandable business needs. Using methodology and modelling methods in data warehouse by targeting on clean data. Thus, support from upper management to contribute on the success.N/AOrdered angle of dip of successChen et al.(2000)N/AFocused on exactness and preciseness of User mirth by Support and realization of end users needs.Support for end users affects user satisfactionWixom Watson (2001)Implementation factors include management support, resources, User participation, team skills, Source systemsaand development technology which contribute to the implementatio

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.