data warehouses are formed on the basis of fixed over a long period of time snapshots of databases of operational information system and possibly various external sources. Data warehouses use database technologies, OLAP, data mining, data visualization.

Main characteristics of data warehouses.

  • contains historical data;
  • stores detailed information, as well as partially and completely summarized data;
  • the data is mostly static;
  • unregulated, unstructured and heuristic way data processing;
  • medium and low intensity of transaction processing;
  • unpredictable way of using data;
  • designed for analysis;
  • focused on subject areas;
  • support for strategic decision making;
  • serves a relatively small number of executives.

The term OLAP (On-Line Analytical Processing) is used to describe the data presentation model and, accordingly, the technology of their processing in data warehouses. OLAP uses a multidimensional view of aggregated data to provide quick access to strategically important information for in-depth analysis purposes. OLAP applications should have the following basic properties:

  • multidimensional data presentation;
  • support for complex calculations;
  • correct consideration of the time factor.

Advantages of OLAP:

  • promotion performance production staff, developers application programs. Timely access to strategic information.
  • giving users enough power to make their own changes to the schema.
  • OLAP applications rely on data warehouses and OLTP systems, receiving up-to-date data from them, which saves integrity control corporate data.
  • reducing the load on OLTP systems and data warehouses.

OLAP and OLTP. Characteristics and main differences

OLAP OLTP
Data store should include both internal corporate data and external data the main source of information entering the operational database is the activities of the corporation, and data analysis requires the involvement of external sources of information (for example, statistical reports)
The volume of analytical databases is at least an order of magnitude larger than the volume of operational ones. for reliable analysis and forecasting in data store you need to have information about the activities of the corporation and the state of the market for several years For operational processing, data for the last few months is required
Data store should contain uniformly presented and agreed information that is as close as possible to the content of operational databases. A component is needed to extract and "clean" information from various sources. In many large corporations, there are several operational information systems with their own databases at the same time (for historical reasons). Operational databases may contain semantically equivalent information presented in different formats, with different indications of the time of its receipt, sometimes even contradictory
The set of queries against an analytical database cannot be predicted. data warehouses exist to respond to ad hoc analyst requests. You can only count on the fact that requests will not come too often and affect large amounts of information. Analytical database sizes encourage the use of queries with aggregates (sum, min, max, mean etc.) Data processing systems are designed to solve specific problems. Information from the database is selected frequently and in small portions. Usually, a set of queries to the operational database is already known during design
With a small variability of analytical databases (only when loading data), the ordering of arrays turns out to be reasonable, more quick methods indexing for mass sampling, storage of pre-aggregated data Data processing systems by their nature are highly variable, which is taken into account in the used DBMS (normalized database structure, rows are stored out of order, B-trees for indexing, transactionality)
The information of analytical databases is so critical for the corporation that a large granulation of protection is required (individual access rights to certain rows and / or columns of the table) For data processing systems, it is usually enough information protection at the table level

Codd rules for OLAP systems

In 1993, Codd published OLAP for Analyst Users: What It Should Be Like. In it, he outlined the basic concepts of online analytical processing and identified 12 rules that products must satisfy in order to provide online analytical processing.

  1. Conceptual multidimensional representation. An OLAP model must be multidimensional at its core. A multidimensional conceptual diagram or custom representation facilitates modeling and analysis as well as calculations.
  2. Transparency. The user is able to get all the necessary data from the OLAP machine, without even suspecting where they come from. Regardless of whether the OLAP product is part of the user's tools or not, this fact should be invisible to the user. If OLAP is provided by client-server computing, then this fact should also, if possible, be invisible to the user. OLAP should be delivered in the context of a truly open architecture, allowing the user, wherever they are, to communicate with the server using an analytical tool. In addition, transparency must also be achieved when the analytical tool interacts with homogeneous and heterogeneous database environments.
  3. Availability. OLAP must provide its own logic diagram to access in a heterogeneous database environment and perform appropriate transformations to provide data to the user. Moreover, it is necessary to take care in advance about where and how, and what types of physical data organization will actually be used. An OLAP system should only access data that is really needed, and not apply general principle"kitchen funnel" that entails unnecessary input.
  4. Constant performance when developing reports. Performance reporting should not drop significantly with the growth of the number of dimensions and the size of the database.
  5. Client-server architecture. Not only is the product required to be a client/server product, but the server component is also required to be intelligent enough that different clients can connect with a minimum of effort and programming.
  6. General multidimensionality. All dimensions must be equal, each dimension must be equivalent both in structure and in operational capabilities. True, additional operational possibilities are allowed for individual measurements (apparently, time is implied), but such additional functions should be given to any dimension. It should not be so that the basic data structures, computational or reporting formats were more specific to one dimension.
  7. Dynamic control sparse matrices. OLAP systems should automatically adjust their physical schema based on model type, data volumes, and database sparseness.
  8. Multi-User Support. OLAP tool must provide the ability sharing(request and addition), integrity and security.
  9. Unlimited cross operations. All kinds of operations must be allowed for any measurements.
  10. Intuitive data manipulation. Data manipulation was carried out through direct actions on cells in view mode without using menus and multiple operations.
  11. Flexible reporting options. The measurements should be placed in the report in the way the user wants.
  12. Unlimited

OLAP (OnLine Analytical Processing) is not the name of a specific product, but of an entire online analytical processing technology that involves data analysis and reporting. The user is provided with a multidimensional table that automatically summarizes the data in various sections and allows you to quickly manage the calculations and the form of the report.

Although in some publications analytical processing is called both online and interactive, the adjective "online" most accurately reflects the meaning of OLAP technology. The development of managerial management decisions falls into the category of areas most falsely amenable to automation. However, today there is an opportunity to assist the manager in the development of decisions and, most importantly, to significantly speed up the process of developing decisions, their selection and adoption.

Decision support systems usually have the means to provide the user with aggregate data for various samples from the initial set in a form convenient for perception and analysis. As a rule, such aggregate functions form a multidimensional data set, often called a hypercube or metacube, whose axes contain parameters, and the cells contain aggregate data that depend on them - and such data can also be stored in relational tables, but in this case we are talking about a logical organization data, and not about the physical implementation of their storage.

Along each axis, the data can be organized into a hierarchy representing different levels of detail.

According to the dimensions in the multidimensional model, factors that affect the activities of the enterprise are put aside (for example: time, products, company branches, etc.). The resulting OLAP-cube is then filled with indicators of the enterprise's activity (prices, sales, plan, profits, cash flow, etc.). It should be noted that, unlike a geometric cube, the faces of an OLAP cube do not have to have the same size. This filling can be carried out as with real data operating systems, and predicted based on historical data. Hypercube dimensions can be complex, hierarchical, and relationships can be established between them. During the analysis, the user can change the point of view on the data (the so-called operation of changing the logical view), thereby viewing the data in different sections and solving specific problems. Various operations can be performed on cubes, including forecasting and conditional scheduling (what-if analysis).

Thanks to this data model, users can formulate complex queries, generate reports, and receive subsets of data. Operational analytical processing can significantly simplify and speed up the process of preparing and making decisions by management personnel. Online analytical processing serves the purpose of turning data into information. It is fundamentally different from the traditional decision support process, which is based, most often, on the consideration of structured reports.


OLAP technology refers to the type of intellectual analysis and involves 12 principles:

1. Conceptual multidimensional representation. The user-analyst sees the world of the enterprise as multidimensional in nature, respectively, and the OLAP model must be multidimensional at its core.

2. Transparency. The architecture of the OLAP system should be open, allowing the user, wherever he is, to communicate using an analytical tool - the client - with the server.

3. Availability. An OLAP analyst user must be able to perform analysis based on a common conceptual schema containing enterprise-wide data in a relational database as well as data from legacy legacy databases, on common access methods, and on a common analytical model. An OLAP system should access only the data that is actually needed, and not apply the general "kitchen funnel" principle that entails unnecessary input.

4. Consistent performance in report development. With an increase in the number of dimensions or the size of the database, the analyst user should not experience a significant decrease in performance.

5. Client-server architecture . Most of the data that today needs to be subjected to online analytical processing is contained on mainframes with access to user workstations via LAN. This means that OLAP products must be able to work in a client-server environment.

6. General multidimensionality. Each dimension should be applied regardless of its structure and operational capabilities. The underlying data structures, formulas, and reporting formats should not be biased towards any one dimension.

7. Dynamic management of sparse matrices. The physical design of an OLAP tool must be fully adaptable to the specific analytic model in order to optimally manage sparse matrices. Sparsity (measured as a percentage empty cells to all possible) is one of the characteristics of data dissemination.

8. Multi-User Support. An OLAP tool must provide the ability to share query and augment multiple analyst users while maintaining integrity and security.

9. Unlimited cross operations. Various operations, due to their hierarchical nature, can represent dependent relationships in the OLAP model, that is, they are cross-functional. Their execution should not require the analyst user to redefine these calculations and operations.

10. Intuitive data manipulation. The analyst user's view of the dimensions defined in the analytic model must contain all the necessary information to perform actions on the OLAP model, i.e. they should not require the use of a menu system or other multiple user interface operations.

11. Flexible reporting options. Reporting tools should be synthesized data or information resulting from the data model in any possible orientation. This means that the rows, columns, or pages of a report must display multiple dimensions of an OLAP model at the same time, with the ability to display any subset of the elements (values) contained in the dimension, and in any order.

12. Unlimited dimension and number of aggregation levels. A study on the possible number of necessary measurements required in an analytical model showed that up to 19 measurements can be used simultaneously by an analyst user. This leads to a recommendation about the number of dimensions supported by the OLAP system. Moreover, each of the common dimensions should not be limited by the number of levels of aggregation defined by the user-analyst.

As specialized OLAP systems currently offered on the market, you can specify CalliGraph, Business Intelligence.

To solve simple data analysis tasks, it is possible to use a budget solution - Microsoft Excel and Access office applications, which contain elementary OLAP technology tools that allow you to create pivot tables and build various reports based on them.

aim term paper is the study of OLAP technology, the concept of its implementation and structure.

AT modern world computer networks and computing systems allow you to analyze and process large amounts of data.

A large amount of information greatly complicates the search for solutions, but makes it possible to obtain much more accurate calculations and analysis. To solve this problem, there is whole class information systems performing the analysis. Such systems are called decision support systems (DSS) (DSS, Decision Support System).

To perform the analysis, the DSS must accumulate information, having the means of its input and storage. In total, there are three main tasks solved in the DSS:

· data input;

· data storage;

· data analysis.

Data entry into the DSS is carried out automatically from sensors characterizing the state of the environment or process, or by a human operator.

If data is entered automatically from sensors, then data is accumulated by a ready signal that occurs when information appears or by cyclic polling. If the input is done by a human, then they should provide users with convenient means for entering data that checks them for the correctness of the input, as well as perform the necessary calculations.

When entering data simultaneously by several operators, it is necessary to solve the problems of modification and parallel access of the same data.

DSS provides analysts with data in the form of reports, tables, graphs for study and analysis, which is why such systems provide decision support functions.

In data entry subsystems, called OLTP (On-linetransactionprocessing), operational data processing is implemented. For their implementation, conventional database management systems (DBMS) are used.

The analysis subsystem can be built on the basis of:

· subsystems of information retrieval analysis based on relational DBMS and static queries using the SQL language;

· operational analysis subsystems. To implement such subsystems, the technology of online analytical data processing OLAP is used, using the concept of multidimensional data representation;

· intellectual analysis subsystems. This subsystem implements DataMining methods and algorithms.

From the user's point of view, OLAP-systems provide a means of flexible viewing of information in various sections, automatic receipt of aggregated data, performing analytical operations of convolution, detailing, comparison over time. Thanks to all this, OLAP systems are a solution with great advantages in the field of data preparation for all types of business reporting, involving the presentation of data in various sections and different levels hierarchies, such as sales reports, various forms of budgets, and others. OLAP systems have great advantages of such representation in other forms of data analysis, including forecasting.

1.2 Definition OLAP-systems

The technology of complex multidimensional data analysis is called OLAP. OLAP is a key component of a data warehouse organization.

OLAP functionality can be implemented in a variety of ways, as simple as analyzing data in office applications, and more complex - distributed analytical systems based on server products.

OLAP (On-LineAnalyticalProcessing) is a technology for online analytical data processing that uses tools and methods for collecting, storing and analyzing multidimensional data and in order to support decision-making processes.

The main purpose of OLAP systems is to support analytical activities, arbitrary requests from analyst users. The purpose of OLAP analysis is to test emerging hypotheses.

The use of an OLAP system allows you to automate the strategic level of organization management. OLAP (Online Analytical Processing - analytical data processing in real time) is a powerful technology for processing and researching data. Systems built on the basis of OLAP technology provide almost unlimited possibilities for compiling reports, performing complex analytical calculations, building forecasts and scenarios, and developing a variety of plan options.

Full-fledged OLAP systems appeared in the early 90s as a result of the development of decision support information systems. They are designed to transform various, often disparate, data into useful information. OLAP systems can organize data according to some set of criteria. However, it is not necessary that the criteria have clear characteristics.

OLAP systems have found their application in many issues of strategic management of an organization: business performance management, strategic planning, budgeting, development forecasting, financial reporting, work analysis, simulation of the external and internal environment of the organization, data storage and reporting.

Structure of an OLAP system

The operation of the OLAP system is based on the processing of multidimensional data arrays. Multidimensional arrays are arranged in such a way that each element of the array has many relationships with other elements. In order to form a multidimensional array, an OLAP system must obtain input data from other systems (eg ERP or CRM systems), or through external input. The user of the OLAP system receives the necessary data in a structured form in accordance with his request. Based on the specified procedure, you can imagine the structure of the OLAP system.

In general, the structure of an OLAP system consists of the following items:

  • database . The database is the source of information for the operation of the OLAP system. The type of database depends on the type of OLAP system and the algorithms of the OLAP server. As a rule, relational databases, multidimensional databases, data warehouses, etc. are used.
  • OLAP server. It provides management of the multidimensional data structure and the relationship between the database and users of the OLAP system.
  • custom applications. This element of the OLAP system structure manages user requests and generates the results of accessing the database (reports, graphs, tables, etc.)

Depending on the method of organizing, processing and storing data, OLAP systems can be implemented on local computers users or using dedicated servers.

There are three main ways to store and process data:

  • locally . The data is hosted on users' computers. Processing, analysis and data management is performed at local workplaces. This structure of the OLAP system has significant drawbacks associated with the speed of data processing, data security and the limited use of multidimensional analysis.
  • relational databases. These databases are used when an OLAP system works together with a CRM system or an ERP system. Data is stored on the server of these systems in the form of relational databases or data warehouses. The OLAP server accesses these databases to form the necessary multidimensional structures and perform analysis.
  • multidimensional databases. In this case, the data is organized as a special data warehouse on a dedicated server. All data operations are performed on this server, which converts the original data into multidimensional structures. Such structures are called OLAP cubes. Data sources for forming an OLAP cube are relational databases and/or client files. The data server performs preliminary preparation and processing of data. OLAP server works with OLAP cube without direct access to data sources (relational databases, client files, etc.).

Types of OLAP systems

Depending on the method of data storage and processing, all OLAP systems can be divided into three main types.


1. ROLAP (Relational OLAP - relational OLAP systems) - this type of OLAP system works with relational databases. The data is accessed directly to the relational database. The data is stored in relational tables. Users have the ability to perform multidimensional analysis as in traditional OLAP systems. This is achieved through the use SQL tools and special requests.

One of the benefits of ROLAP is the ability to more efficiently process large amounts of data. Another advantage of ROLAP is the ability to efficiently process both numeric and textual data.

The disadvantages of ROLAP include low performance (compared to traditional OLAP systems), because data processing is carried out by the OLAP server. Another disadvantage is the limitation of functionality due to the use of SQL.


2. MOLAP (Multidimensional OLAP - multidimensional OLAP systems). This type of OLAP systems belongs to traditional systems. The difference between a traditional OLAP system and other systems lies in the preliminary preparation and optimization of data. These systems, as a rule, use a dedicated server on which the data is pre-processed. The data is generated in multidimensional arrays- OLAP cubes.

MOLAP systems are the most efficient in data processing. they make it easy to reorganize and structure data for different user requests. MOLAP analytical tools allow you to perform complex calculations. Another advantage of MOLAP is the ability to quickly generate queries and get results. This is ensured by the preliminary formation of OLAP cubes.

The disadvantages of the MOLAP system include the limitation of the volume of processed data and data redundancy, because for the formation of multidimensional cubes, in various aspects, the data has to be duplicated.


3. HOLAP (Hybrid OLAP - hybrid OLAP systems). Hybrid OLAP systems are a combination of ROLAP and MOLAP systems. Hybrid systems have tried to combine the advantages of two systems: the use of multidimensional databases and relational database management. HOLAP systems allow you to store a large amount of data in relational tables, and the processed data is placed in pre-built multidimensional OLAP cubes. The advantages of this kind of system are data scalability, fast data processing, and flexible access to data sources.

There are other types of OLAP systems, but they are more of a marketing move by manufacturers than an independent type of OLAP system.

These types include:

  • WOLAP (Web OLAP). View of the OLAP system with support web interface. These OLAP systems have the ability to access databases through a web interface.
  • DOLAP (Desktop OLAP). This type of OLAP system allows users to download a database to a local workplace and work with it locally.
  • MobileOLAP. This is a feature of OLAP systems that allows you to work with the database remotely using mobile devices.
  • SOLAP (Spatial OLAP). This type of OLAP systems is designed to process spatial data. It appeared as a result of the integration of geographic information systems and OLAP systems. These systems allow you to process data not only in alphanumeric format, but also in the form of visual objects and vectors.

Benefits of an OLAP system

The use of an OLAP system gives the organization the ability to predict and analyze various situations related to current activities and development prospects. These systems can be seen as complementary to enterprise-level automation systems. All the advantages of OLAP systems directly depend on the accuracy, reliability and volume of the source data.

The main advantages of the OLAP system are:

  • consistency background information and analysis results. In the presence of an OLAP system, it is always possible to trace the source of information and determine the logical relationship between the results obtained and the source data. The subjectivity of the analysis results is reduced.
  • conducting multivariate analysis. The use of an OLAP system allows you to get many scenarios for the development of events based on a set of initial data. Due to the analysis tools, it is possible to model situations according to the “what will happen if” principle.
  • detail control. The detail of the presentation of the results may vary depending on the needs of users. In this case, there is no need to carry out complex system settings and repeat calculations. The report may contain exactly the information that is necessary for decision making.
  • revealing hidden dependencies. By building multidimensional relationships, it becomes possible to identify and identify hidden dependencies in various processes or situations that affect production activities.
  • creation of a single platform. Through the use of an OLAP system, it becomes possible to create a single platform for all forecasting and analysis processes in an enterprise. In particular, OLAP system data is the basis for building budget forecasts, sales forecasts, purchase forecasts, strategic development plans, etc.

The concept of multidimensional data analysis is closely related to operational analysis, which is performed by means of OLAP systems.

OLAP (On-Line Analytical Processing) is a technology for online analytical data processing that uses methods and tools for collecting, storing and analyzing multidimensional data in order to support decision-making processes.

The main purpose of OLAP systems is to support analytical activities, arbitrary (the term ad-hoc is often used) requests from analyst users. The purpose of OLAP analysis is to test emerging hypotheses.

At the origins of OLAP technology is the founder of the relational approach E. Codd. In 1993, he published an article titled "OLAP for Analyst Users: What It Should Be". This paper outlines the basic concepts of online analytical processing and identifies the following 12 requirements that must be met by products that allow online analytical processing. Tokmakov G.P. Database. Database concept, relational data model, SQL languages. S. 51

Listed below are the 12 rules that Codd outlined that define OLAP.

1. Multidimensionality - OLAP-system at the conceptual level should present data in the form of a multidimensional model, which simplifies the processes of analysis and perception of information.

2. Transparency - An OLAP system should hide from the user the real implementation of a multidimensional model, the method of organization, sources, processing and storage tools.

3. Availability -- An OLAP system must provide the user with a single, consistent, and consistent data model, allowing access to data regardless of how or where it is stored.

4. Consistent Report Development Performance -- The performance of OLAP systems should not decrease significantly as the number of dimensions being analyzed increases.

5. Client-Server Architecture -- An OLAP system must be able to work in a client-server environment, as most of the data that today needs to be subjected to online analytical processing is stored distributed. The main idea here is that the server component of the OLAP tool should be intelligent enough to allow building a common conceptual schema based on the generalization and consolidation of various logical and physical corporate database schemas to provide a transparent effect.

6. Equality of dimensions -- An OLAP system must support a multidimensional model in which all dimensions are equal. If necessary additional characteristics can be given to individual dimensions, but this option must be given to any dimension.

7. Dynamic management of sparse matrices -- An OLAP system must provide optimal handling of sparse matrices. The access rate must be maintained regardless of the location of the data cells and be a constant value for models with a different number of dimensions and a different degree of data sparseness.

8. Support for multi-user mode - OLAP-system should provide the ability to work with multiple users together with one analytical model or create different models for them from a single data. At the same time, both reading and writing data are possible, so the system must ensure their integrity and security.

9. Unlimited cross operations -- An OLAP system must ensure that the functional relationships described using a certain formal language between hypercube cells are preserved when performing any slice, rotate, consolidate, or drill down operations. The system must independently (automatically) perform the conversion established relationships without requiring the user to override them.

10. Intuitive Data Manipulation -- An OLAP system must provide a way to perform slice, rotate, consolidate, and drill operations on a hypercube without the user having to do a lot of user interface work. The dimensions defined in the analytical model must contain all the necessary information to perform the above operations.

11. Flexible reporting options -- OLAP system must support various ways data visualization, i.e. reports should be submitted in every possible orientation. Reporting tools should represent synthesized data or information resulting from the data model in any possible orientation. This means that rows, columns or pages should show from 0 to N dimensions at the same time, where N-- number measurements of the entire analytical model. In addition, each content dimension shown in a single post, column, or page must allow any subset of the elements (values) contained in the dimension to be shown in any order.

12. Unlimited dimensionality and number of levels of aggregation - A study on the possible number of required dimensions required in an analytical model showed that up to 19 dimensions can be used simultaneously. Hence the strong recommendation that the analytical tool be able to provide at least 15 and preferably 20 measurements simultaneously. Moreover, each of the common dimensions should not be limited by the number of levels of aggregation and consolidation paths defined by the user-analyst.

Additional rules of Codd.

The set of these requirements, which served as the de facto definition of OLAP, quite often causes various criticisms, for example, rules 1, 2, 3, 6 are requirements, and rules 10, 11 are unformalized wishes. Tokmakov G.P. Database. Database concept, relational data model, SQL languages. P. 68 Thus, the listed 12 requirements of Codd do not allow you to accurately define OLAP. In 1995, Codd added the following six rules to the list:

13. Batch extraction vs. interpretation -- An OLAP system must be equally efficient at providing access to both internal and external data.

14. Support for all OLAP analysis models -- An OLAP system must support all four data analysis models defined by Codd: categorical, interpretative, speculative, and stereotypical.

15. Handling denormalized data -- An OLAP system must be integrated with denormalized data sources. Modifications to data made in an OLAP environment should not result in changes to data stored in the original external systems.

16. Saving OLAP results: keeping them separate from the original data -- An OLAP system operating in read-write mode, after modifying the original data, must save the results separately. In other words, the security of the source data is ensured.

17. Exclusion of missing values-- When presenting data to the user, an OLAP system must discard all missing values. In other words, missing values ​​must be different from null values.

18 Handling Missing Values ​​-- An OLAP system must ignore all missing values, regardless of their source. This feature is related to the 17th rule.

In addition, Codd broke all 18 rules into the following four groups, calling them features. These groups were named B, S, R and D.

The main features (B) include the following rules:

Multidimensional conceptual representation of data (rule 1);

Intuitive data manipulation (rule 10);

Availability (rule 3);

Batch extraction versus interpretation (rule 13);

Support for all OLAP analysis models (rule 14);

Architecture "client-server" (rule 5);

Transparency (rule 2);

Multiplayer support (rule 8)

Special Features (S):

Processing of non-normalized data (rule 15);

Saving OLAP results: storing them separately from the original data (rule 16);

Exclusion of missing values ​​(rule 17);

Handling missing values ​​(rule 18). Reporting features (R):

Flexibility of generating reports (rule 11);

Report performance standard (rule 4);

Automatic physical layer configuration (modified original rule 7).

Measurement control (D):

Universality of measurements (rule 6);

Unlimited number of dimensions and aggregation levels (rule 12);

Unlimited operations between dimensions (rule 9).