Predictive Analytics and Data Mining Tools
Predictive mining tools are a specific kind of tool that mines historical data for the purpose of predictive or statistical analysis to find the probability of a future event, patterns, and direction of trends. Another use case for data mining tools is for fraud detection, root cause analysis, anti-money laundering (AML), segmentation, scoring, and market basket (MBA) analytics.
Traditionally, statisticians have extracted sample data through files. Processes, models, and required analyses were run in a data warehouse. However, in modern times, the existence of multiple data warehouses and data marts presents a lack of interoperability and integration among different systems and departments. Hence, DB vendors and BI vendors are providing tight coupling with analytical processing. Available standard models utilize user interfaces and produce statistical reports and charts. Modern BI tools integrate analytical processing and storage.
There is a process to create predictive models and machine learnings. ML-Ops (machine learning operations) enabled tools are used to build Machine learning algorithms and operationalize these tools. Then these are run for periodic file extraction or to provide access to data for self-service analytics.
Advanced Visualization and Discovery Tools
Business users are required to perform analysis through two means: either through visualization tools or through direct access to a database.
Modern visualization tools utilize in-memory architecture to interact with large amounts of data in a visual way. It is hard to find patterns in a row and column–style dataset. A pattern can be picked up visually in a better way either by querying the data or by using graphs, charts, etc. for quick analysis without manual coding.
The best practices and standards for visualization include directly interacting and analyzing through visualization as compared to a tabular data display. The high degree of sophisticated analysis and visualization are some of the reasons to adopt advanced visualization and discovery tools.
Figure 7-5 demonstrates the classification of different kinds of tools, like published reports, scorecards, interactive reports, OLAP, spreadsheets, dashboards, embedded BI, system reporting, and statistics reports on factors like information width, depth, and complexity of use.
Figure 7-5. Business intelligence tools classification
Three classical implementation approaches are as follows:
• ROLAP: Relational online analytical processing
• MOLAP: Multi-dimensional online analytical processing
• HOLAP: Hybrid online analytical processing Capabilities of modern BI tools are as follows:
• Analytics catalog: Easy to display and search analytic content to make it easy to find and consume
• Automated insights: Ability to use machine learning (ML) techniques to generate automatic insights for end users
• Collaboration: Capability to collaborate with broad spectrum of users, like development and end users, internally or externally Data science integration includes the following:
• Ability to interconnect with wide variety of data sources—structured,unstructured—irrespective of location of data source, whether on-premises or in the cloud
• This involves creation of and support for highly interactive dashboards and data through manipulation with multiple visualization options, including charts, images, heat and tree maps,and more. Also the ability to combine and narrate data visualization interactively
• Ability to run reports or dashboards on ad-hoc basis or at a user-defined schedule
• Ability to use verbal or written communication using Google-like interface to run natural query (NLQ) though verbal or written communication, which enables users to ask questions about data
• Ability to share and collaborate with developers to co-produce or share final results with business users internal or external to boundary of organization
• Ability to apply machine learning (ML) techniques to generate insights for end users
• Ability to track, audit, control, and manage information (reports, dashboard, data, etc.)
• Ability to prepare data, drag-and-drop, and query through code and user interface to cater to data science and business applications
• Ability to create searchable content from catalog and ability to make recommendations The following are categories of users, which each have different needs:
• Consumer of BI: The high-quality pixels and color combination for visualization, metrics, ability to do self-analytics are important. Ability to share and collaborate with external users
• Data scientist: Integration with data science capabilities, self-analytics, ability to collaborate with other users. Connection with wide variety of sources
• Business Analyst: Ability to blend disparate data together for visual analysis
• Developer: For developers, connectivity of data sources, ability to have flexibility to develop reports through UI (user interface) or through coding. Ability for developers to have ready-made graphs, codes, or metrics