SQL Server 2012 Business Intelligence Enhancements
Courtesy James Serra, SQL Server Magazine
Microsoft understands the importance of business intelligence (BI) and is continually striving to make powerful yet easy-to-use tools for BI developers and end users. This is evident in SQL Server 2012, which has an abundance of new BI features and enhancements.
I’ll cover the most important of these new features and enhancements, broken out into the logical areas seen in a full end-to-end development of a BI solution: data integration, data management and warehousing, and end-user reporting.
Changes in the Data Integration Area
Data integration involves the extraction, transformation, and loading (ETL) of data as well as the cleaning of source data in a data mart or data warehouse. SQL Server Integration Services (SSIS), Data Quality Services (DQS), Master Data Services (MDS), and SQL Server Data Tools (SSDT) are the tools of choice for data integration. Let’s take a look at the improvements made to each of these tools.
SSIS. SSIS is at the heart of any BI project. Many new features and enhancements have been added to SSIS 2012, making it a much better ETL tool. Perhaps the most noteworthy change is the new package deployment model, which lets you build, deploy, and execute multiple SSIS packages as an SSIS project. Previously, you had to deploy each package individually.
All deployed projects are stored in a catalog—a new database named SSISDB—on the SSIS server. When you want to access a project in SSISDB, you don’t need to start a separate service. As Figure 1 shows, the SSISDB catalog appears as a node in Object Explorer when you connect to a SQL Server database instance.
Figure 1: The SSISDB Catalog
When you’re using the package deployment model, you use parameters and environments instead of package configurations to set a package’s properties. Parameters are similar to variables in SSIS 2008 R2 and SSIS 2008. You can have package parameters and project parameters. Package parameters allow you to modify a package’s execution without having to edit and redeploy the package. Project parameters are defined outside of a package and can be accessed by any package within the project. Thus, there’s no need for parent package configurations or referencing the same configuration file from multiple packages. This is an enhancement over variables, which are scoped only to a package. However, you can’t change the value of a project parameter when executing a package like you can do with a variable.
Environments are a new object type. They’re a wrapper for all environment-specific information that you want to maintain outside of a package. You can create multiple execution environments, such as one for development, another for quality assurance (QA), and a third for production. When you execute a package, you choose which environment to execute it against. You can configure environment variables to hold connection string information, such as the server name for each environment. This removes the need to use package configurations. Multiple environments can be maintained from a single SSIS instance, but most solutions will likely have an SSIS server for each environment.
Besides the package deployment model, other major improvements in SSIS 2012 include:
- A built-in logging infrastructure. With the new built-in logging infrastructure, you might not need to create your own SSIS framework. Information (e.g., what packages were executed, how long they took to run, error messages) is automatically logged to the SSISDB catalog. There are built-in reports that display all the details.
- Undo/Redo command. This highly requested feature is now available in the SSIS designer.
- Connection Managers node. The new Connection Managers node lets you share connection managers across multiple packages. Thus, you don’t need to create duplicate connections in each package.
- Version control. When you deploy a project to the SSISDB catalog, the previous version of the project is kept on the server. Accessing the prior version is as easy as right-clicking the name of a project in Object Explorer to access the Versions menu.
- Package upgrade wizard. If you want to migrate your SSIS 2008 R2 and SSIS 2008 packages to SSIS 2012, there’s a package upgrade wizard that will convert the package configuration files and parent package variables into project parameters. You can also choose to run the packages in the legacy package deployment model as is. However, legacy package deployment doesn’t allow you to use many of the features just discussed, such as SSISDB, parameters, and environments.
DQS. New to SQL Server 2012, DQS checks the quality of your data and cleans it when necessary. You can:
- Build a knowledge base and use it to perform a variety of important data quality tasks, such as data correction, enrichment, standardization, monitoring, and de-duplication.
- Perform data cleansing and other data quality tasks through cloud-based reference data services provided by numerous data providers.
- Use the profiling feature that’s integrated into DQS’s data quality tasks. This feature analyzes the integrity of your data and provides suggestions for updating it, along with the percent of confidence that the changes are correct.
Through the DQS UI, which is shown in Figure 2, you can create a knowledge base. To do so, you create domains, each of which is specific to a data field (e.g., name field, address field). For each domain, you create rules that, for example, check the data type, verify the format, and specify the allowed field length. Once your knowledge base is created, you can apply it to a data source by mapping your source columns to the domains. You then execute the rules on the data and accept or reject corrections, exporting the cleaned data to any data source.
Figure 2: The DQS UI
In any BI solution, the quality of the data is vital to the decision making process, so it’s great to see a tool added to the Microsoft BI stack that lets you capture and share knowledge to prevent wrong data and wrong decisions. Previously, you had to apply all data quality rules in SSIS. Now the rules can be in one central location, where they can be managed and re-used between multiple projects.
MDS. Master data management encompasses the technology, tools, and processes required to create and maintain a clean, consistent, and accurate view of lists of master data, such as products, customers, and employees. Microsoft’s solution for master data management is MDS, which was first introduced in SQL Server 2008 R2. With SQL Server 2012 comes a number of improvements:
- If you’ve used MDS in SQL Server 2008 R2, you know that it’s pretty difficult to navigate. In SQL Server 2012, there’s a much improved web UI built with Microsoft Silverlight. It has a much more intuitive navigation system. Figure 3 shows the improved UI.
- You can use a new Microsoft Excel add-in to administer MDS. All the master data remains centrally managed in MDS, while you leverage the familiar functionality of Excel to read, update, and add data. For example, you can use the add-in to publish data to the MDS database with the click of a button. You can even create new model objects and load data without launching any administrative tools.
- MDS integrates with DQS, so before adding more data to MDS, you can confirm that you aren’t adding duplicate records and perform data cleansing tasks. To do this, you use the MDS Excel add-in. It uses DQS to compare two sources of data: the data from MDS and the data from another system or spreadsheet. DQS then provides suggestions for updating your data, along with the percent of confidence that the changes are correct.
- The staging process, now called entity-based staging, has been completely re-architected. A new schema named stg has been created to hold staging tables that map directly with each MDS entity. These tables integrate well with SSIS, requiring no additional pivoting of the data to load into MDS. This enables data loads through SSIS to be completely automated.
- You can now install MDS while you’re installing SQL Server instead of using a separate installer, as was required in the previous release. You can use the SQL Server Installation Wizard or a command prompt to install MDS.
Figure 3: The MDS UI
SSDT. SSDT is a new SQL Server development tool designed to provide a single development environment for all database-related project types. It brings the Business Intelligence Development Studio (BIDS), the "Data Dude" tools in Microsoft Visual Studio 2010 Premium and Ultimate editions, and SQL Server Management Studio (SSMS) into the same IDE. SSDT provides functionalities for both data-tier and application-tier developers to perform all their development work against any SQL Server platform within Visual Studio.
SSDT isn’t so much a replacement for SSMS (which is focused on SQL Server management) but rather a home for when you need to do database development. The goal is to take the tasks and features that are routinely used by developers in SSMS and bring them over into SSDT. This means you can avoid Alt+Tab scenarios that kept you leaping between BIDS and SSMS in previous versions of SQL Server.
The database development tools already available in Visual Studio 2010 have been included in SSDT. All functionality currently available remains or has been enhanced. You can think of SSDT as a replacement for BIDS that uses the Database Services, Analysis Services, Reporting Services, and Integration Services project types.
Changes in the Data Management and Warehousing Area
Data management and warehousing involves storing and retrieving data. Several new technologies in SQL Server 2012—xVelocity, tabular model, and columnstore index—can provide major boosts in performance.
xVelocity. New to SQL Server 2012 is xVelocity, Microsoft’s family of in-memory and memory-optimized data management technologies. One member of this family is the xVelocity in-memory analytics engine, which is the next generation of the VertiPaq engine. The xVelocity in-memory analytics engine runs inside Microsoft PowerPivot and the new SQL Server Analysis Services (SSAS) tabular model, which is ideal for BI workloads. The other member is the new memory-optimized columnstore index, which is ideal for data warehouse workloads.
With xVelocity, you can achieve impressive performance gains for data warehouse and BI queries. These gains are achieved through xVelocity’s use of state-of-the-art compression, in-memory caching, algorithms enhanced for modern CPU/memory architecture, and highly parallel intra-query execution.
Tabular model. SQL Server 2012 introduces a new model called the Business Intelligence Semantic Model. BISM is a hybrid model that supports the existing multidimensional data model (formally called the Unified Dimensional Model) and the new tabular model. The tabular model uses xVelocity in-memory SSAS databases, which provide very fast reporting through client applications such as Excel and Microsoft Power View. It’s the same xVelocity technology that’s part of PowerPivot.
The tabular model is based on concepts familiar to anyone who has a relational database background (e.g., tables, relationships), making it easier to use than the multidimensional model. Because it uses the existing relational model, there’s usually no need to create a star schema (which usually requires using ETL to create new dimension and fact tables in a data mart or data warehouse). The tabular model uses the Data Analysis Expressions (DAX) language, which is much easier to use than the Multidimensional Expressions (MDX) language, at least for the basics.
The tabular model will likely be widely adopted for two main reasons. First, it’s faster and easier to use than the multidimensional model. Second, most types of projects lend themselves better to tabular models than multidimensional models. To help you choose the best model to use, see the Microsoft white paper "Choosing a Tabular or Multidimensional Modeling Experience in SQL Server 2012 Analysis Services."
Columnstore index. For common data warehouse queries, this new type of index provides impressive speed improvements—in the neighborhood of a 10x to 100x performance boost. The xVelocity technology makes the columnstore index more efficient than a traditional index. In a regular index, all indexed data from each row is kept together on a single page, and the data in each column is spread across all pages in an index. In a columnstore index, the data from each column is kept together so that each data page contains data only from a single column. In addition, the indexed data for each column is compressed—and because many columns often contain repetitive values, the compression ratio can be very high. This architecture reduces the number of pages in the index. It also reduces the number of pages that need scanning if you’re selecting only a few columns, which might mean that SQL Server can keep the pages in memory.
Building a columnstore index is easy. You use the same index creation syntax and specify the keyword COLUMNSTORE. But note that once you add a columnstore index to a table, the table becomes read-only, so inserts, updates, and deletes aren’t allowed. If you need to insert or update rows, you can disable the index, make the data modifications, and rebuild the columnstore index. Because of this limitation, this feature is more suitable for data warehouse tables that contain static data, where it’s acceptable for the data to be refreshed only during scheduled intervals. You can, however, use partitioning to avoid having to rebuild the index. For example, you can create a daily, weekly, or monthly partition, load the data into a new table, build all indexes, and switch the table into the partitioned table. Another option is to create a view that uses a UNION ALL operator to combine a table with a columnstore index and an updatable table without a columnstore index into one logical table. This view can then be referenced by queries. This allows dynamic insertion of new data into a single logical fact table while still retaining much of columnstore’s performance benefit.
Columnstore indexes can be of great value in a data warehouse environment. They can reduce the time spent tuning queries and creating aggregate tables.
Changes to End-User Reporting
Once you’re ready to show off the data, end-user reporting comes into play. A new tool named Power View and improvements to PowerPivot and SQL Server Reporting Services (SSRS) are helping make self-service BI a reality.
Power View. Power View is an interactive data-exploration and visual-presentation reporting tool. It offers a fun yet powerful drag-and-drop ad-hoc reporting experience. This web-based BI tool is built with Silverlight. Its features and functionality far outreach anything currently available. Users can build and format reports based on models that are deployed to the server. Each Power View report is based on either a PowerPivot model created within Excel or a tabular model created within SSDT. The model is deployed to SharePoint or SSAS, where users can create reports through the web front end. The entire report creation process is done in a browser. The report is presentation-ready at all times, meaning there isn’t a design mode and the completed report doesn’t need to be deployed. Figure 4 shows a sample Power View report.
Figure 4: Sample Power View Report
Power View requires SharePoint and SSRS running in SharePoint integrated mode (not native mode). One limitation of Power View is that it doesn’t yet work with multidimensional cubes, although Microsoft has released a Community Technology Preview (CTP) to support this. In addition, reports can be viewed only through SharePoint. However, with Excel 2013, this requirement goes away, because Power View is an add-in. There’s a restriction, though: Power View reports created in Excel 2013 can’t be viewed outside of Excel unless they’re uploaded to SharePoint.
Power View will allow more people to be involved in creating reports and will likely be the tool of choice for ad-hoc reporting. With Power View, business users (e.g., data analysts, business decision makers, information workers) can have fun interacting with the data while learning more about their business.
PowerPivot. PowerPivot 2.0 was made available with SQL Server 2012. It offers a number of new features:
- Instead of looking at defined relationships in a list format, you can view the relationships in a diagram. This makes it much easier to view how all the data fits together and to create relationships and hierarchies.
- You can create Key Performance Indicators (KPIs).
- The PATH function lets you display parent-child relationships and multi-level one-to-many relationships in a delimited format.
- You can define subsets of a model to provide a simplified view for the end user (same as in SSAS).
- You can sort one column by another column within the same table. For example, you can sort the month name column by the month number column.
PowerPivot is currently an Excel add-in that you install separately. However, PowerPivot is included with Excel 2013—you just need to enable it. In addition, many of the features that were part of the add-in are built directly into Excel 2013, such as the xVelocity in-memory analytics engine.
SSRS. Last but not least, SSRS 2012 has its own improvements, including a re-architected SharePoint mode and data alerts.
SharePoint integration has been re-architected to improve the administrator and user experience. The new architecture is implemented as a SharePoint 2010 service application, which allows SSRS to leverage many of the IT features of SharePoint products, such as scale-out functionality, cross-farm support for viewing reports, and claims-based authentication.
With data alerts, business users can get reports delivered to them when changes in a data set raise events on the server. For example, you can configure a data alert to deliver a report to a business user when a price goes down or sales falls below a specified threshold. Figure 5 shows the UI for configuring a data alert.
Figure 5: The UI for Configuring a Data Alert
But Wait, There’s More!
There’s now a Business Intelligence edition of SQL Server. It includes all of the Standard edition’s capabilities, plus Power View, additional multidimensional features, the tabular model, PowerPivot for SharePoint, SSRS data alerts, DQS, and MDS. So you get almost all the BI-related features without having to purchase the Enterprise edition. You can see all the features in the SQL Server Business Intelligence edition in the MSDN web page Features Supported by the Editions of SQL Server 2012.
Should I Upgrade to SQL Server 2012?
If you’ve implemented a BI solution or are interested in implementing one, you should seriously consider upgrading to SQL Server 2012. Its new and enhanced BI features will lead to BI projects with a shorter completion time, faster performance, and more end-user involvement. SQL Server 2012 is a one-stop shop for implementing an end-to-end BI solution.