Hortonworks Data Platform 2.1 New Features

What is Hortonworks Data Platform

Hortonworks recently announced their release of the HDP 2.1 their Hadoop Big Data platform.  If you have not tried it yet and are interested in how to get started with Big Data.  Now is the time learn Hadoop.

In this release there are 5 new enhancements to the Enterprise Capability of Hortonworks.

Apache Falcon – framework for data management and pipeline processing for Hadoop.

Know Gateway - authentication system that allows a single point of access for all Hadoop  services within a cluster

Apache Ambari – operational interface to support Hadoop cluster services

Apache Solr - high performance indexing and search retrieval system

Apache Storm – real time computation engine for processing stream data


SQL Server Filtered Index

What is a SQL Server Filtered Index

A SQL Server Filtered index is a nonclustered indexes that can be used to increase query performance when selecting records from a table.  This type of index can be especially useful when there are a small count of  distinct values in a column such as gender that can contain null values.  By created a filtered indexed on Male or Female the index will ignore null rows.  This has three benefits.  First, the performance of the index is faster because it has less rows to use for retrieval.  Second, it reduces the cost to maintain the index on the table.  Finally there is less storage required to hold the index because it is only storing the values defined for the index on the table.

Design Considerations

When selecting the columns to include in the filtered index you should be aware of these design points.

  • Filtered indexes are defined on one table and support simple comparison operations in queries.
  • A column in the filtered should be a key or included column if the column is in the query result set

As with anything in SQL Server your mileage may vary.  If you have not tried using Filtered indexes then you should find an appropriate table and query that could benefit from their use.  Execute the code prior to the changes and review the execution plan.  Then make the changes, clear the buffer cache and try your code and execution plan again.

For more information on SQL Server Filtered Indexes see Technet

For an example on how to use SQL Server Filtered Index Performance see my post


Houston SQL Saturday #308 – Presenting

Saturday May 10th – Houston SQL Saturday #308

I will be presenting at the upcoming SQL Saturday in Houston on May10th.  For those of you not familiar with SQL Saturdays.  It is an all day training event hosted by the local SQL Server User Groups.  If you are interested in learning more about SQL Server, Analysis Services, Tabular Models or anything else in the SQL Server stack, this is a great way to pick up free training.

My topic is covering how to build a SQL 2012 Analysis Services Tabular model and present the information via Excel 2013.  I am using the United States Oil Field Production data set from the government as the example data.  There will be tips and lessons learned from projects that I have worked on in building and deploying Tabular models.  This session is geared for people who are interested in building models but have limited experience with them.

The location for the event is:

San Jacinto College – South Campus
13735 Beamer Rd. Building 12
Houston, TX 77089

For more information on SQL Saturdays : SQLSaturday

For Houston’s SQL Saturday #308 : SQLSaturday308


CISSP and Enterprise Architecture

CISSP and Enterprise Architecture

One topic that professionals pursing Enterprise Architecture roles should focus on is security.  Enterprise Architects are responsible for designing the structure and interaction of IT systems around fulfilling business needs for an organization.  However, not everyone moving to an EA role has had a broad exposure, experience or understanding to the various types of security that makes and IT environment more secure.  Many EA’s come from an application development background.  While application developers understand security as it relates to system design, they may not understand overall infrastructure or physical environment security.  Regardless of your background.  If you are pursuing work as an Enterprise Architect., then one method to learn many aspects about security is to study the CISSP program.  Knowing more about information security and standards within the framework will help you when you are designing new applications, evaluation vendor solutions or making decision on how to best leverage newer technologies such as the cloud.

Enterprise solution vendors are going to create solutions that reference the guidance within this program.  Therefore the better you understand the primary concepts within the domains the more you will be able to architect a solution based on best practices.

The CISSP Program – Certified Information System Security Professional

Information security covers many areas of IT and is a program that encompasses many layers of security to protect your business from unauthorized access.  In today’s business climate organizations must be wary of threats originating from outside the organization as well as internal security breaches.  I believe that Enterprise Architects should study the CISSP knowledge base to gain a more holistic understanding of how to best implement security for your organization and work with your internal security staff to ensure proper measures are taken.  The CISSP program covers these 10 topics and provides many great examples of how to implement layers of security based on your needs.

  • Access Control
  • Telecommunications and Network Security
  • Information Security Governance and Risk Management
  • Software Development Security
  • Cryptography
  • Security Architecture and Design
  • Operations Security
  • Business Continuity and Disaster Recovery Planning
  • Legal, Regulations, Investigations and Compliance
  • Physical (Environmental) Security

For complete information on the CISSP certification and exam can be found here – CISSP

Whether you decide to take the CISSP certification exam or not.  Studying the material will help you become a better Enterprise Architect.

When Good BI Goes Bad – Data Quality

Regardless of the Technology BI is about Data

Many BI projects are sold as a technology solution.  Vendors come in a wow executives with new features, visualizations, and dashboards.  Along with the software they offer simple and fast implementation that can “connect to all of your data and produce fantastic reports”.  While the technology may have the capability to connect to a variety of data sources and the software can produce reports from those connections.  They often leave out the critical question.  How clean is your data that you are reporting on?

Bad Data Ruins Great Technology

Data is the key to why you are building a business intelligence system.  Without data that you can trust and make business decisions upon, there is no reason to invest in new technology.   Before you begin a BI project, take some time to investigate the data quality of the information in the applications that will be used in any future BI Projects.  For each application you need to determine what is the appropriate level of “clean” data that can be used to make decisions.  For instance, CRM data has many components such as person, address, and contact history.  When building a BI system using CRM data the goal of the project will determine what data elements are critical for success.  If knowing how many customers you have in a location is critical then you would need to have high confidence in the address information in your CRM application.  In this instance, how clean is clean?  Having 90% of all active customer addresses with complete information?  Or is 80% of all active customers with corrected and updated address information more important?  The answer is of course, it depends.  Depending upon the expected outcome will determine what is needed from your data. Regardless of the current project, whenever you access data for a BI project your should determine what its data quality currently is.  Then make a plan to correct it short term as well as address long term needs.  Data quality that is only done once is not really data quality it is a one time clean up.  Work done on BI systems should lead the way to get your production applications to accept feedback from the BI systems as an input to fixing the source system information.