Ir al contenido principal

SQL Server Analysis Services Neural Network Data Mining Algorithm

Problem
In data mining and machine learning circles, the neural network is one of the most difficult algorithms to explain. Fortunately, SQL Server Analysis Services allows for a simple implementation of the algorithm for data analytics.  Check out this tip to learn more.
Solution
In this tip, we show how to create a simple data mining model using the Neural Network algorithm in SQL Server Analysis Services 2012.
In Visual Studio (also known from the start menu as SQL Server Data Tools), create a new Analysis Services Multidimensional and Data Mining Project.
Introduction to the SQL Server Analysis Services Neural Network Data Mining Algorithm
In this tip, we will name the project NeuralNetworkExample. Click on OK when finished with the New Project window.
name the project NeuralNetworkExample
In the Solution Explorer window, right-click on the Data Sources folder and choose "New Data Source..." to initiate the Data Source Wizard.
right-click on the Data Sources folder
Click on "Next >".
initiate the Data Source Wizard
Choose your data connection, if one exists. If a data connection does not exist, click on "New..." to create a new data connection.
If a data connection does not exist, click on "New...
In this example, we are using a connection to the AdventureWorksDW2012 database on the localhost.
we are using a connection to the AdventureWorksDW2012 database
Click on "Next >".
Click on "Next >".
On the Impersonation Information screen, click on "Use a specific Windows user name and password." Enter your username and password. Click on "Next >".
Use a specific Windows user name and password
On the Completing the Wizard screen, the data source name can be changed if desired. Click on "Finish".
On the Completing the Wizard screen, the data source name can be changed if desired
The new data source will appear in the Solution Explorer.
The new data source will appear in the Solution Explorer.
In the Solution Explorer window, right-click on the Data Source Views folder and choose "New Data Source View..." to launch the Data Source View Wizard.
right-click on the Data Source Views folder and choose "New Data Source View...
Click on "Next >".
launch the Data Source View Wizard.
On the Select a Data Source page in the Relational data sources window, select the data source we created in the above step. Click on "Next >".
select the data source we created in the above step. Click on "Next >".
On the Select Tables and Views page, move the view vTargetMail from the Available Objects box to the Included object box by selecting vTargetMail in the Available objects box and then clicking on the ">" box. Click on "Next >".
the Select Tables and Views page
On the Completing the Wizard page, give the Data Source View a name and click on "Finish".
give the Data Source View a name and click on "Finish".
The data source view now appears in the Solution Explorer window. In the center of the image below, we see the columns listed in the view vTargetMail. Many of these columns such as First Name and Name Style are not useful for data mining. Also, the Spanish and French columns will be redundant because there is an English equivalent. We can select the specific columns we would like for input to the neural network algorithm when we create our data mining model, or we can create a new Named Query to restrict the available columns within the data source view.
The data source view now appears in the Solution Explorer window.
To create a new Named Query, right-click next to the vTargetMail object and choose "New Named Query" as shown below.
create a new Named Query
Enter the query as shown below, give the named query a name and then click "OK".
give the named query a name and then click "OK".
The new Named Query, ColumnsForDataMining, now appears next to the view vTargetMail. Right-click on the Mining Structures folder and select "New Mining Structure..." to launch the Data Mining Wizard.
The new Named Query, ColumnsForDataMining, now appears next to the view vTargetMail.
Click on "Next >".
Right-click on the Mining Structures folder and select "New Mining Structure..."
Press the "From existing relational database or data warehouse" radio button and then click "Next >".
"From existing relational database or data warehouse" radio button and then click "Next >".
Select the Microsoft Neural Network as the data mining technique.
the Microsoft Neural Network
On the Select Data Source View page, we will use our previously defined objects. Click on "Next >".
On the Select Data Source View page
Next, check the Case box on the ColumnsForDataMining line. Click on "Next >".
check the Case box on the ColumnsForDataMining line
On the Specify the Training Data page, check the box in the Key column that corresponds with the CustomerKey column. The remainder of the columns will be used as input. The BikeBuyer column will be our class label, so we check the Predictable box for the BikeBuyer column. Click on "Next >".
On the Specify the Training Data page, check the box in the Key column that corresponds with the CustomerKey column.
The default values are shown below on the Specify Columns' Content and Data Type page. Click on the Detect button.
The default values are shown below on the Specify Columns' Content and Data Type page.
Notice how the content type for Bike Buyer, Number Cars Owned, Number Children At Home, and Total Children changed from Continuous to Discrete. Also, the data type for Bike Buyer can be changed from Long to Boolean because the distinct values for the Bike Buyer column are zero and one. Click on "Next >".
the data type for Bike Buyer can be changed from Long to Boolean because the distinct values for the Bike Buyer column are zero and one. Click on "Next >".
We will use 30% of our data for testing the mining model's accuracy. Click on "Next >".
We will use 30% of our data for testing the mining model's accuracy
On the Completing the Wizard screen, we can rename the mining structure name and the mining model name. Click on "Finish".
On the Completing the Wizard screen, we can rename the mining structure name and the mining model name.
Our mining structure now appears in the Solution Explorer.
Our mining structure now appears in the Solution Explorer.
The Mining Structure tab is selected by default. At this point the Analysis Services objects reside in the Visual Studio project. Click on the Mining Model Viewer tab.
At this point the Analysis Services objects reside in the Visual Studio project
Visual Studio will attempt to deploy the SSAS objects to the server specified in the project properties. When asked if "Would you like to build and deploy the project first?", choose "Yes".
Visual Studio will attempt to deploy the SSAS objects to the server specified in the project properties
When given the warning about the time it could take to process the mining model and asked "Do you wish to continue?", choose "Yes". The number of records in the view is not a large amount, so it should not take more than a minute to process.
"Do you wish to continue?", choose "Yes".
When the Process Mining Model window appears, press the "Run..." button.
When the Process Mining Model window appears, press the "Run..." button.
The Process Progress window will appear. When the process completes successfully select "Close" in the Process Progress window and "Close" again in the Process Mining Model window.
When the process completes successfully select "Close" in the Process Progress window and "Close" again
Depending on your hardware configuration, the Load Mining Model Content window might appear stating to "Please wait...".
he Load Mining Model Content window might appear stating to "Please wait...".
The Deployment Progress window will appear also stating that the SSAS objects were successfully deployed to the Analysis Services server.
The Deployment Progress window will appear also stating that the SSAS objects were successfully deployed to the Analysis Services server.
In the Mining Model Viewer tab, we can see which attributes and their values favor the False (non-buyer) classification and which attributes and their values favor the True (buyer) classification. The wider the blue bar the more that specific key-value pair will favor a particular classification. In the example shown here, when there are 3 children in the customer's home, then the customer tends to be a non-buyer of bikes. When the customer's yearly income is between approximately 79,000 and 154,000, then the customer tends to be a bike buyer.
the Mining Model Viewer tab
When we click on the Mining Accuracy Chart and then click on the Classification Matrix page, we can see the confusion matrix for the Neural Network algorithm. This displays the count of true positives, true negatives, false positives and false negatives.
click on the Mining Accuracy Chart and then click on the Classification Matrix page

Summary

In this tip, we have provided an introduction to the Neural Network data mining algorithm in SQL Server 2012 Analysis Services.

Comentarios

Entradas populares de este blog

¿En qué puerto TCP escucha SQL Server 2005? ¿Cómo cambiar o configurar el puerto TCP de escucha de una Instancia de SQL Server 2005?

Una buena práctica inmediatamente después de instalar SQL Server 2005 es cambiar el Puerto TCP de escucha, por múltiples motivos: Seguridad, Configuración de reglas de acceso de Firewall, Aplicaciones cliente que requieren un puerto TCP estático para SQL Server, etc. En este Artículo se explica cómo averiguar en qué puerto TCP escucha SQL Server 2005, cómo cambiar el puerto TCP de escucha de SQL Server 2005, etc. Resulta de gran interés ser capaz de responder a la pregunta  ¿En qué puerto TCP escucha SQL Server?  Por defecto, una  Instancia por Defecto de SQL Server 2005  queda configurada durante la instalación para escuchar en el  puerto TCP-1433 , sin embargo,  las Instancias con Nombre  quedan configuradas durante el proceso de instalación para escuchar en  puertos TCP dinámicos , por lo tanto, cada vez que se inicie la Instancia puede que escuche en un puerto diferente. Esta situación puede resultar problemática, por un lado desde el punto de vista de la seguridad (el hecho d

¿Qué es el nivel de aislamiento (Isolation Level) de una Transacción? ¿Qué niveles de aislamiento ofrece SQL Server?

Esta capítulo explica qué es el nivel de aislamiento (isolation level) de una transacción, el comportamiento de SQL Server en operaciones de lectura o de escritura, se detallan los diferentes niveles de aislamiento basados en bloqueos (READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE) y los niveles de aislamiento basados en versionado de filas (READ COMMITTED SNAPSHOT, SNAPSHOT), se explican los males de la concurrencia (lecturas sucias, lecturas no repetibles, lecturas fantasma, y conflictos de actualización), como establecer el nivel de aislamiento deseado (SET TRANSACTION ISOLATION LEVEL y las opciones de base de datos READ_COMMITTED_SNAPSHOT y ALLOW_SNAPSHOT_ISOLATION), cómo conecer el tiempo máximo de bloqueo (@@LOCK_TIMEOUT) y como establecer el tiempo máximo de bloqueo (SET LOCK_TIMEOUT), etc. El nivel de aislamiento de una transacción (transaction isolation level)  define el grado en que se aísla una transacción de las modificaciones de recursos o datos rea

Aumentar y Reducir la TempDB en SQL Server?

Una buena práctica después de instalar SQL Server (y que interesa revisar periódicamente) es tener bien dimensionada la base de datos TEMPDB, es decir que el  tamaño inicial de TEMPDB  sea suficiente, y en consecuencia no sea necesario que TEMPDB crezca ni tampoco reducir TEMPDB (SHRINK). Esta artículo explica brevemente  para qué sirve TEMPDB , explica camo cambiar el tamaño inicial de TEMPDB (aumentar o reducir),  cómo reducir TEMPDB , cuántos ficheros son recomendables para TEMPDB, etc. ¿Para qué sirve TEMPDB?  La base de datos TEMPDB es un elemento de gran importancia en una Instancia de SQL Server, ya que  TEMPDB es la encargada de almacenar tanto los objetos temporales  (tablas temporales, procedimientos almacenados temporales, etc.),  como los resultados intermedios  que pueda necesitar crear el motor de base de datos, por ejemplo durante la ejecución de consultas que utilizan las cláusulas GROUP BY, ORDER BY, DISTINCT, etc. (es decir, las tablas temporales o WorkTables que