Monday, April 8, 2013
Friday, March 22, 2013
How to Process Analysis Cube using Command
The ascmd command-line utility enables a database administrator to execute an XMLA script, Multidimensional Expressions (MDX) query, or Data Mining Extensions (DMX) statement against an instance of Microsoft SQL Server Analysis Services. This command-line utility contains functionality for Analysis Services that resembles the sqlcmd utility included with SQL Server. For more information, see the topic sqlcmd Utility in SQL Server. The execution results of the script, query, or statement can be stored in a file together with relevant SQL Server Profiler trace information
XMLA Example:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Parallel>
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>Adventure Works DW</DatabaseID>
<CubeID>Adventure Works DW</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2001</PartitionID>
</Object>
<Type>ProcessFull</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
</Parallel>
</Batch>
This example uses an XMLA Statement to fully process the Internet_Sales_2001 partition.
_________________________________________________________________
Processing Multiple Partitions
In this scenario, you process multiple partitions by using the ascmd command-line utility. You use scripting variables in the XMLA processing script (process.xmla) to specify the degree of parallelism, the database and cube names, and the process type. This XMLA script also demonstrates the use of comments in an XMLA script. When you call the process.xmla processing script from the ascmd command-line utility, you specify the server and database name, an output file for XMLA results, a trace file for trace events, the trace level, and the degree of parallelism in a batch bat (process.bat). The trace file will contain the same events and information as SQL Server Profiler would return if an administrator was monitoring the system during the processing.
process.xmla file:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Parallel maxparallel="$(MAXPARALLEL)">
<!-- SEE ABOVE FOR HOW MANY PARITIONS PROCESSED IN PARALLEL -->
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>$(ASCMDDBNAME)</DatabaseID>
<CubeID>$(ASCMDDBNAME)</CubeID>
<!-- Just so happens CubeID=DatabaseID=Database name :-) -->
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2001</PartitionID>
</Object>
<Type>$(PROCESSTYPE)</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>$(ASCMDDBNAME)</DatabaseID>
<CubeID>$(ASCMDDBNAME)</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2002</PartitionID>
</Object>
<Type>$(PROCESSTYPE)</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>$(ASCMDDBNAME)</DatabaseID>
<CubeID>$(ASCMDDBNAME)</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2004</PartitionID>
</Object>
<Type>$(PROCESSTYPE)</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
<Process xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Object>
<DatabaseID>$(ASCMDDBNAME)</DatabaseID>
<CubeID>$(ASCMDDBNAME)</CubeID>
<MeasureGroupID>Fact Internet Sales 1</MeasureGroupID>
<PartitionID>Internet_Sales_2003</PartitionID>
</Object>
<Type>$(PROCESSTYPE)</Type>
<WriteBackTableCreation>UseExisting</WriteBackTableCreation>
</Process>
</Parallel>
</Batch>
process.bat file:
@echo off
call :generate-timestamp
ascmd -S myserver -d "Adventure Works DW" -i process.xmla
-o process.xml -T process-%timestamp%.csv -Tl medium
-v maxparallel=4 processtype=ProcessFull
if ERRORLEVEL 1 goto errseen
goto :EOF
:errseen
echo ** Error seen in processing
goto :EOF
:generate-timestamp
set now_date=%date%
set now_time=%time%
set now_Year=%now_date:~10,4%
set now_Month=%now_date:~4,2%
set now_Day=%now_date:~7,2%
set now_Hour=%now_time:~0,2%
set now_Min=%now_time:~3,2%
if "%now_Hour:~0,1%"==" " set now_Hour=0%now_Hour:~1,1%
set timestamp=%now_year%%now_month%%now_day%_%now_hour%%now_min%
goto :EOF
Notice that the batch file uses a timestamp in the output file so that multiple runs can be recorded at the same time.
Ways to Execute a SQL Agent Job
Let’s say I have a job called “BACKUPTEST” which backups the test databases. I want to be able to execute the job “On Demand”, so whenever anyone needs to do the backup this can be done. In this article I will show you how you can execute such Jobs easily through various ways.
In this tip we will look at these four methods:
- SQL Server Management Studio
- T-SQL commands
- DMO (Distributed Management Objects)
- OSQL
Also, this tip assumes that the jobs have already been setup.
(1) - SQL Server Management Studio
The first way that most people are probably aware of is to use SQL Server Management Studio.
SQL Server Agent is the job scheduling tool for SQL Server.
To execute a job on demand using the GUI, open the SQL Server Agent tree, expand Jobs, select the job you want to run, right click on that job and click ‘Start Job’ and the job will execute.
(2) -T-SQL commands
The second way is through a T-SQL statement using the 'sp_start_job' stored procedure which instructs SQL Server Agent to execute a job immediately. It is a stored procedure in the 'msdb' database.
The syntax for the sp_start_job stored procedure is:
sp_start_job [@job_name] or [@job_id ] [,@error_flag ] [,@server_name] [,@step_name ] [,@output_flag ] |
Arguments:
[@job_name] | [@job_id ] | Is the name of the job to start. Either job_id or job_name must be specified, but both cannot be specified. job_name is sysname, with a default of NULL. |
[@error_flag =] error_flag | Reserved. |
[@server_name =] 'server_name' | Is the target server on which to start the job. server_name is nvarchar(30), with a default of NULL. server_name must be one of the target servers to which the job is currently targeted. |
[@step_name =] 'step_name' | Is the name of the step at which to begin execution of the job. Applies only to local jobs. step_name is sysname, with a default of NULL |
[@output_flag =] output_flag | Reserved. |
When a job run it will have one of two return codes:
- 0 (success)
- 1 (failure)
To run the job ‘BACKUPTEST’ it can be executed by a single T-SQL statement: such as:
EXEC msdb.dbo.sp_start_job 'BACKUPTEST' |
(3) -DMO (Distributed Management Objects)
Another way of executing the job is through a VBS script using Distributed Management Objects (DMO).
Here is the basic script syntax.
On Error Goto 0: Main() Sub Main() Set objSQL = CreateObject("SQLDMO.SQLServer") ' Leave as trusted connection objSQL.LoginSecure = True ' Change to match the name of your SQL server objSQL.Connect "Enter Server Name Here" Set objJob = objSQL.JobServer For each job in objJob.Jobs if instr(1,job.Name,"Enter Job Name Here") > 0 then msgbox job.Name job.Start msgbox "Job Started" end if Next End Sub |
Here is sample executing the "BACKUPTEST" job on server "SQLTEST1". This uses NT authentication to run this script.
On Error Goto 0: Main() Sub Main() Set objSQL = CreateObject("SQLDMO.SQLServer") ' Leave as trusted connection objSQL.LoginSecure = True ' Change to match the name of your SQL server objSQL.Connect "SQLTEST1" Set objJob = objSQL.JobServer For each job in objJob.Jobs if instr(1,job.Name,"BACKUPTEST") > 0 then msgbox job.Name job.Start msgbox "Job Started" end if Next End Sub |
This code would then be saved in a file and named something like "RunJob.vbs". You can then double click on the file to execute it or run the code from a command line.
(4) - Using osql utility
Lastly, we can start the job using osql commands.
The osql utility allows you to enter T-SQL statements, system procedures, and script files.
Here is the basic script syntax.
osql -S "Enter Server Name Here" -E -Q"exec msdb.dbo.sp_start_job 'Enter Job Name Here'" |
Open a command prompt and execute the below osql command in it:, replacing your server name and job name.
osql -S "SQLTEST1" -E -Q"exec msdb.dbo.sp_start_job 'BACKUPTEST'" |
The next step is to make a batch file which can be run over and over again.
Open notepad and type the commands as follow:
Save the file as “job.bat”.
The batch is now ready for use. Just double click on it and it will do the maintenance work without having any knowledge of SQL Server.
Permissions
You might have noticed in all the four solutions the msdb stored procedure ‘sp_start_job’ is used in one way or another.
By default, members of the sysadmin fixed server role can execute this stored procedure.
Other users must be granted one of the following SQL Server Agent fixed database roles in the msdb database:
- SQLAgentUserRole
- SQLAgentReaderRole
- SQLAgentOperatorRole
Members of SQLAgentUserRole and SQLAgentReaderRole can only start jobs that they own.
Members of SQLAgentOperatorRole can start all local jobs including those that are owned by other users.
Members of sysadmin can start all local and multiserver jobs.
Data Mining with Microsoft Office Excel
Introduction
A key issue in data mining is not only generating the results but also interpreting them. Unfortunately it is often difficult to interpret the results of data mining and many users are turning to Excel and its data mining add-in tool.
Requirements
You will need to have permission to access SQL Server Analysis Services. You will need to have Excel installed (preferably 2007 or later). The Microsoft SQL Server Data Mining Add-ins for Microsoft Office 2007 can be downloaded from:
This package includes two add-ins for Microsoft Office Excel 2007 (Table Analysis Tools and Data Mining Client) and one add-in for Microsoft Office Visio 2007 (Data Mining Templates). In this article, will focus on the Table Analysis Tool for Excel 2007.
Unfortunately, this add-in does not work in a 64 bit environment and Microsoft has even said we won’t get 64bit support version in the near future.
Installation
After the standard dialog box in the installation, you will get the following dialog requesting you to select your installation options.
By default, the Table Analysis Tools for Excel is selected and if you wish, you can select the other two options.
Configuration
In the start menu under Microsoft SQL 2008 Data Mining Add-ins, you will see a short cut named Server Configuration Utility which is the tool you to configure the add-in.
At the first dialog, you need to enter the analysis server name, the authentication method should be Windows Authentication as analysis services does not support SQL Server Authentication.
You need to allow the creation of the temporary mining model in SQL Server Analysis Service(SSAS). You can enable this from SQL Server Analysis Service by selecting the properties of the Analysis Service logging in to the SQL Server Analysis Services from SQL Server Management Studio.
Also, you can configure this from the installation wizard.
In the wizard, the next step is to create a SSAS database or else you can allow temporary mining models in the SSAS database.
With the installation, you will get a sample Excel file named DMAddings_SampleData.xlsx to test with. When you open the excel file, in the Analyze ribbon you will see that available for table analyze:
The below table lists the data mining algorithms used by each analysis option.
Menu Option | Data Mining Algorithm |
Analyze Key Influencers | Naïve Bayes |
Detect Categories | Clustering |
Fill form Example | Logistic Regression |
Forecast | Time Series |
Highlight Exceptions | Clustering |
Scenario Analysis (Goal Seek) | Logistic Regression |
Scenario Analysis (What If) | Logistic Regression |
Prediction Calculator | Logistic Regression |
Shopping Basket Analysis | Association Rule |
Analyze Key Influences
When you have a data set there can be several attributes. For example, in case of a customer dataset, there can be several attributes like Marital Status, Gender, Income and Occupation etc. Analyze Key Influences examines what are the influencing factors for a given attribute. In the Table Analysis Tools Sample sheet, you can find a table with customer attributes.
Let us assume that we wish to analyse the Occupation column for key factors.
1. First select Analyze Key Influences from the Analyze ribbon.
2. Next select the Occupation column for analyze:
3. Next, choose the columns for analysis by clicking Choose columns to be used for analysis
By default, all columns are selected. However, you can ignore columns such as ID. If there are derived columns you should ignore these as well. For example, there can be a bonus column which is based on a percentage of the salary column.
Press OK for this dialog and click Run for the other dialog and you will get the following results in a new sheet in the same Excel file.
The abThe above table indicates that Income range 39,050 – 71,062 is more favored to Skilled Manual Occupation.
The Relative Impact column, which is the last column, indicates the relative impact to the Occupation. This is a type of rank with strongest getting a score100 and the weakest a score of 0.
In case you want to find to out the relationship between selected factors, you can use the table filtering option in Excel. By doing this you can ignore unnecessary values. For example, if you want to find out the influencing factors for Age and Income towards Occupation, you can filter the column field and it will look as below:
Discrimination on Key Influencers
When generating a key influencing report, you can add any number of discrimination reports which compare how key factors differentiate between them.
From the following dialog box you can view a report of the discrimination between occupations of Skilled manual and Clerical occupations.
The below shows the discrimination report between Skilled Manual and Clerical. The final column was added manually and not by the add-in tool. This column is the sum of column 3 and column 4.
The above report will list factors that will influence for the selected occupations in descending order. You can add any number of discrimination reports, but you have to generate them at the instance and so you cannot add them later.
The influencers presented in the discrimination report are often different from those presented in the main report. Some influencers from the main report are not present in the discrimination report. Those are important when comparing Skilled manual and Clerical against all other occupation types.
Detect Categories
When working with large sets of data, it is much easier to deal with a smaller and more manageable sub set of the data which has similar behavior patterns. This practice is called Customer Segmentation.
The Detect Categories table analysis tools finds natural groups in the data. It analyses your data, finds the most common combinations of column values and then defines groups based on these common patterns.
As in the previous case, you can use the Table Analysis Tools Sample Excel sheet. After selecting the table in the sheet, select Detect Categories from the Analyze table which will bring up following dialog box:
As in the previous case, you need to select the attributes that need to be categorized.
Fill Form
Fill form is a prediction technique using sample or known data. If you navigate to the next sheet, Fill Form Example, you will see something like this:
In the above data set you will see High Value Customer for the first 10 rows but the rest of the rows are not filled.
Fill Form will identify the pattern for the existing data and it will fill the blank columns.
You need to click the Fill Form button from the add-in ribbon, following screen will be displayed:
In the above dialog, you need to select the column that need to predict. By clicking the link Choose columns to be used for analysis you can select the columns you want to take account for in making the predictions. For example, you can ignore the ID column.
When you click Run button, the new column High Value Customer_Extended is added to the sheet in which you will get the predictions for the high value customer. Apart from this, the new sheet will be added as follows.
The above sheet will tell you what the most and least influencing factors are when predicting the selected column.
Highlight Exceptions
When collecting data, there can be anomalies due to various reasons. Data mining techniques can be used to detect those anomalies. Highlight Exceptions is a technique available to detect anomalies.
After getting into the sheet and selecting the columns you need to analyze, click the Highlight Exception button.
The exception rows will be highlighted, along with the column which is the reason for the row to become exception.
Apart from highlighting the exception rows, there will be another sheet added to the workbook which is summarization of anomalies:
This should get you started using Excel for SQL Server data mining, in future I hope to write another articles explaining other techniques such as Forecast, Scenario Analysis, Prediction Calculator and Shopping Basket Analysis.
Which of Your Stored Procedures are Using the Most Resources?
When you start to look at tuning up your stored procedures where do you start? If your database has only a few stored procedures, it might be fairly simple to figure this out by doing a quick test to determine which stored procedure is using the most resources. But what if you have hundreds of them? How do you identify those that are using the bulk of the resources? In this article I will show you how to quickly identify the worst performing stored procedures on your server.
Getting Metrics Related to Stored Procedure Executions
How do you set about getting figures for the time taken by stored procedures, or the amount of resources used in their execution? There is more than one way to discover these execution metrics. I am not going to mention all the different methods: Heck, I probably don’t even know all the different ways! Instead, I will focus in on how to use Dynamic Management Views and Function (DMVs and DMFs) to identify resources used by stored procedures. DMVs and DMFs were introduced in SQL Server 2005 and then enhanced with the introduction SQL Server 2008. I am sorry to say that, if you are running on a SQL Server version prior to 2005, then this article will not help you in identifying your worst performing stored procedures.
I’m going to show you two different DMV methods to get stored procedure execution metrics. One method works in both SQL Server 2005 and SQL Server 2008, while the other method is only available in SQL Server 2008. Let me explain both of these methods.
Before I show you these two methods, first let me explain a little about how SQL Server collects the DMV metrics for stored procedure executions. These two different DMVs are able to obtain these figures only for those stored procedures that have execution plans in the procedure cache. The SQL Server engine starts gathering information about the performance of a stored procedure the first time that the plan is placed in the procedure cache. As long as the stored procedure’s execution plan stays in the cache, SQL Server keeps updating the performance figures every time the stored procedure is executed. If the execution plan for a stored procedure is dropped from the procedure cache then the metrics for the stored procedure are also removed. Therefore you will only be able to obtain metrics for those stored procedures that have cached plans, and those metrics will contain a summarization of the amount of resources used since the stored procedure was compiled, and its’ execution plan was placed in the procedure cache.
sys.dm_exec_procedure_stats
The first DMV that I’d like to discuss is sys.dm_exec_procedure_stats. This DMV was introduced with SQL Server 2008. By using this DMV, you can returns metrics for stored procedures, but it does not directly identify the stored procedure by name. Instead, the DMV only identifies the object_id, and a database_id for each stored procedure: Therefore to identify the actual stored procedure name, that the performance figures belong to, you should either join the output of this DMV with one of the system views within the appropriate database, or use a few metadata functions. Here is an example of how you might do that:
SELECT CASE WHEN database_id = 32767 then ‘Resource’ ELSE DB_NAME(database_id)END AS DBName
,OBJECT_SCHEMA_NAME(object_id,database_id) AS [SCHEMA_NAME]
,OBJECT_NAME(object_id,database_id)AS [OBJECT_NAME]
,* FROM sys.dm_exec_procedure_stats
You can see, by looking at this query, that I used the “database_id” and the “object_id” columns from the sys.dm_exec_procedure_stats and then used a couple of metadata functions to identify the name of the database, the schema_name, and the object name for each record returned from this DMV. Run this query on one of your SQL Server instances and see what comes back. Let me explore a couple of key columns values returned by this view, and explain what they mean and how you can use them.
First, let’s look at the “cached_time” column. This column contains the time in which the stored procedure was compiled and placed in the procedure cache. You can use this column to identify how long the database engine has been collecting metrics for the identified stored procedure. This is very useful when comparing the amount of resources that one stored procedure has consumed over another. By knowing how long SQL Server has been collecting metrics will help you determine how skewed the resource usage metrics might be when comparing two different stored procedures.
The next column you might be interested in looking at is the “execution_count” . This column identifies the number of times the stored procedure has been executed since it has been placed in the procedure cache. When you are looking for tuning opportunities you need to consider tuning those store procedures that have been executed many times. Making small performance gains in a stored procedure that has been executed 100,000 times might be much more beneficial than speeding up a stored procedure two fold if it has only been executed a few timea. To understand what each column value represents in this DMV I would recommend you refer to Books Online.
When I look at stored procedure metrics, I like to identify the average amount of resources used for a single execution of a stored procedure. Calculating the average resources usage per execution allows you to more easily compare one stored procedure to another. To calculate these average resource usage numbers, I divide the total used number by the “execution_count” column. Here is some code where I used this technique to calculate average resources usage numbers for stored procedures that have metrics on my system:
SELECT CASE WHEN database_id = 32767 then ‘Resource’ ELSE DB_NAME(database_id)END AS DBName
,OBJECT_SCHEMA_NAME(object_id,database_id) AS [SCHEMA_NAME]
,OBJECT_NAME(object_id,database_id)AS [OBJECT_NAME]
,cached_time
,last_execution_time
,execution_count
,total_worker_time / execution_count AS AVG_CPU
,total_elapsed_time / execution_count AS AVG_ELAPSED
,total_logical_reads / execution_count AS AVG_LOGICAL_READS
,total_logical_writes / execution_count AS AVG_LOGICAL_WRITES
,total_physical_reads / execution_count AS AVG_PHYSICAL_READS
FROM sys.dm_exec_procedure_stats
ORDER BY AVG_LOGICAL_READS DESC
Here I have ordered the results by the average logical reads per execution of a stored procedure. By doing this I can tell which SP has performed the most logical I/O’s per execution. When you run this query on your system, remember to keep an eye on the “execution_count” column. This will help you identify those stored procedures that are executed frequently. By modifying the query above and ordering the results based on a different column you can compare each stored procedure based on different average resource usage. Keep in mind when you are looking out the output from different queries, it might not be as beneficial to tune a procedure that is only executed once, over a procedure that is executed many time.
If you don’t have SQL Server 2008, that doesn’t mean you are out of luck at finding stored procedure metrics. In the next section I will show you how to get the similar results as the above query using DMVs and DMFs that are available in both SQL Server 2008 and SQL Server 2005.
sys.dm_exec_query_stats
The sys.dm_exec_query_stats DMV can be used to obtain statement level metrics. Once again, these metrics are only available if the statement comes from a cached plan. By using this DMV and summarizing the statement metrics up to the plan level, you are able to identify metrics for stored procedures. The code below can be used in SQL Server 2005 and SQL Server 2008 to produce similar results as I did above using the sys.dm_exec_procedure_stats DMV method:
SELECT CASE WHEN dbid = 32767 then ‘Resource’ ELSE DB_NAME(dbid)END AS DBName
,OBJECT_SCHEMA_NAME(objectid,dbid) AS [SCHEMA_NAME]
,OBJECT_NAME(objectid,dbid)AS [OBJECT_NAME]
,MAX(qs.creation_time) AS ‘cache_time’
,MAX(last_execution_time) AS ‘last_execution_time’
,MAX(usecounts) AS [execution_count]
,SUM(total_worker_time) / SUM(usecounts) AS AVG_CPU
,SUM(total_elapsed_time) / SUM(usecounts) AS AVG_ELAPSED
,SUM(total_logical_reads) / SUM(usecounts) AS AVG_LOGICAL_READS
,SUM(total_logical_writes) / SUM(usecounts) AS AVG_LOGICAL_WRITES
,SUM(total_physical_reads) / SUM(usecounts)AS AVG_PHYSICAL_READS
FROM sys.dm_exec_query_stats qs
join sys.dm_exec_cached_plans cp on qs.plan_handle = cp.plan_handle
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle)
WHERE objtype = ‘Proc’
AND text
NOT LIKE ‘%CREATE FUNC%’
GROUP BY cp.plan_handle,DBID,objectid
One thing to note about this code, the “last_execution_time”, “AVG_CPU” and “AVG_ELAPSED” will have a slightly different values then using the sys.dm_exec_procedure_stats method. It makes sense that these numbers will not be exactly the same since these values are calculated by rolling up the individual statement metrics to calculate the totals at the stored procedure level. If you look closely at the different values produced by these two methods you will find a lower CPU and elapsed time value is reported using this method, and the last execution date will be a little close to the current time.
Getting Stored Procedure Metrics
When doing performance analysis of your system it is good to be able to find those stored procedures that are using the most resources. By know this information you can focus in and tune those stored procedures that provide you the biggest performance improvement for your effort. Next time you are wondering which stored procedures are using the most resource you will now have a way to identify that regardless of whether you are monitoring SQL Server 2005 or SQL Server 2008.
Thursday, March 21, 2013
Execute SQL statements in Integration Services
Takeaway: Executing SQL statements are one of the easiest and most common tasks that occur in SQL Server 2005 Integration Services (SSIS). This tutorial walks you through the process of adding SQL statements to your SSIS packages and bundling them into logical units of work if necessary.
Executing SQL statements is one of the easiest and most common tasks that occur in SQL Server 2005 Integration Services (SSIS). This tutorial walks you through the process of adding SQL statements to your SSIS packages and bundling them into logical units of work if necessary.
Add and bundle SQL statements to SISS packages
Before I can execute a SQL statement inside of an SSIS package, I need to create a Data Source to specify the SQL Server and database to which I want to connect. To do this, I right-click the Data Sources folder in my Solution Explorer window and select New Data Source (Figure A). If you cannot find your Solution Explorer window, go to the View menu and select Solution Explorer.
Figure A
Selecting the New Data Source option brings up the Data Source Wizard. Since I haven't defined a Data Source, I click the New button (Figure B).
Figure B
I am working on the machine that hosts the SQL Server instance to which I will connect, so I type(local) in the Server Name drop-down list. From there, I can select the database to which I want to connect (Figure C).
Figure C
I name my new Data Source (Figure D). I can add additional data sources if my project will need it. An SSIS package can connect to several different data sources — even data sources of different types — within the same package. For the purposes of this example, I will only need the connection I just created.
Figure D
Now that I have defined a connection, it is time for me to move on to the objects that I will use to execute my SQL statements. From the Control Flow menu, I want to drag three Execute SQL Task task options onto my Control Flow screen (Figure E). I am dragging three of these task items because I plan to create three tables that I will import data into in next week's article. I am adding three of these task items because I plan to create three tables that I will import data into in a future article. These tasks will be used to create the BigScreenProducts table, the ComputerProducts table, and the PoolTableProducts table. To rename these individual task items, right-click the item and select Rename.
Figure E
To specify the details of the task items, I double-click the first item, which brings up the Execute SQL Task Editor window (Figure F).
Figure F
In this window, I specify the data server to which I want to connect. I select the Data Source that I previously created and click OK (Figure G).
Figure G
Once I establish my Data Source, it is time to define the SQL statement that I want to execute. Here are my three options for executing the statement:
- Direct Input, which means I will be tying in the SQL statement into a window.
- Specify a file that contains the SQL statement.
- Specify a variable that contains the SQL statement.
I will explore the last option more in future articles, but for now, I'll concentrate on the Direct Input option. Click the ellipsis symbol (…) in the Execute SQL Task Editor window and enter the SQL Script in Figure H.
Figure H
The code for the statement is below:
IF OBJECT_ID('BigScreenProducts') IS NOT NULL DROP TABLE BigScreenProducts GO CREATE TABLE BigScreenProducts ( [SaleID] INT, [Product] VARCHAR(50), [SalePrice] MONEY )
This script checks to see if the BigScreenProducts table exists and drops it if it does. I then create the BigScreenProducts table.
Repeat the following Execute SQL Task steps for the remaining two task items — simply change the name of the table in the SQL script. You can replace BigScreenProducts with PoolTableProducts in the second task and replace it with ComputerProducts in the third task.
I like to group statements together that I feel execute as one logical unit of work. I feel like creating these tables is one logical unit of work, so I want to include them in a Sequence Container task item. From the Control Flow Items menu, I drag the Sequence Container task onto my Control Flow menu; I then drag my three Execute SQL Tasks into this container object. I like to do this to ensure that these items execute together and to abstract them from other parts of my Control Flow menu if it is going to be complex (Figure I).
Figure I
All that is left is to execute the package. Because I didn't define any precedence control inside of the package, the creation of the three tables happens at almost exactly the same time. Figure Jdepicts a successful package execution.
Figure J