什么时候写“ad hoc sql"比较好?VS 存储过程

问题描述:

在我的应用程序中,我有 100% 的临时 sql.我的一个朋友建议我转换为存储过程以获得额外的性能和安全性.这让我想到了一个问题,除了速度和安全性之外,还有其他理由坚持使用即席 sql 查询吗?

I have 100% ad hoc sql through out my application. A buddy of mine recommended that I convert to stored procedures for the extra performance and security. This brought up a question in my mind, besides speed and security is there any other reason to stick with ad hoc sql queries?

SQL Server 缓存了临时查询的执行计划,因此(扣除第一次调用所花费的时间)这两种方法在速度方面是相同的.

SQL Server caches the execution plans for ad-hoc queries, so (discounting the time taken by the first call) the two approaches will be identical in terms of speed.

一般来说,使用存储过程意味着获取应用程序所需的一部分代码(T-SQL 查询)并将其放在不受源代码控制的地方(它可以 是,但通常不是),并且其他人可以在您不知情的情况下对其进行更改.

In general, the use of stored procedures means taking a portion of the code needed by your application (the T-SQL queries) and putting it in a place that is not under source control (it can be, but usually isn't) and where it can be altered by others without your knowledge.

将查询放在像这样的中心位置可能是一件好事,这取决于有多少不同的应用程序需要访问它们所代表的数据.我通常发现将应用程序使用的查询保留在应用程序代码本身中要容易得多.

Having the queries in a central place like this may be a good thing, depending upon how many different applications need access to the data they represent. I generally find it much easier to keep the queries used by an application resident in the application code itself.

在 1990 年代中期,传统观点认为 SQL Server 中的存储过程是性能关键情况下的最佳选择,而当时确实如此.然而,这个 CW 背后的原因已经很长时间没有成立了.

In the mid-1990's, the conventional wisdom said that stored procedures in SQL Server were the way to go in performance-critical situations, and at the time they definitely were. The reasons behind this CW have not been valid for a long time, however.

更新:此外,在关于存储过程的可行性的争论中,经常会提到需要防止 SQL 注入来保护 proc.当然,没有人认为通过字符串连接组装临时查询是正确的做法(尽管如果您连接用户输入,这只会使您面临 SQL 注入攻击).显然,ad hoc 查询应该被参数化,不仅是为了防止 sql 注入攻击的底层怪物,而且只是为了让你作为程序员的生活更轻松(除非你喜欢弄清楚何时使用 single引用您的价值观).

Update: Also, frequently in debates over the viability of stored procedures, the need to prevent SQL injection is invoked in defense of procs. Surely, no one in their right mind thinks that assembling ad hoc queries through string concatenation is the correct thing to do (although this will only expose you to a SQL injection attack if you're concatenating user input). Obviously ad hoc queries should be parameterized, not only to prevent the monster-under-the-bed of a sql injection attack, but also just to make your life as a programmer generally easier (unless you enjoy having to figure out when to use single quotes around your values).

更新 2:我做了更多的研究.根据 this MSDN white paper,答案似乎取决于什么您的意思是特别"与您的查询,正是.例如,像这样的简单查询:

Update 2: I have done more research. Based on this MSDN white paper, it appears that the answer depends on what you mean by "ad-hoc" with your queries, exactly. For example, a simple query like this:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

... 缓存其执行计划.此外,因为查询不包含某些不符合条件的元素(就像从一个表中的简单 SELECT 之外的几乎任何东西),SQL Server 实际上会自动参数化"查询并用参数替换文字常量5",并缓存参数化版本的执行计划.这意味着,如果您随后执行这个即席查询:

... will have its execution plan cached. Moreover, because the query does not contain certain disqualifying elements (like nearly anything other than a simple SELECT from one table), SQL Server will actually "auto-parameterize" the query and replace the literal constant "5" with a parameter, and cache the execution plan for the parameterized version. This means that if you then execute this ad-hoc query:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 23

...它将能够使用缓存的执行计划.

... it will be able to use the cached execution plan.

不幸的是,自动参数化的不合格查询元素列表很长(例如,忘记使用DISTINCTTOPUNIONGROUP BYOR 等),所以你真的不能指望这个来提高性能.

Unfortunately, the list of disqualifying query elements for auto-parameterization is long (for example, forget about using DISTINCT, TOP, UNION, GROUP BY, OR etc.), so you really cannot count on this for performance.

如果您确实有一个不会被自动参数化的超级复杂"查询,例如:

If you do have a "super complex" query that won't be auto-parameterized, like:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5 OR ITEM_COUNT < 23

...它仍然会被查询的确切文本缓存,因此如果您的应用程序使用相同的文字硬编码"值重复调用此查询,则第一个查询之后的每个查询都将重新使用缓存的执行计划(因此与存储过程一样快).

... it will still be cached by the exact text of the query, so if your application calls this query with the same literal "hard-coded" values repeatedly, each query after the first will re-use the cached execution plan (and thus be as fast as a stored proc).

如果文字值发生变化(基于用户操作,例如过滤或排序查看的数据),那么查询将不会从缓存中受益(除非偶尔他们不小心与最近的查询完全匹配).

If the literal values change (based on user actions, for example, like filtering or sorting viewed data), then the queries will not benefit from caching (except occasionally when they accidentally match a recent query exactly).

从使用ad-hoc"查询缓存中受益的方法是将它们参数化.像这样在 C# 中动态创建查询:

The way to benefit from caching with "ad-hoc" queries is to parameterize them. Creating a query on the fly in C# like this:

int itemCount = 5;
string query = "DELETE FROM tblSTUFF WHERE ITEM_COUNT > " + 
        itemCount.ToString();

不正确.正确的方法(使用 ADO.Net)应该是这样的:

is incorrect. The correct way (using ADO.Net) would be something like this:

using (SqlConnection conn = new SqlConnection(connStr))
{
    SqlCommand com = new SqlCommand(conn);
    com.CommandType = CommandType.Text;
    com.CommandText = 
        "DELETE FROM tblSTUFF WHERE ITEM_COUNT > @ITEM_COUNT";
    int itemCount = 5;
    com.Parameters.AddWithValue("@ITEM_COUNT", itemCount);
    com.Prepare();
    com.ExecuteNonQuery();
}

查询不包含文字并且已经完全参数化,因此使用相同参数化语句的后续查询将使用缓存计划(即使使用不同的参数值调用).请注意,此处的代码实际上与您用于调用存储过程的代码相同(唯一的区别是 CommandType 和 CommandText),因此在某种程度上归结为您希望该查询的文本存活"的位置"(在您的应用程序代码或存储过程中).

The query contains no literals and is already fully parameterized, so subsequent queries using the identical parameterized statement would use the cached plan (even if called with different parameter values). Note that the code here is virtually the same as the code you would use for calling a stored procedure anyway (the only difference being the CommandType and the CommandText), so it somewhat comes down to where you want the text of that query to "live" (in your application code or in a stored procedure).

最后,如果临时"查询是指您正在动态构建具有不同列、表、过滤参数等的查询,例如:

Finally, if by "ad-hoc" queries you mean you're dynamically constructing queries with different columns, tables, filtering parameters and whatnot, like maybe these:

SELECT ID, DESC FROM tblSTUFF WHERE ITEM_COUNT > 5

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS 
    WHERE AGE >= 18 AND LASTNAME LIKE '%What the`

SELECT ID, FIRSTNAME, LASTNAME FROM tblPEEPS 
    WHERE AGE >= 18 AND LASTNAME LIKE '%What the`
    ORDER BY LASTNAME DESC

...那么你几乎不能用存储过程来做到这一点(没有 EXEC hack 在上流社会是不会被提及的),所以这一点没有实际意义.

... then you pretty much can't do this with stored procedures (without the EXEC hack which is not to be spoken of in polite society), so the point is moot.

更新 3:这是使用存储过程的唯一真正好的与性能相关的原因(无论如何我能想到).如果您的查询是一个长时间运行的查询,其中编译执行计划的过程花费的时间比实际执行的时间长得多,并且该查询只是很少被调用(例如每月报告),那么将它放在存储过程中可能会使 SQL Server 将已编译的计划在缓存中保留足够长的时间,以便它在下个月左右仍然存在.不管是真是假,我都打不过.

Update 3: Here is the only really good performance-related reason (that I can think of, anyway) for using a stored procedure. If your query is a long-running one where the process of compiling the execution plan takes significantly longer than the actual execution, and the query is only called infrequently (like a monthly report, for example), then putting it in a stored procedure might make SQL Server keep the compiled plan in the cache long enough for it to still be around next month. Beats me if that's true or not, though.