如何最好地在SQL Server存储过程中找到硬编码的英语字符串?

问题描述:

我们正在努力使我们的应用程序100%可本地化,并且我们大部分都在那儿.但是,有时我们会发现仍然在存储过程中对英语字符串进行了硬编码. (顺便说一下,我们使用的是SQL Server2005.)我们有成千上万的存储过程,因此手动进行存储是不切实际的.我正在尝试自动执行搜索的最准确方法.

We're working on making our application 100% localizable, and we're mostly there. However, we occasionally find an English string still hard-coded in stored procedures. (We're on SQL Server 2005 by the way.) We have thousands of stored procedures, so going through them by hand is impractical. I'm trying to think of the most accurate means of automating a search.

现在,我知道没有办法搜索英语"字符串-但是,搜索以单引号和大约20个以上的字符为界的字符串应该能清除大部分.对于我们的目的而言已经足够好了.但是在存储过程的注释中,我也期待很多假阳性.

Now, I know there's no means to search for "English" strings - but searching for strings bounded by single quotes and perhaps 20+ characters long should flush MOST of them out. Good enough for our purposes now. But I'm anticipating a lot of false-positives in the comments of the stored procedures, too.

那么您将如何处理呢? SMO是否可以让我从存储过程中的注释中挑出SQL?我必须使用OBJECT_DEFINITION()并开始破解一些令人恐惧的正则表达式吗?

So how would you approach this? Would SMO let me tease apart the SQL in a stored procedure from the comments in it? Do I have to use OBJECT_DEFINITION() and start hacking out some terrifying regular expressions?

人们非常感谢.

另一种想法:Microsoft通过Visual Studio提供了可以解析SQL的程序集.我已经用过了,而且使用起来很简单.您也许可以使用它来解析存储过程的文本;它可以返回语句中各种标记的列表,包括标记的类型.因此,它应该能够帮助您区分您可能会感兴趣的文本字符串与注释中哪些内容可以忽略.这里有更多详细信息: http://blogs.msdn.com/b/gertd/archive/2008/08/21/getting-to-the-crown-jewels.aspx .

Another thought: Microsoft provides, with Visual Studio, an assembly that can parse SQL. I've used it, and it's fairly simple to use. You might be able to use it to parse the text of your stored procedures; it can return a list of the various tokens in your statements, including the type of the token. So, it should be able to help you differentiate between what is a string of text you might be interested in vs. what is part of a comment and can be ignored. There are more details here: http://blogs.msdn.com/b/gertd/archive/2008/08/21/getting-to-the-crown-jewels.aspx.

基本上,从.NET,您将打开与数据库的连接,并在syscomments中查询存储过程的文本.您将遍历每个过程,并使用这些解析器对其进行解析.然后,您将使用Sql100ScriptGenerator从解析的文本中获取标记,遍历标记并查找其类型为ASCII或Unicode字符串文字的标记.对于那些字符串,请检查其长度以查看其长度是否超过20,如果存在,请标记字符串和procs,以供进一步检查.

Basically, from .NET, you'd open a connection to your database and query syscomments for your stored procedures' text. You'd loop through each procedure and parse it using these parser. Then you'd use the Sql100ScriptGenerator to get the tokens out of the parsed text, loop through the tokens and look for tokens whose types are either ASCII or Unicode string literals. For those strings, check their length to see if it's 20+, and if it is, flag the strings and the procs as needing further review.

我玩了一下,这是一个非常原始示例,用来说明基本原理:

I played around with it a bit, and here is a very raw example to illustrate the basic principle:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using Microsoft.Data.Schema;
using Microsoft.Data.Schema.ScriptDom;
using Microsoft.Data.Schema.ScriptDom.Sql;

namespace FindHardCodedStrings
{
    class Program
    {
        static void Main(string[] args)
        {
            using (SqlConnection conn = new SqlConnection())
            {
                SqlConnectionStringBuilder bldr = new SqlConnectionStringBuilder();
                bldr.DataSource = "localhost\\sqlexpress";
                bldr.InitialCatalog = "msdb";
                bldr.IntegratedSecurity = true;

                conn.ConnectionString = bldr.ConnectionString;

                SqlCommand cmd = conn.CreateCommand();
                cmd.CommandType = System.Data.CommandType.Text;
                cmd.CommandText = "select [text] from syscomments";

                SqlDataAdapter da = new SqlDataAdapter(cmd);
                DataSet ds = new DataSet();
                da.Fill(ds);

                TSql100Parser parser = new TSql100Parser(false);
                Sql100ScriptGenerator gen = new Sql100ScriptGenerator();
                gen.Options.SqlVersion = SqlVersion.Sql100;

                foreach (DataRow proc in ds.Tables[0].Rows)
                {
                    string txt = proc[0].ToString();
                    using (System.IO.TextReader sr = new System.IO.StringReader(txt))
                    {
                        IList<ParseError> errs;
                        IScriptFragment frag = parser.Parse(sr, out errs);

                        if (null == frag)
                            continue;

                        IList<TSqlParserToken> tokens = gen.GenerateTokens((TSqlFragment)frag);

                        foreach (TSqlParserToken token in tokens)
                        {
                            if (token.TokenType == TSqlTokenType.UnicodeStringLiteral || token.TokenType == TSqlTokenType.AsciiStringLiteral)
                            {
                                if (token.Text.Length >= 20)
                                    Console.WriteLine("String found: " + token.Text);
                            }
                        }
                    }
                }

            }
        }
    }
}