Powershell在Word文档中搜索匹配字符串
我有一个简单的要求.我需要在Word文档中搜索一个字符串,结果需要在文档中找到匹配的行/一些单词.
I have a simple requirement. I need to search a string in Word document and as result I need to get matching line / some words around in document.
到目前为止,我可以在包含Word文档的文件夹中成功搜索字符串,但是它会根据是否可以找到搜索字符串返回True/False.
So far, I could successfully search a string in folder containing Word documents but it returns True / False based on whether it could find search string or not.
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "c:\MORLAB"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output = "c:\wordfiletry.txt"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "CRHPCD01"
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
$wordFound = $range.find.execute($findText)
if($wordFound)
{
"$file.fullname has $wordfound" | Out-File $output -Append
}
}
$document.close()
$application.quit()
}
getStringMatch
#ERROR REPORTING ALL
Set-StrictMode -Version latest
$path = "c:\Temp"
$files = Get-Childitem $path -Include *.docx,*.doc -Recurse | Where-Object { !($_.psiscontainer) }
$output = "c:\temp\wordfiletry.csv"
$application = New-Object -comobject word.application
$application.visible = $False
$findtext = "First"
$charactersAround = 30
$results = @{}
Function getStringMatch
{
# Loop through all *.doc files in the $path directory
Foreach ($file In $files)
{
$document = $application.documents.open($file.FullName,$false,$true)
$range = $document.content
If($range.Text -match ".{$($charactersAround)}$($findtext).{$($charactersAround)}"){
$properties = @{
File = $file.FullName
Match = $findtext
TextAround = $Matches[0]
}
$results += New-Object -TypeName PsCustomObject -Property $properties
}
}
If($results){
$results | Export-Csv $output -NoTypeInformation
}
$document.close()
$application.quit()
}
getStringMatch
import-csv $output
有两种方法可以获取您想要的东西.一种简单的方法是,因为您已经拥有文档的文本,可以对它执行正则表达式匹配并返回结果等等.这有助于尝试解决在文档中出现一些单词的问题.
There are a couple of ways to get what you want. A simple approach is since you have the text of the document already lets perform a regex match on it and return the results and more. This helps in trying to address getting some words around in document.
我们有变量$charactersAround
,它设置要匹配$findtext
的字符数.另外,尽管我的输出更适合CSV文件,所以我使用$results
捕获属性的哈希表,最后将这些属性输出到csv文件.
We have the variable $charactersAround
which sets the number of characters to match around the $findtext
. Also I though the output was a better fit for a CSV file so I used $results
to capture a hashtable of properties that, in the end, are output to a csv file.
请确保为您自己的测试更改变量.现在,我们使用正则表达式来定位匹配项,这将打开一个无限的可能性.
Be sure to change the variables for your own testing. Now that we are using regex to locate the matches this opens up a world of possibilities.
示例输出
Match TextAround File
----- ---------- ----
First dley Air Services Limited dba First Air meets or exceeds all term C:\Temp\20120315132117214.docx