为什么我的Windows Server 2019和Azure之间的站点到站点VPN连接突然不路由?

问题描述:

今天,当我们所有人都在远程工作时,由于办公室突然发生电源故障,我们遇到了一个奇怪的问题.派遣人员重新启动设备后,我们的办公室Internet连接恢复了,我们可以访问某些服务,但是我们的办公室网络与云之间的站点到站点(S2S)VPN不再起作用.奇怪的是,Azure表示VPN已连接",并且-经过一些创造性的隧道传输后-我能够确认办公室 中的Windows Server 2019还将连接表示为已连接"",因此这似乎是一个路由问题.通过重新启动和Windows更新,该VPN已忠实地工作了10个月,但今天却莫名其妙地瘫痪了.

We're having a strange issue today brought on by an unexpected power failure back in our office while we're all working remotely. After dispatching someone to restart the equipment, our office internet connection came back up and we're able to reach some services, but our Site-to-Site (S2S) VPN between our office network and the cloud is no longer functioning. The odd part is that Azure indicates that the VPN is "Connected", and -- after some creative tunneling -- I was able to confirm that Windows Server 2019 in the office also indicates the connection as "Connected", so this looks like a routing issue. This VPN has worked faithfully for 10 months, through reboots and Windows Updates, and yet today it's inexplicably down.

现在,有一些历史了:早在2019年6月,我们就在洛杉矶的办公室和Azure的资源之间建立了S2S VPN.目标是开始将Azure上的Windows虚拟桌面用于远程员工虚拟桌面,同时使他们能够访问与现场员工相同的资源.那时,我们在洛杉矶的域控制器上运行了以下PowerShell脚本,以将Windows Server 2019配置为具有到Azure的S2S VPN:

Now, some history: Back in June 2019, we set-up an S2S VPN between our office in LA and resources in Azure. The goal was to start using Windows Virtual Desktop on Azure for remote employee virtual desktops, while enabling them to access the same resources as on-site employees. Back then, we ran the following PowerShell script on the domain controller in LA to configure Windows Server 2019 with the S2S VPN to Azure:

Install-WindowsFeature Routing, RemoteAccess, RSAT-RemoteAccess-PowerShell

# Only needed if "RestartNeeded" is "Yes"
# Restart-Computer

# After the machine reboots. Launch PowerShell again to resume the configuration
Install-RemoteAccess -VpnType VpnS2S

# Setting variables
$rrasInterfaceName     = "Azure (vpn-subnet-to-la)"
$azureGatewayIpAddress = "12.74.131.73"
$virtualNetworkRange   = "10.3.0.0/16"
$sharedKey             = "redacted-psk"

Function Invoke-WindowsApi(
    [string] $dllName,
    [Type] $returnType,
    [string] $methodName,
    [Type[]] $parameterTypes,
    [Object[]] $parameters
    )
{
  ## Begin to build the dynamic assembly
  $domain = [AppDomain]::CurrentDomain
  $name = New-Object Reflection.AssemblyName 'PInvokeAssembly'
  $assembly = $domain.DefineDynamicAssembly($name, 'Run')
  $module = $assembly.DefineDynamicModule('PInvokeModule')
  $type = $module.DefineType('PInvokeType', "Public,BeforeFieldInit") 

  $inputParameters = @() 

  for($counter = 1; $counter -le $parameterTypes.Length; $counter++)
  {
     $inputParameters += $parameters[$counter - 1]
  } 

  $method = $type.DefineMethod($methodName, 'Public,HideBySig,Static,PinvokeImpl',$returnType, $parameterTypes) 

  ## Apply the P/Invoke constructor
  $ctor = [Runtime.InteropServices.DllImportAttribute].GetConstructor([string])
  $attr = New-Object Reflection.Emit.CustomAttributeBuilder $ctor, $dllName
  $method.SetCustomAttribute($attr) 

  ## Create the temporary type, and invoke the method.
  $realType = $type.CreateType() 

  $ret = $realType.InvokeMember($methodName, 'Public,Static,InvokeMethod', $null, $null, $inputParameters) 

  return $ret
}

Function Set-PrivateProfileString(
    $file,
    $category,
    $key,
    $value)
{
  ## Prepare the parameter types and parameter values for the Invoke-WindowsApi script
  $parameterTypes = [string], [string], [string], [string]
  $parameters = [string] $category, [string] $key, [string] $value, [string] $file

  ## Invoke the API
  [void] (Invoke-WindowsApi "kernel32.dll" ([UInt32]) "WritePrivateProfileString" $parameterTypes $parameters)
}

# Add and configure S2S VPN interface for VNet1
Add-VpnS2SInterface -Protocol IKEv2 -AuthenticationMethod PSKOnly -ResponderAuthenticationMethod PSKOnly `
 -Name $rrasInterfaceName -Destination $azureGatewayIpAddress -IPv4Subnet @("$($virtualNetworkRange):256")`
 -NumberOfTries 3 -SharedSecret $sharedKey

Set-VpnServerIPsecConfiguration -EncryptionType MaximumEncryption

# default value for Windows 2012 is 100MB, which is way too small. Increase it to 32GB.
Set-VpnServerIPsecConfiguration -SADataSizeForRenegotiationKilobytes 33553408

# TODO: Confirm why this setting is needed/what it does                                                                    
# Seems related to this: https://tools.ietf.org/html/draft-dukes-ikev2-config-payload-00
New-ItemProperty -Path HKLM:\System\CurrentControlSet\Services\RemoteAccess\Parameters\IKEV2 -Name SkipConfigPayload -PropertyType DWord -Value 1 -Force

# Set S2S VPN connections to be persistent by editing the router.pbk file (required admin priveleges)note that the IdelDisconnectSeconds and RedialOnLinkFailure are set for reach adaptors.
Set-PrivateProfileString $env:windir\System32\ras\router.pbk "$($rrasInterfaceName)" "IdleDisconnectSeconds" "0"
Set-PrivateProfileString $env:windir\System32\ras\router.pbk "$($rrasInterfaceName)" "RedialOnLinkFailure" "1"

# Restart the RRAS service
Restart-Service RemoteAccess

Connect-VpnS2SInterface -Name $rrasInterfaceName

route -p ADD 10.1.0.0 MASK 255.255.0.0 10.3.0.1 IF 30

最后的静态路由规则可确保发往10.1.0.x范围内的Windows虚拟桌面计算机的数据包通过S2S VPN另一端的网关10.3.0.1进行路由.S2S VPN VNet与WVD连接的VNet对等.

The static routing rule at the end ensures that packets destined for the Windows Virtual Desktop machines in the 10.1.0.x range get routed through the gateway on the other side of the S2S VPN at 10.3.0.1. The S2S VPN VNet is peered to the VNet that WVD is connected to.

再次,我想强调一下,自从6月设置以来,我们没有对Azure VPN或服务器配置进行任何更改.

Again, I want to emphasize we have made no changes to either the Azure VPN or server configuration since this was setup in June.

路由表如下:

===========================================================================
Interface List
 15...6c 4b 90 21 ab 9b ......Intel(R) Ethernet Connection (2) I219-LM
 27...........................Azure (vpn-subnet-to-la)
  1...........................Software Loopback Interface 1
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination        Netmask          Gateway       Interface  Metric
          0.0.0.0          0.0.0.0  192.168.100.254    192.168.100.1    281
         10.3.0.0      255.255.0.0         On-link      169.254.0.27    281
     10.3.255.255  255.255.255.255         On-link      169.254.0.27    281
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331
      169.254.0.0      255.255.0.0         On-link      169.254.0.27    281
     169.254.0.27  255.255.255.255         On-link      169.254.0.27    281
  169.254.255.255  255.255.255.255         On-link      169.254.0.27    281
    192.168.100.0    255.255.255.0         On-link     192.168.100.1    281
    192.168.100.1  255.255.255.255         On-link     192.168.100.1    281
  192.168.100.255  255.255.255.255         On-link     192.168.100.1    281
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331
        224.0.0.0        240.0.0.0         On-link     192.168.100.1    281
        224.0.0.0        240.0.0.0         On-link      169.254.0.27    281
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331
  255.255.255.255  255.255.255.255         On-link     192.168.100.1    281
  255.255.255.255  255.255.255.255         On-link      169.254.0.27    281
===========================================================================
Persistent Routes:
  Network Address          Netmask  Gateway Address  Metric
         10.1.0.0      255.255.0.0         10.3.0.1       1
          0.0.0.0          0.0.0.0  192.168.100.254  Default
===========================================================================

IPv6 Route Table
===========================================================================
Active Routes:
 If Metric Network Destination      Gateway
  1    331 ::1/128                  On-link
 15    281 fe80::/64                On-link
 15    281 fe80::6430:2788:424f:47fb/128
                                    On-link
  1    331 ff00::/8                 On-link
 15    281 ff00::/8                 On-link
===========================================================================
Persistent Routes:
  None

要点:

  • 192.168.100.1是提供与Azure的VPN连接的域控制器.
  • 192.168.100.254是到互联网的路由器.
  • DC的默认网关是192.168.100.254(因此,默认情况下,DC通过路由器将流量路由到Internet).
  • 将网络配置为从DC而非路由器获取DHCP租约.
  • 将DC配置为发出DHCP租约,该租约使用DC作为默认网关,以便来自办公网络其余部分的发往云的数据包通过VPN,而发往互联网的数据包被转发到路由器.

使用此配置,Internet通信正常.办公网络上的所有内容都可以正常访问Internet.但是云无法访问本地网络上的任何内容,反之亦然.

With this configuration, internet traffic is working fine. Everything on the office network is able to reach the internet just fine. But the cloud can't access anything on the local network and vice-versa.

以下是服务器指示S2S接口状态的信息:

Here's what the server indicates the status of the S2S Interface is:

Get-VpnS2SInterface -Name "Azure (vpn-subnet-to-la)"

RoutingDomain   Name                 Destination          AdminStatus  ConnectionState IPv4Subnet
-------------   ----                 -----------          -----------  --------------- ----------
 -              Azure (vpn-subnet... {12.74.131.73}       True         Connected       {10.3.0.0/16:256}

这是一条跟踪路由,显示通过路由器错误地路由了发往云的流量:

Here's a trace route showing that traffic destined for the cloud is being routed wrongly through the router:

tracert 10.1.2.7

Tracing route to 10.1.2.7 over a maximum of 30 hops

  1    <1 ms    <1 ms    <1 ms  dsldevice.attlocal.net [192.168.100.254]
  2     *     *     *
  3     *     *     *

为什么Windows无法通过正确的接口进行路由?

Why is Windows not routing through the correct interface?

似乎意外的断电导致Windows重新初始化S2S接口,因此它具有不同的接口ID.请注意,在我6月份回溯的原始脚本中,接口号为 30 .

It appears that the unexpected power outage caused Windows to reinitialize the S2S interface so that it has a different interface ID. Note that in the original script I ran back in June, the interface number was 30.

但是,当我删除静态路由并重新添加它时,我得到了:

But, when I deleted the static route and re-added it, I got:

route delete 10.1.0.0
route -p ADD 10.1.0.0 MASK 255.255.0.0 10.3.0.1 IF 30

The route addition failed: The system cannot find the file specified.

这促使我查看 route print 输出顶部的接口列表:

This prompted me to review the interface list at the top of the route print output:

===========================================================================
Interface List
 15...6c 4b 90 21 ab 9b ......Intel(R) Ethernet Connection (2) I219-LM
 27...........................Azure (vpn-subnet-to-la)
  1...........................Software Loopback Interface 1
===========================================================================

请注意,接口号现在为 27 .所以我跑了:

Note that the interface number is now 27. So I ran:

route -p ADD 10.1.0.0 MASK 255.255.0.0 10.3.0.1 IF 27
 OK!

现在,当我运行跟踪路由时:

Now when I run trace route:

tracert 10.1.2.7

Tracing route to 10.1.2.7 over a maximum of 30 hops

  1    <1 ms    <1 ms    <1 ms  server.subdomain.mydomain.com [192.168.100.1]
  2    34 ms    33 ms    35 ms  10.1.2.7

Trace complete.