由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
EmergingNetworking版 - BGP 多线切换问题。。。
相关主题
shift all outbound traffic to one BGP peerBGP question
iBGP也马上可以作pe-ce routing protocol了iBGP as pe-ce protocol draft is out
Internet 问题请教:如果现在有一个big network,我是ISP,要扩展一下,
Real world problem问个俗问题
iBGP fully meshed?现在北美都有哪些ISP已经IPv6 available了?
junos/junosE HA: GR/NSR, BFD with GR/NSR问个关于ISP网络带宽的问题
问一个简单的PE router的问题[合集] BGP question
BGP Discontiguous AS你们说google做自己的router/switch有什么意义?
相关话题的讨论汇总
话题: bgp话题: bfd话题: isp话题: frr话题: ip
进入EmergingNetworking版参与讨论
1 (共1页)
u*****e
发帖数: 47
1
深夜上来向大家再请教一个问题:
我有一个路由器,上面两个ISP线路,也就是两个BGP sessions. 我的目的是在其中一
根线出问题,比如down了,我的数据流能以最短的时间,切换到另外一根线上。丢包能
越少越好。
起初我配置了两根线,两个BGP,就是希望,在一根线出问题时,另外一根线可以take
over所有的数据。实际上也是这样,但是我发现了一个问题:这个切换时间好像比较长
,不能满足我们的要求。
一下是我的分析,请大家给点意见:
当本地路由器60秒内没有收到keepalive 包(比如对方的路由器down了),他就要再等
一个holdertimer 180秒的时间,然后,BGP session 就被认为彻底的down了,TCP链接
也被关闭了。这个时候,路由器才会删除原来的路由,然后接受从另外一根线来的路由
。(两根线都是通告全网路由。)我感觉这个过程应该是对的把。可是从链路down开始
,到本地路由器完成路由表的更新,好像至少240秒。。。这个是不是太长了?
我现在想到的方法是在确认uplink down时,直接进行 BGP 软清。 但是还没有做过测
试,不知道,这样会不会缩短整个过程?或者大家有什么别的solution去解决这个问题
。。。
这个地方我有个假设,我是监测到对方链路出问题。。。请问我应该怎样做,才能监测
到这个链路down 呢?
f*****m
发帖数: 416
2
单从control plane而言,你需要BFD
data plane上, 你需要FRR

take

【在 u*****e 的大作中提到】
: 深夜上来向大家再请教一个问题:
: 我有一个路由器,上面两个ISP线路,也就是两个BGP sessions. 我的目的是在其中一
: 根线出问题,比如down了,我的数据流能以最短的时间,切换到另外一根线上。丢包能
: 越少越好。
: 起初我配置了两根线,两个BGP,就是希望,在一根线出问题时,另外一根线可以take
: over所有的数据。实际上也是这样,但是我发现了一个问题:这个切换时间好像比较长
: ,不能满足我们的要求。
: 一下是我的分析,请大家给点意见:
: 当本地路由器60秒内没有收到keepalive 包(比如对方的路由器down了),他就要再等
: 一个holdertimer 180秒的时间,然后,BGP session 就被认为彻底的down了,TCP链接

n**********l
发帖数: 271
3
are you concerned about inbound traffic at all?
b***p
发帖数: 700
4
Ok, your purpose is to speed up bgp convergence to reflect uplink/node
failure. Normally there is 2 ways to achieve it, 1 is sub-second level and
another one is tens-second level.
So, let me correct your theory first. Cisco IOS default BGP timers are 60/
180, where the hold time is 180 sec. From the RFC 4271,
If a system does not receive successive KEEPALIVE, UPDATE, and/or
NOTIFICATION messages within the period specified in the Hold Time
So the total time to wait before bgp releases the peer is down is 180 sec,
instead of 60+180=240.
Now you have the first solution, decrease the bgp timers. You have to check
with your ISPs, what's minimum bgp timers they support. If you select 10/30,
so it means your convergence time is 30 second level. But you show not and
is not suggested to use over-aggressive timers like 1/3, because if the
routers' cpu are busy to process other jobs, it may miss keepalive pkt and
trigger bgp termination.
The 2nd solution is to use BFD as other guy mentioned. Well, you have to ask
your ISP again to make sure if they support it or not. This is a light-
weight keepalive process, which can achieve sub-second level failure
detection. On bgp side, it is a single line config like "neighbor x.x.x.x
fall-over bfd".

take

【在 u*****e 的大作中提到】
: 深夜上来向大家再请教一个问题:
: 我有一个路由器,上面两个ISP线路,也就是两个BGP sessions. 我的目的是在其中一
: 根线出问题,比如down了,我的数据流能以最短的时间,切换到另外一根线上。丢包能
: 越少越好。
: 起初我配置了两根线,两个BGP,就是希望,在一根线出问题时,另外一根线可以take
: over所有的数据。实际上也是这样,但是我发现了一个问题:这个切换时间好像比较长
: ,不能满足我们的要求。
: 一下是我的分析,请大家给点意见:
: 当本地路由器60秒内没有收到keepalive 包(比如对方的路由器down了),他就要再等
: 一个holdertimer 180秒的时间,然后,BGP session 就被认为彻底的down了,TCP链接

b***p
发帖数: 700
5
if i am right, this is an enterprise network with only ipv4 unicast. Where
does the FRR kick in?

【在 f*****m 的大作中提到】
: 单从control plane而言,你需要BFD
: data plane上, 你需要FRR
:
: take

f*****m
发帖数: 416
6
IP FRR

【在 b***p 的大作中提到】
: if i am right, this is an enterprise network with only ipv4 unicast. Where
: does the FRR kick in?

u*****e
发帖数: 47
7
先上来感谢各位高手的回复。太感谢大家了,特别是buyup。 你的回复真的很详细,下
面我就要好好研究一下这两种方法。如果再有问题,我再来更新!总之太感谢了!!!
u*****e
发帖数: 47
8
Hi buyup,
再次感谢你的回复。我下午网上查了点资料,关于你说的BFD, 现在有几个问题还请你
帮忙解释一下:
一旦我们启用了bfd以后,比如我设置那个bfd liveness detection 是一秒,那么一旦
我本地路由器没有收到回复,是不是路由起马上就把当前bgp session 给kill, 转到另
外一个bgp 线上了? 这个过程是怎么运作的?在网上没有看到说的很清楚的。。
你提到两个方法,第一种是10秒级的,就是修改timer,我感觉这个不是很适合我们的
video service. 估计这个bfd 可能更适合我们的需求。
另外,又看到大家提出,ip FRR: 如果我们使用bfd,我们是否仍然需要这个呢?我们
的所有设备都是juniper 的,我看了,juniper 好像有个类似的协议:Host fast
reroute (HFRR)。 你熟悉这个吗?
谢谢!

check

【在 b***p 的大作中提到】
: Ok, your purpose is to speed up bgp convergence to reflect uplink/node
: failure. Normally there is 2 ways to achieve it, 1 is sub-second level and
: another one is tens-second level.
: So, let me correct your theory first. Cisco IOS default BGP timers are 60/
: 180, where the hold time is 180 sec. From the RFC 4271,
: If a system does not receive successive KEEPALIVE, UPDATE, and/or
: NOTIFICATION messages within the period specified in the Hold Time
: So the total time to wait before bgp releases the peer is down is 180 sec,
: instead of 60+180=240.
: Now you have the first solution, decrease the bgp timers. You have to check

b***p
发帖数: 700
9
from your prev post, you have ONE router facing SP network. If the BFD
detects other side is down and notify the BGP, this bgp will withdraw all
prefix and re-advertise the ones from backup SP. I am not sure if this is
what you are asking.
Above it is the fast failure detection between your network and SP.
Fulcrum's idea IP FRR is to fast-reroute your internal networks, if one node
/link is down on the path, the router can switch to pre-calculated path
during the IGP working on the new path. But i am very green on this.

【在 u*****e 的大作中提到】
: Hi buyup,
: 再次感谢你的回复。我下午网上查了点资料,关于你说的BFD, 现在有几个问题还请你
: 帮忙解释一下:
: 一旦我们启用了bfd以后,比如我设置那个bfd liveness detection 是一秒,那么一旦
: 我本地路由器没有收到回复,是不是路由起马上就把当前bgp session 给kill, 转到另
: 外一个bgp 线上了? 这个过程是怎么运作的?在网上没有看到说的很清楚的。。
: 你提到两个方法,第一种是10秒级的,就是修改timer,我感觉这个不是很适合我们的
: video service. 估计这个bfd 可能更适合我们的需求。
: 另外,又看到大家提出,ip FRR: 如果我们使用bfd,我们是否仍然需要这个呢?我们
: 的所有设备都是juniper 的,我看了,juniper 好像有个类似的协议:Host fast

c*****i
发帖数: 631
10
IP FRR是based on IGP,对你这个case没啥用。你和isp之间是direct connect ebgp还
是有别的?另外一个可以考虑的是用IP SLA来track,不知道可行不。

【在 u*****e 的大作中提到】
: Hi buyup,
: 再次感谢你的回复。我下午网上查了点资料,关于你说的BFD, 现在有几个问题还请你
: 帮忙解释一下:
: 一旦我们启用了bfd以后,比如我设置那个bfd liveness detection 是一秒,那么一旦
: 我本地路由器没有收到回复,是不是路由起马上就把当前bgp session 给kill, 转到另
: 外一个bgp 线上了? 这个过程是怎么运作的?在网上没有看到说的很清楚的。。
: 你提到两个方法,第一种是10秒级的,就是修改timer,我感觉这个不是很适合我们的
: video service. 估计这个bfd 可能更适合我们的需求。
: 另外,又看到大家提出,ip FRR: 如果我们使用bfd,我们是否仍然需要这个呢?我们
: 的所有设备都是juniper 的,我看了,juniper 好像有个类似的协议:Host fast

相关主题
junos/junosE HA: GR/NSR, BFD with GR/NSRBGP question
问一个简单的PE router的问题iBGP as pe-ce protocol draft is out
BGP Discontiguous AS如果现在有一个big network,我是ISP,要扩展一下,
进入EmergingNetworking版参与讨论
L******t
发帖数: 1985
11
As LZ mentioned 2 BGP sessions, I suppose it's multihoming.
BFD is definitely the way to go. Once link failure detected, LZ's router
will start to reroute through 2nd link per routes already available.
Unless LZ's router is doing transit, no route to withdraw from backup SP.
Correct me if I'm wrong.

node

【在 b***p 的大作中提到】
: from your prev post, you have ONE router facing SP network. If the BFD
: detects other side is down and notify the BGP, this bgp will withdraw all
: prefix and re-advertise the ones from backup SP. I am not sure if this is
: what you are asking.
: Above it is the fast failure detection between your network and SP.
: Fulcrum's idea IP FRR is to fast-reroute your internal networks, if one node
: /link is down on the path, the router can switch to pre-calculated path
: during the IGP working on the new path. But i am very green on this.

u*****e
发帖数: 47
12
先感谢大家的热心回复。我简单的 说一下我的网络结构吧。
(router1)-------(router2)
每个router都连有两个isp,都做了bgp, 两个router之间我还配置了ibgp. 所有的
bgp都接受全网路由。 这样做的目的就是为了在其中一根线断了之后,数据流可以尽快
的切换到其他线上去(不管是去自己路由器的另外一根线,或者是另外一台路由器。不
过我估计他会优先选择本机的另外一根线把。。。。)。
现在基本可以确定,bfd会是我的一个必要选择。但是请问,基于我现在的结构,ip
frr,有需要吗?

【在 L******t 的大作中提到】
: As LZ mentioned 2 BGP sessions, I suppose it's multihoming.
: BFD is definitely the way to go. Once link failure detected, LZ's router
: will start to reroute through 2nd link per routes already available.
: Unless LZ's router is doing transit, no route to withdraw from backup SP.
: Correct me if I'm wrong.
:
: node

b***p
发帖数: 700
13
I know what you mean. I thought LZ has other iBGP peers(R2) with the dual-
homing edge router(R1), in this case the router(R1) should withdraw/re-
advertise the prefixes from backup to iBGP peer(R2).
If LZ only has 1 BGP rtr as the edge, then no re-advertise, just best-select
the backup prefixes.

【在 L******t 的大作中提到】
: As LZ mentioned 2 BGP sessions, I suppose it's multihoming.
: BFD is definitely the way to go. Once link failure detected, LZ's router
: will start to reroute through 2nd link per routes already available.
: Unless LZ's router is doing transit, no route to withdraw from backup SP.
: Correct me if I'm wrong.
:
: node

b***p
发帖数: 700
14
From a quick google, IP FRR is to target 50-ms level traffic convergence. If
I was you, I will leave it to 2nd phase after some experiment and tests,
because your bfd consideration is 1-sec level convergence. you can try some
tech to tune your IGP to have a fast convergence at seconds level.

【在 u*****e 的大作中提到】
: 先感谢大家的热心回复。我简单的 说一下我的网络结构吧。
: (router1)-------(router2)
: 每个router都连有两个isp,都做了bgp, 两个router之间我还配置了ibgp. 所有的
: bgp都接受全网路由。 这样做的目的就是为了在其中一根线断了之后,数据流可以尽快
: 的切换到其他线上去(不管是去自己路由器的另外一根线,或者是另外一台路由器。不
: 过我估计他会优先选择本机的另外一根线把。。。。)。
: 现在基本可以确定,bfd会是我的一个必要选择。但是请问,基于我现在的结构,ip
: frr,有需要吗?

L******t
发帖数: 1985
15
Make sense.
A followup question. Do MX or ASR routers support clustering, like VSS for
Cat6k switch? If so, no need to run any IGP between the two routers.

select

【在 b***p 的大作中提到】
: I know what you mean. I thought LZ has other iBGP peers(R2) with the dual-
: homing edge router(R1), in this case the router(R1) should withdraw/re-
: advertise the prefixes from backup to iBGP peer(R2).
: If LZ only has 1 BGP rtr as the edge, then no re-advertise, just best-select
: the backup prefixes.

c*****i
发帖数: 631
16
其实如果和isp连,isp多半不support bfd的,更没ip frr啥事情

【在 L******t 的大作中提到】
: Make sense.
: A followup question. Do MX or ASR routers support clustering, like VSS for
: Cat6k switch? If so, no need to run any IGP between the two routers.
:
: select

u*****e
发帖数: 47
17

你为啥觉得isp多半是不支持bfd?

【在 c*****i 的大作中提到】
: 其实如果和isp连,isp多半不support bfd的,更没ip frr啥事情
u*****e
发帖数: 47
18
最近都在讨论这个BFD如何切换的问题。我还想到一个问题:假设BFD成功的发现链路
failure,然后BGP也成功切换了。那么当原来那根线路恢复了,现有bgp table 和
routing table会发现什么变化? (这个有点类似于vrrp的一个问题,当原来的
primary node 恢复了以后,会不会发生抢占的问题。。。)
谢谢
s*****g
发帖数: 1055
19
That will be taken care of by BGP protocol itself.
ISPs are very conservative in terms of BGP configuration, for them stability
is more important than flexibility or customer preferences (for example,
Level3 does not even want to turn on ORF capability), your best bet of
faster fail over is to shorten BGP hold timer if you are peering with
loopbacks (if you are peering with physical interface IP, BGP fast external
failover functionality will automatically kick in without waiting for hold
timer to expire)

【在 u*****e 的大作中提到】
: 最近都在讨论这个BFD如何切换的问题。我还想到一个问题:假设BFD成功的发现链路
: failure,然后BGP也成功切换了。那么当原来那根线路恢复了,现有bgp table 和
: routing table会发现什么变化? (这个有点类似于vrrp的一个问题,当原来的
: primary node 恢复了以后,会不会发生抢占的问题。。。)
: 谢谢

L******t
发帖数: 1985
20
Someone please confirm if ISP do not let customers enjoy BFD?
I thought money talks, and in most cases this won't affect ISP's stability?
Good knowledge that you can't learn from books..

【在 c*****i 的大作中提到】
: 其实如果和isp连,isp多半不support bfd的,更没ip frr啥事情
相关主题
问个俗问题[合集] BGP question
现在北美都有哪些ISP已经IPv6 available了?你们说google做自己的router/switch有什么意义?
问个关于ISP网络带宽的问题1/31是一个值得纪念的日子
进入EmergingNetworking版参与讨论
c*****i
发帖数: 631
21
我说的这个意思就是money talk。你一个小客户跑去给att说,人家多半不叼你啊。你
要是大客户当然又不一样了。而且还得看你的接入方式。

【在 L******t 的大作中提到】
: Someone please confirm if ISP do not let customers enjoy BFD?
: I thought money talks, and in most cases this won't affect ISP's stability?
: Good knowledge that you can't learn from books..

u*****e
发帖数: 47
22
那么请问,那些对 time 非常敏感的视频,在线通信等service 公司是如何做到
milisecond level 切换的呢?


stability
external

【在 s*****g 的大作中提到】
: That will be taken care of by BGP protocol itself.
: ISPs are very conservative in terms of BGP configuration, for them stability
: is more important than flexibility or customer preferences (for example,
: Level3 does not even want to turn on ORF capability), your best bet of
: faster fail over is to shorten BGP hold timer if you are peering with
: loopbacks (if you are peering with physical interface IP, BGP fast external
: failover functionality will automatically kick in without waiting for hold
: timer to expire)

s*****g
发帖数: 1055
23
Like what upstairs said, if you are a small customer, ISPs won't bird you,
but if you are a big customer, they will so whatever you want them to do. I
am not sure you can achieve milli-second ISP fail-over, most BFD
implementations only support longer than 10-ms hellos, so typically longer
than 30
milli-second fail detection. IP/MPLS FRR is for a totally different
fail scenario.
Did you check with your ISP? they might be able to accommodate, you first
need to check with your ISP's account person who sold you this service.

【在 u*****e 的大作中提到】
: 那么请问,那些对 time 非常敏感的视频,在线通信等service 公司是如何做到
: milisecond level 切换的呢?
: 
:
: stability
: external

m**k
发帖数: 290
24

~~~~~~~~~~~~~~~~~~~
I

【在 s*****g 的大作中提到】
: Like what upstairs said, if you are a small customer, ISPs won't bird you,
: but if you are a big customer, they will so whatever you want them to do. I
: am not sure you can achieve milli-second ISP fail-over, most BFD
: implementations only support longer than 10-ms hellos, so typically longer
: than 30
: milli-second fail detection. IP/MPLS FRR is for a totally different
: fail scenario.
: Did you check with your ISP? they might be able to accommodate, you first
: need to check with your ISP's account person who sold you this service.

1 (共1页)
进入EmergingNetworking版参与讨论
相关主题
你们说google做自己的router/switch有什么意义?iBGP fully meshed?
1/31是一个值得纪念的日子junos/junosE HA: GR/NSR, BFD with GR/NSR
再请问大侠一个PE router的问题。问一个简单的PE router的问题
很久没来了,贡献一个面经吧。BGP Discontiguous AS
shift all outbound traffic to one BGP peerBGP question
iBGP也马上可以作pe-ce routing protocol了iBGP as pe-ce protocol draft is out
Internet 问题请教:如果现在有一个big network,我是ISP,要扩展一下,
Real world problem问个俗问题
相关话题的讨论汇总
话题: bgp话题: bfd话题: isp话题: frr话题: ip