k*******r 发帖数: 90 | 1 现在比较高端的 10G 网卡都支持 RDMA over iWarp
其实这些网卡的ibverbs实现不仅支持 RDMA 通信,也可以实现IP packet的RDMA
简单说来就是另外一头还是 TCP/IP,
但这头可以通过 ibverbs 直接访问 IP Packet
硬件可以直接把IP包 DMA到内存,省去了kernel copy和系统调用的开销
其实我不确定这个方案比kernel里面的iptables实现效率会提升多少
但是灵活性肯定会大大提升, 开发难度也不会太高
同样的思路也可以拿来搞 L2 load balancer, ipsec gateway
有兴趣可以继续聊聊 |
m**k 发帖数: 290 | 2 可不可以讲得详细些?
【在 k*******r 的大作中提到】 : 现在比较高端的 10G 网卡都支持 RDMA over iWarp : 其实这些网卡的ibverbs实现不仅支持 RDMA 通信,也可以实现IP packet的RDMA : 简单说来就是另外一头还是 TCP/IP, : 但这头可以通过 ibverbs 直接访问 IP Packet : 硬件可以直接把IP包 DMA到内存,省去了kernel copy和系统调用的开销 : 其实我不确定这个方案比kernel里面的iptables实现效率会提升多少 : 但是灵活性肯定会大大提升, 开发难度也不会太高 : 同样的思路也可以拿来搞 L2 load balancer, ipsec gateway : 有兴趣可以继续聊聊
|
k*******r 发帖数: 90 | 3 mellanox 这个 pdf 对于 RDMA 编程有简单的介绍
http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_P
这封邮件介绍了如何使用 IBV_QPT_RAW_ETH 来直接访问 L2
http://listarc.com/showthread.php?157108-Libibverbs:%2520add%25
Adding a IBV_QPT_RAW_ETH enables a L2 traffic kernel bypass using user-space
verbs API.
The L2 RAW_ETH acceleration assumes that user application transmits and
receives a whole L2 frame including MAC/IP/UDP/TCP headers. Depending on
frame content and available HW filters in HW any L2 traffic type could be
accelerated.
The sample code for IBV_QPT_RAW_ETH is provided by NES as implementation of
IPv4 multicast acceleration.
In this sample, first the application creates IBV_QPT_RAW_ETH QP with
associated CQ,PD, completion channels as it is performed for RDMA connection.
Next step is enabling L2 MAC address RX filters for directing received
multicasts to the RAW_ETH QPs using ibv_attach_multicast() verb.
From this point the application is ready to receive and transmit the L2
traffic.
In multicast acceleration the user application passes to ibv_post_send() the
whole IGMP frame including MAC header, IP header, UDP header and UDP
payload. It is a user responsibility to make IP fragmentaion when required
payload to send is bigger than MTU. Every fragment is a separate L2 frame to
transmit. The ibv_poll_cq() provides an information about the status of
transmit buffer.
On receive path when ibv_poll_cq() provides an information about received L2
packet, the Rx buffer (previously posted by ibv_post_recv() ) contains a
whole L2 frame including MAC header, IP header and UDP header. It is a user
application responsibility to check if received packet is a valid UDP frame
so the fragments must be checked and checksums must be computed.
【在 m**k 的大作中提到】 : 可不可以讲得详细些?
|
L******t 发帖数: 1985 | 4 Did you ever hear of Intel DPDK?
【在 k*******r 的大作中提到】 : 现在比较高端的 10G 网卡都支持 RDMA over iWarp : 其实这些网卡的ibverbs实现不仅支持 RDMA 通信,也可以实现IP packet的RDMA : 简单说来就是另外一头还是 TCP/IP, : 但这头可以通过 ibverbs 直接访问 IP Packet : 硬件可以直接把IP包 DMA到内存,省去了kernel copy和系统调用的开销 : 其实我不确定这个方案比kernel里面的iptables实现效率会提升多少 : 但是灵活性肯定会大大提升, 开发难度也不会太高 : 同样的思路也可以拿来搞 L2 load balancer, ipsec gateway : 有兴趣可以继续聊聊
|
m**t 发帖数: 1292 | 5 any intesting DPDK applications available?
【在 L******t 的大作中提到】 : Did you ever hear of Intel DPDK?
|
L******t 发帖数: 1985 | 6 Plenty nowadays.
Pure software like Vyatta's virtual router.
Hardware appliance like Fortinet's firewalls.
Bunches of startups working on virtual appliances with Intel DPDK being the
enabling technology.
【在 m**t 的大作中提到】 : any intesting DPDK applications available?
|