Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C...

31
Ed Warnicke, Cisco Tomasz Zawadzki, Intel

Transcript of Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C...

Page 1: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Ed Warnicke, Cisco

Tomasz Zawadzki, Intel

Page 2: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Agenda

SPDK iSCSI target overview

FD.io and VPP

SPDK iSCSI VPP integration

Q&A

2

Page 3: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Notices & DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration.

No computer system can be absolutely secure.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks .

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .

Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.

Intel® Advanced Vector Extensions (Intel® AVX)* provides higher throughput to certain processor operations. Due to varying processor power characteristics, utilizing AVX instructions may cause a) some parts to operate at less than the rated frequency and b) some parts with Intel® Turbo Boost Technology 2.0 to not achieve any or maximum turbo frequencies. Performance varies depending on hardware, software, and system configuration and you can learn more at http://www.intel.com/go/turbo.

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.

© 2018 Intel Corporation. Intel, the Intel logo, and Intel Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others.

Page 4: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

4

Page 5: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 5

Moving to UserspaceAlternate solutions (RDMA) are moving in strides

TCP/IP transport has been present for much longer

There are still use cases for TCP/IP

Even NVMe-oF transport will be using it in the future

Page 6: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 6

SPDK iSCSI target overviewUsing POSIX sockets for data path negates benefits of userspace storage services by:

• Having syscalls go to kernel and back

• Adding back interrupts

SP

DK

U

S

E

R

S

P

A

C

E

K

E

R

N

E

L

S

P

A

C

E

NVMe Driver

Block Device Abstraction

iSCSItarget

Ke

rne

lL4 TCP

L2/L3 MAC/IP

NIC Driver

POSIX sockets

Page 7: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

7

Page 8: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

FD.io: The Universal Dataplane

• Project at Linux Foundation• Multi-party

• Multi-project

• Software Dataplane• High throughput

• Low Latency

• Feature Rich

• Resource Efficient

• Bare Metal/VM/Container

• Multiplatform

FD.io Foundation 8

Bare Metal/VM/Container

• Fd.io Scope:

• Network IO - NIC/vNIC <-> cores/threads

• Packet Processing –Classify/Transform/Prioritize/Forward/Terminate

• Dataplane Management Agents - ControlPlane

Dataplane Management Agent

Packet Processing

Network IO

Page 9: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP – Vector Packet ProcessingCompute Optimized SW Network Platform

Packet Processing Software Platform

• High performance

• Linux user space

• Runs on compute CPUs: - And “knows” how to run them well !

Shipping at volume in server & embedded products

9

Packet Processing

Dataplane Management Agent

Network IO

Bare-metal / VM / Container

Page 10: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP – How does it work?Compute Optimized SW Network Platform

1Packet processing is decomposedinto a directed graph of nodes …

Packet 0

Packet 1

Packet 2

Packet 3

Packet 4

Packet 5

Packet 6

Packet 7

Packet 8

Packet 9

Packet 10

… packets move through graph nodes in vector …

2

Microprocessor

… graph nodes are optimized to fit inside the instruction cache …

… packets are pre-fetched into the data cache.

Instruction Cache3

Data Cache4

3

4

Makes use of modern Intel® Xeon® Processor micro-architectures.

Instruction cache & data cache always hot Minimized memory latency and usage.

vhost-user-

input

af-packet-

inputdpdk-input

ip4-lookup-

mulitcastip4-lookup*

ethernet-

input

mpls-inputlldp-input

arp-inputcdp-input

...-no-

checksum

ip6-inputl2-input ip4-input

ip4-load-

balance

mpls-policy-

encap

ip4-rewrite-

transit

ip4-

midchain

interface-

output

* Each graph node implements a “micro-NF”, a “micro-NetworkFunction” processing packets.

Page 11: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Architecture: Packet Processing

Packet

Vector of n packets

ethernet-input

dpdk-input

vhost-user-input af-packet-input

ip4-inputip6-input arp-input

ip6-lookup

ip4-lookup

ip6-localip6-rewriteip4-

rewriteip4-local

mpls-input

Packet Processing Graph

Graph Node

Input Graph Node

0 1 32 … n

Page 12: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Architecture: Splitting the Vector

Packet

Vector of n packets

ethernet-input

dpdk-input

vhost-user-input af-packet-input

ip4-inputip6-input arp-input

ip6-lookup

ip4-lookup

ip6-localip6-rewriteip4-

rewriteip4-local

mpls-input

Packet Processing Graph

Graph Node

Input Graph Node

20 1 3 … n

Page 13: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Architecture: Plugins0 1 32 … n

Packet

Vector of n packets

ethernet-input

dpdk-input

vhost-user-input af-packet-input

ip4-inputip6-input arp-input

ip6-lookup

ip4-lookup

ip6-localip6-rewriteip4-

rewriteip4-local

mpls-input

custom-1

custom-2 custom-3

Packet Processing Graph

Graph Node

Input Graph Node

/usr/lib/vpp_plugins/foo.soPlugin Plugins are:

First class citizensThat can:

Add graph nodesAdd APIRearrange the graph

Hardware Plugin

hw-accel-input

Skip sftw nodeswhere work is done byhardware already

Can be built independently of VPP source tree

Page 14: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

FD.io Foundation 14

Page 15: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

K8s Networking Microservice: Contiv-VPP

Node

Kubelet

CNIvet

h

Contiv-VPP

VPP Agent

… PodPodPod

PodPodPod

VPP

Node

Kubelet

CNI vet

h

Contiv-VPP

VPP Agent

…PodPodPod

PodPodPod

VPP

K8s Master

vswitch CNF Podvswitch CNF Pod

IPv4/IPv6/SRv6 Network

tapv2 tapv2

Page 16: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Motivation: Container networking

FD.io Mini-Summit at KubeCon Europe 2018

FIFO

TCP

IP (routing)

device

send()

FIFO

TCP

IP (routing)

device

recv()

kernel

glibc

PID 1234 PID 4321

Page 17: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Why not this?

PID 1234 PID 4321

recv()

FIFOFIFO

TCP

IP

DPDK

send()

Session

FD.io Mini-Summit at KubeCon Europe 2018

VPP

Page 18: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Host Stack

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

Page 19: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Host Stack: SVM FIFOs

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

Allocated within shared memory segments with or without file backing (ssvm/memfd)

Fixed position and size Lock free enqueue/dequeue but atomic size

increment Option to dequeue/peek data Support for out-of-order data enqueues

shmsegmentrx tx

Page 20: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

VPP Host Stack: TCP

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

Clean-slate implementation “Complete” state machine implementation Connection management and flow control

(window management) Timers and retransmission, fast retransmit,

SACK NewReno congestion control, SACK based fast

recovery Checksum offloading Linux compatibility tested with IWL TCP

protocol tester

Page 21: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

SPDK w/ VPP Host Stack: More network option

FD.io Mini-Summit at KubeCon Europe 2018

Session

App

Binary API

TCP

IP, DPDK

VPP

shmsegmentrx tx

SCTP UDP TLS

IPv4, IPv6 Bridging/Routing MPLSoX, SRv6 VXLAN{-GPE}, Geneve, GRE Much much more

iSCSI/SPDK

Page 22: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Future: StorageUnified Storage/Networking Graph

FD.io Foundation 22

- Unified Storage Networking Graph allows hyper efficient processing of blocks to packets and packets to blocks

- Avoid copies- Avoid cache misses- Utilize other VPP performance tricks

- Most Storage IO is connected to Network IO- Can extend with additional protocols like

ROCEv2

vhost-user-

input

af-packet-

inputdpdk-input

ip4-lookup-

mulitcastip4-lookup*

ethernet-

input

mpls-inputlldp-input

arp-inputcdp-input

...-no-

checksum

ip6-inputl2-input ip4-input

ip4-load-

balance

mpls-policy-

encap

ip4-rewrite-

transit

ip4-

midchain

interface-

output

spdk-input

block processing

iSCSI ROCEv2

tcp-output

Page 23: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

23

Page 24: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 24

iSCSI target architecture Extension

SP

DK

U

S

E

R

S

P

A

C

E

K

E

R

N

E

L

S

P

A

C

E

U

S

E

R

S

P

A

C

E

NVMe Driver

Block Device Abstraction

iSCSItarget

POSIX sockets

Ke

rne

lL4 TCP

L2/L3 MAC/IP

NIC Driver

SP

DK

NVMe Driver

Block Device Abstraction

iSCSItarget

VPP API

VP

PTCP host stack

VPP Graph nodes

DPDK NIC Driver

Network Services

APIStorage Services

Page 25: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 25

iSCSI target architecture with VPPSPDK iSCSI target is using VPP Communications Library (VCL):

• No kernel syscalls from top to bottom

• Better CPU utilization

• Extensive VPP networking capabilities available

USERSPACE PROCESS

SP

DK

NVMe Driver

Block Device Abstraction

iSCSItarget

VCL API

VP

PTCP host stack

VPP Graph nodes

DPDK NIC Driver

Network Services

APIStorage Services

Shared memory

USERSPACE PROCESS

Page 26: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 26

net framework abstractioniSCSI target is not aware of socket types used

All net framework types can be used at the same

time

POSIX sockets are still available

VPP support - optional at compile time

Enables usage in other libraries in the future

(such as NVMe-oF target)

SP

DKiSCSI

target

Net framework

Kernel VPP

NVMe-oFtarget

POSIX sockets

VPP API

Planned

APIStorage Services

Page 27: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 27

VPP integrationKey steps for running SPDK iSCSI target with VPP:

1. Build SPDK with VPP support

2. Run VPP process

3. Configure interfaces using VPPCTL utility

4. Start SPDK iSCSI target, which can now utilize VPP interfaces

All configuration steps can be found on spdk.io iSCSI target documentation

http://www.spdk.io/doc/iscsi.html

Page 28: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

Intel® Builders 28

What about performance DATA?

Page 29: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3
Page 30: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3

30

Backup

Page 31: Ed Warnicke, Cisco Tomasz Zawadzki, Intel · S E R S P A C E K E R N E L S P A C E U S E R S P A C E NVMe Driver Block Device Abstraction iSCSI target POSIX sockets el L4 TCP L2/L3