idnits 2.17.1 draft-yang-nwcrg-bats-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 17, 2019) is 1621 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 377 -- Looks like a reference, but probably isn't: '1' on line 377 == Missing Reference: 'K-1' is mentioned on line 377, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force S. Yang 3 Internet-Draft CUHK(SZ) 4 Intended status: Informational X. Huang 5 Expires: May 20, 2020 R.W. Yeung 6 CUHK 7 J.K. Zao 8 NCTU 9 November 17, 2019 11 BATS Coding Scheme for Multi-hop Data Transport 12 draft-yang-nwcrg-bats-02 14 Abstract 16 This document describes a BATS coding scheme for communication 17 through multi-hop networks. BATS code is a class of efficient linear 18 network coding scheme with a matrix generalization of fountain codes 19 as the outer code, and batch-based linear network coding as the inner 20 code. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at https://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on May 20, 2020. 39 Copyright Notice 41 Copyright (c) 2019 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (https://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 58 2. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 60 2.2. Data Partitioning and Padding . . . . . . . . . . . . . . 4 61 2.3. Data Delivery Procedures . . . . . . . . . . . . . . . . 4 62 2.3.1. Source Node Procedures . . . . . . . . . . . . . . . 4 63 2.3.2. Intermediate Node Procedures . . . . . . . . . . . . 5 64 2.3.3. Destination Node Procedures . . . . . . . . . . . . . 6 65 2.4. Recommendation for the Parameters . . . . . . . . . . . . 6 66 2.5. Example DDP Packet Format . . . . . . . . . . . . . . . . 7 67 2.5.1. Packet Header . . . . . . . . . . . . . . . . . . . . 7 68 2.5.2. Packet Payload . . . . . . . . . . . . . . . . . . . 7 69 2.5.3. Packet Footer . . . . . . . . . . . . . . . . . . . . 8 70 3. BATS Code Specification . . . . . . . . . . . . . . . . . . . 8 71 3.1. Background . . . . . . . . . . . . . . . . . . . . . . . 8 72 3.2. Outer Code Encoder . . . . . . . . . . . . . . . . . . . 9 73 3.3. Inner Code Encoder (Recoder) Recommendations . . . . . . 9 74 3.4. Decoder Recommendations . . . . . . . . . . . . . . . . . 10 75 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 76 5. Security Considerations . . . . . . . . . . . . . . . . . . . 11 77 5.1. Provision of Confidentiality Protection . . . . . . . . . 11 78 5.2. Countermeasures against Pollution Attacks . . . . . . . . 11 79 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 80 6.1. Normative References . . . . . . . . . . . . . . . . . . 12 81 6.2. Informative References . . . . . . . . . . . . . . . . . 12 82 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 13 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 85 1. Introduction 87 This document specifies a BATS code [BATS] scheme for data delivery 88 in multi-hop networks. The BATS code described here includes an 89 outer code and an inner code. The outer code is a matrix 90 generalization of fountain codes (see also the RapterQ code described 91 in RFC 6330 [RFC6330]), which inherits the advantages of reliability 92 and efficiency and possesses the extra desirable property of being 93 network coding compatible. The inner code is formed by linear 94 network coding for combating packet loss, improving the multicast 95 efficiency, etc. A detailed design and analysis of BATS codes are 96 provided in the BATS monograph [BATSMonograph]. 98 The BATS coding scheme can be applied in multi-hop networks formed by 99 wireless communication links, which are inherently unreliable due to 100 interference. Existing network protocols like TCP/IP use end-to-end 101 retransmission and store-and-forward at the relays, so that packet 102 loss would accumulate along the way. 104 The BATS coding scheme can be used for various data delivery 105 applications like file transmission, video streaming, etc. 107 1.1. Requirements Language 109 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 110 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 111 document are to be interpreted as described in RFC 2119 [RFC2119]. 113 2. Procedures 115 2.1. Introduction 117 A BATS coding scheme includes an outer code encoder (also called 118 encoder), an inner code encoder (also called recoder) and a decoder. 119 The BATS coding scheme described in this document can be used by a 120 Data Delivery Protocol (DDP) with the following procedures. 122 Encoding at a source node which has the data for transmission: 124 * The DDP provides the data to be delivered and the related 125 information to the BATS encoder. 127 * The BATS encoder generates a sequence of batches, each 128 consisting of a set of coded packets and the information 129 pertaining to the batch. 131 * The DDP forms and transmits the DDP packets using the batches 132 and the corresponding batch information. 134 Recoding at an intermediate node that does not need the data: 136 * The DDP extracts the batches and the corresponding batch 137 information from its received DDP packets. 139 * A BATS recoder generates recoded packets of a batch. 141 * The DDP forms and transmits DDP packets using the recoded 142 packets and the corresponding batch information. 144 Decoding at a destination node that needs the data: 146 * The DDP extracts the batches and the corresponding batch 147 information from its received DDP packets. 149 * A BATS decoder tries to recover the transmitted data using the 150 received batches. 152 * The DDP sends the decoded data to the application that needs 153 the data. 155 2.2. Data Partitioning and Padding 157 Suppose that the DDP has F octets of data for transmission. The 158 construction of source packets from the data depends on two 159 parameters K and T: 161 o K: the number of source packets. 163 o T: the size of a source packet, in octets. 165 If F is smaller than T*K, the data MUST be padded to have T*K octets, 166 so that the data can be partitioned into K source packets, each of 167 which has T octets. 169 2.3. Data Delivery Procedures 171 2.3.1. Source Node Procedures 173 A source node has the data for transmission. The DDP will first pad 174 and partition the data into K source packets, each containing T 175 octets. The DDP provides the BATS encoder with the following 176 information: 178 o Batch size (M): the number of coded packets in a batch. 180 o Recoding field size (q): the number of elements in the finite 181 field for recoding. 183 o The degree distribution (DD). 185 o A sequence of batch IDs (j, j = 0, 1, ...). 187 o Number of source packets (K). 189 o Packet size (T): the number of octets in a source packet. 191 o The source packets (b[i], i = 0, 1, ..., K-1). 193 Based on the above inputs, the parameters T' and O are calculated as 194 follows: 196 o O = ceil(M*log2(q)/8) 198 o T' = T + O 200 Using this information, the (outer code) encoder generates a batch 201 for each batch ID. For the batch ID j, the encoder returns the DDP 202 that contains 204 o a sparse degree d[j], and 206 o M coded packets (x'[j,i], i =0, 1, ..., M-1), each containing T' 207 octets. 209 The DDP will use the batches to form DDP packets to be transmitted to 210 other network nodes towards the destination nodes. The DDP MUST 211 deliver with each coded packet its 213 o d: sparse degree 215 o BID: batch ID 217 The DDP MUST deliver the following information to each recoder: 219 o M: batch size M 221 o q: recoding field size 223 The DDP MUST deliver the following information to each decoder: 225 o M: batch size 227 o q: recoding field size 229 o F: the data size 231 o K: the number of source packet 233 The coded packet length T' information MUST be known by all the 234 nodes. 236 2.3.2. Intermediate Node Procedures 238 An intermediate node does not need the data, but only helps to 239 deliver the data to the destination nodes. At an intermediate node, 240 the DDP only receives the DDP packets from the other network nodes, 241 and should be able to extract coded packets and the corresponding 242 batch information from these packets. 244 The DDP provides the recoder with the following information: 246 o the batch size M, 248 o the recoding field size q, 250 o a number of received coded packets of the same batch, each 251 containing T' octets, and 253 The recoder uses the information provided by the DDP to generate M' 254 recoded packets, each containing T' octets. The DDP uses the M' 255 recoded packets to form the DDP packets for transmitting. 257 2.3.3. Destination Node Procedures 259 A destination node needs the data transmitted by the source node. At 260 the destination node, the DDP receives DDP packets from the other 261 network nodes, and should be able to extract coded packets and the 262 corresponding batch information from these packets. 264 The DDP provides the decoder with the following information: 266 o F: number of octets in the data, 268 o M: batch size, 270 o q: recoding field size, 272 o K: number of source packets 274 o A sequence of batches, each of which is formed by a number of 275 coded packets belonging to the same batch, with their 276 corresponding batch IDs and degrees. 278 The decoder uses this information to decode the K source packets. If 279 successful, the decoder returns the recovered K source packets to the 280 DDP, which will use the source packets to form the source data. 282 2.4. Recommendation for the Parameters 284 The recommendation for the parameters M, K, T, and T' is shown as 285 follows: 287 o M is 1, 2, 4, 8, 16, 32, 64 and 128. 289 o q is 2, 256. When q = 2, M >= 8. 291 o T' is not larger than the maximum coded packet payload size. 293 o K = ceil(F/T). 295 It is RECOMMENDED that K is at least 128. However, the encoder/ 296 decoder SHALL support an arbitrary positive integer value of K. 298 2.5. Example DDP Packet Format 300 A DDP can form a DDP packet with a header, a footer and a payload. 302 2.5.1. Packet Header 304 The BATS packet header has 40 bits (5 octets) and includes F, M, K, 305 q, ID, and d 307 0 1 2 3 308 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | F | K | M |q| BID | 311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 312 | d | 313 +-+-+-+-+-+-+-+-+ 315 Figure: BATS packet header format. 317 o F: 8-bit unsigned integer. Value i represents F=2^i. 319 o K: 8-bit unsigned integer. 321 o M: 3-bit unsigned integer. The value i represents M=2^i. 323 o q: 1-bit unsigned integer. The value 0 represents q=1, while the 324 value 1 represents q=256. 326 o BID: 12-bit unsigned integer. 328 o d: 8-bit unsigned integer. 330 2.5.2. Packet Payload 331 O T 332 +-----------------------+-------------------------------+ 333 | coefficient vector | coded data | 334 +-----------------------+-------------------------------+ 336 Figure: BATS packet payload format. 338 The payload has T' octets, where the first O octets contain the 339 coefficient vector and the remaining T octets contain the coded data. 340 Information in both fields MAY be encoded in JSON (ASCII) or protobuf 341 (binary) formats. 343 o coefficient vector: O octets. 345 o coded data: T octets. 347 2.5.3. Packet Footer 349 0 1 2 350 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 352 | signature | parity check | 353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 Figure: BATS packet footer format. 357 The footer has three octets. 359 o signature: 2 octets. A signature of the individual packet to 360 prevent pollution attack. 362 o parity check: 1 octet. A parity check field used to verity the 363 correctness of the packet. 365 3. BATS Code Specification 367 3.1. Background 369 The T octets of a source packets are treated as a column vector of T 370 elements in GF(256). Linear algebra and matrix operations over 371 finite fields are assumed in this section. 373 Suppose that a pseudo-random number generator Rand() is given. 375 3.2. Outer Code Encoder 377 Let b[0], b[1], ...,b[K-1] be the K source packets. A batch with 378 batch ID j is generated using the following steps. 380 First, a degree d[j] = DD() is sampled using the given degree 381 distribution DD() and pseudo-random generator Rand() (initialized 382 with any seed). 384 Second, initialize Rand() with j as the seed. Using Rand() sample d 385 packets among all the source packets. Suppose the indices of the 386 packets sampled are i_1, i_2, ..., i_d. Let B[j] = (b[i_1], b[i_2], 387 ..., b[i_d]). 389 Third, sample a d x M generator matrix G[j] with all the entries 390 uniformly distribution among {0,1,...,255}. 392 Fourth, form the batch X[j] = B[j]G[j], whose dimension is T x M. 394 Last, append coefficient vectors to the packets of the batches. Let 395 x[j,i], i=0,1,...,M-1, be the (i+1)th column of X[j]. The 396 coefficient vector of x[j,i] is the (i+1)th column of the M x M 397 identity matrix with entries in GF(q), which can be represented by O 398 octets. The coded packet x'[j,i] is formed by appending the 399 coefficient vector before x[j,i]. 401 3.3. Inner Code Encoder (Recoder) Recommendations 403 The inner code comprises (random) linear network coding applied on 404 the coded packets belonging to the same batch. At a particular 405 network node, recoded packets are generated by (random) linear 406 combinations of the received coded packets of a batch. The recoded 407 packets have the same batch ID, sparse degree and coded packet 408 length. 410 Suppose that coded packets x'[j,i], i = 0, 1, ..., r-1, which have 411 the same batch ID j, have been received at an intermediate node. 412 Using the recommended packet format, it can be verified that whether 413 the corresponding packet headers of these coded packets are the same. 414 Then a recoded packet can be generated by one of the following two 415 approaches: 417 o forwarding: when receiving x'[j,i], directly use x'[j,i] as a 418 recoded packet. 420 o linear combination recoding: (randomly) choose a sequence of 421 coefficients c[i], i = 0, 1, ..., r-1 from GF(q). Generate 422 c[0]x'[j,0]+c[1]x'[j,1]+...+c[r-1]x'[j,r-1] as a recoded packet. 424 A recoder can combine these two approaches to generate recoded 425 packets. For example, when M' > r, the recoder can output x'[j,i], i 426 = 0, 1, ..., r-1 as r systematic recoded packets and generate M'-r 427 recoded packets using linear combinations of randomly chosen 428 coefficients. The M' is recommended to be (1+e)M, where e is the 429 average loss rate. 431 3.4. Decoder Recommendations 433 A belief propagation (BP) decoder is described. 435 The decoder receives a sequence of batches Y'[j], j = 0, 1, ..., n-1, 436 each of which is a T'-row matrix over GF(256). Let Y[j] be the 437 submatrix of the last T rows of Y'[j]. When q = 256, let H[j] be the 438 first M rows of Y'[j]; when q = 2, let H[j] be the matrix over 439 GF(256) formed by embedding each bit in the first M/8 rows of Y'[j] 440 into GF(256). 442 The decoder is initialized with the following steps similar to those 443 of encoding. For each batch j, 445 o Initialize Rand() with batch ID j as the seed. Using Rand() 446 sample d[j] packets among all the source packets. Obtain the 447 indices of the packets sampled. 449 o Using Rand() sample a d x M generator matrix G[j] with all the 450 entries uniformly distributed among {0,1,...,255}. 452 A batch j is said to be decodable if the system of linear equations 453 Y[j] = B[j]G[j]H[j] with B[j] as the variable matrix has a unique 454 solution, i.e., rank(G[j]H[j]) = d[j]. The BP decoding algorithm has 455 multiple iterations. Each iteration is formed by the following 456 steps: 458 o decoding step: Find all batches j that are decodable. Solve the 459 corresponding system of linear equations Y[j] = B[j]G[j]H[j] and 460 decode B[j]. 462 o substituting step: Substitute the decoded source packets into 463 undecodable batches. Suppose that a decoded source packet b[k] is 464 used in generating a undecodable Y[j]. The substiting involves 1) 465 removing the column in B[j] corresponding to b[k], 2) removing the 466 row in G[j] corresponding to b[k], and 3) reducing d[j] by 1. 468 The BP decoder repeats the above steps until no batches are decodable 469 during the decoding step. 471 4. IANA Considerations 473 This memo includes no request to IANA. 475 5. Security Considerations 477 Subsuming both Random Linear Network Codes (RLNC) and fountain codes, 478 BATS codes naturally inherit both their desirable capability of 479 offering confidentiality protection as well as their vulnerability 480 towards pollution attacks. 482 5.1. Provision of Confidentiality Protection 484 Since the transported messages are linearly combined with random 485 coefficients at each recoding node, it is statistically impossible to 486 recover individual messages by capturing the coded messages at any 487 one or small number of nodes. As long as the coding matrices of the 488 transported messages cannot be fully recovered, any attempt of 489 decoding is equivalent to randomly guessing the transported messages. 490 Thus, with the use of BATS codes, information confidentiality 491 throughout the data transport process is assured. 493 The only thread towards confidentiality exists in the form of 494 eavesdropping onto the initial encoding process, which takes place at 495 the encoding nodes. In these nodes, the transported data are 496 presented in plain text and can be read along their transfer paths. 497 Hence, information isolation between the encoding process and all 498 other user processes running on the node must be assured. 500 In addition, the authenticity and trustworthiness of the encoding, 501 recoding and decoding program running on all the nodes must be 502 attested by a trusted authority. Such a measure is also necessary in 503 countering pollution attacks. 505 5.2. Countermeasures against Pollution Attacks 507 Like all network codes, BATS codes are vulnerable under pollution 508 attacks. In these attacks, one or more compromised coding node(s) 509 can pollute the coded messages or inject forged messages into the 510 coding network. These attacks can prevent the receivers from 511 recovering the transported data correctly. Although error detection 512 mechanisms can be put in place to prevent the receivers from getting 513 the wrong messages, detection and discard of the polluted messages 514 still constitute a form of denial-of-service (DoS) attack. 516 The research community has long been investigating the use of various 517 signature schemes (including homomorphic signatures) to identify the 518 forged messages and stall the attacks (see Zhao07 [Zhao07], Yu08 520 [Yu08], Agrawal09 [Agrawal09]). Nevertheless, these countermeasures 521 are regarded as being computationally too expensive to be employed in 522 broadband communications. A practical approach to protect against 523 pollution attacks consist of the following system-level 524 countermeasures: 526 1. Attestation and Validation of all encoding, recoding and decoding 527 nodes in the network. Remote attestation and repetitive 528 validation of a node based on valid public key certificates with 529 proper authorization MUST be a pre-requisite of admitting that 530 node into a network and permitting it to remain in that network. 532 2. Attestation of all encoding, recoding and decoding programs used 533 in the coding nodes. All programs used to perform the BATS 534 encoding, recoding and decoding processes MUST be remotely 535 attested before they are permitted to run on any of the coding 536 nodes. Reloading or alteration of programs MUST NOT be permitted 537 during an encoding session. Programs MUST be attested or 538 validated again when they are executed in new execution 539 environments instantiated even in the same node. 541 3. Original Authentication of all coded messages using network or 542 transport level secure protocols such as IP-sec or TLS/DTLS MUST 543 be used to provide Peer or Message Origin Authentication to every 544 coded message sent through the coding network. 546 6. References 548 6.1. Normative References 550 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 551 Requirement Levels", BCP 14, RFC 2119, 552 DOI 10.17487/RFC2119, March 1997, 553 . 555 6.2. Informative References 557 [Agrawal09] 558 Agrawal, S. and D. Boneh, "Homomorphic MACs: MAC-based 559 integrity for network coding", International Conference on 560 Applied Cryptography and Network Security , 2009. 562 [BATS] Yang, S. and R. Yeung, "Batched Sparse Codes", IEEE 563 Transactions on Information Theory 60(9), 5322-5346, 2014. 565 [BATSMonograph] 566 Yang, S. and R. Yeung, "BATS Codes: Theory and Practice", 567 Morgan & Claypool Publishers , 2017. 569 [RFC6330] Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T., 570 and L. Minder, "RaptorQ Forward Error Correction Scheme 571 for Object Delivery", RFC 6330, DOI 10.17487/RFC6330, 572 August 2011, . 574 [Yu08] Yu, Z., Wei, Y., Ramkumar, B., and Y. Guan, "An Efficient 575 Signature-Based Scheme for Securing Network Coding Against 576 Pollution Attacks", INFOCOM , 2008. 578 [Zhao07] Zhao, F., Kalker, T., Medard, M., and K. Han, "Signatures 579 for content distribution with network coding", ISIT , 580 2007. 582 Appendix A. Additional Stuff 584 This becomes an Appendix. 586 Authors' Addresses 588 Shenghao Yang 589 CUHK(SZ) 590 Shenzhen, Guangdong 591 China 593 Phone: +86 755 8427 3827 594 Email: shyang@cuhk.edu.cn 596 Xuan Huang 597 CUHK 598 Hong Kong, Hong Kong SAR 599 China 601 Phone: +852 3943 8375 602 Email: 1155136647@link.cuhk.edu.hk 604 Raymond W. Yeung 605 CUHK 606 Hong Kong, Hong Kong SAR 607 China 609 Phone: +852 3943 8375 610 Email: whyeung@ie.cuhk.edu.hk 611 John K. Zao 612 NCTU 613 Hsinchu, Taiwan 614 China 616 Email: jkzao@ieee.org