idnits 2.17.1 draft-irtf-nwcrg-bats-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (9 December 2021) is 869 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 653 == Missing Reference: 'K-1' is mentioned on line 653, but not defined == Missing Reference: 'K-2' is mentioned on line 294, but not defined == Missing Reference: 'T-P-1' is mentioned on line 297, but not defined == Missing Reference: 'T-P' is mentioned on line 297, but not defined == Missing Reference: 'T-1' is mentioned on line 449, but not defined == Missing Reference: 'WI' is mentioned on line 455, but not defined -- Looks like a reference, but probably isn't: '1' on line 653 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NWCRG S. Yang 3 Internet-Draft CUHK(SZ) 4 Intended status: Informational X. Huang 5 Expires: 12 June 2022 R. W. Yeung 6 CUHK 7 J. K. Zao 8 NCTU 9 9 December 2021 11 BATS Coding Scheme for Multi-hop Data Transport 12 draft-irtf-nwcrg-bats-03 14 Abstract 16 BATS code is a class of efficient linear network coding scheme with a 17 matrix generalization of fountain codes as the outer code, and batch- 18 based linear network coding as the inner code. This document 19 describes a baseline BATS coding scheme for communication through 20 multi-hop networks, and discusses the related research issues towards 21 a more sophisticated BATS coding scheme. This document is a product 22 of the Coding for Efficient Network Communications Research Group 23 (NWCRG). 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at https://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on 12 June 2022. 42 Copyright Notice 44 Copyright (c) 2021 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 49 license-info) in effect on the date of publication of this document. 50 Please review these documents carefully, as they describe your rights 51 and restrictions with respect to this document. Code Components 52 extracted from this document must include Revised BSD License text as 53 described in Section 4.e of the Trust Legal Provisions and are 54 provided without warranty as described in the Revised BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 60 2. A Use Case of BATS Coding Scheme . . . . . . . . . . . . . . 4 61 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 5 62 2.2. Data Delivery Procedures . . . . . . . . . . . . . . . . 6 63 2.2.1. Source Node Data Partitioning and Padding . . . . . . 6 64 2.2.2. Source Node Outer Code Encoding Procedure . . . . . . 7 65 2.2.3. Recoding Procedures . . . . . . . . . . . . . . . . . 9 66 2.2.4. Destination Node Procedures . . . . . . . . . . . . . 10 67 2.3. Recommendation for the Parameters . . . . . . . . . . . . 10 68 2.4. Coding Parameters in DDP Packets . . . . . . . . . . . . 11 69 2.4.1. Coding Parameter Format . . . . . . . . . . . . . . . 11 70 2.4.2. Coded Packet Format . . . . . . . . . . . . . . . . . 12 71 3. BATS Code Specification . . . . . . . . . . . . . . . . . . . 13 72 3.1. Common Parts . . . . . . . . . . . . . . . . . . . . . . 13 73 3.2. Outer Code Encoder . . . . . . . . . . . . . . . . . . . 14 74 3.3. Inner Code Encoder (Recoder) . . . . . . . . . . . . . . 15 75 3.4. Outer Decoder . . . . . . . . . . . . . . . . . . . . . . 16 76 4. Research Issues . . . . . . . . . . . . . . . . . . . . . . . 17 77 4.1. Coding Design Issues . . . . . . . . . . . . . . . . . . 17 78 4.2. Protocol Design Issues . . . . . . . . . . . . . . . . . 18 79 4.3. Application Related Issues . . . . . . . . . . . . . . . 19 80 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 81 6. Security related Considerations . . . . . . . . . . . . . . . 20 82 6.1. Preventing Eavesdropping . . . . . . . . . . . . . . . . 20 83 6.2. Countermeasures against Pollution Attacks . . . . . . . . 21 84 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 85 7.1. Normative References . . . . . . . . . . . . . . . . . . 22 86 7.2. Informative References . . . . . . . . . . . . . . . . . 22 87 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 24 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 24 90 1. Introduction 92 This document specifies a baseline BATS code [Yang14] scheme for data 93 delivery in multi-hop networks, and discusses the related research 94 issues towards a more sophisticated scheme. The BATS code described 95 here includes an outer code and an inner code. The outer code is a 96 matrix generalization of fountain codes (see also the RapterQ code 97 described in RFC 6330 [RFC6330]), which inherits the advantages of 98 reliability and efficiency and possesses the extra desirable property 99 of being network coding compatible. The inner code, also called 100 recoding, is formed by linear network coding for combating packet 101 loss, improving the multicast efficiency, etc. A detailed design and 102 analysis of BATS codes are provided in the BATS monograph [Yang17]. 104 A BATS coding scheme can be applied in multi-hop networks formed by 105 wireless communication links, which are inherently unreliable due to 106 interference. Existing transport protocols like TCP use end-to-end 107 retransmission, while network protocols such as IP might enable 108 store-and-forward at the relays, so that packet loss would accumulate 109 along the way. 111 A BATS coding scheme can be used for various data delivery 112 applications like file transmission, video streaming over wireless 113 multi-hop networks, etc. Different from traditional forward error 114 correcting (FEC) schemes that are applied either hop-by-hop or end- 115 to-end, the BATS coding scheme combines the end-to-end coding (the 116 outer code) with certain hop-by-hop coding (the inner code), and 117 hence can potentially achieve better performance. 119 The baseline coding scheme described here considers a network with 120 multiple communication flows. For each flow, the source node encodes 121 the data for transmission separately. Inside the network, however, 122 it is possible to mix the packets from different flows for recoding. 123 In this document, we describe a simple case where recoding is 124 performed within each flow. Note that the same encoding/decoding 125 scheme described here can be used with different recoding schemes as 126 long as they follow the principle as we illustrate in this document. 128 The purpose of the baseline BATS coding scheme is twofold. First, it 129 provides researchers and engineers a starting point for developing 130 network communication applications/protocols based on BATS codes. 131 Second, it helps to make the research issues clearer towards a 132 sophisticated BATS code based network protocol. Important research 133 directions include the security issues, congestion control and 134 routing algorithms for BATS codes, etc. 136 This document is a product of and represents the collaborative work 137 and consensus of the Coding for Efficient Network Communications 138 Research Group (NWCRG); it is not an IETF product and is not an IETF 139 standard. 141 1.1. Requirements Language 143 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 144 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 145 document are to be interpreted as described in RFC 2119 [RFC2119]. 147 2. A Use Case of BATS Coding Scheme 149 The BATS coding scheme described in this document can be used, for 150 example, by a Data Delivery Protocol (DDP). Though this document is 151 not about a DDP, we briefly illustrate in this section how a BATS 152 coding scheme is employed by a DDP to make the role of the coding 153 scheme clear. Some terms that will be used in this section are 154 summarized here: 156 * DDP: data delivery protocol. 158 * DDP packet: the packet formed by a DDP employing a BATS coding 159 scheme. 161 * source packet: the packet formed by the data for delivery. 163 * outer encoder: the outer code encoder of a BATS code. 165 * recoder: the inner code encoder of a BATS code. 167 * outer decoder: the outer code decoder of a BATS code. 169 * coded packet: the packet generated by the outer code encoder or a 170 recoder. 172 * batch: a set of coded packets generated by a BATS coding scheme 173 from the same subset of the source packets. 175 * recoded packet: a coded packet generated by a recoder. 177 * degree: the number of source packets used to generate a batch by 178 the outer encoder. The degree can be different for different 179 batch. 181 Other common terms can be found in RFC 8406 [RFC8406]. 183 2.1. Introduction 185 We describe a data delivery process that involves one source node, 186 one destination node, and multiple intermediate nodes in between. A 187 BATS coding scheme includes an outer code encoder (also called outer 188 encoder), an inner code encoder (also called recoder), and an outer 189 decoder which decodes the outer code and the inner code jointly as 190 illustrated in Figure 1. The functions of the outer encoder, recoder 191 and outer decoder are described below: 193 | 194 | {set of source packets} 195 v 196 +-+-+-+-+-+-+-+-+ 197 | outer encoder | 198 | v | source node 199 | recoder | 200 +-+-+-+-+-+-+-+-+ 201 | 202 | {set of DDP packets} 203 v 204 +-+-+-+-+-+-+-+-+ 205 | | 206 | recoder | intermediate node 207 | | 208 +-+-+-+-+-+-+-+-+ 209 | 210 | {set of DDP packets} 211 v 212 ... 213 | 214 | {set of DDP packets} 215 v 216 +-+-+-+-+-+-+-+-+ 217 | | 218 | outer decoder | destination node 219 | | 220 +-+-+-+-+-+-+-+-+ 221 | 222 | {set of source packets} 223 v 225 Figure 1: A network model for data delivery. 227 At the source node, the DDP first processes the data to be delivered 228 into a number of source packets each of the same number of bits (see 229 details in Section 2.2.1), and then provides all the source packets 230 to the outer encoder. The outer encoder will further generate a 231 sequence of batches, each consisting of a fixed number of coded 232 packets (see the description in Section 2.2.2). 234 Each batch generated at the source node is further processed by the 235 recoder separately. The recoder may generate a number of new coded 236 packets using the existing coded packets of the batch (see the 237 description in Section 2.2.3). After processed by the recoder, the 238 DDP forms and transmits the DDP packets using the coded packets, 239 together with the corresponding batch information. 241 We assume that a DDP packet is either correctly received or 242 completely erased during the communication. The DDP extracts the 243 coded packets and the corresponding batch information from its 244 received DDP packets. A recoder is employed at an intermediate node 245 that does not need the data, and generates recoded packets for each 246 batch (see the description in Section 2.2.3). The DDP forms and 247 transmits DDP packets using the recoded packets and the corresponding 248 batch information. 250 The outer decoder is employed at the destination node that needs the 251 data. The DDP extracts the coded packets and the corresponding batch 252 information from its received DDP packets. The outer decoder tries 253 to recover the transmitted data using the received batches (see the 254 description in Section 2.2.4). The DDP sends the decoded data to the 255 application that needs the data. 257 2.2. Data Delivery Procedures 259 Suppose that the DDP has F octets of data for transmission. We 260 describe the procedures of one BATS session for transmitting the F 261 octets. There is a limit on F of a single BATS session. If the 262 total data has more than the limit, the data needs to be transmitted 263 using multiple BATS sessions. The limit on F of a single BATS 264 session depends on the coding parameters to be discussed in this 265 section, and will be calculated at the end of Section 2.4.2. 267 2.2.1. Source Node Data Partitioning and Padding 269 The DDP first determines the following parameters: 271 * Batch size (M): the number of coded packets in a batch generated 272 by the outer encoder. 274 * Recoding field size (q): the number of elements in the finite 275 field for recoding. q is 2 or 2^8 277 * BATS payload size (TO): the number of payload octets in a BATS 278 packet, including the coded data and the coefficient vector. 280 Based on the above parameters, the parameters T, O and K are 281 calculated as follows: 283 * O: the number of octets of a coefficient vector, calculated as O = 284 ceil(M*log2(q)/8), which is also called the coefficient vector 285 overhead. 287 * T: the number of data octets of a coded packet, calculated as T = 288 TO - O. 290 * K: number of source packets, calculated as K = floor(F/T)+1. 292 The data MUST be padded to have T*K octets, which will be partitioned 293 into K source packets b[0], ..., b[K-1], each of T octets. In our 294 padding scheme, b[0], ..., b[K-2] are filled with data octets, and 295 b[K-1] is filled with the remaining data octets and padding octets. 296 Let P = K*T-F denote the number of padding octets. We use b[K-1, 0], 297 ..., b[K-1, T-P-1] to denote the T-P source octets and b[K-1, T-P], 298 ..., b[K-1, T-1] to denote the P padding octets in b[K-1], 299 respectively. The padding insertion process is shown in Figure 2. 301 Z = T - P 302 j = 1 303 v = 1 304 Let bl be the last source packet b[K-1] 305 for i = Z, Z+1, ..., T-1 do 306 bl[i] = j 307 if i+1 >= v+Z do 308 j += 1 309 v += j 311 Figure 2: Data Padding Insertion Process 313 2.2.2. Source Node Outer Code Encoding Procedure 315 The DDP provides the BATS encoder with the following information: 317 * Batch size (M): the number of coded packets in a batch. 319 * Recoding field size (q): the number of elements in the finite 320 field for recoding. 322 * Maximum degree (MAX_DEG): a positive integer that specifies the 323 largest degree can be used. 325 * Degree distribution (DD): an unsigned integer array of size 326 MAX_DEG+1. The i-th entry DD[i] is the possibility that i is 327 chosen as the degree, where i is between 0 and MAX_DEG. 329 * A sequence of batch IDs (BID) (j, j = 0, 1, ...). 331 * Number of source packets (K). 333 * Packet size (T): the number of octets in a source packet. 335 * Source packets (b[i], i = 0, 1, ..., K-1). 337 Using this information, the outer encoder generates M coded packets 338 for each batch ID using the following steps to be described in 339 details at Section 3.2: 341 * Obtain a degree d by sampling DD. Roughly, the value d is chosen 342 with probability DD[d]. 344 * Choose d source packets uniformly at random from all the K source 345 packets. It is allowed that a source packet is used by mutiple 346 batches. 348 * Generate M coded packets using the d source packets. 350 The DDP receives from the outer encoder a sequence of batches, where 351 the batch with ID j has 353 * M coded packets (x[j,i], i =0, 1, ..., M-1), each containing TO 354 octets. 356 The DDP will use the batches to form DDP packets to be transmitted to 357 other network nodes towards the destination nodes. The DDP MUST 358 deliver with each coded packet with its 360 * BID: batch ID. 362 The DDP MUST deliver the following information to each recoder: 364 * M: batch size 366 * q: recoding field size. 368 The DDP MUST deliver the following information to each decoder: 370 * M: batch size 372 * q: recoding field size 374 * K: the number of source packets 376 * T: the number of octets in a source packet 378 * DD: the degree distribution. 380 The BID is used by both recoders and decoders. We will illustrate in 381 Section 2.4 that how to embed BID, M, q, and K into DDP packets. The 382 degree distribution DD does not need to be changed frequently. See 383 Section 6 in [Yang17] about how to design a good degree distribution. 384 Once designed, the degree distribution can be shared between the 385 source node and the destination node by the DDP, which is not further 386 discussed here. 388 2.2.3. Recoding Procedures 390 Both the source node and the intermediate nodes perform recoding on 391 the batches before transmission. At the source node, the recoder 392 receives the batches from the outer code encoding procedure. At an 393 intermediate node, the DDP receives the DDP packets from the other 394 network nodes. If the DDP choose not to recode, it just forwards the 395 DDP packets it received. Otherwise, the DDP should be able to 396 extract coded packets and the corresponding batch information from 397 these packets. 399 For a received batch, the DDP determines a positive integer Mr, the 400 number of recoded packets to be transmitted for the batch, and 401 provides the recoder with the following information: 403 * the batch size M, 405 * the recoding field size q, 407 * a number of received coded packets of the same batch, each 408 containing TO octets, and 410 * the number of recoded packets to be generated (Mr). 412 The recoder uses the information provided by the DDP to generate Mr 413 recoded packets, each containing TO octets, to be described in 414 Section 3.3. The DDP uses the Mr recoded packets to form the DDP 415 packets for transmitting. 417 2.2.4. Destination Node Procedures 419 A destination node needs the data transmitted by the source node. At 420 the destination node, the DDP receives DDP packets from an 421 intermediate network node, and should be able to extract coded 422 packets and the corresponding batch information from these packets. 424 The DDP provides the outer decoder (to be described in Section 3.4) 425 with the following information: 427 * M: batch size, 429 * q: recoding field size, 431 * K: the number of source packets 433 * T: the number of octets of a source packet 435 * A sequence of batches, each of which is formed by a number of 436 coded packets belonging to the same batch, with their 437 corresponding BIDs. 439 The decoder uses this information to decode the outer code and the 440 inner code jointly and recover the K source packets (see details in 441 Section 3.4). If successful, the decoder returns the recovered K 442 source packets to the DDP, which will use the K source packets to 443 form the F octets data. The recommended padding deletion process is 444 shown as follows: 446 // this procedure returns the number P of padding octets 447 // at the end of b[K-1] 448 Let bl be the last decoded source packet b[K-1] 449 PL = bl[T-1] 450 if PL == 1 do 451 return P = 1 452 WI = T - 1 453 while bl[WI] == PL do 454 WI = WI - 1 455 return P = (1 + bl[WI]) * bl[WI] + T - WI - 1 457 Figure 3: Data Padding Deletion Process 459 2.3. Recommendation for the Parameters 461 The recommendation for the parameters M and q is shown as follows: 463 * When q=2, M=16,32,64,128 464 * When q=256, M=4,8,16,32 466 It is RECOMMENDED that K is at least 128. The encoder/decoder SHALL 467 support an arbitrary positive integer value less than 2^16. However, 468 the BATS coding scheme to be described is not optimized for small K. 470 2.4. Coding Parameters in DDP Packets 472 Here we provide an example of embedding the aforementioned BATS 473 coding parameters into the DDP packets which will be used for 474 recoding and decoding. A DDP can form a DDP packet using a coded 475 packet by adding necessary information that can help to deliver the 476 DPP packet to the next node, e.g., the DDP protocol version, 477 addresses and session identifiers. We will not go into the details 478 of formatting these fields in a DDP packet, but focus on how to 479 format the coding parameters and the coded packet in a DDP packet. 481 2.4.1. Coding Parameter Format 483 Here we provide an example of using 32 bits (4 octets) to embed the 484 parameters K, M, q, and BID. The 32 bits are separated into three 485 subfields, denoted as K, Mq and BID, respectively, as illustrated in 486 Figure 4. 488 0 1 2 3 489 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 491 | K | Mq | BID | 492 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 494 Figure 4: Coding parameter field format. 496 * K: 16-bit unsigned integer, specifying the number of source 497 packets of the BATS session. 499 * Mq: 3-bit unsigned integer to specify the value of M and q as 500 Table 1. 502 * BID: 13-bit unsigned integer, specifying the batch ID of the batch 503 the packet belongs to. 505 +=====+=====+=====+ 506 | Mq | M | q | 507 +=====+=====+=====+ 508 | 000 | 16 | 2 | 509 +-----+-----+-----+ 510 | 010 | 32 | 2 | 511 +-----+-----+-----+ 512 | 100 | 64 | 2 | 513 +-----+-----+-----+ 514 | 110 | 128 | 2 | 515 +-----+-----+-----+ 516 | 001 | 4 | 256 | 517 +-----+-----+-----+ 518 | 011 | 8 | 256 | 519 +-----+-----+-----+ 520 | 101 | 16 | 256 | 521 +-----+-----+-----+ 522 | 111 | 32 | 256 | 523 +-----+-----+-----+ 525 Table 1: Values of Mq 526 field 528 The choice of the coding parameters depends on the computation cost, 529 the network conditions and the expected end-to-end coding 530 performance. Usually, a larger batch size M will have a better 531 coding performance, but higher computation cost for encoding, 532 recoding and decoding. The field size q affects the coefficient 533 vector overhead, and also the computation cost for recoding. Within 534 a BATS session, the BID field should be different for all batches, 535 and hence the maximum number of batches can be generated for the 536 outer encoder is 2^13. For different BATS sessions, batches can use 537 the same BID. 539 2.4.2. Coded Packet Format 541 O T 542 +-----------------------+-------------------------------+ 543 | coefficient vector | coded data | 544 +-----------------------+-------------------------------+ 546 Figure 5: Code packet format in a DDP packet. 548 A coded packet has TO=T+O octets, where the first O octets contain 549 the coefficient vector and the remaining T octets contain the coded 550 data. 552 * coefficient vector: O = M*log2(q)/8 octets. For the values of M 553 and q in Table 1, O is at most 32 octets when q is 256 and 6 554 octets when q is 2. 556 * coded data: T octets. T should be chosen so that the whole DDP 557 packet is at most PMTU. 559 Using the above formation, we can calculate the largest file size F 560 for different parameters. For example, when q=2 and M=128, we have O 561 = 6 octets. Counting the 4 octets for embedding the coding 562 parameters, we can choose T = PMTU-H-10, where H is the header length 563 of a DDP packet. As K can be at most 2^16-1, F can be at most (PMTU- 564 H-10)(2^16-1) octets. In this case, K/M is about 2^9 and the BID 565 field allows transmitting 2^4*K/M batches. 567 3. BATS Code Specification 569 3.1. Common Parts 571 The T octets of a source packets are treated as a column vector of T 572 elements in GF(256). The O octets of coefficient vector are treated 573 as a column vector of O elements in GF(q), where q=2 or q=256. 574 Linear algebra and matrix operations over finite fields are assumed 575 in this section. 577 For the two elements of GF(2), their multiplication corresponds to a 578 logical AND operation and their addition is an logical XOR operation. 579 An element of the field GF(256) can be represented by a polynomial 580 with binary coefficients and degree lower or equal to 7. The 581 addition between two elements of GF(256) is defined as the addition 582 of the two binary polynomials. The multiplication between two 583 elements of GF(256) is the multiplication of the two binary 584 polynomials modulo a certain irreducible polynomial of degree 8, 585 called a primitive polynomial. One example of such a primitive 586 polynomial for GF(256) is: 588 x^8 + x^4 + x^3 + x^2 + 1 590 A common primitive polynomial should be specified for all the finite 591 field multiplications over GF(256). Note that a binary polynomial of 592 degree less than 8 can be represented by a binary sequence of 8 bits, 593 i.e., an octet. 595 Suppose that a pseudorandom number generator Rand() which generates 596 an unsigned integer of 32 bits is shared by both encoding and 597 decoding. The pseudorandom generator can be initialized by 598 Rand_Init(S) with seed S. When S is not provided, the pseudorandom 599 generator is initialized arbitrarily. One example of such a 600 pseudorandom generator is defined in RFC 8682 [RFC8682]. 602 A function called BatchSampler is used in both encoding and decoding. 603 The function takes two integers j and d as input, and generates an 604 array idx of d integers and a d x M matrix G. The function first 605 initializes the pseudorandom generator with j, sample d distinct 606 integers from 0 to K-1 as idx, and sample d*M integers from 0 to 255 607 as G. See the pseudocode in Figure 6. 609 function BatchSampler(j,d) 610 // initialize the pseudorandom generator by seed j. 611 Rand_Init(j) 612 // sample d distinct integers between 0 and K-1. 613 for k = 0, ..., d-1 do 614 r = Rand() % K 615 while r already exists in idx do 616 r = Rand() % K 617 idx[k] = r 619 // sample d x M matrix 620 for r = 0, ..., d-1 do 621 for c = 0,...,M-1 do 622 G[r,c] = Rand() % 256 624 return idx, G 626 Figure 6: Batch Sampler Function 628 3.2. Outer Code Encoder 630 Define a function called DegreeSampler that returns an integer d 631 using the degree distribution DD. We expect that the empirical 632 distribution of the returning d converges to DD(d) when d < K. One 633 design of DegreeSampler is illustrated in Figure 7. Note that when K 634 < MAX_DEG, the degree value returned by DegreeSampler does not 635 exactly follow the distribution DD, which however would not affect 636 the practical decoding performance for the outer decoder to be 637 described in Section 3.4. 639 function DegreeSampler(j, DD) 640 Let CDF be an array 641 CDF[0] = 0 642 for i = 1, ..., MAX_DEG do 643 CDF[i] = CDF[i-1] + DD[i] 644 Rand_Init(j) 645 r = Rand() % CDF[MAX_DEG] 646 for d = 1, ..., MAX_DEG do 647 if r >= CDF[d] do 648 return min(d,K) 649 return min(MAX_DEG,K) 651 Figure 7: Degree Sampler Function. 653 Let b[0], b[1], ..., b[K-1] be the K source packets. A batch with 654 BID j is generated using the following steps. 656 * Obtain a degree d by calling DegreeSampler with input j. 658 * Obtain idx and G[j] by calling BatchSampler with input j and d. 660 * Let B[j] = (b[idx[0]], b[idx[1]], ..., b[idx[d-1]]). Form the 661 batch X[j] = B[j]*G[j], whose dimension is T x M. 663 * Form the TO x M matrix Xr[j], where the first O rows of Xr[j] form 664 the M x M identity matrix I with entries in GF(q), and the last T 665 rows of Xr[j] is X[j]. 667 See the pseudocode of the batch generating process in Figure 8. 669 function GenBatch(j) 670 d = DegreeSampler(j) 671 (idx, G) = BatchSampler(j,d) 672 B = (b[idx[0]], b[idx[i]], ..., b[idx[d-1]]) 673 X = B * G 674 Xr = [I; X] 675 return Xr 677 Figure 8: Batch Generation Function. 679 3.3. Inner Code Encoder (Recoder) 681 In general, the inner code of a BATS code comprises (random) linear 682 network coding applied on the coded packets belonging to the same 683 batch. The recoded packets have the same BID. Suppose that coded 684 packets xr[i], i = 0, 1, ..., r-1, which have the same BID j, have 685 been received at an intermediate node, and Mr recoded packets are 686 supposed to be generated. Following traditional random linear 687 network coding, a recoded packet can be generated by random linear 688 combination: (randomly) choose a sequence of coefficients c[i], i = 689 0, 1, ..., r-1 from GF(q), and generate 690 c[0]xr[0]+c[1]xr[1]+...+c[r-1]xr[r-1] as a recoded packet. This 691 recoding approach, called random linear recoding, achieves good 692 network coding performance for multicast when the batch size is 693 sufficiently large. 695 For unicast communications in a single path as illustrated in 696 Figure 1, it is not necessary to generate all the Mr recoded packets 697 using random linear combination. Instead, xr[i], i = 0, 1, ..., r-1, 698 are directly used as recoded packets, and max(Mr-r,0) recoded packets 699 are generated using linear combinations. Compared with random linear 700 recoding, this recoding approach, called systematic recoding, can 701 reduce both the computation cost and also the recoding latency that 702 accumulates linearly with the number of nodes. Note that the use of 703 systematic recoding may not always achieve the optimal network coding 704 performance as random linear recoding in more complicated 705 communication scenarios that include multiple paths and multiple 706 destination nodes. 708 3.4. Outer Decoder 710 The decoder receives a sequence of batches Yr[j], j = 0, 1, ..., n-1, 711 each of which is a TO-row matrix over GF(256). Let Y[j] be the 712 submatrix of the last T rows of Yr[j]. When q = 256, let H[j] be the 713 first M rows of Yr[j]; when q = 2, let H[j] be the matrix over 714 GF(256) formed by embedding each bit in the first M/8 rows of Yr[j] 715 into GF(256). For successful decoding, we require that the total 716 rank of all the batches is at least K. 718 The same degree distribution DD used for the outer encoder is 719 supposed to be known by the outer decoder. By calling DegreeSampler 720 and BatchSampler with input j, we obtain d[j], idx[j] and G[j]. 721 According to the encoding and recoding processes described in 722 Section 3.2 and Section 3.3, we have the system of linear equations 723 Y[j] = B[j]G[j]H[j] for each received batch with ID j, where B[j] = 724 (b[idx[j,0]], b[idx[j,1]], ..., b[idx[j,d-1]]) is unknown. 726 We first describe a belief propagation (BP) decoder that can 727 efficiently solve the source packets when a sufficient number of 728 batches have been received. A batch j is said to be decodable if 729 rank(G[j]H[j]) = d[j] (i.e., the system of linear equations Y[j] = 730 B[j]G[j]H[j] with B[j] as the variable matrix has a unique solution). 731 The BP decoding algorithm has multiple iterations. Each iteration is 732 formed by the following steps: 734 * Decoding step: Find a batch j that is decodable. Solve the 735 corresponding system of linear equations Y[j] = B[j]G[j]H[j] and 736 decode B[j]. 738 * Substitution step: Substitute the decoded source packets into 739 undecodable batches. Suppose that a decoded source packet b[k] is 740 used in generating an undecodable Y[j]. The substitution involves 741 1) removing the entry in idx[j] corresponding to k, 2) removing 742 the row in G[j] corresponding to b[k], and 3) reducing d[j] by 1. 744 The BP decoder repeats the above steps until no batches are decodable 745 during the decoding step. 747 When the degree distribution DD in the outer code encoder (see 748 Section 3.2) is properly designed, the BP decoder guarantees a high 749 probability for the recovery of a given fraction of the source 750 packets when K is large. To recover all the source packets, a 751 precode can be applied to the source packets to generate a fraction 752 of redundant packets before applying the outer code encoding. 753 Moreover, when the BP decoder stops which may happen with a high 754 probability when K is relatively small, it is possible to continue 755 with inactivation decoding, where certain source packets are treated 756 inactive so that a similar belief propagation process can be resumed. 757 The reader is referred to RFC 6330 [RFC6330] for the design of a 758 precode with a good inactivation decoding performance. 760 4. Research Issues 762 The baseline BATS coding scheme described in Section 2 and Section 3 763 needs various refinement and complement towards becoming a more 764 sophisticated network communication application. Various related 765 research issues are discussed in this section, but the security 766 related issues are left to Section 6. 768 4.1. Coding Design Issues 770 When the number of batches is sufficiently large, the BATS code 771 specification in Section 3 has nearly optimal performance in the 772 sense that the decoding can be successful with a high probability 773 when the total rank of all the batches used for decoding is just 774 slightly larger than the number of source packet K. But when K is 775 small, the degree sampler function in Figure 7 and the BatchSampler 776 function in Figure 6 based on a pseudorandom generator may not sample 777 all the source packets evenly, so that some of the source packets are 778 not well protected. One approach to solve this issue is to generate 779 a deterministic degree sequence when the number of batches is 780 relatively small, and design a special pseudorandom generator that 781 has a good sampling performance when K is small. 783 There are research issues related to recoding discussed in 784 Section 3.3. One question is how many recoded packets to generate 785 for each batch. Though it is asymptotically optimal when using the 786 same number of recoded packets for all batches, it has been shown 787 that transmitting a different number of recoded packets for different 788 batches can improve the recoding efficiency. The intuition is that 789 for a batch with a lower rank, a smaller number of recoded packets 790 need to be transmitted. This kind of recoding scheme is called 791 adaptive recoding [Yin19]. 793 Packet loss in network communication is usually bursty, which may 794 harm the recoding performance. One way to resolve this issue is to 795 transmit the packets of different batches in a mixed order, which is 796 also called batch interleaving [Yin20]. How to efficiently 797 interleave batches without increasing too much end-to-end latency is 798 a research issue. 800 Though we only focus on the BATS coding scheme with one source node 801 and one destination node, a BATS coding scheme can be used for 802 multiple source and destination nodes. To benefit from multiple 803 source nodes, we would need different source nodes to generate 804 statistically independent batches. For communicating the same data 805 to multiple destination nodes, which is also called multicast, it is 806 well-known that linear network coding [Li03] achieves the mulicast 807 capacity. BATS codes can benefit from network coding due to its 808 inner code, but how to efficiently implement multicast needs further 809 research. 811 4.2. Protocol Design Issues 813 The baseline scheme in this document focuses on reliable 814 communication. There are other issues to be considered towards 815 designing a fully functional DDP based on a BATS coding scheme. Here 816 we discuss some network management issues that are closely related to 817 a BATS coding scheme: routing, congestion control and media access 818 control. 820 The outer code of a BATS code can be regarded as a channel code for 821 the channel induced by the inner code, and hence the network 822 management algorithms should try to maximize the capacity of the 823 channel induced by the inner code. A network utility maximization 824 problem [Dong20] for BATS coding can be applied to study routing, 825 congestion control and media access control jointly. Compared with 826 the network utility maximization for Internet, there are two major 827 differences. First, the network flow rate is not measured by the 828 rate of the raw packets. Instead, a rank based measurement induced 829 by the inner code is applied for BATS coding schemes. Second, due to 830 recoding, the raw packet rate may not be the same for different links 831 of a flow, i.e., no flow conservation for BATS coding schemes. These 832 differences affect both the objective and the constraints of the 833 utility maximization problem. 835 Practical congestion control, routing and media access control 836 algorithms for BATS coding schemes deserve more research efforts. 837 Due to the recoding operation, congestion control cannot be only 838 performed end-to-end. The rate of transmitting batches can be 839 controlled end-to-end, but the number of recoded packets generated 840 for a batch must be controlled at the intermediate nodes, which 841 introduces new research issues for congestion control. For routing, 842 the BATS coding scheme is flexible for implementing multi-path data 843 transmission, and different batches can be transmitted on a different 844 path between a source node and a destination node. Under the 845 scenario of BATS coding schemes, media access control can have some 846 different considerations: Retransmission is not necessary, and a 847 reasonably high packet loss rate can be tolerated. 849 4.3. Application Related Issues 851 There are more research issues pertaining to different applications. 852 The reliable communication technique provided by BATS codes can be 853 used for a broad range of network communication scenarios. In 854 general, a BATS coding scheme is suitable for data delivery in 855 networks with multiple hops and unreliable links. 857 One class of typical application scenario is wireless mesh and ad hoc 858 networks [Toh02], including vehicular networks, wireless sensor 859 networks, smart lamppost networks, etc. These networks are 860 characterized by a large number of network devices connected 861 wirelessly with each other without a centralized network 862 infrastructure. A BATS coding scheme is suitable for high data load 863 delivery in such networks without the requirement that the point-to- 864 point/one-hop communication is highly reliable. Therefore, employing 865 a BATS coding scheme can provide more freedom for media access 866 control, including power control, and physical-layer design so that 867 the overall network throughput can be improved. 869 Another typical application scenario of BATS coding schemes is 870 underwater acoustic networks [Sprea19], where the propagation delay 871 of acoustic waves in underwater can be as long as several seconds. 872 Due to the long delay, feedback based mechanisms become inefficient. 873 Moreover, point-to-point/one-hop underwater acoustic communication 874 (for both the forward and reverse directions) is highly unreliable. 875 Due to these reasons, traditional networking techniques developed for 876 radio and wireline networks cannot be directly applied to underwater 877 networks. As a BATS coding scheme does not rely on the feedback for 878 reliability communication and can tolerate highly unreliable links, 879 it makes a good candidate for developing data delivery protocols for 880 underwater acoustic networks. 882 Last but not least, due to its capability of performing multi-source 883 multi-destination communications, a BATS coding scheme can be applied 884 in various content distribution scenarios. For example, a BATS 885 coding scheme can be a candidate for the erasure code used in the 886 liquid data networking framework [Byers20] of CCN (content centric 887 networking), and provides the extra benefit of network coding 888 [Zhang16]. 890 5. IANA Considerations 892 This memo includes no request to IANA. 894 6. Security related Considerations 896 Subsuming both random linear network codes (RLNC) and fountain codes, 897 BATS codes naturally inherit both their desirable security capability 898 of preventing eavesdropping, as well as their vulnerability towards 899 pollution attacks. In this section, we discuss some related research 900 issues. 902 6.1. Preventing Eavesdropping 904 Suppose that an eavesdropper obtains a batch where the degree value d 905 is strictly larger than the batch size M. Even the eavesdropper has 906 all the related encoding information, the system of linear equations 907 related to this batch does not have a unique solution, and the 908 probability that the eavesdropper can guess the d source packets used 909 for encoding the batch correctly is 2^(d-M)T>=2^T (see also 910 [Bhattad05]). When inactivation decoding is applied, we can design 911 the degree distribution DD so that the smallest degree is M+1, and 912 hence prevent the eavesdropper from decoding source packets from 913 individual batches. 915 If we allow the eavesdropper to collect multiple batches and use 916 inactivation decoding, the same security holds if the total rank of 917 all the batches collected by the eavesdropper is less than the number 918 of source packet. Therefore, if the DDP can manage to restrict the 919 eavesdropper from collecting a sufficiently number of coded packets, 920 the native security of BATS code is effective when T is sufficiently 921 large. Here by native security, we mean the security protection 922 provided by the BATS coding scheme without extra enhancement. 924 If the eavesdropper can collect a sufficient number of coded packets 925 for correctly decoding, the native security of BATS code is 926 ineffective. One solution in this case is to encrypt the whole data 927 before using the BATS code scheme. Better schemes are desired 928 towards reducing the computation cost of the whole data encryption 929 solution. This is a research issue that depends on specific BATS 930 code schemes, and will not be further discussed here. 932 The threat exists for eavesdropping on the initial encoding process, 933 which takes place at the encoding nodes. In these nodes, the 934 transported data are presented in plain text and can be read along 935 their transfer paths. Hence, information isolation between the 936 encoding process and all other user processes running on the source 937 node MUST be assured. 939 In addition, the authenticity and trustworthiness of the encoding, 940 recoding and decoding program running on all the nodes MUST be 941 attested by a trusted authority. Such a measure is also necessary in 942 countering pollution attacks. 944 6.2. Countermeasures against Pollution Attacks 946 Like all network codes, BATS codes are vulnerable to pollution 947 attacks. In these attacks, one or more compromised coding node(s) 948 can pollute the coded messages by injecting forged packets into the 949 network and thus prevent the receivers from recovering the 950 transported data correctly. 952 The research community has long been investigating the use of various 953 signature schemes (including homomorphic signatures) to identify the 954 forged packets and stall the attacks (see [Zhao07], [Yu08], 955 [Agrawal09]). However, these countermeasures are regarded as being 956 too computationally expensive to be employed in broadband 957 communications. Hence, a system-level approach based on Trusted 958 Computing [TC-Wikipedia] is proposed as a practical alternative to 959 protect BATS codes against pollution attacks. This Trusted Computing 960 based protection consists of the following countermeasures: 962 1. Attestation and Validation of all BATS encoding, recoding and 963 decoding nodes in the network. Remote attestation and repetitive 964 validation of the identity and capability of these node based on 965 valid public key certificates with proper authorization MUST be a 966 pre-requisite for admitting these nodes to a network and 967 permitting them to remain on that network. 969 2. Attestation of all encoding, recoding and decoding programs used 970 in the coding nodes. All programs used to perform the BATS 971 encoding, recoding and decoding processes MUST be remotely 972 attested before they are permitted to run on any of the coding 973 nodes. Reloading or alteration of programs MUST NOT be permitted 974 during an encoding session. Programs MUST be attested or 975 validated again when they are executed in new execution 976 environments instantiated even in the same node. 978 3. Origin Authentication of all coded messages using network level 979 security protocols such as IPsec or Peer Authentication over 980 session-based communication using transport level security 981 protocols such as TLS/DTLS MUST be employed in order to provide 982 Message Integrity or Origin Authentication to every coded packet 983 sent through the coding network. 985 7. References 987 7.1. Normative References 989 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 990 Requirement Levels", BCP 14, RFC 2119, 991 DOI 10.17487/RFC2119, March 1997, 992 . 994 [RFC8406] Adamson, B., Adjih, C., Bilbao, J., Firoiu, V., Fitzek, 995 F., Ghanem, S., Lochin, E., Masucci, A., Montpetit, M-J., 996 Pedersen, M., Peralta, G., Roca, V., Ed., Saxena, P., and 997 S. Sivakumar, "Taxonomy of Coding Techniques for Efficient 998 Network Communications", RFC 8406, DOI 10.17487/RFC8406, 999 June 2018, . 1001 [RFC8682] Saito, M., Matsumoto, M., Roca, V., Ed., and E. Baccelli, 1002 "TinyMT32 Pseudorandom Number Generator (PRNG)", RFC 8682, 1003 DOI 10.17487/RFC8682, January 2020, 1004 . 1006 7.2. Informative References 1008 [Agrawal09] 1009 Agrawal, S. and D. Boneh, "Homomorphic MACs: MAC-based 1010 integrity for network coding", International Conference on 1011 Applied Cryptography and Network Security , 2009. 1013 [Bhattad05] 1014 Bhattad, K. and K.R. Narayanan, "Weakly Secure Network 1015 Coding", ISIT , 2007. 1017 [Byers20] Byers, J.W. and M. Luby, "Liquid Data Networking", ICN , 1018 2020. 1020 [Dong20] Dong, Y., Jin, S., Yang, S., and H.H.F. Yin, "Network 1021 Utility Maximization for BATS Code enabled Multihop 1022 Wireless Networks", ICC , 2020. 1024 [Li03] Li, S.-Y.R., Yeung, R.W., and N. Cai, "Linear Network 1025 Coding", IEEE Transactions on Information Theory , 2003. 1027 [RFC6330] Luby, M., Shokrollahi, A., Watson, M., Stockhammer, T., 1028 and L. Minder, "RaptorQ Forward Error Correction Scheme 1029 for Object Delivery", RFC 6330, DOI 10.17487/RFC6330, 1030 August 2011, . 1032 [Sprea19] Sprea, N., Bashir, M., Truhachev, D., Srinivas, K.V., 1033 Schlegel, C., and C. Claudio Sacchi, "BATS Coding for 1034 Underwater Acoustic Communication Networks", OCEANS , 1035 2019. 1037 [TC-Wikipedia] 1038 "Trusted Computing", 1039 Wikipedia https://en.wikipedia.org/wiki/Trusted_Computing. 1041 [Toh02] Toh, C.K., "Ad Hoc Mobile Wireless Networks", Prentice 1042 Hall Publishers , 2002. 1044 [Yang14] Yang, S. and R.W. Yeung, "Batched Sparse Codes", IEEE 1045 Transactions on Information Theory 60(9), 5322-5346, 2014. 1047 [Yang17] Yang, S. and R.W. Yeung, "BATS Codes: Theory and 1048 Practice", Morgan & Claypool Publishers , 2017. 1050 [Yin19] Yin, H.H.F., Tang, B., Ng, K.H., Yang, S., Wang, X., and 1051 Q. Zhou, "A Unified Adaptive Recoding Framework for 1052 Batched Network Coding", ISIT , 2019. 1054 [Yin20] Yin, H.H.F., Yeung, R.W., and S. Yang, "A Protocol Design 1055 Paradigm for Batched Sparse Codes", Entroy , 2020. 1057 [Yu08] Yu, Z., Wei, Y., Ramkumar, B., and Y. Guan, "An Efficient 1058 Signature-Based Scheme for Securing Network Coding Against 1059 Pollution Attacks", INFOCOM , 2008. 1061 [Zhang16] Zhang, G. and Z. Xu, "Combing CCN with network coding: An 1062 architectural perspective", Computer Networks , 2016. 1064 [Zhao07] Zhao, F., Kalker, T., Medard, M., and K.J. Han, 1065 "Signatures for content distribution with network coding", 1066 ISIT , 2007. 1068 Acknowledgments 1070 The authors would like to thank the NWCRG chairs, Vincent Roca (our 1071 shepherd) and Marie-Jose Montpetit; and all those who provided 1072 comments -- namely (in alphabetical order), Emmanuel Lochin, David 1073 Oran, and Colin Perkins. 1075 Authors' Addresses 1077 Shenghao Yang 1078 CUHK(SZ) 1079 Shenzhen 1080 Guangdong, 1081 China 1083 Phone: +86 755 8427 3827 1084 Email: shyang@cuhk.edu.cn 1086 Xuan Huang 1087 CUHK 1088 Hong Kong 1089 Hong Kong SAR, 1090 China 1092 Phone: +852 3943 8375 1093 Email: 1155136647@link.cuhk.edu.hk 1095 Raymond W. Yeung 1096 CUHK 1097 Hong Kong 1098 Hong Kong SAR, 1099 China 1101 Phone: +852 3943 8375 1102 Email: whyeung@ie.cuhk.edu.hk 1103 John K. Zao 1104 NCTU 1105 Hsinchu 1106 Taiwan, 1107 China 1109 Email: jkzao@ieee.org