| rfc9768.original | rfc9768.txt | |||
|---|---|---|---|---|
| TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
| Internet-Draft Independent | Request for Comments: 9768 Independent | |||
| Updates: 3168 (if approved) M. Kühlewind | Updates: 3168 M. Kühlewind | |||
| Intended status: Standards Track Ericsson | Category: Standards Track Ericsson | |||
| Expires: 11 September 2025 R. Scheffenegger | ISSN: 2070-1721 R. Scheffenegger | |||
| NetApp | NetApp | |||
| 10 March 2025 | August 2025 | |||
| More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
| draft-ietf-tcpm-accurate-ecn-34 | ||||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is a mechanism where network | Explicit Congestion Notification (ECN) is a mechanism by which | |||
| nodes can mark IP packets instead of dropping them to indicate | network nodes can mark IP packets instead of dropping them to | |||
| incipient congestion to the endpoints. Receivers with an ECN-capable | indicate incipient congestion to the endpoints. Receivers with an | |||
| transport protocol feed back this information to the sender. ECN was | ECN-capable transport protocol feed back this information to the | |||
| originally specified for TCP in such a way that only one feedback | sender. ECN was originally specified for TCP in such a way that only | |||
| signal can be transmitted per Round-Trip Time (RTT). Recent new TCP | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
| mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP) | Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center | |||
| or Low Latency, Low Loss, and Scalable Throughput (L4S) need more | TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) | |||
| Accurate ECN (AccECN) feedback information whenever more than one | need more Accurate ECN (AccECN) feedback information whenever more | |||
| marking is received in one RTT. This document updates the original | than one marking is received in one RTT. This document updates the | |||
| ECN specification in RFC 3168 to specify a scheme that provides more | original ECN specification defined in RFC 3168 by specifying a scheme | |||
| than one feedback signal per RTT in the TCP header. Given TCP header | that provides more than one feedback signal per RTT in the TCP | |||
| space is scarce, it allocates a reserved header bit previously | header. Given TCP header space is scarce, it allocates a reserved | |||
| assigned to the ECN-Nonce. It also overloads the two existing ECN | header bit previously assigned to the ECN-nonce. It also overloads | |||
| flags in the TCP header. The resulting extra space is additionally | the two existing ECN flags in the TCP header. The resulting extra | |||
| exploited to feed back the IP-ECN field received during the TCP | space is additionally exploited to feed back the IP-ECN field | |||
| connection establishment. Supplementary feedback information can | received during the TCP connection establishment. Supplementary | |||
| optionally be provided in two new TCP option alternatives, which are | feedback information can optionally be provided in two new TCP option | |||
| never used on the TCP SYN. The document also specifies the treatment | alternatives, which are never used on the TCP SYN. The document also | |||
| of this updated TCP wire protocol by middleboxes. | specifies the treatment of this updated TCP wire protocol by | |||
| middleboxes. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
| provisions of BCP 78 and BCP 79. | ||||
| Internet-Drafts are working documents of the Internet Engineering | ||||
| Task Force (IETF). Note that other groups may also distribute | ||||
| working documents as Internet-Drafts. The list of current Internet- | ||||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
| Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
| and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
| time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
| material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
| Internet Standards is available in Section 2 of RFC 7841. | ||||
| This Internet-Draft will expire on 11 September 2025. | Information about the current status of this document, any errata, | |||
| and how to provide feedback on it may be obtained at | ||||
| https://www.rfc-editor.org/info/rfc9768. | ||||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
| license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
| and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
| extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
| described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
| provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
| in the Revised BSD License. | ||||
| This document may contain material from IETF Documents or IETF | This document may contain material from IETF Documents or IETF | |||
| Contributions published or made publicly available before November | Contributions published or made publicly available before November | |||
| 10, 2008. The person(s) controlling the copyright in some of this | 10, 2008. The person(s) controlling the copyright in some of this | |||
| material may not have granted the IETF Trust the right to allow | material may not have granted the IETF Trust the right to allow | |||
| modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
| Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
| the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
| outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
| not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
| it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
| than English. | than English. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction | |||
| 1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 5 | 1.1. Document Roadmap | |||
| 1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Goals | |||
| 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Terminology | |||
| 1.4. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7 | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
| 2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 9 | 2. AccECN Protocol Overview and Rationale | |||
| 2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 10 | 2.1. Capability Negotiation | |||
| 2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 11 | 2.2. Feedback Mechanism | |||
| 2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 11 | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
| 2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 12 | 2.4. Feedback Metrics | |||
| 2.5. Generic (Mechanistic) Reflector . . . . . . . . . . . . . 12 | 2.5. Generic (Mechanistic) Reflector | |||
| 3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 13 | 3. AccECN Protocol Specification | |||
| 3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 13 | 3.1. Negotiating to Use AccECN | |||
| 3.1.1. Negotiation during the TCP three-way handshake . . . 13 | 3.1.1. Negotiation During the TCP Three-Way Handshake | |||
| 3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 15 | 3.1.2. Backward Compatibility | |||
| 3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 17 | 3.1.3. Forward Compatibility | |||
| 3.1.4. Multiple SYNs or SYN/ACKs . . . . . . . . . . . . . . 18 | 3.1.4. Multiple SYNs or SYN/ACKs | |||
| 3.1.4.1. Retransmitted SYNs . . . . . . . . . . . . . . . 18 | 3.1.4.1. Retransmitted SYNs | |||
| 3.1.4.2. Retransmitted SYN/ACKs . . . . . . . . . . . . . 19 | 3.1.4.2. Retransmitted SYN/ACKs | |||
| 3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 20 | 3.1.5. Implications of AccECN Mode | |||
| 3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 24 | 3.2. AccECN Feedback | |||
| 3.2.1. Initialization of Feedback Counters . . . . . . . . . 25 | 3.2.1. Initialization of Feedback Counters | |||
| 3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 25 | 3.2.2. The ACE Field | |||
| 3.2.2.1. ACE Field on the ACK of the SYN/ACK . . . . . . . 26 | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
| 3.2.2.2. Encoding and Decoding Feedback in the ACE | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
| Field . . . . . . . . . . . . . . . . . . . . . . . 29 | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
| 3.2.2.3. Testing for Mangling of the IP/ECN Field . . . . 31 | 3.2.2.4. Testing for Zeroing of the ACE Field | |||
| 3.2.2.4. Testing for Zeroing of the ACE Field . . . . . . 33 | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
| 3.2.2.5. Safety against Ambiguity of the ACE Field . . . . 34 | 3.2.3. The AccECN Option | |||
| 3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 37 | ||||
| 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option | 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option | |||
| Fields . . . . . . . . . . . . . . . . . . . . . . 39 | Fields | |||
| 3.2.3.2. Path Traversal of the AccECN Option . . . . . . . 39 | 3.2.3.2. Path Traversal of the AccECN Option | |||
| 3.2.3.3. Usage of the AccECN TCP Option . . . . . . . . . 44 | 3.2.3.3. Usage of the AccECN TCP Option | |||
| 3.3. AccECN Compliance Requirements for TCP Proxies, Offload | 3.3. AccECN Compliance Requirements for TCP Proxies, Offload | |||
| Engines and other Middleboxes . . . . . . . . . . . . . . 46 | Engines, and Other Middleboxes | |||
| 3.3.1. Requirements for TCP Proxies . . . . . . . . . . . . 46 | 3.3.1. Requirements for TCP Proxies | |||
| 3.3.2. Requirements for Transparent Middleboxes and TCP | 3.3.2. Requirements for Transparent Middleboxes and TCP | |||
| Normalizers . . . . . . . . . . . . . . . . . . . . . 46 | Normalizers | |||
| 3.3.3. Requirements for TCP ACK Filtering . . . . . . . . . 47 | 3.3.3. Requirements for TCP ACK Filtering | |||
| 3.3.4. Requirements for TCP Segmentation Offload and Large | 3.3.4. Requirements for TCP Segmentation Offload and Large | |||
| Receive Offload . . . . . . . . . . . . . . . . . . . 48 | Receive Offload | |||
| 4. Updates to RFC 3168 . . . . . . . . . . . . . . . . . . . . . 49 | 4. Updates to RFC 3168 | |||
| 5. Interaction with TCP Variants . . . . . . . . . . . . . . . . 51 | 5. Interaction with TCP Variants | |||
| 5.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 51 | 5.1. Compatibility with SYN Cookies | |||
| 5.2. Compatibility with TCP Experiments and Common TCP | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
| Options . . . . . . . . . . . . . . . . . . . . . . . . . 51 | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
| 5.3. Compatibility with Feedback Integrity Mechanisms . . . . 52 | 6. Summary: Protocol Properties | |||
| 6. Summary: Protocol Properties . . . . . . . . . . . . . . . . 53 | 7. IANA Considerations | |||
| 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 | 8. Security and Privacy Considerations | |||
| 8. Security and Privacy Considerations . . . . . . . . . . . . . 57 | 9. References | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 58 | 9.1. Normative References | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 58 | 9.2. Informative References | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 59 | Appendix A. Example Algorithms | |||
| Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 62 | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option . . 62 | ||||
| A.2. Example Algorithm for Safety Against Long Sequences of ACK | A.2. Example Algorithm for Safety Against Long Sequences of ACK | |||
| Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 63 | Loss | |||
| A.2.1. Safety Algorithm without the AccECN Option . . . . . 64 | A.2.1. Safety Algorithm Without the AccECN Option | |||
| A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 66 | A.2.2. Safety Algorithm with the AccECN Option | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked | A.3. Example Algorithm to Estimate Marked Bytes from Marked | |||
| Packets . . . . . . . . . . . . . . . . . . . . . . . . . 68 | Packets | |||
| A.4. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 68 | A.4. Example Algorithm to Count Not-ECT Bytes | |||
| Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 69 | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 69 | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 70 | B.2. Four Codepoints in the SYN/ACK | |||
| B.3. Space for Future Evolution . . . . . . . . . . . . . . . 70 | B.3. Space for Future Evolution | |||
| Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 72 | Acknowledgements | |||
| Comments Solicited . . . . . . . . . . . . . . . . . . . . . . . 72 | Authors' Addresses | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 73 | ||||
| 1. Introduction | 1. Introduction | |||
| Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | |||
| network nodes can mark IP packets instead of dropping them to | which network nodes can mark IP packets instead of dropping them to | |||
| indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
| ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
| sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
| only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
| (RTT). This is sufficient for congestion control scheme like Reno | (RTT). This is sufficient for congestion control schemes like Reno | |||
| [RFC6582] and Cubic [RFC9438], as those schemes reduce their | [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | |||
| congestion window by a fixed factor if congestion occurs within an | congestion window by a fixed factor if congestion occurs within an | |||
| RTT independent of the number of received congestion markings. | RTT independent of the number of received congestion markings. | |||
| Recently, proposed mechanisms like Congestion Exposure (ConEx | Recently, proposed mechanisms like Congestion Exposure (ConEx | |||
| [RFC7713]), DCTCP [RFC8257] or L4S [RFC9330] need to know when more | [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | |||
| than one marking is received in one RTT, which is information that | than one marking is received in one RTT, which is information that | |||
| cannot be provided by the feedback scheme as specified in [RFC3168]. | cannot be provided by the feedback scheme as specified in [RFC3168]. | |||
| This document specifies an update to the ECN feedback scheme of RFC | This document specifies an update to the ECN feedback scheme of RFC | |||
| 3168 that provides more accurate information and could be used by | 3168 that provides more accurate information and could be used by | |||
| these and potentially other future TCP extensions, while still also | these and potentially other future TCP extensions, while still also | |||
| supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
| one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
| IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
| a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
| as fast as the network can carry it, but no faster. A fuller | as fast as the network can carry it, but no faster. A fuller | |||
| treatment of the motivation for this specification is given in the | description of the motivation for this specification is given in the | |||
| associated requirements document [RFC7560]. | associated requirements document [RFC7560]. | |||
| This document specifies a standards track scheme for ECN feedback in | This document specifies a Standards Track scheme for ECN feedback in | |||
| the TCP header to provide more than one feedback signal per RTT. It | the TCP header to provide more than one feedback signal per RTT. It | |||
| will be called the more Accurate ECN feedback scheme, or AccECN for | is called the more "Accurate ECN" feedback scheme, or AccECN for | |||
| short. This document updates RFC 3168 with respect to negotiation | short. This document updates RFC 3168 with respect to negotiation | |||
| and use of the feedback scheme for TCP. All aspects of RFC 3168 | and use of the feedback scheme for TCP. All aspects of RFC 3168 | |||
| other than the TCP feedback scheme and its negotiation remain | other than the TCP feedback scheme and its negotiation remain | |||
| unchanged by this specification. In particular the definition of ECN | unchanged by this specification. In particular, the definition of | |||
| at the IP layer is unaffected. Section 4 gives a more detailed | ECN at the IP layer is unaffected. Section 4 details the aspects of | |||
| specification of exactly which aspects of RFC 3168 this document | RFC 3168 that are updated by this document. | |||
| updates. | ||||
| This document uses the term Classic ECN feedback when it needs to | This document uses the term "Classic ECN feedback" when it needs to | |||
| distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | |||
| AccECN TCP feedback scheme. AccECN is intended to offer a complete | AccECN TCP feedback scheme. AccECN is intended to offer a complete | |||
| replacement for Classic TCP/ECN feedback, not a fork in the design of | replacement for Classic TCP/ECN feedback, not a fork in the design of | |||
| TCP. AccECN feedback complements TCP's loss feedback and it can | TCP. AccECN feedback complements TCP's loss feedback and it can | |||
| coexist alongside hosts using Classic TCP/ECN feedback. So its | coexist alongside hosts using Classic TCP/ECN feedback. So its | |||
| applicability is intended to include the public Internet as well as | applicability is intended to include the public Internet as well as | |||
| private IP network such as data centres (and even any non-IP networks | private IP networks such as data centres (and even any non-IP | |||
| over which TCP is used), whether or not any nodes on the path support | networks over which TCP is used), whether or not any nodes on the | |||
| ECN, of whatever flavour. | path support ECN, of whatever flavour. | |||
| AccECN feedback overloads the two existing ECN flags in the TCP | AccECN feedback overloads the two existing ECN flags in the TCP | |||
| header and allocates the currently reserved flag (previously called | header and allocates the currently reserved flag (previously called | |||
| NS) in the TCP header, to be used as one three-bit counter field for | NS) in the TCP header to be used as one 3-bit counter field for | |||
| feeding back the number of packets marked as congestion experienced | feeding back the number of packets marked as congestion experienced | |||
| (CE). Given the new definitions of these three bits, both ends have | (CE). Given the new definitions of these three bits, both ends have | |||
| to support the new wire protocol before it can be used. Therefore, | to support the new wire protocol before it can be used. Therefore, | |||
| during the TCP handshake, the two ends use these three bits in the | during the TCP handshake, the two ends use these three bits in the | |||
| TCP header to negotiate the most advanced feedback protocol that they | TCP header to negotiate the most advanced feedback protocol that they | |||
| can both support, in a way that is backward compatible with | can both support, in a way that is backward compatible with | |||
| [RFC3168]. | [RFC3168]. | |||
| AccECN is solely a change to the TCP wire protocol; it covers the | AccECN is solely a change to the TCP wire protocol; it covers the | |||
| negotiation and signaling of more Accurate ECN feedback from a TCP | negotiation and signaling of more Accurate ECN feedback from a TCP | |||
| Data Receiver to a Data Sender. It is completely independent of how | Data Receiver to a Data Sender. It is completely independent of how | |||
| TCP might respond to congestion feedback, which is out of scope, but | TCP might respond to congestion feedback, which is out of scope, but | |||
| ultimately the motivation for Accurate ECN feedback. Like Classic | ultimately the motivation for Accurate ECN feedback. Like Classic | |||
| ECN feedback, AccECN can be used by standard Reno or CUBIC congestion | ECN feedback, AccECN can be used by standard Reno or CUBIC congestion | |||
| control [RFC5681] [RFC9438] to respond to the existence of at least | control [RFC5681] [RFC9438] to respond to the existence of at least | |||
| one congestion notification within a round trip. Or, unlike Reno or | one congestion notification within a round trip. Or, unlike Reno or | |||
| CUBIC, AccECN can be used to respond to the extent of congestion | CUBIC, AccECN can be used to respond to the extent of congestion | |||
| notification over a round trip, as for example DCTCP does in | notification over a round trip, as for example DCTCP does in | |||
| controlled environments [RFC8257]. For congestion response, this | controlled environments [RFC8257]. For congestion response, this | |||
| specification refers to the original ECN specificiation adopted in | specification refers to the original ECN specification adopted in | |||
| 2001 [RFC3168], as updated by the more relaxed rules introduced in | 2001 [RFC3168], as updated by the more relaxed rules introduced in | |||
| 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | |||
| Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | |||
| Alternative Backoff with ECN (ABE) [RFC8511]. | Alternative Backoff with ECN (ABE) [RFC8511]. | |||
| Section 5.2 explains how AccECN is compatible with current commonly | Section 5.2 explains how AccECN is compatible with current commonly | |||
| used TCP options, and a number of current experimental modifications | used TCP options, and a number of current experimental modifications | |||
| to TCP, as well as SYN cookies. | to TCP, as well as SYN cookies. | |||
| 1.1. Document Roadmap | 1.1. Document Roadmap | |||
| The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
| (Section 1.2). Then, terminology is defined (Section 1.3) and a | (Section 1.2). Then, terminology is defined (Section 1.3) and a | |||
| recap of existing prerequisite technology is given (Section 1.4). | recap of existing prerequisite technology is given (Section 1.4). | |||
| Section 2 gives an informative overview of the AccECN protocol. Then | Section 2 gives an informative overview of the AccECN protocol. Then | |||
| Section 3 gives the normative protocol specification, and Section 3.3 | Section 3 gives the normative protocol specification, and Section 3.3 | |||
| collects together requirements for proxies, offload engines and other | collects requirements for proxies, offload engines, and other | |||
| middleboxes. Section 4 clarifies which aspects of RFC 3168 are | middleboxes. Section 4 clarifies which aspects of RFC 3168 are | |||
| updated by AccECN. Section 5 assesses the interaction of AccECN with | updated by AccECN. Section 5 assesses the interaction of AccECN with | |||
| commonly used variants of TCP, whether standardized or not. Then | commonly used variants of TCP, whether they are standardized or not. | |||
| Section 6 summarizes the features and properties of AccECN. | Then Section 6 summarizes the features and properties of AccECN. | |||
| Section 7 summarizes the protocol fields and numbers that IANA will | Section 7 summarizes the protocol fields and numbers that IANA | |||
| need to assign and Section 8 points to the aspects of the protocol | assigned, and Section 8 points to the aspects of the protocol that | |||
| that will be of interest to the security community. | will be of interest to the security community. | |||
| Appendix A gives pseudocode examples for the various algorithms that | Appendix A gives pseudocode examples for the various algorithms that | |||
| AccECN uses and Appendix B explains why AccECN uses flags in the main | AccECN uses, and Appendix B explains why AccECN uses flags in the | |||
| TCP header and quantifies the space left for future use. | main TCP header and quantifies the space left for future use. | |||
| 1.2. Goals | 1.2. Goals | |||
| [RFC7560] enumerates requirements that a candidate feedback scheme | [RFC7560] enumerates requirements that a candidate feedback scheme | |||
| will need to satisfy, under the headings: resilience, timeliness, | needs to satisfy, under the headings: resilience, timeliness, | |||
| integrity, accuracy (including ordering and lack of bias), | integrity, accuracy (including ordering and lack of bias), | |||
| complexity, overhead and compatibility (both backward and forward). | complexity, overhead, and compatibility (both backward and forward). | |||
| It recognizes that a perfect scheme that fully satisfies all the | It recognizes that a perfect scheme that fully satisfies all the | |||
| requirements is unlikely and trade-offs between requirements are | requirements is unlikely and trade-offs between requirements are | |||
| likely. Section 6 presents the properties of AccECN against these | likely. Section 6 considers the properties of AccECN against these | |||
| requirements and discusses the trade-offs made. | requirements and discusses the trade-offs. | |||
| The requirements document recognizes that a protocol as ubiquitous as | The requirements document recognizes that a protocol as ubiquitous as | |||
| TCP needs to be able to serve as-yet-unspecified requirements. | TCP needs to be able to serve as-yet-unspecified requirements. | |||
| Therefore an AccECN receiver acts as a generic (mechanistic) | Therefore, an AccECN receiver acts as a generic (mechanistic) | |||
| reflector of congestion information with the aim that in future new | reflector of congestion information with the aim that new sender | |||
| sender behaviours can be deployed unilaterally (see Section 2.5). | behaviours can be deployed unilaterally (see Section 2.5) in the | |||
| future. | ||||
| 1.3. Terminology | 1.3. Terminology | |||
| AccECN: The more Accurate ECN feedback scheme will be called AccECN | AccECN: The more Accurate ECN feedback scheme is called AccECN for | |||
| for short. | short. | |||
| Classic ECN: The ECN protocol specified in [RFC3168]. | Classic ECN: The ECN protocol specified in [RFC3168]. | |||
| Classic ECN feedback: The feedback aspect of the ECN protocol | Classic ECN feedback: The feedback aspect of the ECN protocol | |||
| specified in [RFC3168], including generation, encoding, | specified in [RFC3168], including generation, encoding, | |||
| transmission and decoding of feedback, but not the Data Sender's | transmission and decoding of feedback, but not the Data Sender's | |||
| subsequent response to that feedback. | subsequent response to that feedback. | |||
| ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | |||
| skipping to change at page 7, line 30 ¶ | skipping to change at line 308 ¶ | |||
| data and sends AccECN feedback. | data and sends AccECN feedback. | |||
| Data Sender: The endpoint of a TCP half-connection that sends data | Data Sender: The endpoint of a TCP half-connection that sends data | |||
| and receives AccECN feedback. | and receives AccECN feedback. | |||
| In a mild abuse of terminology, this document sometimes refers to | In a mild abuse of terminology, this document sometimes refers to | |||
| 'TCP packets' instead of 'TCP segments'. | 'TCP packets' instead of 'TCP segments'. | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 1.4. Recap of Existing ECN feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
| Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
| two parts conceptionally. In the forward direction, alongside the | two parts conceptionally. In the forward direction, alongside the | |||
| data stream, it uses a two-bit field in the IP header. This is | data stream, it uses a 2-bit field in the IP header. This is | |||
| referred to as IP-ECN later on. This signal carried in the IP (Layer | referred to as IP-ECN later on. This signal carried in the IP (Layer | |||
| 3) header is exposed to network devices and may be modified when such | 3) header is exposed to network devices and may be modified when such | |||
| a device starts to experience congestion (see Table 1). The second | a device starts to experience congestion (see Table 1). The second | |||
| part is the feedback mechanism, by which the original data sender is | part is the feedback mechanism, by which the original data sender is | |||
| notified of the current congestion state of the intermediate path. | notified of the current congestion state of the intermediate path. | |||
| That returned signal is carried in a protocol specific manner, and is | That returned signal is carried in a protocol-specific manner, and is | |||
| not to be modified by intermediate network devices. While ECN is in | not to be modified by intermediate network devices. While ECN is in | |||
| active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | |||
| [RFC6679] and Remote Direct Memory Access over Converged Ethernet | [RFC6679], and Remote Direct Memory Access over Converged Ethernet | |||
| [RoCEv2], this document only concerns itself with the specific | [RoCEv2], this document only concerns itself with the specific | |||
| implementation for the TCP protocol. | implementation for the TCP protocol. | |||
| Once ECN has been negotiated for a transport layer connection, the | Once ECN has been negotiated for a transport layer connection, the | |||
| Data Sender for either half-connection can set two possible | Data Sender for either half-connection can set two possible | |||
| codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | |||
| indicate an ECN-capable transport (ECT). If the ECN codepoint is | indicate an ECN-capable transport (ECT). If the ECN codepoint is | |||
| 0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
| Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
| will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
| skipping to change at page 8, line 32 ¶ | skipping to change at line 356 ¶ | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| | 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
| +------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| Table 1: The ECN Field in the IP Header | Table 1: The ECN Field in the IP Header | |||
| In the TCP header the first two bits in byte 14 (the TCP header flags | In the TCP header, the first two bits in byte 14 (the TCP header | |||
| at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) and | flags at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) | |||
| Explicit Congestion notification Echo (ECE) in Figure 1) are defined | and Explicit Congestion notification Echo (ECE) in Figure 1) are | |||
| as flags for the use of Classic ECN [RFC3168]. A TCP Client | defined as flags for the use of Classic ECN [RFC3168]. A TCP Client | |||
| indicates that it supports Classic ECN feedback by setting (CWR,ECE) | indicates that it supports Classic ECN feedback by setting (CWR,ECE) | |||
| = (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic | = (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic | |||
| ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On | ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On | |||
| reception of a CE-marked packet at the IP layer, the Data Receiver | reception of a CE-marked packet at the IP layer, the Data Receiver | |||
| for that half-connection starts to set the Echo Congestion | for that half-connection starts to set the Echo Congestion | |||
| Experienced (ECE) flag continuously in the TCP header of ACKs, which | Experienced (ECE) flag continuously in the TCP header of ACKs, which | |||
| gives the signal resilience to loss or reordering of ACKs. The Data | gives the signal resilience to loss or reordering of ACKs. The Data | |||
| Sender for the same half-connection confirms that it has received at | Sender for the same half-connection confirms that it has received at | |||
| least one ECE signal by responding with the congestion window reduced | least one ECE signal by responding with the CWR flag, which allows | |||
| (CWR) flag, which allows the Data Receiver to stop repeating the ECN- | the Data Receiver to stop repeating the ECN-Echo flag. This always | |||
| Echo flag. This always leads to a full RTT of ACKs with ECE set. | leads to a full RTT of ACKs with ECE set. Thus Classic ECN cannot | |||
| Thus Classic ECN cannot feed back any additional CE markings arriving | feed back any additional CE markings arriving within this RTT. | |||
| within this RTT. | ||||
| The last bit in byte 13 of the TCP header (the TCP header flag at bit | The last bit in byte 13 of the TCP header (the TCP header flag at bit | |||
| offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN | offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN- | |||
| Nonce [RFC3540]. In the absence of widespread deployment RFC 3540 | nonce [RFC3540]. In the absence of widespread deployment, RFC 3540 | |||
| has been reclassified as historic [RFC8311] and the respective flag | was reclassified as Historic [RFC8311] and the respective flag was | |||
| has been marked as "reserved", making this TCP flag available for use | marked as "Reserved", which made this TCP flag available for use by | |||
| by AccECN instead. | AccECN instead. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 1: TCP header flags as defined before the Nonce Sum flag | Figure 1: TCP Header Flags as Defined Before the Nonce Sum Flag | |||
| reverted to Reserved | Reverted to Reserved | |||
| 2. AccECN Protocol Overview and Rationale | 2. AccECN Protocol Overview and Rationale | |||
| This section provides an informative overview of the AccECN protocol | This section provides an informative overview of the AccECN protocol | |||
| that will be normatively specified in Section 3 | that is normatively specified in Section 3. | |||
| Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
| connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
| acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
| whenever possible. | whenever possible. | |||
| The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
| * an essential feedback part that re-uses the TCP-ECN header bits | * an essential feedback part that reuses the TCP-ECN header bits for | |||
| for the Data Receiver to feed back the number of packets arriving | the Data Receiver to feed back the number of packets arriving with | |||
| with CE in the IP-ECN field. This provides more accuracy than | CE in the IP-ECN field. This provides more accuracy than Classic | |||
| Classic ECN feedback, but limited resilience against ACK loss; | ECN feedback, but limited resilience against ACK loss; | |||
| * a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
| AccECN TCP options that provide additional feedback on the number | AccECN TCP options that provide additional feedback on the number | |||
| of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
| codepoints in the IP-ECN field (not just CE marks). See the BCP | codepoints in the IP-ECN field (not just CE marks). See the BCP | |||
| on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
| rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
| should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
| packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
| than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
| suffer from middlebox interference. | suffer from middlebox interference. | |||
| The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
| available for TCP options and given the possibility that certain | available for TCP options and given the possibility that certain | |||
| incorrectly designed middleboxes might prevent TCP using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
| options. | options. | |||
| The essential feedback part overloads the previous definition of the | The essential feedback part overloads the previous definition of the | |||
| three flags in the TCP header that had been assigned for use by | three flags in the TCP header that had been assigned for use by | |||
| Classic ECN. This design choice deliberately allows AccECN peers to | Classic ECN. This design choice deliberately allows AccECN peers to | |||
| replace the Classic ECN feedback protocol, rather than leaving | replace the Classic ECN feedback protocol, rather than leaving | |||
| Classic ECN feedback intact and adding more accurate feedback | Classic ECN feedback intact and adding more accurate feedback | |||
| separately because: | separately because: | |||
| * this efficiently reuses scarce TCP header space, given TCP option | * this efficiently reuses scarce TCP header space, given TCP option | |||
| space is approaching saturation; | space is approaching saturation; | |||
| * a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
| in the design which modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
| feedback; | feedback; | |||
| * otherwise Classic and Accurate ECN feedback could give conflicting | * otherwise, Classic and Accurate ECN feedback could give | |||
| feedback about the same segment, which could open up new security | conflicting feedback about the same segment, which could open up | |||
| concerns and make implementations unnecessarily complex; | new security concerns and make implementations unnecessarily | |||
| complex; | ||||
| * middleboxes are more likely to faithfully forward the TCP ECN | * middleboxes are more likely to faithfully forward the TCP ECN | |||
| flags than newly defined areas of the TCP header. | flags than newly defined areas of the TCP header. | |||
| AccECN is designed to work even if the supplementary feedback part is | AccECN is designed to work even if the supplementary feedback part is | |||
| removed or zeroed out, as long as the essential feedback part gets | removed or zeroed out, as long as the essential feedback part gets | |||
| through. | through. | |||
| 2.1. Capability Negotiation | 2.1. Capability Negotiation | |||
| AccECN is a change to the wire protocol of the main TCP header, | AccECN changes the wire protocol of the main TCP header; therefore, | |||
| therefore it can only be used if both endpoints have been upgraded to | it can only be used if both endpoints have been upgraded to | |||
| understand it. The TCP Client signals support for AccECN on the | understand it. The TCP Client signals support for AccECN on the | |||
| initial SYN of a connection and the TCP Server signals whether it | initial SYN of a connection, and the TCP Server signals whether it | |||
| supports AccECN on the SYN/ACK. The TCP flags on the SYN that the | supports AccECN on the SYN/ACK. The TCP flags on the SYN that the | |||
| TCP Client uses to signal AccECN support have been carefully chosen | TCP Client uses to signal AccECN support have been carefully chosen | |||
| so that a TCP Server will interpret them as a request to support the | so that a TCP Server will interpret them as a request to support the | |||
| most recent variant of ECN feedback that it supports. Then the TCP | most recent variant of ECN feedback that it supports. Then the TCP | |||
| Client falls back to the same variant of ECN feedback. | Client falls back to the same variant of ECN feedback. | |||
| An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | |||
| option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
| the SYN/ACK and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
| whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
| 2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
| A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
| the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
| bytes marked CE, ECT(1) and ECT(0) in the IP-ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | |||
| counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
| header and TCP options. The fourth counter counts the number of | header and TCP options. The fourth counter counts the number of | |||
| packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
| packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
| The Data Sender maintains four equivalent counters for the half | The Data Sender maintains four equivalent counters for the half | |||
| connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
| match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
| little delay. | little delay. | |||
| Each ACK carries the three least significant bits (LSBs) of the | Each ACK carries the three least significant bits (LSBs) of the | |||
| packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
| renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24 | renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | |||
| LSBs of some or all of the byte counters can be optionally carried in | some or all of the byte counters can be optionally carried in an | |||
| an AccECN Option. For efficient use of limited option space, two | AccECN Option. For efficient use of limited option space, two | |||
| alternative forms of AccECN Option are specified with the fields in | alternative forms of the AccECN Option are specified with the fields | |||
| the opposite order to each other. | in the opposite order to each other. | |||
| 2.3. Delayed ACKs and Resilience Against ACK Loss | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
| With both the ACE and the AccECN Option mechanisms, the Data Receiver | With both the ACE and the AccECN Option mechanisms, the Data Receiver | |||
| continually repeats the current LSBs of each of its respective | continually repeats the current LSBs of each of its respective | |||
| counters. There is no need to acknowledge these continually repeated | counters. There is no need to acknowledge these continually repeated | |||
| counters, so the congestion window reduced (CWR) mechanism of | counters, so the Congestion Window Reduced (CWR) mechanism of | |||
| [RFC3168] is no longer used. Even if some ACKs are lost, the Data | [RFC3168] is no longer used. Even if some ACKs are lost, the Data | |||
| Sender ought to be able to infer how much to increment its own | Sender ought to be able to infer how much to increment its own | |||
| counters, even if the protocol field has wrapped. | counters, even if the protocol field has wrapped. | |||
| The 3-bit ACE field can wrap fairly frequently. Therefore, even if | The 3-bit ACE field can wrap fairly frequently. Therefore, even if | |||
| it appears to have incremented by one (say), the field might have | it appears to have incremented by one (say), the field might have | |||
| actually cycled completely then incremented by one. The Data | actually cycled completely and then incremented by one. The Data | |||
| Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
| that the ACE field would cycle. However ACKs received at the Data | that the ACE field would cycle. However, ACKs received at the Data | |||
| Sender could still cycle because a whole sequence of ACKs carrying | Sender could still cycle because a whole sequence of ACKs carrying | |||
| intervening values of the field might all be lost or delayed in | intervening values of the field might all be lost or delayed in | |||
| transit. | transit. | |||
| The fields in an AccECN Option are larger, but they will increment in | The fields in an AccECN Option are larger, but they will increment in | |||
| larger steps because they count bytes not packets. Nonetheless, | larger steps because they count bytes not packets. Nonetheless, | |||
| their size has been chosen such that a whole cycle of the field would | their size has been chosen such that a whole cycle of the field would | |||
| never occur between ACKs unless there had been an infeasibly long | never occur between ACKs unless there has been an infeasibly long | |||
| sequence of ACK losses. Therefore, provided that an AccECN Option is | sequence of ACK losses. Therefore, provided that an AccECN Option is | |||
| available, it can be treated as a dependable feedback channel. | available, it can be treated as a dependable feedback channel. | |||
| If an AccECN Option is not available, e.g., it is being stripped by a | If an AccECN Option is not available, e.g., it is being stripped by a | |||
| middlebox, the AccECN protocol will only feed back information on CE | middlebox, the AccECN protocol will only feed back information on CE | |||
| markings (using the ACE field). Although not ideal, this will be | markings (using the ACE field). Although not ideal, this will be | |||
| sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | |||
| will ever indicate more severe congestion than CE, even though future | will ever indicate more severe congestion than CE, even though future | |||
| uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | |||
| 3-bit ACE field is so small, when it is the only field available, the | 3-bit ACE field is so small, when it is the only field available, the | |||
| skipping to change at page 12, line 26 ¶ | skipping to change at line 536 ¶ | |||
| AccECN Option on an ACK. The rules are designed to ensure that the | AccECN Option on an ACK. The rules are designed to ensure that the | |||
| order in which different markings arrive at the receiver is | order in which different markings arrive at the receiver is | |||
| communicated to the sender (as long as options are reaching the | communicated to the sender (as long as options are reaching the | |||
| sender and as long as there is no ACK loss). Implementations are | sender and as long as there is no ACK loss). Implementations are | |||
| encouraged to send an AccECN Option more frequently, but this is left | encouraged to send an AccECN Option more frequently, but this is left | |||
| up to the implementer. | up to the implementer. | |||
| 2.4. Feedback Metrics | 2.4. Feedback Metrics | |||
| The CE packet counter in the ACE field and the CE byte counter in | The CE packet counter in the ACE field and the CE byte counter in | |||
| AccECN Options both provide feedback on received CE-marks. The CE | AccECN Options both provide feedback on received CE marks. The CE | |||
| packet counter includes control packets that do not have payload | packet counter includes control packets that do not have payload | |||
| data, while the CE byte counter solely includes marked payload bytes. | data, while the CE byte counter solely includes marked payload bytes. | |||
| If both are present, the byte counter in an AccECN Option will | If both are present, the byte counter in an AccECN Option will | |||
| provide the more accurate information needed for modern congestion | provide the more accurate information needed for modern congestion | |||
| control and policing schemes, such as L4S, DCTCP or ConEx. If AccECN | control and policing schemes, such as L4S, DCTCP, or ConEx. If | |||
| Options are stripped, a simple algorithm to estimate the number of | AccECN Options are stripped, a simple algorithm to estimate the | |||
| marked bytes from the ACE field is given in Appendix A.3. | number of marked bytes from the ACE field is given in Appendix A.3. | |||
| The AccECN design has been generalized so that it ought to be able to | The AccECN design has been generalized so that it ought to be able to | |||
| support possible future uses of the experimental ECT(1) codepoint | support possible future uses of the experimental ECT(1) codepoint | |||
| other than the L4S experiment [RFC9330], such as a lower severity or | other than the L4S experiment [RFC9330], such as a lower severity or | |||
| a more instant congestion signal than CE. | a more instant congestion signal than CE. | |||
| Feedback in bytes is provided to protect against the receiver or a | Feedback in bytes is provided to protect against the receiver or a | |||
| middlebox using attacks similar to 'ACK-Division' to artificially | middlebox using attacks similar to 'ACK-Division' to artificially | |||
| inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
| that TCP counts acknowledged bytes not packets. | that TCP counts acknowledge bytes not packets. | |||
| 2.5. Generic (Mechanistic) Reflector | 2.5. Generic (Mechanistic) Reflector | |||
| The ACE field provides feedback about CE markings in the IP-ECN field | The ACE field provides feedback about CE markings in the IP-ECN field | |||
| of both data and control packets. According to [RFC3168] the Data | of both data and control packets. According to [RFC3168], the Data | |||
| Sender is meant to set the IP-ECN field of control packets to Not- | Sender is meant to set the IP-ECN field of control packets to Not- | |||
| ECT. However, mechanisms in certain private networks (e.g., data | ECT. However, mechanisms in certain private networks (e.g., data | |||
| centres) set control packets to be ECN capable because they are | centres) set control packets to be ECN-capable because they are | |||
| precisely the packets that performance depends on most. | precisely the packets that performance depends on most. | |||
| For this reason, AccECN is designed to be a generic reflector of | For this reason, AccECN is designed to be a generic reflector of | |||
| whatever ECN markings it sees, whether or not they are compliant with | whatever ECN markings it sees, whether or not they are compliant with | |||
| a current standard. Then as standards evolve, Data Senders can | a current standard. Then as standards evolve, Data Senders can | |||
| upgrade unilaterally without any need for receivers to upgrade too. | upgrade unilaterally without any need for receivers to upgrade too. | |||
| It is also useful to be able to rely on generic reflection behaviour | It is also useful to be able to rely on generic reflection behaviour | |||
| when senders need to test for unexpected interference with markings | when senders need to test for unexpected interference with markings | |||
| (for instance Section 3.2.2.3, Section 3.2.2.4 and Section 3.2.3.2 of | (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | |||
| the present document and paragraph 2 of Section 20.2 of [RFC3168]). | document and paragraph 2 of Section 20.2 of [RFC3168]). | |||
| The initial SYN and SYN/ACK are the most critical control packets, so | The initial SYN and SYN/ACK are the most critical control packets, so | |||
| AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | |||
| ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | |||
| the SYN and SYN/ACK supports future scenarios in which SYNs might be | the SYN and SYN/ACK supports future scenarios in which SYNs might be | |||
| ECN-enabled (without prejudging whether they ought to be). For | ECN-enabled (without prejudging whether they ought to be). For | |||
| instance, [RFC8311] updates this aspect of RFC 3168 to allow | instance, [RFC8311] updates this aspect of RFC 3168 to allow | |||
| experimentation with ECN-capable TCP control packets. | experimentation with ECN-capable TCP control packets. | |||
| Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | |||
| not-ECT in compliance with RFC 3168, feedback on the state of the IP- | Not-ECT in compliance with RFC 3168, feedback on the state of the IP- | |||
| ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
| because middleboxes have been known to overwrite the IP-ECN field as | because middleboxes have been known to overwrite the IP-ECN field as | |||
| if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
| [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | |||
| ECT, but receives feedback that the IP-ECN field on the SYN arrived | ECT, but receives feedback that the IP-ECN field on the SYN arrived | |||
| with a different codepoint, it can detect such middlebox | with a different codepoint, it can detect such middlebox | |||
| interference. Previously, neither end knew what IP-ECN field the | interference. Previously, neither end knew what IP-ECN field the | |||
| other had sent. So, if a TCP Server received ECT or CE on a SYN, it | other sent. So, if a TCP Server received ECT or CE on a SYN, it | |||
| could not know whether it was invalid because only the TCP Client | could not know whether it was invalid because only the TCP Client | |||
| knew whether it originally marked the SYN as Not-ECT (or ECT). | knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
| Therefore, prior to AccECN, the Server's only safe course of action | Therefore, prior to AccECN, the Server's only safe course of action | |||
| in this example was to disable ECN for the connection. Instead, the | in this example was to disable ECN for the connection. Instead, the | |||
| AccECN protocol allows the Server and Client to feed back the ECN | AccECN protocol allows the Server and Client to feed back the ECN | |||
| field received on the SYN and SYN/ACK to their peer, which then has | field received on the SYN and SYN/ACK to their peer, which now has | |||
| all the information to decide whether the connection has to fall-back | all the information to decide whether the connection has to fall back | |||
| from supporting ECN (or not). | from supporting ECN (or not). | |||
| 3. AccECN Protocol Specification | 3. AccECN Protocol Specification | |||
| 3.1. Negotiating to use AccECN | 3.1. Negotiating to Use AccECN | |||
| 3.1.1. Negotiation during the TCP three-way handshake | 3.1.1. Negotiation During the TCP Three-Way Handshake | |||
| Given the ECN Nonce [RFC3540] has been reclassified as historic | Given the ECN-nonce [RFC3540] has been reclassified as Historic | |||
| [RFC8311], the TCP flag that was previously called NS (Nonce Sum) is | [RFC8311], the TCP flag that was previously called NS (Nonce Sum) is | |||
| renamed as the AE (Accurate ECN) flag (the TCP header flag at bit | renamed as the AE (Accurate ECN) flag (the TCP header flag at bit | |||
| offset 7 in Figure 2). See the IANA Considerations in Section 7. | offset 7 in Figure 2). See the IANA Considerations in Section 7. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | A | C | E | U | A | P | R | S | F | | | | | A | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | |||
| | | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 2: The new definition of the TCP header flags during the | Figure 2: The New Definition of the TCP Header Flags During the | |||
| TCP three-way handshake | TCP Three-Way Handshake | |||
| During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
| request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
| the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
| If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
| the above three flags set, it MUST set both its half connections into | the above three flags set, it MUST set both its half connections into | |||
| AccECN mode. Then it MUST set the AE, CWR and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
| SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
| back the IP-ECN field that arrived on the SYN. This applies whether | back the IP-ECN field that arrived on the SYN. This applies whether | |||
| or not the Server itself supports setting the IP-ECN field on a SYN | or not the Server itself supports setting the IP-ECN field on a SYN | |||
| or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
| When the TCP Server returns any of the 4 combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
| block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
| Server MUST NOT set one of these 4 combination of flags on the SYN/ | Server MUST NOT set one of these four combinations of flags on the | |||
| ACK unless the preceding SYN requested support for AccECN as above. | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
| above. | ||||
| Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
| supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
| that confirms that the TCP Server supports AccECN, the TCP Client | that confirms that the TCP Server supports AccECN, the TCP Client | |||
| MUST set both its half connections into AccECN mode. The TCP Client | MUST set both its half connections into AccECN mode. The TCP Client | |||
| MUST NOT enter AccECN mode (or any feedback mode) before it has | MUST NOT enter AccECN mode (or any feedback mode) before it has | |||
| received the first SYN/ACK. | received the first SYN/ACK. | |||
| Once in AccECN mode, a TCP Client or Server has the rights and | Once in AccECN mode, a TCP Client or Server has the rights and | |||
| obligations to participate in the ECN protocol defined in | obligations to participate in the ECN protocol defined in | |||
| Section 3.1.5. | Section 3.1.5. | |||
| The procedures to follow for retransmission of SYNs or SYN/ACKs are | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
| given in Section 3.1.4. | Section 3.1.4. | |||
| It is RECOMMENDED that the AccECN protocol is implemented alongside | It is RECOMMENDED that the AccECN protocol be implemented alongside | |||
| Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | |||
| with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | |||
| MUST also be implemented. | MUST also be implemented. | |||
| 3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
| The three flags set to 1 to indicate AccECN support on the SYN have | The three flags are set to 1 to indicate AccECN support on the SYN | |||
| been carefully chosen to enable natural fall-back to prior stages in | have been carefully chosen to enable natural fall-back to prior | |||
| the evolution of ECN. Table 2 tabulates all the negotiation | stages in the evolution of ECN. Table 2 tabulates all the | |||
| possibilities for ECN-related capabilities that involve at least one | negotiation possibilities for ECN-related capabilities that involve | |||
| AccECN-capable host. The entries in the first two columns have been | at least one AccECN-capable host. The entries in the first two | |||
| abbreviated, as follows: | columns have been abbreviated, as follows: | |||
| AccECN: Supports more Accurate ECN Feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
| specification) | specification) | |||
| Nonce: Supports ECN Nonce feedback [RFC3540] | Nonce: Supports ECN-nonce feedback [RFC3540] | |||
| ECN: Supports 'Classic' ECN feedback [RFC3168] | ECN: Supports 'Classic' ECN feedback [RFC3168] | |||
| No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
| packet drop. | packet drop. | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | |||
| | | | A->B | B->A | of Host A | | | | | A->B | B->A | of Host A | | |||
| | | | AE CWR ECE | AE CWR ECE | | | | | | AE CWR ECE | AE CWR ECE | | | |||
| +========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | |||
| | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | |||
| | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | |||
| | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| | AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) | | | AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) | | |||
| | AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN | | | AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN | | |||
| | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| | Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | | Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | |||
| | ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | | ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | |||
| | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| | AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | | AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | |||
| +--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| Table 2: ECN capability negotiation between Client (A) and | Table 2: ECN Capability Negotiation Between Client (A) and | |||
| Server (B) | Server (B) | |||
| Table 2 is divided into blocks each separated by an empty row. | Table 2 is divided into blocks, with each block separated by an empty | |||
| row. | ||||
| 1. The top block shows the case already described in Section 3.1 | 1. The top block shows the case already described in Section 3.1 | |||
| where both endpoints support AccECN and how the TCP Server (B) | where both endpoints support AccECN and how the TCP Server (B) | |||
| indicates congestion feedback. | indicates congestion feedback. | |||
| 2. The second block shows the cases where the TCP Client (A) | 2. The second block shows the cases where the TCP Client (A) | |||
| supports AccECN but the TCP Server (B) supports some earlier | supports AccECN but the TCP Server (B) supports some earlier | |||
| variant of TCP feedback, indicated in its SYN/ACK. Therefore, as | variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | |||
| soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | |||
| shown it MUST set both its half connections into the feedback | shown, it MUST set both its half connections into the feedback | |||
| mode shown in the rightmost column. If the TCP Client has set | mode shown in the rightmost column. If the TCP Client has set | |||
| itself into Classic ECN feedback mode it MUST then comply with | itself into Classic ECN feedback mode, it MUST comply with | |||
| [RFC3168]. | [RFC3168]. | |||
| An AccECN implementation has no need to recognize or support the | An AccECN implementation has no need to recognize or support the | |||
| Server response labelled 'Nonce' or ECN Nonce feedback more | Server response labelled 'Nonce' or ECN-nonce feedback more | |||
| generally [RFC3540], which has been reclassified as historic | generally [RFC3540], as RFC 3540 has been reclassified as | |||
| [RFC8311]. AccECN is compatible with alternative ECN feedback | Historic [RFC8311]. AccECN is compatible with alternative ECN | |||
| integrity approaches to the nonce (see Section 5.3). The SYN/ACK | feedback integrity approaches to the nonce (see Section 5.3). | |||
| labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for | The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | |||
| future use. A TCP Client (A) that receives such a SYN/ACK | reserved for future use. A TCP Client (A) that receives such a | |||
| follows the procedure for forward compatibility given in | SYN/ACK follows the procedure for forward compatibility given in | |||
| Section 3.1.3. | Section 3.1.3. | |||
| 3. The third block shows the cases where the TCP Server (B) supports | 3. The third block shows the cases where the TCP Server (B) supports | |||
| AccECN but the TCP Client (A) supports some earlier variant of | AccECN but the TCP Client (A) supports some earlier variant of | |||
| TCP feedback, indicated in its SYN. | TCP feedback, as indicated in its SYN. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,1,1) it MUST do one of the following: | (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | |||
| * set both its half connections into the Classic ECN feedback | * set both its half connections into the Classic ECN feedback | |||
| mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | |||
| shown. Then it MUST comply with [RFC3168]. | shown. Then it MUST comply with [RFC3168]. | |||
| * set both its half-connections into Not ECN mode and return a | * set both its half-connections into Not ECN mode and return a | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | |||
| disabled. This latter case is unlikely to be desirable, but | disabled. This latter case is unlikely to be desirable, but | |||
| it is allowed as a possibility, e.g., for minimal TCP | it is allowed as a possibility, e.g., for minimal TCP | |||
| implementations. | implementations. | |||
| When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
| (AE,CWR,ECE) = (0,0,0) it MUST set both its half connections into | (AE,CWR,ECE) = (0,0,0), it MUST set both its half connections | |||
| the Not ECN feedback mode, return a SYN/ACK with (AE,CWR,ECE) = | into the Not ECN feedback mode, return a SYN/ACK with | |||
| (0,0,0) as shown and continue with ECN disabled. | (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | |||
| 4. The fourth block displays a combination labelled `Broken'. Some | 4. The fourth block displays a combination labelled 'Broken'. Some | |||
| older TCP Server implementations incorrectly set the TCP-ECN | older TCP Server implementations incorrectly set the TCP-ECN | |||
| flags in the SYN/ACK by reflecting those in the SYN. Such broken | flags in the SYN/ACK by reflecting those in the SYN. Such broken | |||
| TCP Servers (B) cannot support ECN, so as soon as an AccECN- | TCP Servers (B) cannot support ECN; so as soon as an AccECN- | |||
| capable TCP Client (A) receives such a broken SYN/ACK it MUST | capable TCP Client (A) receives such a broken SYN/ACK, it MUST | |||
| fall back to Not ECN mode for both its half connections and | fall back to Not ECN mode for both its half connections and | |||
| continue with ECN disabled. | continue with ECN disabled. | |||
| The following additional rules do not fit the structure of the table, | The following additional rules do not fit the structure of the table, | |||
| but they complement it: | but they complement it: | |||
| Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
| with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | |||
| B. Host A MUST then enter the same feedback mode as it would have | B. Host A MUST then enter the same feedback mode as it would have | |||
| entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
| skipping to change at page 17, line 30 ¶ | skipping to change at line 782 ¶ | |||
| new TCP connection if they receive an in-window SYN packet during | new TCP connection if they receive an in-window SYN packet during | |||
| TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | |||
| state, it ought to ignore any previous state about the negotiation | state, it ought to ignore any previous state about the negotiation | |||
| of AccECN for that connection and renegotiate the feedback mode | of AccECN for that connection and renegotiate the feedback mode | |||
| according to Table 2. | according to Table 2. | |||
| 3.1.3. Forward Compatibility | 3.1.3. Forward Compatibility | |||
| If a TCP Server that implements AccECN receives a SYN with the three | If a TCP Server that implements AccECN receives a SYN with the three | |||
| TCP header flags (AE,CWR,ECE) set to any combination other than | TCP header flags (AE,CWR,ECE) set to any combination other than | |||
| (0,0,0), (0,1,1) or (1,1,1) and it does not have logic specific to | (0,0,0), (0,1,1), or (1,1,1) and it does not have logic specific to | |||
| such a combination, the Server MUST negotiate the use of AccECN as if | such a combination, the Server MUST negotiate the use of AccECN as if | |||
| the three flags had been set to (1,1,1). However, an AccECN Client | the three flags had been set to (1,1,1). However, an AccECN Client | |||
| implementation MUST NOT send a SYN with any combination other than | implementation MUST NOT send a SYN with any combination other than | |||
| the three listed. | the three listed. | |||
| If a TCP Client has sent a SYN requesting AccECN feedback with | If a TCP Client sent a SYN requesting AccECN feedback with | |||
| (AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently | (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | |||
| reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | |||
| logic specific to such a combination, the Client MUST enable AccECN | logic specific to such a combination, the Client MUST enable AccECN | |||
| mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
| as if it fed back that the IP-ECN field on the SYN had arrived | as if it fed back that the IP-ECN field on the SYN had arrived | |||
| unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
| SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| | For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| | present specification applies whether or not the three | | present specification applies whether or not the three | |||
| | remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
| All these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
| combinations on a SYN or SYN/ACK can rely on consistent behaviour | combinations on a SYN or SYN/ACK can rely on consistent behaviour | |||
| from the installed base of AccECN implementations. See Appendix B.3 | from the installed base of AccECN implementations. See Appendix B.3 | |||
| for related discussion. | for related discussion. | |||
| 3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
| 3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
| If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
| receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
| AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP ECN | |||
| flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
| the usual retransmission time-outs). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
| also fails to be acknowledged, in deployment scenarios where AccECN | also fails to be acknowledged, in deployment scenarios where AccECN | |||
| path traversal might be problematic, the TCP Client SHOULD send | path traversal might be problematic, the TCP Client SHOULD send | |||
| subsequent retransmissions of the SYN with the three TCP-ECN flags | subsequent retransmissions of the SYN with the three TCP-ECN flags | |||
| cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | |||
| the same initial sequence number (ISN) as the original SYN. | the same initial sequence number (ISN) as the original SYN. | |||
| Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
| middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
| measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
| to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
| e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective (e.g., attempting to negotiate AccECN on the SYN | |||
| only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
| congestion). | congestion). | |||
| Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
| be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
| specification of the other option(s) and any attempt to co-ordinate | specification of the other option(s) and any attempt to coordinate | |||
| fall-back between different modules of the stack. For instance, even | fall-back between different modules of the stack. For instance, even | |||
| if taking part in an [RFC8311] experiment that allows ECT on a SYN, | if taking part in an [RFC8311] experiment that allows ECT on a SYN, | |||
| it would be advisable to try it without. | it would be advisable to try it without. | |||
| Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
| failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
| attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
| attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
| specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
| so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
| skipping to change at page 19, line 15 ¶ | skipping to change at line 858 ¶ | |||
| All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
| in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
| negotiating different types of feedback have been sent within the | negotiating different types of feedback have been sent within the | |||
| same connection, including the possibility that they arrive out of | same connection, including the possibility that they arrive out of | |||
| order. As examples, the following non-normative bullets call out | order. As examples, the following non-normative bullets call out | |||
| those rules from Section 3.1.5 that apply to the above fall-back | those rules from Section 3.1.5 that apply to the above fall-back | |||
| strategies: | strategies: | |||
| * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | |||
| with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | |||
| from the Server in response to one, the other, or both and | from the Server in response to one, the other, or both, and | |||
| possibly reordered; | possibly reordered; | |||
| * Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
| first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
| switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
| receive or send; | receive or send; | |||
| * If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
| sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
| is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
| connection. Note that this rule is different to that of a Server | connection. Note that this rule is different than that of a | |||
| in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
| * Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Section 3.2.2.3, Section 3.2.2.4); | see Section 3.2.2.3, Section 3.2.2.4); | |||
| * Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| skipping to change at page 20, line 10 ¶ | skipping to change at line 902 ¶ | |||
| negotiating different types of feedback are sent within the same | negotiating different types of feedback are sent within the same | |||
| connection, including the possibility that they arrive out of order. | connection, including the possibility that they arrive out of order. | |||
| As examples, the following non-normative bullets call out those rules | As examples, the following non-normative bullets call out those rules | |||
| from Section 3.1.5 that apply to the above fall-back strategies: | from Section 3.1.5 that apply to the above fall-back strategies: | |||
| * An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
| to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
| to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
| whatever SYN/ACKs it might send; | whatever SYN/ACKs it might send; | |||
| * if a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
| (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
| ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | |||
| * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
| uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | |||
| field on the latest SYN to have arrived; | field on the latest SYN to have arrived; | |||
| * If a TCP Server enters AccECN mode then subsequently sends a SYN/ | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
| ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
| prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
| connection; | connection; | |||
| * Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
| respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
| sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
| see Section 3.2.2.3, Section 3.2.2.4); | see Sections 3.2.2.3, 3.2.2.4); | |||
| * Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
| to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
| the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
| itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
| 3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
| Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
| mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
| skipping to change at page 21, line 5 ¶ | skipping to change at line 946 ¶ | |||
| synchronization; | synchronization; | |||
| 'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
| as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
| An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
| Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
| * Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
| - MUST NOT switch into a different feedback mode to the one it | - MUST NOT switch into a different feedback mode than the one it | |||
| first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
| subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
| different types. | different types. | |||
| - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
| received after the implementation reaches the Established | received after the implementation reaches the Established | |||
| state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]; | |||
| Reason: Reaching established state implies that at least one | Reason: Reaching established state implies that at least one | |||
| SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
| skipping to change at page 22, line 35 ¶ | skipping to change at line 1024 ¶ | |||
| - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | |||
| (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | |||
| negotiate Classic ECN or AccECN feedback mode; | negotiate Classic ECN or AccECN feedback mode; | |||
| Rationale: There would be no point in the Server offering any | Rationale: There would be no point in the Server offering any | |||
| type of ECN feedback, because the Client will not be using ECN. | type of ECN feedback, because the Client will not be using ECN. | |||
| However, there is no interoperability reason to make this rule | However, there is no interoperability reason to make this rule | |||
| mandatory. | mandatory. | |||
| If for any reason a host is not willing to provide ECN feedback on a | If for any reason a host is not willing to provide ECN feedback on a | |||
| particular TCP connection, it SHOULD clear the AE, CWR and ECE flags | particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | |||
| in all SYN and/or SYN/ACK packets that it sends. | in all SYN and/or SYN/ACK packets that it sends. | |||
| Sending ECT: | Sending ECT: | |||
| * Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
| - MUST NOT set ECT if it is in Not ECN feedback mode. | - MUST NOT set ECT if it is in Not ECN feedback mode. | |||
| A Data Sender in AccECN mode: | A Data Sender in AccECN mode: | |||
| skipping to change at page 23, line 12 ¶ | skipping to change at line 1049 ¶ | |||
| - MAY not set ECT on any packet (for instance if it has reason to | - MAY not set ECT on any packet (for instance if it has reason to | |||
| believe such a packet would be blocked); | believe such a packet would be blocked); | |||
| A TCP Server in AccECN mode: | A TCP Server in AccECN mode: | |||
| - MUST NOT set ECT on any packet for the rest of the connection, | - MUST NOT set ECT on any packet for the rest of the connection, | |||
| if it has received or sent at least one valid SYN or Acceptable | if it has received or sent at least one valid SYN or Acceptable | |||
| SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
| This rule solely applies to a Server because, when a Server | This rule solely applies to a Server because, when a Server | |||
| enters AccECN mode it doesn't know for sure whether the Client | enters AccECN mode, it doesn't know for sure whether the Client | |||
| will end up in AccECN mode. But when a Client enters AccECN | will end up in AccECN mode. But when a Client enters AccECN | |||
| mode, it can be certain that the Server is already in AccECN | mode, it can be certain that the Server is already in AccECN | |||
| feedback mode. | feedback mode. | |||
| Congestion response: | Congestion response: | |||
| * A host in AccECN mode: | * A host in AccECN mode: | |||
| - is obliged to respond appropriately to AccECN feedback that | - is obliged to respond appropriately to AccECN feedback that | |||
| indicates there were ECN marks on packets it had previously | indicates there were ECN marks on packets it had previously | |||
| sent, where 'appropriately' is defined in Section 6.1 of | sent, where 'appropriately' is defined in Section 6.1 of | |||
| [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even when it is solely sending non-ECN-capable | feedback, even when it is solely sending non-ECN-capable | |||
| packets (for rationale, some examples and some exceptions see | packets (for rationale, some examples and some exceptions see | |||
| Section 3.2.2.3, Section 3.2.2.4). | Sections 3.2.2.3 and 3.2.2.4). | |||
| - is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
| feedback, even if it has sent or received a SYN or SYN/ACK | feedback, even if it has sent or received a SYN or SYN/ACK | |||
| packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | |||
| - MUST NOT set CWR to indicate that it has received and responded | - MUST NOT set CWR to indicate that it has received and responded | |||
| to indications of congestion. | to indications of congestion. | |||
| For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
| sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
| skipping to change at page 24, line 29 ¶ | skipping to change at line 1112 ¶ | |||
| - MUST NOT use reception of packets with ECT set in the IP-ECN | - MUST NOT use reception of packets with ECT set in the IP-ECN | |||
| field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
| Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
| peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
| could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
| 3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
| Each Data Receiver of each half connection maintains four counters, | Each Data Receiver of each half connection maintains four counters, | |||
| r.cep, r.ceb, r.e0b and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
| * The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
| for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
| point in the IP ECN field, including CE marked control packets and | point in the IP-ECN field, including CE-marked control packets and | |||
| retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
| * A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
| increment the r.ceb, r.e0b or r.e1b byte counters by the number of | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
| TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
| ECT(0) and ECT(1) codepoint in their IP-ECN field, including any | ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | |||
| payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
| including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
| Each Data Sender of each half connection maintains four counters, | Each Data Sender of each half connection maintains four counters, | |||
| s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
| counters at the Data Receiver. | counters at the Data Receiver. | |||
| A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
| ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
| feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
| specified in Section 3.2.3. | specified in Section 3.2.3. | |||
| Whenever a Data Receiver feeds back the value of any counter, it MUST | Whenever a Data Receiver feeds back the value of any counter, it MUST | |||
| report the most recent value, no matter whether it is in a pure ACK, | report the most recent value, no matter whether it is in a pure ACK, | |||
| or an ACK piggybacked on a packet used by the other half-connection, | or an ACK piggybacked on a packet used by the other half-connection, | |||
| whether new payload data or a retransmission. Therefore the feedback | whether a new payload data or a retransmission. Therefore, the | |||
| piggybacked on a retransmitted packet is unlikely to be the same as | feedback piggybacked on a retransmitted packet is unlikely to be the | |||
| the feedback on the original packet. | same as the feedback on the original packet. | |||
| 3.2.1. Initialization of Feedback Counters | 3.2.1. Initialization of Feedback Counters | |||
| When a host first enters AccECN mode, in its role as a Data Receiver | When a host first enters AccECN mode, in its role as a Data Receiver, | |||
| it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1 and r.ceb | it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | |||
| = 0, | r.ceb = 0, | |||
| Non-zero initial values are used to support a stateless handshake | Non-zero initial values are used to support a stateless handshake | |||
| (see Section 5.1) and to be distinct from cases where the fields are | (see Section 5.1) and to be distinct from cases where the fields are | |||
| incorrectly zeroed (e.g., by middleboxes - see Section 3.2.3.2.4). | incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | |||
| When a host enters AccECN mode, in its role as a Data Sender it | When a host enters AccECN mode, in its role as a Data Sender, it | |||
| initializes its counters to s.cep = 5, s.e0b = s.e1b = 1 and s.ceb = | initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | |||
| 0. | 0. | |||
| 3.2.2. The ACE Field | 3.2.2. The ACE Field | |||
| After AccECN has been negotiated on the SYN and SYN/ACK, both hosts | After AccECN has been negotiated on the SYN and SYN/ACK, both hosts | |||
| overload the three TCP flags (AE, CWR and ECE) in the main TCP header | overload the three TCP flags (AE, CWR, and ECE) in the main TCP | |||
| as one 3-bit field. Then the field is given a new name, ACE, as | header as one 3-bit field. Then the field is given a new name, ACE, | |||
| shown in Figure 3. | as shown in Figure 3. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | | U | A | P | R | S | F | | | | | | U | A | P | R | S | F | | |||
| | Header Length | Reserved | ACE | R | C | S | S | Y | I | | | Header Length | Reserved | ACE | R | C | S | S | Y | I | | |||
| | | | | G | K | H | T | N | N | | | | | | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 3: Definition of the ACE field within bytes 13 and 14 of | Figure 3: Definition of the ACE Field Within Bytes 13 and 14 of | |||
| the TCP Header (when AccECN has been negotiated and SYN=0). | the TCP Header (When AccECN Has Been Negotiated and SYN=0). | |||
| The original definition of these three flags in the TCP header, | The original definition of these three flags in the TCP header, | |||
| including the addition of support for the ECN Nonce, is shown for | including the addition of support for the ECN-nonce, is shown for | |||
| comparison in Figure 1. This specification does not rename these | comparison in Figure 1. This specification does not rename these | |||
| three TCP flags to ACE unconditionally; it merely overloads them with | three TCP flags to ACE unconditionally; it merely overloads them with | |||
| another name and definition once an AccECN connection has been | another name and definition once an AccECN connection has been | |||
| established. | established. | |||
| With one exception (Section 3.2.2.1), a host with both of its half- | With one exception (Section 3.2.2.1), a host with both of its half- | |||
| connections in AccECN mode MUST interpret the AE, CWR and ECE flags | connections in AccECN mode MUST interpret the AE, CWR, and ECE flags | |||
| as the 3-bit ACE counter on a segment with the SYN flag cleared | as the 3-bit ACE counter on a segment with the SYN flag cleared | |||
| (SYN=0). On such a packet, a Data Receiver MUST encode the three | (SYN=0). On such a packet, a Data Receiver MUST encode the 3 least | |||
| least significant bits of its r.cep counter into the ACE field that | significant bits of its r.cep counter into the ACE field that it | |||
| it feeds back to the Data Sender. The least significant bit is at | feeds back to the Data Sender. The least significant bit is at bit | |||
| bit offset 9 in Figure 3. A host MUST NOT interpret the 3 flags as a | offset 9 in Figure 3. A host MUST NOT interpret the three flags as a | |||
| 3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or | 3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or | |||
| if AccECN negotiation is incomplete or has not succeeded. | if AccECN negotiation is incomplete or has not succeeded. | |||
| Both parts of each of these conditions are equally important. For | Both parts of each of these conditions are equally important. For | |||
| instance, even if AccECN negotiation has been successful, the ACE | instance, even if AccECN negotiation has been successful, the ACE | |||
| field is not defined on any segments with SYN=1 (e.g., a | field is not defined on any segments with SYN=1 (e.g., a | |||
| retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
| SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
| a simultaneous open). | a simultaneous open). | |||
| skipping to change at page 26, line 46 ¶ | skipping to change at line 1221 ¶ | |||
| with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
| data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
| does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
| back which of the four values of the IP-ECN field arrived on the SYN/ | back which of the four values of the IP-ECN field arrived on the SYN/ | |||
| ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
| implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
| the ECN field is unlikely. | the ECN field is unlikely. | |||
| The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
| of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
| supports AccECN. The procedure for the TCP Server to follow if the | supports AccECN. If the final ACK of the handshake does not arrive | |||
| final ACK of the handshake does not arrive before its retransmission | before its retransmission timer expires, the TCP Server is follow the | |||
| timer expires is given in Section 3.1.4.2. | procedure given in Section 3.1.4.2. | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | | IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | |||
| | on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | |||
| +==================+================+=====================+ | +==================+================+=====================+ | |||
| | Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| | CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
| +------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| Table 3: The encoding of the ACE field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
| the SYN-ACK to reflect the SYN-ACK's IP-ECN field | the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | |||
| When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
| SYN=0 and no SACK blocks, instead of treating the ACE field as a | SYN=0 and no SACK blocks, instead of treating the ACE field as a | |||
| counter, it MUST infer the meaning of each possible value of the ACE | counter, it MUST infer the meaning of each possible value of the ACE | |||
| field from Table 4, which also shows the value that an AccECN Server | field from Table 4, which also shows the value that an AccECN Server | |||
| MUST set s.cep to as a result. | MUST set s.cep to as a result. | |||
| Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
| exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
| prefer to disable LRO until such an ACK has transitioned it out of | prefer to disable LRO until such an ACK has transitioned it out of | |||
| skipping to change at page 28, line 28 ¶ | skipping to change at line 1275 ¶ | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b101 | Currently Unused {Note | 5 | | | 0b101 | Currently Unused {Note | 5 | | |||
| | | 2} | | | | | 2} | | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b110 | CE | 6 | | | 0b110 | CE | 6 | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| | 0b111 | Currently Unused {Note | 5 | | | 0b111 | Currently Unused {Note | 5 | | |||
| | | 2} | | | | | 2} | | | |||
| +------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| Table 4: Meaning of the ACE field on the ACK of the SYN/ACK | Table 4: Meaning of the ACE Field on the ACK of the SYN/ACK | |||
| {Note 1}: If the Server is in AccECN mode and in SYN-RCVD state, and | Note 1: If the Server is in AccECN mode and in SYN-RCVD state, and | |||
| if it receives a value of zero on a pure ACK with SYN=0 and no SACK | if it receives a value of zero on a pure ACK with SYN=0 and | |||
| blocks, for the rest of the connection the Server MUST NOT set ECT on | no SACK blocks, for the rest of the connection the Server | |||
| outgoing packets and MUST NOT respond to AccECN feedback. | MUST NOT set ECT on outgoing packets and MUST NOT respond to | |||
| Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. | AccECN feedback. Nonetheless, as a Data Receiver, it MUST | |||
| NOT disable AccECN feedback. | ||||
| Any of the circumstances below could cause a value of zero but, | Any of the circumstances below could cause a value of zero | |||
| whatever the cause, the actions above would be the appropriate | but, whatever the cause, the actions above would be the | |||
| response: | appropriate response: | |||
| * The TCP Client has somehow entered No ECN feedback mode (most | * The TCP Client has somehow entered No ECN feedback mode | |||
| likely if the Server received a SYN or sent a SYN/ACK with | (most likely if the Server received a SYN or sent a SYN/ | |||
| (AE,CWR,ECE) = (0,0,0) after entering AccECN mode, but possible | ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | |||
| even if it didn't); | mode, but possible even if it didn't); | |||
| * The TCP Client genuinely might be in AccECN mode, but its count of | * The TCP Client genuinely might be in AccECN mode, but its | |||
| received CE marks might have caused the ACE field to wrap to zero. | count of received CE marks might have caused the ACE | |||
| This is highly unlikely, but not impossible because the Server | field to wrap to zero. This is highly unlikely, but not | |||
| might have already sent multiple packets while still in SYN-RCVD | impossible because the Server might have already sent | |||
| state, e.g., using TFO (see Section 5.2) and some might have been | multiple packets while still in SYN-RCVD state, e.g., | |||
| CE-marked. Then ACE on the first ACK seen by the Server might be | using TFO (see Section 5.2), and some might have been CE- | |||
| zero, due to previous ACKs experiencing an unfortunate pattern of | marked. Then ACE on the first ACK seen by the Server | |||
| loss or delay. | might be zero, due to previous ACKs experiencing an | |||
| unfortunate pattern of loss or delay. | ||||
| * Some form of non-compliance at the TCP Client or on the path (see | * There is some form of non-compliance at the TCP Client or | |||
| Section 3.2.2.4). | on the path (see Section 3.2.2.4). | |||
| {Note 2}: If the Server is in AccECN mode, these values are Currently | Note 2: If the Server is in AccECN mode, these values are Currently | |||
| Unused but the AccECN Server's behaviour is still defined for forward | Unused but the AccECN Server's behaviour is still defined | |||
| compatibility. Then the designer of a future protocol can know for | for forward compatibility. Then the designer of a future | |||
| certain what AccECN Servers will do with these codepoints. | protocol can know for certain what AccECN Servers will do | |||
| with these codepoints. | ||||
| {Note 3}: In the case where a Server that implements AccECN is also | Note 3: In the case where a Server that implements AccECN is also | |||
| using a stateless handshake (termed a SYN cookie) it will not | using a stateless handshake (termed a SYN cookie), it will | |||
| remember whether it entered AccECN mode. The values 0b000 or 0b001 | not remember whether it entered AccECN mode. The values | |||
| will remind it that it did not enter AccECN mode, because AccECN does | 0b000 or 0b001 will remind it that it did not enter AccECN | |||
| not use them (see Section 5.1 for details). If a Server that uses a | mode, because AccECN does not use them (see Section 5.1 for | |||
| stateless handshake and implements AccECN receives either of these | details). If a Server that uses a stateless handshake and | |||
| two values in the ACK, its action is implementation-dependent and | implements AccECN receives either of these two values in the | |||
| outside the scope of this document. It will certainly not take the | ACK, its action is implementation-dependent and outside the | |||
| action in the third column because, after it receives either of these | scope of this document. It will certainly not take the | |||
| values, it is not in AccECN mode. In example, it will not disable | action in the third column because, after it receives either | |||
| ECN (at least not just because ACE is 0b000) and it will not set | of these values, it is not in AccECN mode. For example, it | |||
| s.cep. | will not disable ECN (at least not just because ACE is | |||
| 0b000) and it will not set s.cep. | ||||
| 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
| Whenever the Data Receiver sends an ACK with SYN=0 (with or without | Whenever the Data Receiver sends an ACK with SYN=0 (with or without | |||
| data), unless the handshake encoding in Section 3.2.2.1 applies, the | data), unless the handshake encoding in Section 3.2.2.1 applies, the | |||
| Data Receiver MUST encode the least significant 3 bits of its r.cep | Data Receiver MUST encode the least significant 3 bits of its r.cep | |||
| counter into the ACE field (see Appendix A.2). | counter into the ACE field (see Appendix A.2). | |||
| Whenever the Data Sender receives an ACK with SYN=0 (with or without | Whenever the Data Sender receives an ACK with SYN=0 (with or without | |||
| data), it first checks whether it has already been superseded | data), it first checks whether it has already been superseded | |||
| (defined in Appendix A.1) by another ACK in which case it ignores the | (defined in Appendix A.1) by another ACK in which case it ignores the | |||
| ECN feedback. If the ACK has not been superseded, and if the special | ECN feedback. If the ACK has not been superseded, and if the special | |||
| handshake encoding in Section 3.2.2.1 does not apply, the Data Sender | handshake encoding in Section 3.2.2.1 does not apply, the Data Sender | |||
| decodes the ACE field as follows (see Appendix A.2 for examples). | decodes the ACE field as follows (see Appendix A.2 for examples). | |||
| * It takes the least significant 3 bits of its local s.cep counter | * It takes the least significant 3 bits of its local s.cep counter | |||
| and subtracts them from the incoming ACE counter to work out the | and subtracts them from the incoming ACE counter to work out the | |||
| minimum positive increment it could apply to s.cep (assuming the | minimum positive increment it could apply to s.cep (assuming the | |||
| ACE field only wrapped at most once). | ACE field only wrapped once at most). | |||
| * It then follows the safety procedures in Section 3.2.2.5.2 to | * It then follows the safety procedures in Section 3.2.2.5.2 to | |||
| calculate or estimate how many packets the ACK could have | calculate or estimate how many packets the ACK could have | |||
| acknowledged under the prevailing conditions to determine whether | acknowledged under the prevailing conditions to determine whether | |||
| the ACE field might have wrapped more than once. | the ACE field might have wrapped more than once. | |||
| The encode/decode procedures during the three-way handshake are | The encode/decode procedures during the three-way handshake are | |||
| exceptions to the general rules given so far, so they are spelled out | exceptions to the general rules given so far, so they are spelled out | |||
| step by step below for clarity: | step by step below for clarity: | |||
| skipping to change at page 30, line 19 ¶ | skipping to change at line 1368 ¶ | |||
| Reason: It would be redundant for the Server to include CE-marked | Reason: It would be redundant for the Server to include CE-marked | |||
| SYNs in its r.cep counter, because it already reliably delivers | SYNs in its r.cep counter, because it already reliably delivers | |||
| feedback of any CE marking using the encoding in the top block of | feedback of any CE marking using the encoding in the top block of | |||
| Table 2 in the SYN/ACK. This also ensures that, when the Server | Table 2 in the SYN/ACK. This also ensures that, when the Server | |||
| starts using the ACE field, it has not unnecessarily consumed more | starts using the ACE field, it has not unnecessarily consumed more | |||
| than one initial value, given they can be used to negotiate | than one initial value, given they can be used to negotiate | |||
| variants of the AccECN protocol (see Appendix B.3). | variants of the AccECN protocol (see Appendix B.3). | |||
| * If a TCP Client in AccECN mode receives CE feedback in the TCP | * If a TCP Client in AccECN mode receives CE feedback in the TCP | |||
| flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | |||
| initial value of 5), so that it stays in step with r.cep on the | initial value of 5) so that it stays in step with r.cep on the | |||
| Server. Nonetheless, the TCP Client still triggers the congestion | Server. Nonetheless, the TCP Client still triggers the congestion | |||
| control actions necessary to respond to the CE feedback. | control actions necessary to respond to the CE feedback. | |||
| * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | |||
| field of a SYN/ACK, it MUST increment r.cep, but no more than once | field of a SYN/ACK, it MUST increment r.cep, but no more than once | |||
| no matter how many CE-marked SYN/ACKs it receives | no matter how many CE-marked SYN/ACKs it receives (i.e., | |||
| (i.e., incremented from 5 to 6, but no further). | incremented from 5 to 6, but no further). | |||
| Reason: Incrementing r.cep ensures the Client will eventually | Reason: Incrementing r.cep ensures the Client will eventually | |||
| deliver any CE marking to the Server reliably when it starts using | deliver any CE marking to the Server reliably when it starts using | |||
| the ACE field. Even though the Client also feeds back any CE | the ACE field. Even though the Client also feeds back any CE | |||
| marking on the ACK of the SYN/ACK using the encoding in Table 3, | marking on the ACK of the SYN/ACK using the encoding in Table 3, | |||
| this ACK is not delivered reliably, so it can be considered as a | this ACK is not delivered reliably, so it can be considered as a | |||
| timely notification that is redundant but unreliable. The Client | timely notification that is redundant but unreliable. The Client | |||
| does not increment r.cep more than once, because the Server can | does not increment r.cep more than once, because the Server can | |||
| only increment s.cep once (see next bullet). Also, this limits | only increment s.cep once (see next bullet). Also, this limits | |||
| the unnecessarily consumed initial values of the ACE field to two. | the unnecessarily consumed initial values of the ACE field to two. | |||
| * If a TCP Server in AccECN mode and in SYN-RCVD state receives CE | * If a TCP Server in AccECN mode and in SYN-RCVD state receives CE | |||
| feedback in the TCP flags of a pure ACK with no SACK blocks, it | feedback in the TCP flags of a pure ACK with no SACK blocks, it | |||
| MUST increment s.cep (from 5 to 6). The TCP Server then triggers | MUST increment s.cep (from 5 to 6). The TCP Server then triggers | |||
| the congestion control actions necessary to respond to the CE | the congestion control actions necessary to respond to the CE | |||
| feedback. | feedback. | |||
| Reasoning: The TCP Server can only increment s.cep once, because | Reasoning: The TCP Server can only increment s.cep once, because | |||
| the first ACK it receives will cause it to transition out of SYN- | the first ACK it receives will cause it to transition out of SYN- | |||
| RCVD state. The Server's congestion response would be no | RCVD state. The Server's congestion response would be no | |||
| different even if it could receive feedback of more than one CE- | different, even if it could receive feedback of more than one CE- | |||
| marked SYN/ACK. | marked SYN/ACK. | |||
| Once the TCP Server transitions to ESTABLISHED state, it might | Once the TCP Server transitions to ESTABLISHED state, it might | |||
| later receive other pure ACK(s) with the handshake encoding in the | later receive other pure ACK(s) with the handshake encoding in the | |||
| ACE field. A Server MAY implement a test for such a case, but it | ACE field. A Server MAY implement a test for such a case, but it | |||
| is not required. Therefore, once in the ESTABLISHED state, it | is not required. Therefore, once in the ESTABLISHED state, it | |||
| will be sufficient for the Server to consider the ACE field to be | will be sufficient for the Server to consider the ACE field to be | |||
| encoded as the normal ACE counter on all packets with SYN=0. | encoded as the normal ACE counter on all packets with SYN=0. | |||
| Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or | Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or | |||
| skipping to change at page 31, line 46 ¶ | skipping to change at line 1444 ¶ | |||
| comparison implies an invalid transition of the IP-ECN field, for | comparison implies an invalid transition of the IP-ECN field, for | |||
| the remainder of the half-connection the Server is advised to send | the remainder of the half-connection the Server is advised to send | |||
| non-ECN-capable packets, but it still ought to respond to any | non-ECN-capable packets, but it still ought to respond to any | |||
| feedback of CE markings (explained below). However, the Server | feedback of CE markings (explained below). However, the Server | |||
| MUST remain in the AccECN feedback mode and it MUST continue to | MUST remain in the AccECN feedback mode and it MUST continue to | |||
| feed back any ECN markings on arriving packets (in its role as | feed back any ECN markings on arriving packets (in its role as | |||
| Data Receiver). | Data Receiver). | |||
| If a Data Sender in AccECN mode starts sending non-ECN-capable | If a Data Sender in AccECN mode starts sending non-ECN-capable | |||
| packets because it has detected mangling, it is still advised to | packets because it has detected mangling, it is still advised to | |||
| respond to CE feedback. Reason: any CE-marking arriving at the Data | respond to CE feedback. Reason: Any CE marking arriving at the Data | |||
| Receiver could be due to something early in the path mangling the | Receiver could be due to something early in the path mangling the | |||
| non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | |||
| later in the path, a network bottleneck might be applying CE-markings | later in the path, a network bottleneck might be applying CE markings | |||
| to indicate genuine congestion. This argument applies whether the | to indicate genuine congestion. This argument applies whether the | |||
| handshake packet originally sent by the TCP Client or Server was non- | handshake packet originally sent by the TCP Client or Server was non- | |||
| ECN-capable or ECN-capable because, in either case, an unsafe | ECN-capable or ECN-capable because, in either case, an unsafe | |||
| transition could imply that non-ECN-capable packets later in the | transition could imply that non-ECN-capable packets later in the | |||
| connection might get mangled. | connection might get mangled. | |||
| Once a Data Sender has entered AccECN mode it is advised to check | Once a Data Sender has entered AccECN mode it is advised to check | |||
| whether it is receiving continuous feedback of CE. Specifying | whether it is receiving continuous feedback of CE. Specifying | |||
| exactly how to do this is beyond the scope of the present | exactly how to do this is beyond the scope of the present | |||
| specification, but the sender might check whether the feedback for | specification, but the sender might check whether the feedback for | |||
| every packet it sends for the first three or four rounds indicates | every packet it sends for the first three or four rounds indicates CE | |||
| CE-marking. If continuous CE-marking is detected, for the remainder | marking. If continuous CE marking is detected, for the remainder of | |||
| of the half-connection, the Data Sender ought to send non-ECN-capable | the half-connection, the Data Sender ought to send non-ECN-capable | |||
| packets and it is advised not to respond to any feedback of CE | packets, and it is advised not to respond to any feedback of CE | |||
| markings. The Data Sender might occasionally test whether it can | markings. The Data Sender might occasionally test whether it can | |||
| resume sending ECN-capable packets. | resume sending ECN-capable packets. | |||
| The above advice on switching to sending non-ECN-capable packets but | The above advice on switching to sending non-ECN-capable packets but | |||
| still responding to CE-markings unless they become continuous is not | still responding to CE markings unless they become continuous is not | |||
| stated normatively (in capitals), because the best strategy might | stated normatively (in capitals), because the best strategy might | |||
| depend on experience of the most likely types of mangling, which can | depend on experience of the most likely types of mangling, which can | |||
| only be known at the time of deployment. The same is true for other | only be known at the time of deployment. The same is true for other | |||
| forms of mangling (or resumption of expected marking) during later | forms of mangling (or resumption of expected marking) during later | |||
| stages of a connection. | stages of a connection. | |||
| As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
| general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
| feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
| arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
| skipping to change at page 32, line 42 ¶ | skipping to change at line 1488 ¶ | |||
| whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
| The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
| count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
| ACK does not arrive, the Server is advised to continue to send ECN- | ACK does not arrive, the Server is advised to continue to send ECN- | |||
| capable packets without having tested for mangling of the IP-ECN | capable packets without having tested for mangling of the IP-ECN | |||
| field on the SYN/ACK. | field on the SYN/ACK. | |||
| All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
| mangling of the IP-ECN field is asymmetric, which is currently common | mangling of the IP-ECN field is asymmetric, which is currently common | |||
| over some mobile networks [Mandalari18]. Then one end might see no | over some mobile networks [Mandalari18]. In this case, one end might | |||
| unsafe transition and continue sending ECN-capable packets, while the | see no unsafe transition and continue sending ECN-capable packets, | |||
| other end sees an unsafe transition and stops sending ECN-capable | while the other end sees an unsafe transition and stops sending ECN- | |||
| packets. | capable packets. | |||
| Invalid transitions of the IP-ECN field are defined in section 18 of | Invalid transitions of the IP-ECN field are defined in Section 18 of | |||
| the Classic ECN specification [RFC3168] and repeated here for | the Classic ECN specification [RFC3168] and repeated here for | |||
| convenience: | convenience: | |||
| * the not-ECT codepoint changes; | * the Not-ECT codepoint changes; | |||
| * either ECT codepoint transitions to not-ECT; | ||||
| * either ECT codepoint transitions to Not-ECT; | ||||
| * the CE codepoint changes. | * the CE codepoint changes. | |||
| RFC 3168 says that a router that changes ECT to not-ECT is invalid | RFC 3168 says that a router that changes ECT to Not-ECT is invalid | |||
| but safe. However, from a host's viewpoint, this transition is | but safe. However, from a host's viewpoint, this transition is | |||
| unsafe because it could be the result of two transitions at different | unsafe because it could be the result of two transitions at different | |||
| routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe). | routers on the path: ECT to CE (safe) then CE to Not-ECT (unsafe). | |||
| This scenario could well happen where an ECN-enabled home router | This scenario could well happen where an ECN-enabled home router | |||
| congests its upstream mobile broadband bottleneck link, then the | congests its upstream mobile broadband bottleneck link, then the | |||
| ingress to the mobile network clears the ECN field [Mandalari18]. | ingress to the mobile network clears the ECN field [Mandalari18]. | |||
| 3.2.2.4. Testing for Zeroing of the ACE Field | 3.2.2.4. Testing for Zeroing of the ACE Field | |||
| Section 3.2.2 required the Data Receiver to initialize the r.cep | Section 3.2.2 required the Data Receiver to initialize the r.cep | |||
| counter to a non-zero value. Therefore, in either direction the | counter to a non-zero value. Therefore, in either direction the | |||
| initial value of the ACE counter ought to be non-zero. | initial value of the ACE counter ought to be non-zero. | |||
| skipping to change at page 34, line 13 ¶ | skipping to change at line 1554 ¶ | |||
| the other half connection. | the other half connection. | |||
| If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
| necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
| test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
| and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
| the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
| the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
| it might have been a retransmission. | it might have been a retransmission. | |||
| The possibility of re-ordering means that there is a small chance | The possibility of reordering means that there is a small chance that | |||
| that the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
| (without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
| unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half connection. Therefore, in | |||
| environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
| zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
| Note that the Data Sender MUST NOT test whether the arriving counter | Note that the Data Sender MUST NOT test whether the arriving counter | |||
| in the initial ACE field has been initialized to a specific valid | in the initial ACE field has been initialized to a specific valid | |||
| value - the above check solely tests whether the ACE fields have been | value -- the above check solely tests whether the ACE fields have | |||
| incorrectly zeroed. This allows hosts to use different initial | been incorrectly zeroed. This allows hosts to use different initial | |||
| values as an additional signalling channel in future. | values as an additional signalling channel in the future. | |||
| 3.2.2.5. Safety against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
| If too many CE-marked segments are acknowledged at once, or if a long | If too many CE-marked segments are acknowledged at once, or if a long | |||
| run of ACKs is lost or thinned out, the 3-bit counter in the ACE | run of ACKs is lost or thinned out, the 3-bit counter in the ACE | |||
| field might have cycled between two ACKs arriving at the Data Sender. | field might have cycled between two ACKs arriving at the Data Sender. | |||
| The following safety procedures minimize this ambiguity. | The following safety procedures minimize this ambiguity. | |||
| 3.2.2.5.1. Packet Receiver Safety Procedures | 3.2.2.5.1. Packet Receiver Safety Procedures | |||
| The following rules define when the receiver of a packet in AccECN | The following rules define when the receiver of a packet in AccECN | |||
| mode emits an ACK: | mode emits an ACK: | |||
| Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | |||
| whenever a data packet marked CE arrives after the previous packet | whenever a data packet marked CE arrives after the previous packet | |||
| was not CE. | was not CE. | |||
| Even though this rule is stated as a "SHOULD", it is important for | Even though this rule is stated as a "SHOULD", it is important for | |||
| a transition to trigger an ACK if at all possible, The only valid | a transition to trigger an ACK if at all possible. The only valid | |||
| exception to this rule is given below these bullets. | exception to this rule is given below these bullets. | |||
| For the avoidance of doubt, this rule is deliberately worded to | For the avoidance of doubt, this rule is deliberately worded to | |||
| apply solely when _data_ packets arrive, but the comparison with | apply solely when _data_ packets arrive, but the comparison with | |||
| the previous packet includes any packet, not just data packets. | the previous packet includes any packet, not just data packets. | |||
| Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | |||
| an ACK if 'n' CE marks have arrived since the previous ACK. If | an ACK if 'n' CE marks have arrived since the previous ACK. If | |||
| there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | |||
| there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | |||
| and MUST be no less than 3. In either case, 'n' MUST be no | and MUST be no less than 3. In either case, 'n' MUST be no | |||
| greater than 7. | greater than 7. | |||
| The above rules for when to send an ACK are designed to be | The above rules for when to send an ACK are designed to be | |||
| complemented by those in Section 3.2.3.3, which concern whether an | complemented by those in Section 3.2.3.3, which concern whether an | |||
| AccECN TCP Option ought to be included on ACKs. | AccECN TCP Option ought to be included on ACKs. | |||
| If the arrivals of a number of data packets are all processed as one | If the arrivals of a number of data packets are all processed as one | |||
| event, e.g., using large receive offload (LRO) or generic receive | event, e.g., using large receive offload (LRO) or generic receive | |||
| offload (GRO), both the above rules SHOULD be interpreted as | offload (GRO), both the above rules SHOULD be interpreted as | |||
| requiring multiple ACKs to be emitted back-to-back (for each | requiring multiple ACKs to be emitted back to back (for each | |||
| transition and for each sequence of 'n' CE marks). If this is | transition and for each sequence of 'n' CE marks). If this is | |||
| problematic for high performance, either rule can be interpreted as | problematic for high performance, either rule can be interpreted as | |||
| requiring just a single ACK at the end of the whole receive event. | requiring just a single ACK at the end of the whole receive event. | |||
| Even if a number of data packets do not arrive as one event, the | Even if a number of data packets do not arrive as one event, the | |||
| 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | |||
| problematic for high performance (although high performance protocols | problematic for high performance (although high performance protocols | |||
| such as DCTCP already successfully use change-triggered ACKs). The | such as DCTCP already successfully use change-triggered ACKs). The | |||
| rationale for change-triggered ACKs is so that the Data Sender can | rationale for change-triggered ACKs is so that the Data Sender can | |||
| rely on them to detect queue growth as soon as possible, particularly | rely on them to detect queue growth as soon as possible, particularly | |||
| at the start of a flow. The approach can lead to some additional | at the start of a flow. The approach can lead to some additional | |||
| ACKs but it feeds back the timing and the order in which ECN marks | ACKs but it feeds back the timing and the order in which ECN marks | |||
| are received with minimal additional complexity. If CE marks are | are received with minimal additional complexity. If CE marks are | |||
| infrequent, as is the case for most Active Queue Managment (AQM) | infrequent, as is the case for most Active Queue Management (AQM) | |||
| packet schedulers at the time of writing, or there are multiple marks | packet schedulers at the time of writing, or there are multiple marks | |||
| in a row, the additional load will be low. However, marking patterns | in a row, the additional load will be low. However, marking patterns | |||
| with numerous non-contiguous CE marks could increase the load | with numerous non-contiguous CE marks could increase the load | |||
| significantly. One possible compromise would be for the receiver to | significantly. One possible compromise would be for the receiver to | |||
| heuristically detect whether the sender is in slow-start, then to | heuristically detect whether the sender is in slow-start, then to | |||
| implement change-triggered ACKs while the sender is in slow-start, | implement change-triggered ACKs while the sender is in slow-start, | |||
| and offload otherwise. | and offload otherwise. | |||
| In a scenario where both endpoints support AccECN, if host B has | In a scenario where both endpoints support AccECN, if host B has | |||
| chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | |||
| experiments) and enough of these ACKs become CE-marked, then the | experiments) and enough of these ACKs become CE marked, then the | |||
| 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | |||
| B sufficient feedback about this congestion on the ACKs from B to A. | B sufficient feedback about this congestion on the ACKs from B to A. | |||
| Normally, for instance in a unidirectional data scenario from host A | Normally, for instance in a unidirectional data scenario from host A | |||
| to B, the Data Sender (A) can piggyback that feedback on its data. | to B, the Data Sender (A) can piggyback that feedback on its data. | |||
| But if A stops sending data, the second part of the 'Increment- | But if A stops sending data, the second part of the 'Increment- | |||
| Triggered ACKs' rule requires A to emit a pure ACK for at least every | Triggered ACKs' rule requires A to emit a pure ACK for at least every | |||
| third CE-marked incoming ACK over the subsequent round trip. | third CE-marked incoming ACK over the subsequent round trip. | |||
| Although TCP normally only ACKs data segments, in this case the | Although TCP normally only ACKs data segments, in this case the | |||
| increment-triggered ACK rule makes it mandatory for A to emit ACKs of | increment-triggered ACK rule makes it mandatory for A to emit ACKs of | |||
| skipping to change at page 36, line 21 ¶ | skipping to change at line 1655 ¶ | |||
| even if A also uses ECN-capable pure ACKs, and even if there is | even if A also uses ECN-capable pure ACKs, and even if there is | |||
| pathological congestion in both directions, any resulting ping-pong | pathological congestion in both directions, any resulting ping-pong | |||
| of ACKs will be rapidly damped. | of ACKs will be rapidly damped. | |||
| In the above bidirectional scenario, incoming ACKs of ACKs could be | In the above bidirectional scenario, incoming ACKs of ACKs could be | |||
| mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished | mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished | |||
| from duplicate ACKs because they do not contain any SACK blocks even | from duplicate ACKs because they do not contain any SACK blocks even | |||
| when SACK has been negotiated. It is outside the scope of this | when SACK has been negotiated. It is outside the scope of this | |||
| AccECN specification to normatively specify this additional test for | AccECN specification to normatively specify this additional test for | |||
| DupACKs, because ACKs of ACKs can only arise if the original ACKs are | DupACKs, because ACKs of ACKs can only arise if the original ACKs are | |||
| ECN-capable. Instead any specification that allows ECN-capable pure | ECN-capable. Instead, any specification that allows ECN-capable pure | |||
| ACKs MUST make sending ACKs of ACKs conditional on measures to | ACKs MUST make sending ACKs of ACKs conditional on measures to | |||
| distinguish ACKs of ACKs from DupACKs (see for example | distinguish ACKs of ACKs from DupACKs (see for example [ECN++]). All | |||
| [I-D.ietf-tcpm-generalized-ecn]). All that is necessary here is to | that is necessary here is to require that these ACKs of ACKs MUST NOT | |||
| require that these ACKs of ACKs MUST NOT contain any SACK blocks | contain any SACK blocks (which would normally not happen anyway). | |||
| (which would normally not happen anyway). | ||||
| 3.2.2.5.2. Data Sender Safety Procedures | 3.2.2.5.2. Data Sender Safety Procedures | |||
| If the Data Sender has not received AccECN TCP Options to give it | If the Data Sender has not received AccECN TCP Options to give it | |||
| more dependable information, and it detects that the ACE field could | more dependable information, and it detects that the ACE field could | |||
| have cycled, it SHOULD deem whether it cycled by taking the safest | have cycled, it SHOULD deem whether it cycled by taking the safest | |||
| likely case under the prevailing conditions. It can detect if the | likely case under the prevailing conditions. It can detect if the | |||
| counter could have cycled by using the jump in the acknowledgement | counter could have cycled by using the jump in the acknowledgement | |||
| number since the last ACK to calculate or estimate how many segments | number since the last ACK to calculate or estimate how many segments | |||
| could have been acknowledged. An example algorithm to implement this | could have been acknowledged. An example algorithm to implement this | |||
| skipping to change at page 37, line 33 ¶ | skipping to change at line 1715 ¶ | |||
| | Kind = 174 | Length = 11 | EE1B field | | | Kind = 174 | Length = 11 | EE1B field | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | EE1B (cont'd) | ECEB field | | | EE1B (cont'd) | ECEB field | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | EE0B field | Order 1 | | EE0B field | Order 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 4: The Two Alternative AccECN TCP Options | Figure 4: The Two Alternative AccECN TCP Options | |||
| Figure 4 shows two option field orders; order 0 and order 1. They | Figure 4 shows two option field orders; order 0 and order 1. They | |||
| both consists of three 24-bit fields. Order 0 provides the 24 least | both consist of three 24-bit fields. Order 0 provides the 24 least | |||
| significant bits of the r.e0b, r.ceb and r.e1b counters, | significant bits of the r.e0b, r.ceb, and r.e1b counters, | |||
| respectively. Order 1 provides the same fields, but in the opposite | respectively. Order 1 provides the same fields, but in the opposite | |||
| order. On each packet, the Data Receiver can use whichever order is | order. On each packet, the Data Receiver can use whichever order is | |||
| more efficient. In either case, the bytes within the fields are in | more efficient. In either case, the bytes within the fields are in | |||
| network byte order (big-endian). | network byte order (big-endian). | |||
| The choice to use three bytes (24 bits) fields in the options was | The choice to use three bytes (24 bits) fields in the options was | |||
| made to strike a balance between TCP option space usage, and the | made to strike a balance between TCP option space usage, and the | |||
| required fidelity of the counters to accomodate typical scenarios | required fidelity of the counters to accommodate typical scenarios | |||
| such as hardware TCP segmentation offloading (TSO), and periods where | such as hardware TCP Segmentation Offloading (TSO), and periods | |||
| no option may be transmitted (e.g., SACK loss recovery). Providing | during which no option may be transmitted (e.g., SACK loss recovery). | |||
| only 2 bytes (16 bits) for these counters could easily roll over | Providing only 2 bytes (16 bits) for these counters could easily roll | |||
| within a single TSO transmission or large/generic receive offload | over within a single TSO transmission or large/generic receive | |||
| (LRO/GRO) event. Having two distinct orderings further allows the | offload (LRO/GRO) event. Having two distinct orderings further | |||
| transmission of the most pertinent changes in an abbreviated option | allows the transmission of the most pertinent changes in an | |||
| (see below). | abbreviated option (see below). | |||
| When a Data Receiver sends an AccECN Option, it MUST set the Kind | When a Data Receiver sends an AccECN Option, it MUST set the Kind | |||
| field to 172 if using Order 0, or to 174 if using Order 1. These two | field to 172 if using Order 0, or to 174 if using Order 1. These two | |||
| new TCP Option Kinds are registered in Section 7 and called | new TCP Option Kinds are registered in Section 7 and are called | |||
| respectively AccECN0 and AccECN1. | AccECN0 and AccECN1, respectively. | |||
| Note that there is no field to feed back Not-ECT bytes. Nonetheless | Note that there is no field to feed back Not-ECT bytes. Nonetheless, | |||
| an algorithm for the Data Sender to calculate the number of payload | an algorithm for the Data Sender to calculate the number of payload | |||
| bytes received as Not-ECT is given in Appendix A.4. | bytes received as Not-ECT is given in Appendix A.4. | |||
| Whenever a Data Receiver sends an AccECN Option, the rules in | Whenever a Data Receiver sends an AccECN Option, the rules in | |||
| Section 3.2.3.3 allow it to omit unchanged fields from the tail of | Section 3.2.3.3 allow it to omit unchanged fields from the tail of | |||
| the option, to help cope with option space limitations, as long as it | the option, to help cope with option space limitations, as long as it | |||
| preserves the order of the remaining fields and includes any field | preserves the order of the remaining fields and includes any field | |||
| that has changed. The length field MUST indicate which fields are | that has changed. The length field MUST indicate which fields are | |||
| present as follows: | present as follows: | |||
| skipping to change at page 38, line 48 ¶ | skipping to change at line 1776 ¶ | |||
| but there is very limited space for the option. | but there is very limited space for the option. | |||
| All implementations of a Data Sender that read any AccECN Option MUST | All implementations of a Data Sender that read any AccECN Option MUST | |||
| be able to read AccECN Options of any of the above lengths. For | be able to read AccECN Options of any of the above lengths. For | |||
| forward compatibility, if the AccECN Option is of any other length, | forward compatibility, if the AccECN Option is of any other length, | |||
| implementations MUST use those whole 3-octet fields that fit within | implementations MUST use those whole 3-octet fields that fit within | |||
| the length and ignore the remainder of the option, treating it as | the length and ignore the remainder of the option, treating it as | |||
| padding. | padding. | |||
| AccECN Options have to be optional to implement, because both sender | AccECN Options have to be optional to implement, because both sender | |||
| and receiver have to be able to cope without options anyway - in | and receiver have to be able to cope without options anyway -- in | |||
| cases where they do not traverse a network path. It is RECOMMENDED | cases where they do not traverse a network path. It is RECOMMENDED | |||
| to implement both sending and receiving of AccECN Options. Support | to implement both sending and receiving of AccECN Options. Support | |||
| for AccECN Options is particularly valuable over paths that introduce | for AccECN Options is particularly valuable over paths that introduce | |||
| a high degree of ACK filtering, where the 3-bit ACE counter alone | a high degree of ACK filtering, where the 3-bit ACE counter alone | |||
| might sometimes be insufficient, when it is ambiguous whether it has | might sometimes be insufficient, when it is ambiguous whether it has | |||
| wrapped. If sending of AccECN Options is implemented, the fall-backs | wrapped. If sending of AccECN Options is implemented, the fall-backs | |||
| described in this document will need to be implemented as well | described in this document will need to be implemented as well | |||
| (unless solely for a controlled environment where path traversal is | (unless solely for a controlled environment where path traversal is | |||
| not considered a problem). Even if a developer does not implement | not considered a problem). Even if a developer does not implement | |||
| logic to understand received AccECN Options, it is RECOMMENDED that | logic to understand received AccECN Options, it is RECOMMENDED that | |||
| they implement logic to send AccECN Options. Otherwise, those remote | they implement logic to send AccECN Options. Otherwise, those remote | |||
| peers that implement the receiving logic will still be excluded from | peers that implement the receiving logic will still be excluded from | |||
| congestion feedback that is robust against the increasingly | congestion feedback that is robust against the increasingly | |||
| aggressive ACK filtering in the Internet. The logic to send AccECN | aggressive ACK filtering in the Internet. The logic to send AccECN | |||
| Options is the simpler to implement of the two sides. | Options is the simpler to implement of the two sides. | |||
| If a Data Receiver intends to send an AccECN Option at any time | If a Data Receiver intends to send an AccECN Option at any time | |||
| during the rest of the connection it is RECOMMENDED to also test path | during the rest of the connection, it is RECOMMENDED to also test | |||
| traversal of the AccECN Option as specified in Section 3.2.3.2. | path traversal of the AccECN Option as specified in Section 3.2.3.2. | |||
| 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | |||
| Whenever the Data Receiver includes any of the counter fields (ECEB, | Whenever the Data Receiver includes any of the counter fields (ECEB, | |||
| EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | |||
| significant bits of the current value of the associated counter into | significant bits of the current value of the associated counter into | |||
| the field (respectively r.ceb, r.e0b, r.e1b). | the field (respectively r.ceb, r.e0b, r.e1b). | |||
| Whenever the Data Sender receives an ACK carrying an AccECN Option, | Whenever the Data Sender receives an ACK carrying an AccECN Option, | |||
| it first checks whether the ACK has already been superseded by | it first checks whether the ACK has already been superseded by | |||
| another ACK in which case it ignores the ECN feedback. If the ACK | another ACK in which case it ignores the ECN feedback. If the ACK | |||
| has not been superseded, the Data Sender normally decodes the fields | has not been superseded, the Data Sender normally decodes the fields | |||
| in the AccECN Option as follows. For each field, it takes the least | in the AccECN Option as follows. For each field, it takes the least | |||
| significant 24 bits of its associated local counter (s.ceb, s.e0b or | significant 24 bits of its associated local counter (s.ceb, s.e0b, or | |||
| s.e1b) and subtracts them from the counter in the associated field of | s.e1b) and subtracts them from the counter in the associated field of | |||
| the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work | the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work | |||
| out the minimum positive increment it could apply to s.ceb, s.e0b or | out the minimum positive increment it could apply to s.ceb, s.e0b, or | |||
| s.e1b (assuming the field in the option only wrapped at most once). | s.e1b (assuming the field in the option only wrapped once at most). | |||
| Appendix A.1 gives an example algorithm for the Data Receiver to | Appendix A.1 gives an example algorithm for the Data Receiver to | |||
| encode its byte counters into an AccECN Option, and for the Data | encode its byte counters into an AccECN Option, and for the Data | |||
| Sender to decode the AccECN Option fields into its byte counters. | Sender to decode the AccECN Option fields into its byte counters. | |||
| Note that, as specified in Section 3.2, any data on the SYN (SYN=1, | Note that, as specified in Section 3.2, any data on the SYN (SYN=1, | |||
| ACK=0) is not included in any of the byte counters held locally for | ACK=0) is not included in any of the byte counters held locally for | |||
| each ECN marking nor in an AccECN Option on the wire. | each ECN marking nor in an AccECN Option on the wire. | |||
| 3.2.3.2. Path Traversal of the AccECN Option | 3.2.3.2. Path Traversal of the AccECN Option | |||
| 3.2.3.2.1. Testing the AccECN Option during the Handshake | ||||
| 3.2.3.2.1. Testing the AccECN Option During the Handshake | ||||
| The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If | The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If | |||
| there is somehow an AccECN Option on a SYN, it MUST be ignored when | there is somehow an AccECN Option on a SYN, it MUST be ignored when | |||
| forwarded or received. | forwarded or received. | |||
| A TCP Server that confirms its support for AccECN (in response to an | A TCP Server that confirms its support for AccECN (in response to an | |||
| AccECN SYN from the Client as described in Section 3.1) SHOULD | AccECN SYN from the Client as described in Section 3.1) SHOULD | |||
| include an AccECN TCP Option on the SYN/ACK. | include an AccECN TCP Option on the SYN/ACK. | |||
| A TCP Client that has successfully negotiated AccECN SHOULD include | A TCP Client that has successfully negotiated AccECN SHOULD include | |||
| an AccECN Option in the first ACK at the end of the three-way | an AccECN Option in the first ACK at the end of the three-way | |||
| handshake. However, this first ACK is not delivered reliably, so the | handshake. However, this first ACK is not delivered reliably, so the | |||
| TCP Client SHOULD also include an AccECN Option on the first data | TCP Client SHOULD also include an AccECN Option on the first data | |||
| segment it sends (if it ever sends one). | segment it sends (if it ever sends one). | |||
| A host MAY omit an AccECN Option in any of the above three cases due | A host MAY omit an AccECN Option in any of the above three cases | |||
| to insufficient option space or if it has cached knowledge that the | because of insufficient option space or because it has cached | |||
| packet would be likely to be blocked on the path to the other host if | knowledge that the packet would be likely to be blocked on the path | |||
| it included an AccECN Option. | to the other host if it included an AccECN Option. | |||
| 3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option | 3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option | |||
| If the TCP Server has not received an ACK to acknowledge its SYN/ACK | If the TCP Server has not received an ACK to acknowledge its SYN/ACK | |||
| after the normal TCP timeout or it receives a second SYN with a | after the normal TCP timeout or if it receives a second SYN with a | |||
| request for AccECN support, then either the SYN/ACK might just have | request for AccECN support, then either the SYN/ACK might just have | |||
| been lost, e.g., due to congestion, or a middlebox might be blocking | been lost, e.g., due to congestion, or a middlebox might be blocking | |||
| AccECN Options. To expedite connection setup in deployment scenarios | AccECN Options. To expedite connection setup in deployment scenarios | |||
| where AccECN path traversal might be problematic, the TCP Server | where AccECN path traversal might be problematic, the TCP Server | |||
| SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | |||
| retransmission times out, to expedite connection setup, the TCP | retransmission times out, to expedite connection setup, the TCP | |||
| Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | |||
| no AccECN Option, but it remains in AccECN feedback mode (per | no AccECN Option, but it remains in AccECN feedback mode (per | |||
| Section 3.1.5). | Section 3.1.5). | |||
| skipping to change at page 41, line 7 ¶ | skipping to change at line 1875 ¶ | |||
| The above fall-back approach limits any interference by middleboxes | The above fall-back approach limits any interference by middleboxes | |||
| that might drop packets with unknown options, even though it is more | that might drop packets with unknown options, even though it is more | |||
| likely that SYN/ACK loss is due to congestion. The TCP Server MAY | likely that SYN/ACK loss is due to congestion. The TCP Server MAY | |||
| try to send another packet with an AccECN Option at a later point | try to send another packet with an AccECN Option at a later point | |||
| during the connection but it ought to monitor if that packet got lost | during the connection but it ought to monitor if that packet got lost | |||
| as well, in which case it SHOULD disable the sending of AccECN | as well, in which case it SHOULD disable the sending of AccECN | |||
| Options for this half-connection. | Options for this half-connection. | |||
| Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
| be more effective (e.g., retrying an AccECN Option for a second time | be more effective (e.g., retrying an AccECN Option for a second time | |||
| before fall-back - most appropriate during high levels of | before fall-back -- most appropriate during high levels of | |||
| congestion). However, other fall-back strategies will need to follow | congestion). However, other fall-back strategies will need to follow | |||
| all the rules in Section 3.1.5, which concern behaviour when SYNs or | all the rules in Section 3.1.5, which concern behaviour when SYNs or | |||
| SYN/ACKs negotiating different types of feedback have been sent | SYN/ACKs negotiating different types of feedback have been sent | |||
| within the same connection. | within the same connection. | |||
| Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
| experimental fields or options on the SYN/ACK, although the required | experimental fields or options on the SYN/ACK, although the required | |||
| behaviour will depend on the specification of the other option(s) and | behaviour will depend on the specification of the other option(s) and | |||
| on any attempt to co-ordinate fall-back between different modules of | on any attempt to coordinate fall-back between different modules of | |||
| the stack. | the stack. | |||
| If the TCP Client detects that the first data segment it sent with an | If the TCP Client detects that the first data segment it sent with an | |||
| AccECN Option was lost, in deployment scenarios where AccECN path | AccECN Option was lost, in deployment scenarios where AccECN path | |||
| traversal might be problematic, it SHOULD fall back to no AccECN | traversal might be problematic, it SHOULD fall back to no AccECN | |||
| Option on the retransmission. Again, implementers MAY use other | Option on the retransmission. Again, implementers MAY use other | |||
| fall-back strategies such as attempting to retransmit a second | fall-back strategies such as attempting to retransmit a second | |||
| segment with an AccECN Option before fall-back, and/or caching | segment with an AccECN Option before fall-back, and/or caching | |||
| whether AccECN Options are blocked for subsequent connections. | whether AccECN Options are blocked for subsequent connections. | |||
| [RFC9040] further discusses caching of TCP parameters and status | [RFC9040] further discusses caching of TCP parameters and status | |||
| skipping to change at page 41, line 40 ¶ | skipping to change at line 1908 ¶ | |||
| recognize, a host that is sending little or no data but mostly pure | recognize, a host that is sending little or no data but mostly pure | |||
| ACKs will not inherently detect such losses. Such a host MAY detect | ACKs will not inherently detect such losses. Such a host MAY detect | |||
| loss of ACKs carrying the AccECN Option by detecting whether the | loss of ACKs carrying the AccECN Option by detecting whether the | |||
| acknowledged data always reappears as a retransmission. In such | acknowledged data always reappears as a retransmission. In such | |||
| cases, the host SHOULD disable the sending of the AccECN Option for | cases, the host SHOULD disable the sending of the AccECN Option for | |||
| this half-connection. | this half-connection. | |||
| If a host falls back to not sending AccECN Options, it will continue | If a host falls back to not sending AccECN Options, it will continue | |||
| to process any incoming AccECN Options as normal. | to process any incoming AccECN Options as normal. | |||
| Either host MAY include AccECN Options in a subsequent segment or | Either host MAY include AccECN Options in one or more subsequent | |||
| segments to retest whether AccECN Options can traverse the path. | segments to retest whether AccECN Options can traverse the path. | |||
| Similarly, an AccECN endpoint MAY separately memorize which data | Similarly, an AccECN endpoint MAY separately memorize which data | |||
| packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
| Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
| higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
| 3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
| If the TCP Client has successfully negotiated AccECN but does not | If the TCP Client has successfully negotiated AccECN but does not | |||
| skipping to change at page 43, line 5 ¶ | skipping to change at line 1962 ¶ | |||
| the initial value of the EE0B field or EE1B field in an AccECN Option | the initial value of the EE0B field or EE1B field in an AccECN Option | |||
| (if one exists) ought to be non-zero. If AccECN has been negotiated: | (if one exists) ought to be non-zero. If AccECN has been negotiated: | |||
| * the TCP Server MAY check that the initial value of the EE0B field | * the TCP Server MAY check that the initial value of the EE0B field | |||
| or the EE1B field is non-zero in the first segment that | or the EE1B field is non-zero in the first segment that | |||
| acknowledges sequence space that at least covers the ISN plus 1. | acknowledges sequence space that at least covers the ISN plus 1. | |||
| If it runs a test and either initial value is zero, the Server | If it runs a test and either initial value is zero, the Server | |||
| will switch into a mode that ignores AccECN Options for this half | will switch into a mode that ignores AccECN Options for this half | |||
| connection. | connection. | |||
| * the TCP Client MAY check the initial value of the EE0B field or | * the TCP Client MAY check that the initial value of the EE0B field | |||
| the EE1B field is non-zero on the SYN/ACK. If it runs a test and | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
| either initial value is zero, the Client will switch into a mode | and either initial value is zero, the Client will switch into a | |||
| that ignores AccECN Options for this half connection. | mode that ignores AccECN Options for this half connection. | |||
| While a host is in the mode that ignores AccECN Options it MUST adopt | While a host is in the mode that ignores AccECN Options, it MUST | |||
| the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
| Section 3.2.2.5. | Section 3.2.2.5. | |||
| Note that the Data Sender MUST NOT test whether the arriving byte | Note that the Data Sender MUST NOT test whether the arriving byte | |||
| counters in an initial AccECN Option have been initialized to | counters in an initial AccECN Option have been initialized to | |||
| specific valid values - the above checks solely test whether these | specific valid values -- the above checks solely test whether these | |||
| fields have been incorrectly zeroed. This allows hosts to use | fields have been incorrectly zeroed. This allows hosts to use | |||
| different initial values as an additional signalling channel in | different initial values as an additional signalling channel in the | |||
| future. Also note that the initial value of either field might be | future. Also note that the initial value of either field might be | |||
| greater than its expected initial value, because the counters might | greater than its expected initial value, because the counters might | |||
| already have been incremented. Nonetheless, the initial values of | already have been incremented. Nonetheless, the initial values of | |||
| the counters have been chosen so that they cannot wrap to zero on | the counters have been chosen so that they cannot wrap to zero on | |||
| these initial segments. | these initial segments. | |||
| 3.2.3.2.5. Consistency between AccECN Feedback Fields | 3.2.3.2.5. Consistency Between AccECN Feedback Fields | |||
| When AccECN Options are available they ought to provide more | When AccECN Options are available, they ought to provide more | |||
| unambiguous feedback. However, they supplement but do not replace | unambiguous feedback. However, they supplement but do not replace | |||
| the ACE field. An endpoint using AccECN feedback MUST always | the ACE field. An endpoint using AccECN feedback MUST always | |||
| reconcile the information provided in the ACE field with that in any | reconcile the information provided in the ACE field with that in any | |||
| AccECN Option, so that the state of the ACE-related packet counter | AccECN Option, so that the state of the ACE-related packet counter | |||
| can be relied on if future feedback does not carry an AccECN Option. | can be relied on if future feedback does not carry an AccECN Option. | |||
| If an AccECN Option is present, the s.cep counter might increase more | If an AccECN Option is present, the s.cep counter might increase more | |||
| than expected from the increase of the s.ceb counter (e.g., due to a | than expected from the increase of the s.ceb counter (e.g., due to a | |||
| CE-marked control packet). The sender's response to such a situation | CE-marked control packet). The sender's response to such a situation | |||
| is out of scope, and needs to be dealt with in a specification that | is out of scope, and needs to be dealt with in a specification that | |||
| skipping to change at page 44, line 8 ¶ | skipping to change at line 2012 ¶ | |||
| the s.cep has not (and by testing ACK coverage it is certain how much | the s.cep has not (and by testing ACK coverage it is certain how much | |||
| the ACE field has wrapped), and if there is no explanation other than | the ACE field has wrapped), and if there is no explanation other than | |||
| an invalid protocol transition due to some form of feedback mangling, | an invalid protocol transition due to some form of feedback mangling, | |||
| the Data Sender MUST disable sending ECN-capable packets for the | the Data Sender MUST disable sending ECN-capable packets for the | |||
| remainder of the half-connection by setting the IP-ECN field in all | remainder of the half-connection by setting the IP-ECN field in all | |||
| subsequent packets to Not-ECT. | subsequent packets to Not-ECT. | |||
| 3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
| If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
| to provide feedback, the rules below determine when it includes an | to provide feedback, the rules below determine when to include an | |||
| AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
| might be competing for limited option space: | might be competing for limited option space: | |||
| Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
| which implementations SHOULD generally prioritize over other TCP | which implementations SHOULD generally prioritize over other TCP | |||
| options when there is insufficient space for all the options in | options when there is insufficient space for all the options in | |||
| use. | use. | |||
| If SACK has been negotiated [RFC2018], and the smallest | If SACK has been negotiated [RFC2018], and the smallest | |||
| recommended AccECN Option would leave insufficient space for two | recommended AccECN Option would leave insufficient space for two | |||
| skipping to change at page 44, line 38 ¶ | skipping to change at line 2042 ¶ | |||
| A scheduled ACK means an ACK that the Data Receiver would send by | A scheduled ACK means an ACK that the Data Receiver would send by | |||
| its regular delayed ACK rules. Recall that Section 1.3 defines an | its regular delayed ACK rules. Recall that Section 1.3 defines an | |||
| 'ACK' as either with data payload or without. But the above rule | 'ACK' as either with data payload or without. But the above rule | |||
| is worded so that, in the common case when most of the data is | is worded so that, in the common case when most of the data is | |||
| from a Server to a Client, the Server only includes an AccECN TCP | from a Server to a Client, the Server only includes an AccECN TCP | |||
| Option while it is acknowledging data from the Client. | Option while it is acknowledging data from the Client. | |||
| When available TCP option space is limited on particular packets, the | When available TCP option space is limited on particular packets, the | |||
| recommended scheme will need to include compromises. To guide the | recommended scheme will need to include compromises. To guide the | |||
| implementer the rules below are ranked in order of importance, but | implementer, the rules below are ranked in order of importance, but | |||
| the final decision has to be implementation-dependent, because | the final decision has to be implementation-dependent, because | |||
| tradeoffs will alter as new TCP options are defined and new use-cases | tradeoffs will alter as new TCP options are defined and new use-cases | |||
| arise. | arise. | |||
| Necessary Option Length: When TCP option space is limited, an AccECN | Necessary Option Length: When TCP option space is limited, an AccECN | |||
| TCP option MAY be truncated to omit one or two fields from the end | TCP option MAY be truncated to omit one or two fields from the end | |||
| of the option, as indicated by the permitted variants listed in | of the option, as indicated by the permitted variants listed in | |||
| Table 5, provided that the counter(s) that have changed since the | Table 5, provided that the counter(s) that have changed since the | |||
| previous AccECN TCP option are not omitted. | previous AccECN TCP option are not omitted. | |||
| skipping to change at page 45, line 51 ¶ | skipping to change at line 2104 ¶ | |||
| available for payload data with counter field(s) that have never | available for payload data with counter field(s) that have never | |||
| changed. | changed. | |||
| As an example of the recommended scheme, if ECT(0) is the only | As an example of the recommended scheme, if ECT(0) is the only | |||
| codepoint that has ever arrived in the IP-ECN field, the Data | codepoint that has ever arrived in the IP-ECN field, the Data | |||
| Receiver will feed back an AccECN0 TCP Option with only the EE0B | Receiver will feed back an AccECN0 TCP Option with only the EE0B | |||
| field on every packet that acknowledges new data. However, as soon | field on every packet that acknowledges new data. However, as soon | |||
| as even one CE-marked packet arrives, on every packet that | as even one CE-marked packet arrives, on every packet that | |||
| acknowledges new data it will start to include an option with two | acknowledges new data it will start to include an option with two | |||
| fields, EE0B and ECEB. As a second example, if the first packet to | fields, EE0B and ECEB. As a second example, if the first packet to | |||
| arrive happens to be CE-marked, the Data Receiver will have to | arrive happens to be CE marked, the Data Receiver will have to | |||
| arbitrarily choose whether to precede the ECEB field with an EE0B | arbitrarily choose whether to precede the ECEB field with an EE0B | |||
| field or an EE1B field. If it chooses, say, EEB0 but it turns out | field or an EE1B field. If it chooses, say, EEB0 but it turns out | |||
| never to receive ECT(0), it can start sending EE1B and ECEB instead - | never to receive ECT(0), it can start sending EE1B and ECEB instead | |||
| it does not have to include the EE0B field if the r.e0b counter has | -- it does not have to include the EE0B field if the r.e0b counter | |||
| never changed during the connection. | never changed during the connection. | |||
| With the recommended scheme, if the data sending direction switches | With the recommended scheme, if the data sending direction switches | |||
| during a connection, there can be cases where the AccECN TCP Option | during a connection, there can be cases where the AccECN TCP Option | |||
| that is meant to feed back the counter values at the end of a volley | that is meant to feed back the counter values at the end of a volley | |||
| in one direction never reaches the other peer, due to packet loss. | in one direction never reaches the other peer due to packet loss. | |||
| ACE feedback ought to be sufficient to fill this gap, given accurate | ACE feedback ought to be sufficient to fill this gap, given accurate | |||
| feedback becomes moot after data transmission has paused. | feedback becomes moot after data transmission has paused. | |||
| Appendix A.3 gives an example algorithm to estimate the number of | Appendix A.3 gives an example algorithm to estimate the number of | |||
| marked bytes from the ACE field alone, if AccECN Options are not | marked bytes from the ACE field alone, if AccECN Options are not | |||
| available. | available. | |||
| If a host has determined that segments with AccECN Options always | If a host has determined that segments with AccECN Options always | |||
| seem to be discarded somewhere along the path, it is no longer | seem to be discarded somewhere along the path, it is no longer | |||
| obliged to follow any of the rules in this section. | obliged to follow any of the rules in this section. | |||
| 3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines | 3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines, | |||
| and other Middleboxes | and Other Middleboxes | |||
| Given AccECN alters the TCP protocol on the wire, this section | Given AccECN alters the TCP protocol on the wire, this section | |||
| specifies new requirements on certain networking equipment that | specifies new requirements on certain networking equipment that | |||
| forwards TCP and inspects TCP header information. | forwards TCP and inspects TCP header information. | |||
| 3.3.1. Requirements for TCP Proxies | 3.3.1. Requirements for TCP Proxies | |||
| A large class of middleboxes split TCP connections. Such a middlebox | A large class of middleboxes split TCP connections. Such a middlebox | |||
| would be compliant with the AccECN protocol if the TCP implementation | would be compliant with the AccECN protocol if the TCP implementation | |||
| on each side complied with the present AccECN specification and each | on each side complied with the present AccECN specification and each | |||
| side negotiated AccECN independently of the other side. | side negotiated AccECN independently of the other side. | |||
| 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | |||
| Another large class of middleboxes intervenes to some degree at the | Another large class of middleboxes intervenes to some degree at the | |||
| transport layer, but attempts to be transparent (invisible) to the | transport layer, but attempts to be transparent (invisible) to the | |||
| end-to-end connection. A subset of this class of middleboxes | end-to-end connection. A subset of this class of middleboxes | |||
| attempts to `normalize' the TCP wire protocol by checking that all | attempts to 'normalize' the TCP wire protocol by checking that all | |||
| values in header fields comply with a rather narrow interpretation of | values in header fields comply with a rather narrow interpretation of | |||
| the TCP specifications that is also not always up to date. | the TCP specifications that is not always up to date. | |||
| A middlebox that is not normalizing the TCP protocol and does not | A middlebox that is not normalizing the TCP protocol and does not | |||
| itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | |||
| that intends to be transparent or invisible at the transport layer) | that intends to be transparent or invisible at the transport layer) | |||
| ought to forward AccECN TCP Options unaltered, whether or not the | ought to forward AccECN TCP Options unaltered, whether or not the | |||
| length value matches one of those specified in Section 3.2.3, and | length value matches one of those specified in Section 3.2.3, and | |||
| whether or not the initial values of the byte-counter fields match | whether or not the initial values of the byte-counter fields match | |||
| those in Section 3.2.1. This is because blocking apparently invalid | those in Section 3.2.1. This is because blocking apparently invalid | |||
| values prevents the standardized set of values being extended in | values prevents the standardized set of values from being extended in | |||
| future (such outdated normalizers would block updated hosts from | the future (such outdated normalizers would block updated hosts from | |||
| using the extended AccECN standard). | using the extended AccECN standard). | |||
| A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
| the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
| not match one of those specified in Section 3.2.3 or Section 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
| However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
| MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
| that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
| covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
| 3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
| Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on | Section 5.2.1 of [RFC3449] gives best current practice on filtering | |||
| filtering (aka. thinning or coalescing) of pure TCP ACKs. It advises | (aka thinning or coalescing) of pure TCP ACKs. It advises that | |||
| that filtering ACKs carrying ECN feedback ought to preserve the | filtering ACKs carrying ECN feedback ought to preserve the correct | |||
| correct operation of ECN feedback. As the present specification | operation of ECN feedback. As the present specification updates the | |||
| updates the operation of ECN feedback, this section discusses how an | operation of ECN feedback, this section discusses how an ACK filter | |||
| ACK filter might preserve correct operation of AccECN feedback as | might preserve correct operation of AccECN feedback as well. | |||
| well. | ||||
| The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
| a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
| operation of AccECN feedback: | operation of AccECN feedback: | |||
| * To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
| connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
| state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
| field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
| TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
| [I-D.ietf-tcpm-generalized-ecn]). This heuristic is simple and | [ECN++]). This heuristic is simple and stateless. However, it | |||
| stateless. However, it might omit some AccECN ACKs, because | might omit some AccECN ACKs, because AccECN can be used without | |||
| AccECN can be used without ECN++ and even if it is, ECN++ does not | ECN++ and even if it is, ECN++ does not have to make pure ACKs | |||
| have to make pure ACKs ECN-capable - only deployment experience | ECN-capable -- only deployment experience will tell. Also, TCP | |||
| will tell. Also, TCP ACKs might be ECN-capable owing to some | ACKs might be ECN-capable owing to some scheme other than AccECN, | |||
| scheme other than AccECN, e.g., [RFC5690] or some future standards | e.g., [RFC5690] or some future standards action. Again, only | |||
| action. Again, only deployment experience will tell. | deployment experience will tell. | |||
| * The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
| leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
| 3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
| a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
| will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
| of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
| However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
| to implement (albeit recommended — see Section 3.2.3). So one end | to implement (albeit recommended -- see Section 3.2.3). So one | |||
| writing an AccECN TCP Option into a packet does not necessarily | end writing an AccECN TCP Option into a packet does not | |||
| imply that the other end will read it. | necessarily imply that the other end will read it. | |||
| Note that the present specification of AccECN in TCP does not presume | Note that the present specification of AccECN in TCP does not presume | |||
| to rely on any of the above ACK filtering behaviour in the network, | to rely on any of the above ACK filtering behaviour in the network, | |||
| because it has to be robust against pre-existing network nodes that | because it has to be robust against pre-existing network nodes that | |||
| do not distinguish AccECN ACKs, and robust against ACK loss during | do not distinguish AccECN ACKs, and robust against ACK loss during | |||
| overload more generally. | overload more generally. | |||
| 3.3.4. Requirements for TCP Segmentation Offload and Large Receive | 3.3.4. Requirements for TCP Segmentation Offload and Large Receive | |||
| Offload | Offload | |||
| skipping to change at page 48, line 30 ¶ | skipping to change at line 2227 ¶ | |||
| Offloading can happen in the transmit path, usually referred to as | Offloading can happen in the transmit path, usually referred to as | |||
| TCP Segmentation Offload (TSO), and the receive path where it is | TCP Segmentation Offload (TSO), and the receive path where it is | |||
| called Large Receive Offload (LRO). | called Large Receive Offload (LRO). | |||
| In the transmit direction, with AccECN, all segments created from the | In the transmit direction, with AccECN, all segments created from the | |||
| same super-segment should retain the same ACE field, which should | same super-segment should retain the same ACE field, which should | |||
| make TSO straighforward. | make TSO straighforward. | |||
| However, with TSO hardware that supports [RFC3168], the CWR bit is | However, with TSO hardware that supports [RFC3168], the CWR bit is | |||
| usually masked out on the middle and last segment. If applied to an | usually masked out on the middle and last segments. If applied to an | |||
| AccECN segment, this would change the ACE field, and would be | AccECN segment, this would change the ACE field, and would be | |||
| interpreted as having received numerous CE marks in the receive | interpreted as having received numerous CE marks in the receive | |||
| direction. Therefore, currently available TSO hardware with | direction. Therefore, currently available TSO hardware with | |||
| [RFC3168] support may need some minor driver changes, to adjust the | [RFC3168] support may need some minor driver changes, to adjust the | |||
| bitmask for the first, middle and last segment processed with TSO. | bitmask for the first, middle, and last segments processed with TSO. | |||
| Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | |||
| on the same offloading engine, the host software may need to work | on the same offloading engine, the host software may need to work | |||
| around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
| Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
| One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
| not offer a fall back to [RFC3168] ECN. Another way could be to | not offer a fall back to [RFC3168] ECN. Another way could be to | |||
| allow TSO only as long as the CWR flag in the TCP header is not set - | allow TSO only as long as the CWR flag in the TCP header is not set | |||
| at the cost of more processing overhead while the ACE field has this | -- at the cost of more processing overhead while the ACE field has | |||
| bit set. | this bit set. | |||
| For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
| with [RFC3168] ECN supporting hardware. | with [RFC3168] ECN supporting hardware. | |||
| The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
| receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
| situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
| information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
| processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
| the marking rate is generally higher. | the marking rate is generally higher. | |||
| Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
| whenever the TCP ECN flags change. In data centres it has been | whenever the TCP ECN flags change. In data centres, it has been | |||
| fortunate for this offload hardware that DCTCP-style feedback changes | fortunate for this offload hardware that DCTCP-style feedback changes | |||
| less often when there are long sequences of CE marks, which is more | less often when there are long sequences of CE marks, which is more | |||
| common with a step marking threshold (but less likely the more short | common with a step marking threshold (but less likely the more short | |||
| flows are in the mix). The ACE counter approach has been designed so | flows are in the mix). The ACE counter approach has been designed so | |||
| that coalescing can continue over arbitrary patterns of marking and | that coalescing can continue over arbitrary patterns of marking and | |||
| only needs to stop when the counter wraps. Nonetheless, until the | only needs to stop when the counter wraps. Nonetheless, until the | |||
| particular offload hardware in use implements this more efficient | particular offload hardware in use implements this more efficient | |||
| approach, it is likely to be more efficient for AccECN connections to | approach, it is likely to be more efficient for AccECN connections to | |||
| implement this counter-style logic using software segmentation | implement this counter-style logic using software segmentation | |||
| offload. | offload. | |||
| skipping to change at page 49, line 35 ¶ | skipping to change at line 2278 ¶ | |||
| ECN encodes a varying signal in the ACK stream, so it is inevitable | ECN encodes a varying signal in the ACK stream, so it is inevitable | |||
| that offload hardware will ultimately need to handle any form of ECN | that offload hardware will ultimately need to handle any form of ECN | |||
| feedback exceptionally. The ACE field has been designed as a counter | feedback exceptionally. The ACE field has been designed as a counter | |||
| so that it is straightforward for offload hardware to pass on the | so that it is straightforward for offload hardware to pass on the | |||
| highest counter, and to push a segment from its cache before the | highest counter, and to push a segment from its cache before the | |||
| counter wraps. The purpose of working towards standardized TCP ECN | counter wraps. The purpose of working towards standardized TCP ECN | |||
| feedback is to reduce the risk for hardware developers, who would | feedback is to reduce the risk for hardware developers, who would | |||
| otherwise have to guess which scheme is likely to become dominant. | otherwise have to guess which scheme is likely to become dominant. | |||
| The above process has been designed to enable a continuing | The above process has been designed to enable a continuing | |||
| incremental deployment path - to more highly dynamic congestion | incremental deployment path -- to more highly dynamic congestion | |||
| control. Once offload hardware supports AccECN, it will be able to | control. Once offload hardware supports AccECN, it will be able to | |||
| coalesce efficiently for any sequence of marks, instead of relying | coalesce efficiently for any sequence of marks, instead of relying on | |||
| for efficiency on the long marking sequences from step marking. In | the long marking sequences from step marking for efficiency. In the | |||
| the next stage, marking can evolve from a step to a ramp function. | next stage, marking can evolve from a step to a ramp function. That | |||
| That in turn will allow host congestion control algorithms to respond | in turn will allow host congestion control algorithms to respond | |||
| faster to dynamics, while being backwards compatible with existing | faster to dynamics, while being backwards compatible with existing | |||
| host algorithms. | host algorithms. | |||
| 4. Updates to RFC 3168 | 4. Updates to RFC 3168 | |||
| This section clarifies which parts of RFC3168 are updated and maps | This section clarifies which parts of RFC 3168 are updated and maps | |||
| them to the sections of the present AccECN specification that update | them to the relevant updated sections of the present AccECN | |||
| them: | specification. | |||
| * The whole of "6.1.1 TCP Initialization" of [RFC3168] is updated by | * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1 | |||
| Section 3.1 of the present specification. | of the present specification. | |||
| * In "6.1.2. The TCP Sender" of [RFC3168], all mentions of a | * In Section 6.1.2 of [RFC3168], all mentions of a congestion | |||
| congestion response to an ECN-Echo (ECE) ACK packet are updated by | response to an ECN-Echo (ECE) ACK packet are updated by | |||
| Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
| the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
| requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
| Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
| remaining requirements in "6.1.2. The TCP Sender" still stand. | remaining requirements in Section 6.1.2 of [RFC3168] still stand. | |||
| It will be noted that RFC 8311 already updates, or potentially | It will be noted that [RFC8311] already updates, or potentially | |||
| updates, a number of the requirements in "6.1.2. The TCP Sender". | updates, a number of the requirements in Section 6.1.2 of | |||
| Section 6.1.2 of RFC 3168 extended standard TCP congestion control | [RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP | |||
| [RFC5681] to cover ECN marking as well as packet drop. Whereas, | congestion control [RFC5681] to cover ECN marking as well as | |||
| RFC 8311 enables experimentation with alternative responses to ECN | packet drop. Whereas, [RFC8311] enables experimentation with | |||
| marking, if specified for instance by an experimental RFC on the | alternative responses to ECN marking, if specified for instance by | |||
| IETF document stream. RFC 8311 also strengthened the statement | an Experimental RFC produced by the IETF Stream. [RFC8311] also | |||
| that "ECT(0) SHOULD be used" to a "MUST" (see [RFC8311] for the | strengthened the statement that "ECT(0) SHOULD be used" to a | |||
| details). | "MUST" (see [RFC8311] for the details). | |||
| * The whole of "6.1.3. The TCP Receiver" of [RFC3168] is updated by | * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | |||
| Section 3.2 of the present specification, with the exception of | of the present specification, with the exception of the last | |||
| the last paragraph (about congestion response to drop and ECN in | paragraph (about congestion response to drop and ECN in the same | |||
| the same round trip), which still stands. Incidentally, this last | round trip), which still stands. Incidentally, this last | |||
| paragraph is in the wrong section, because it relates to "TCP | paragraph is in the wrong section, because it relates to "TCP | |||
| Sender" behaviour. | Sender" behaviour. | |||
| * The following text within "6.1.5. Retransmitted TCP packets": | * The following text within Section 6.1.5 of [RFC3168]: | |||
| "the TCP data receiver SHOULD ignore the ECN field on arriving | | the TCP data receiver SHOULD ignore the ECN field on arriving | |||
| data packets that are outside of the receiver's current | | data packets that are outside of the receiver's current window. | |||
| window." | ||||
| is updated by more stringent acceptability tests for any packet | is updated by more stringent acceptability tests for any packet | |||
| (not just data packets) in the present specification. | (not just data packets) in the present specification. | |||
| Specifically, in the normative specification of AccECN (Section 3) | Specifically, in the normative specification of AccECN | |||
| only 'Acceptable' packets contribute to the ECN counters at the | (Section 3), only 'Acceptable' packets contribute to the ECN | |||
| AccECN receiver and Section 1.3 defines an Acceptable packet as | counters at the AccECN receiver and Section 1.3 defines an | |||
| one that passes acceptability tests equivalent in strength to | Acceptable packet as one that passes acceptability tests | |||
| those in both [RFC9293] and [RFC5961]. | equivalent in strength to those in both [RFC9293] and [RFC5961]. | |||
| * Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of [RFC3168] prohibit | * Sections 5.2, 6.1.1, 6.1.4, 6.1.5, and 6.1.6 of [RFC3168] prohibit | |||
| use of ECN on TCP control packets and retransmissions. The | use of ECN on TCP control packets and retransmissions. The | |||
| present specification does not update that aspect of RFC 3168, but | present specification does not update that aspect of [RFC3168], | |||
| it does say what feedback an AccECN Data Receiver ought to provide | but it does say what feedback an AccECN Data Receiver ought to | |||
| if it receives an ECN-capable control packet or retransmission. | provide if it receives an ECN-capable control packet or | |||
| This ensures AccECN is forward compatible with any future scheme | retransmission. This ensures AccECN is forward compatible with | |||
| that allows ECN on these packets, as provided for in section 4.3 | any future scheme that allows ECN on these packets, as provided | |||
| of [RFC8311] and as proposed in [I-D.ietf-tcpm-generalized-ecn]. | for in Section 4.3 of [RFC8311] and as proposed in [ECN++]. | |||
| 5. Interaction with TCP Variants | 5. Interaction with TCP Variants | |||
| This section is informative, not normative. | This section is informative, not normative. | |||
| 5.1. Compatibility with SYN Cookies | 5.1. Compatibility with SYN Cookies | |||
| A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | |||
| protect itself from SYN flooding attacks. It places minimal commonly | protect itself from SYN flooding attacks. It places minimal commonly | |||
| used connection state in the SYN/ACK, and deliberately does not hold | used connection state in the SYN/ACK, and deliberately does not hold | |||
| any state while waiting for the subsequent ACK (e.g., it closes the | any state while waiting for the subsequent ACK (e.g., it closes the | |||
| thread). Therefore it cannot record the fact that it entered AccECN | thread). Therefore, it cannot record the fact that it entered AccECN | |||
| mode for both half-connections. Indeed, it cannot even remember | mode for both half-connections. Indeed, it cannot even remember | |||
| whether it negotiated the use of Classic ECN [RFC3168]. | whether it negotiated the use of Classic ECN [RFC3168]. | |||
| Nonetheless, such a Server can determine that it negotiated AccECN as | Nonetheless, such a Server can determine that it negotiated AccECN as | |||
| follows. If a TCP Server using SYN Cookies supports AccECN and if it | follows. If a TCP Server using SYN Cookies supports AccECN and if it | |||
| receives a pure ACK that acknowledges an ISN that is a valid SYN | receives a pure ACK that acknowledges an ISN that is a valid SYN | |||
| cookie, and if the ACK contains an ACE field with the value 0b010 to | cookie, and if the ACK contains an ACE field with the value 0b010 to | |||
| 0b111 (decimal 2 to 7), the Server can infer the first two stages of | 0b111 (decimal 2 to 7), the Server can infer the first two stages of | |||
| the handshake: | the handshake: | |||
| * the TCP Client has to have requested AccECN support on the SYN; | * the TCP Client has to have requested AccECN support on the SYN; | |||
| * then, even though the Server kept no state, it has to have | * then, even though the Server kept no state, it has to have | |||
| confirmed that it supported AccECN. | confirmed that it supported AccECN. | |||
| Therefore the Server can switch itself into AccECN mode, and continue | Therefore, the Server can switch itself into AccECN mode, and | |||
| as if it had never forgotten that it switched itself into AccECN mode | continue as if it had never forgotten that it switched itself into | |||
| earlier. | AccECN mode earlier. | |||
| If the pure ACK that acknowledges a SYN cookie contains an ACE field | If the pure ACK that acknowledges a SYN cookie contains an ACE field | |||
| with the value 0b000 or 0b001, these values indicate that the TCP | with the value 0b000 or 0b001, these values indicate that the TCP | |||
| Client did not request support for AccECN and therefore the Server | Client did not request support for AccECN; therefore, the Server does | |||
| does not enter AccECN mode for this connection. Further, 0b001 on | not enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
| the ACK implies that the Server sent an ECN-capable SYN/ACK, which | implies that the Server sent an ECN-capable SYN/ACK, which was marked | |||
| was marked CE in the network, and the non-AccECN TCP Client fed this | CE in the network, and the non-AccECN TCP Client fed this back by | |||
| back by setting ECE on the ACK of the SYN/ACK. | setting ECE on the ACK of the SYN/ACK. | |||
| 5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
| AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
| TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is | TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It | |||
| also compatible with Multipath TCP (MPTCP [RFC8684]) and the | is also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
| experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
| friendly to all these protocols, because space for TCP options is | friendly to all these protocols, because space for TCP options is | |||
| particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
| header space. | header space. | |||
| When option space is under pressure from other options, | When option space is under pressure from other options, | |||
| Section 3.2.3.3 provides guidance on how important it is to send an | Section 3.2.3.3 provides guidance on how important it is to send an | |||
| AccECN Option relative to other options, and which fields are more | AccECN Option relative to other options, and which fields are more | |||
| important to include. | important to include. | |||
| Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
| in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
| has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
| even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
| the data. Then it can reflect the IP-ECN field of the SYN/ACK on | the data. Then it can reflect the IP-ECN field of the SYN/ACK on | |||
| this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
| that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
| is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
| marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
| AccECN feedback is compatible with the ECN++ | AccECN feedback is compatible with the ECN++ experiment [ECN++], | |||
| [I-D.ietf-tcpm-generalized-ecn] experiment, which allows TCP control | which allows TCP control packets and retransmissions to be ECN- | |||
| packets and retransmissions to be ECN-capable ([RFC3168] was updated | capable ([RFC3168] was updated by [RFC8311] to permit such | |||
| by [RFC8311] to permit such experiments). AccECN is likely to | experiments). AccECN is likely to inherently support any experiment | |||
| inherently support any experiment with ECN-capable packets, because | with ECN-capable packets, because it feeds back the contents of the | |||
| it feeds back the contents of the ECN field mechanistically, without | ECN field mechanistically, without judging whether or not a packet | |||
| judging whether a packet ought to use the ECN capability or not | ought to use the ECN capability (Section 2.5). This specification | |||
| (Section 2.5). This specification does not discuss implementing | does not discuss implementing AccECN alongside [RFC5562], which was | |||
| AccECN alongside [RFC5562], which was an earlier experimental | an earlier experimental protocol with narrower scope than ECN++ and a | |||
| protocol with narrower scope than ECN++ and a 5-way handshake. | 5-way handshake. | |||
| 5.3. Compatibility with Feedback Integrity Mechanisms | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
| Three alternative mechanisms are available to assure the integrity of | Three alternative mechanisms are available to assure the integrity of | |||
| ECN and/or loss signals. AccECN is compatible with any of these | ECN and/or loss signals. AccECN is compatible with any of these | |||
| approaches: | approaches: | |||
| * The Data Sender can test the integrity of the receiver's ECN (or | * The Data Sender can test the integrity of the receiver's ECN (or | |||
| loss) feedback by occasionally setting the IP-ECN field to a value | loss) feedback by occasionally setting the IP-ECN field to a value | |||
| normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
| sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
| Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
| paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce | paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | |||
| [RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
| the IP header, it does not require standardization and it does not | the IP header, it does not require standardization, and it does | |||
| rely on misbehaving receivers volunteering to reveal feedback | not rely on misbehaving receivers volunteering to reveal feedback | |||
| information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
| CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
| from the network and therefore ought to only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
| * Networks generate congestion signals when they are becoming | * Networks generate congestion signals when they are becoming | |||
| congested, so networks are more likely than Data Senders to be | congested, so networks are more likely than Data Senders to be | |||
| concerned about the integrity of the receiver's feedback of these | concerned about the integrity of the receiver's feedback of these | |||
| signals. A network can enforce a congestion response to its ECN | signals. A network can enforce a congestion response to its ECN | |||
| markings (or packet losses) using congestion exposure (ConEx) | markings (or packet losses) using congestion exposure (ConEx) | |||
| audit [RFC7713]. Whether the receiver or a downstream network is | audit [RFC7713]. Whether the receiver or a downstream network is | |||
| suppressing congestion feedback or the sender is unresponsive to | suppressing congestion feedback, or the sender is unresponsive to | |||
| the feedback, or both, ConEx audit can neutralize any advantage | the feedback, or both, ConEx audit can neutralize any advantage | |||
| that any of these three parties would otherwise gain. | that any of these three parties would otherwise gain. | |||
| ConEx is an experimental change to the Data Sender that would be | ConEx is an experimental change to the Data Sender that would be | |||
| most useful when combined with AccECN. Without AccECN, the ConEx | most useful when combined with AccECN. Without AccECN, the ConEx | |||
| behaviour of a Data Sender would have to be more conservative than | behaviour of a Data Sender would have to be more conservative than | |||
| would be necessary if it had the accurate feedback of AccECN. | would be necessary if it had the accurate feedback of AccECN. | |||
| * The standards track TCP authentication option (TCP-AO [RFC5925]) | * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | |||
| can be used to detect any tampering with AccECN feedback between | can be used to detect any tampering with AccECN feedback between | |||
| the Data Receiver and the Data Sender (whether malicious or | the Data Receiver and the Data Sender (whether malicious or | |||
| accidental). The AccECN fields are immutable end-to-end, so they | accidental). The AccECN fields are immutable end to end, so they | |||
| are amenable to TCP-AO protection, which covers TCP options by | are amenable to TCP-AO protection, which covers TCP options by | |||
| default. However, TCP-AO is often too brittle to use on many end- | default. However, TCP-AO is often too brittle to use on many end- | |||
| to-end paths, where middleboxes can make verification fail in | to-end paths, where middleboxes can make verification fail in | |||
| their attempts to improve performance or security, e.g., Network | their attempts to improve performance or security, e.g., Network | |||
| Address (and Port) Translation (NAT/NAPT), resegmentation or | Address Translation (NAT) and Network Address Port Translation | |||
| shifting the sequence space. | (NAPT), resegmentation, or shifting the sequence space. | |||
| 6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
| This section is informative not normative. It describes how well the | This section is informative, not normative. It describes how well | |||
| protocol satisfies the agreed requirements for a more Accurate ECN | the protocol satisfies the agreed requirements for a more Accurate | |||
| feedback protocol [RFC7560]. | ECN feedback protocol [RFC7560]. | |||
| Accuracy: From each ACK, the Data Sender can infer the number of new | Accuracy: From each ACK, the Data Sender can infer the number of new | |||
| CE marked segments since the previous ACK. This provides better | CE-marked segments since the previous ACK. This provides better | |||
| accuracy on CE feedback than Classic ECN. In addition if an | accuracy on CE feedback than Classic ECN. In addition, if an | |||
| AccECN Option is present (not blocked by the network path) the | AccECN Option is present (not blocked by the network path), the | |||
| number of bytes marked with CE, ECT(1) and ECT(0) are provided. | number of bytes marked with CE, ECT(1), and ECT(0) are provided. | |||
| Overhead: The AccECN scheme is divided into two parts. The | Overhead: The AccECN scheme is divided into two parts. The | |||
| essential feedback part reuses the 3 flags already assigned to ECN | essential feedback part reuses the three flags already assigned to | |||
| in the TCP header. The supplementary feedback part adds an | ECN in the TCP header. The supplementary feedback part adds an | |||
| additional TCP option consuming up to 11 bytes. However, no TCP | additional TCP option consuming up to 11 bytes. However, no TCP | |||
| option space is consumed in the SYN. | option space is consumed in the SYN. | |||
| Ordering: The order in which marks arrive at the Data Receiver is | Ordering: The order in which marks arrive at the Data Receiver is | |||
| preserved in AccECN feedback, because the Data Receiver is | preserved in AccECN feedback, because the Data Receiver is | |||
| expected to send an ACK immediately whenever a different mark | expected to send an ACK immediately whenever a different mark | |||
| arrives. | arrives. | |||
| Timeliness: While the same ECN markings are arriving continually at | Timeliness: While the same ECN markings are arriving continually at | |||
| the Data Receiver, it can defer ACKs as TCP does normally, but it | the Data Receiver, it can defer ACKs as TCP does normally, but it | |||
| skipping to change at page 54, line 18 ¶ | skipping to change at line 2500 ¶ | |||
| Timeliness vs Overhead: Change-Triggered ACKs are intended to enable | Timeliness vs Overhead: Change-Triggered ACKs are intended to enable | |||
| latency-sensitive uses of ECN feedback by capturing the timing of | latency-sensitive uses of ECN feedback by capturing the timing of | |||
| transitions but not wasting resources while the state of the | transitions but not wasting resources while the state of the | |||
| signalling system is stable. Within the constraints of the | signalling system is stable. Within the constraints of the | |||
| change-triggered ACK rules, the receiver can control how | change-triggered ACK rules, the receiver can control how | |||
| frequently it sends AccECN TCP Options and therefore to some | frequently it sends AccECN TCP Options and therefore to some | |||
| extent it can control the overhead induced by AccECN. | extent it can control the overhead induced by AccECN. | |||
| Resilience: All information is provided based on counters. | Resilience: All information is provided based on counters. | |||
| Therefore if ACKs are lost, the counters on the first ACK | Therefore if ACKs are lost, the counters on the first ACK | |||
| following the losses allows the Data Sender to immediately recover | following the losses allow the Data Sender to immediately recover | |||
| the number of the ECN markings that it missed. And if data or | the number of the ECN markings that it missed. If data or ACKs | |||
| ACKs are reordered, stale congestion information can be identified | are reordered, stale congestion information can be identified and | |||
| and ignored. | ignored. | |||
| Resilience against Bias: Because feedback is based on repetition of | Resilience against Bias: Because feedback is based on repetition of | |||
| counters, random losses do not remove any information, they only | counters, random losses do not remove any information, they only | |||
| delay it. Therefore, even though some ACKs are change-triggered, | delay it. Therefore, even though some ACKs are change-triggered, | |||
| random losses will not alter the proportions of the different ECN | random losses will not alter the proportions of the different ECN | |||
| markings in the feedback. | markings in the feedback. | |||
| Resilience vs Overhead: If space is limited in some segments | Resilience vs Overhead: If space is limited in some segments (e.g., | |||
| (e.g., because more options are needed on some segments, such as | because more options are needed on some segments, such as the SACK | |||
| the SACK option after loss), the Data Receiver can send AccECN | option after loss), the Data Receiver can send AccECN Options less | |||
| Options less frequently or truncate fields that have not changed, | frequently or truncate fields that have not changed, usually down | |||
| usually down to as little as 5 bytes. | to as little as 5 bytes. | |||
| Resilience vs Timeliness and Ordering: Ordering information and the | Resilience vs Timeliness and Ordering: Ordering information and the | |||
| timing of transitions cannot be communicated in three cases: i) | timing of transitions cannot be communicated in three cases: i) | |||
| during ACK loss; ii) if something on the path strips AccECN | during ACK loss; ii) if something on the path strips AccECN | |||
| Options; or iii) if the Data Receiver is unable to support Change- | Options; or iii) if the Data Receiver is unable to support Change- | |||
| Triggered ACKs. Following ACK reordering, the Data Sender can | Triggered ACKs. Following ACK reordering, the Data Sender can | |||
| reconstruct the order in which feedback was sent, but not until | reconstruct the order in which feedback was sent, but not until | |||
| all the missing feedback has arrived. | all the missing feedback has arrived. | |||
| Complexity: An AccECN implementation solely involves simple counter | Complexity: An AccECN implementation solely involves simple counter | |||
| increments, some modulo arithmetic to communicate the least | increments, some modulo arithmetic to communicate the least | |||
| significant bits and allow for wrap, and some heuristics for | significant bits and allow for wrap, and some heuristics for | |||
| safety against fields cycling due to prolonged periods of ACK | safety against fields cycling due to prolonged periods of ACK | |||
| loss. Each host needs to maintain eight additional counters. The | loss. Each host needs to maintain eight additional counters. The | |||
| hosts have to apply some additional tests to detect tampering by | hosts have to apply some additional tests to detect tampering by | |||
| middleboxes, but in general the protocol is simple to understand, | middleboxes, but in general the protocol is simple to understand | |||
| simple to implement and requires few cycles per packet to execute. | and implement and requires few cycles per packet to execute. | |||
| Integrity: AccECN is compatible with at least three approaches that | Integrity: AccECN is compatible with at least three approaches that | |||
| can assure the integrity of ECN feedback. If AccECN Options are | can assure the integrity of ECN feedback. If AccECN Options are | |||
| stripped the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
| integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
| Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
| scheme, it will fall-back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
| supported by the other end. | supported by the other end. | |||
| If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
| provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
| AccECN can be used to detect mangling of the IP ECN field; | AccECN can be used to detect mangling of the IP-ECN field; | |||
| mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP ECN flags; blocking of ECT-marked segments; | |||
| and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
| these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
| fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
| Options. | Options. | |||
| Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
| carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
| in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
| understand which currently unused values might appear in future. | understand which currently unused values might appear in the | |||
| So, even if they choose to treat such values as anomalous while | future. So, even if they choose to treat such values as anomalous | |||
| they are not widely used, any blocking will at least be under | while they are not widely used, any blocking will at least be | |||
| policy control not hard-coded. Then, if previously unused values | under policy control and not hard-coded. Then, if previously | |||
| start to appear on the Internet (or in standards), such policies | unused values start to appear on the Internet (or in standards), | |||
| could be quickly reversed. | such policies could be quickly reversed. | |||
| 7. IANA Considerations | 7. IANA Considerations | |||
| This document reassigns the TCP header flag at bit offset 7 to the | This document reassigns the TCP header flag at bit offset 7 to the | |||
| AccECN protocol. This bit was previously called the Nonce Sum (NS) | AccECN protocol. This bit was previously called the Nonce Sum (NS) | |||
| flag [RFC3540], but RFC 3540 has been reclassified as historic | flag [RFC3540], but RFC 3540 has been reclassified as Historic | |||
| [RFC8311]. The flag will now be defined as the following in the "TCP | [RFC8311]. The flag is now defined as the following in the "TCP | |||
| Header Flags" registry in the "Transmission Control Protocol (TCP) | Header Flags" registry in the "Transmission Control Protocol (TCP) | |||
| Parameters" registry group: | Parameters" registry group: | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | Bit | Name | Reference | Assignment Notes | | | Bit | Name | Reference | Assignment Notes | | |||
| +=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| | 7 | AE (Accurate | RFC XXXX | Previously used as NS (Nonce | | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | |||
| | | ECN) | | Sum) by [RFC3540], which is | | | | ECN) | | Sum) by [RFC3540], which is | | |||
| | | | | now historic [RFC8311] | | | | | | now Historic [RFC8311] | | |||
| +-----+--------------+-----------+------------------------------+ | +-----+--------------+-----------+------------------------------+ | |||
| Table 6: TCP header flag reassignment | Table 6: TCP Header Flag Reassignment | |||
| [TO BE REMOVED: IANA is requested to update the existing entry in the | ||||
| TCP Header Flags registry (https://www.iana.org/assignments/tcp- | ||||
| parameters/tcp-parameters.xhtml#tcp-header-flags) for Bit 7 to "AE | ||||
| (Accurate ECN)" and to change the reference to this RFC-to-be instead | ||||
| of RFC8311. Also IANA is requested to change the assignment note to | ||||
| "Previously used as NS (Nonce Sum) by [RFC3540], which is now | ||||
| historic [RFC8311]."] | ||||
| This document also defines two new TCP options for AccECN, assigned | This document also defines two new TCP options for AccECN from the | |||
| values of 172 and 174 (decimal) from the TCP option space. These | TCP option space. These values are defined as the following in the | |||
| values are defined as the following in the "TCP Option Kind Numbers" | "TCP Option Kind Numbers" registry in the "Transmission Control | |||
| registry in the "Transmission Control Protocol (TCP) Parameters" | Protocol (TCP) Parameters" registry group: | |||
| registry group: | ||||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | Kind | Length | Meaning | Reference | | | Kind | Length | Meaning | Reference | | |||
| +======+========+================================+===========+ | +======+========+================================+===========+ | |||
| | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC XXXX | | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC XXXX | | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | |||
| +------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| Table 7: New TCP Option assignments | Table 7: New TCP Option assignments | |||
| [TO BE REMOVED: These registrations have taken place using the early | ||||
| registration procedure, which may be temporary if this draft does not | ||||
| proceed, at the following location: http://www.iana.org/assignments/ | ||||
| tcp-parameters/tcp-parameters.xhtml#tcp-parameters-1 ] | ||||
| Early experimental implementations of the two AccECN Options used | Early experimental implementations of the two AccECN Options used | |||
| experimental option 254 per [RFC6994] with the 16-bit magic numbers | experimental option 254 per [RFC6994] with the 16-bit magic numbers | |||
| 0xACC0 and 0xACC1 respectively for Order 0 and 1, as allocated in the | 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | |||
| IANA "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" | the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | |||
| registry. Even earlier experimental implementations used the single | ExIDs)" registry. Even earlier experimental implementations used the | |||
| magic number 0xACCE (16 bits). Uses of these experimental options | single magic number 0xACCE (16 bits). Uses of these experimental | |||
| SHOULD migrate to use the new option kinds (172 & 174). | options SHOULD migrate to use the new option kinds (172 and 174). | |||
| [TO BE REMOVED: IANA is requested to replace the references for all | ||||
| three of the above experimental options (0xACC0, 0xACC1 and 0xACCE) | ||||
| with a reference to the present RFC XXXX.] | ||||
| [TO BE REMOVED: If the early registrations, which may be temporary, | ||||
| do not proceed, the three references to them in the TCP ExIDs | ||||
| registry at the following location will also need to be edited out: | ||||
| https://www.iana.org/assignments/tcp-parameters/tcp- | ||||
| parameters.xhtml#tcp-exids ] | ||||
| 8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
| If ever the supplementary feedback part of AccECN based on one of the | If ever the supplementary feedback part of AccECN that is based on | |||
| new AccECN TCP Options is unusable (due for example to middlebox | one of the new AccECN TCP Options is unusable (due for example to | |||
| interference) the essential feedback part of AccECN's congestion | middlebox interference), the essential feedback part of AccECN's | |||
| feedback offers only limited resilience to long runs of ACK loss (see | congestion feedback offers only limited resilience to long runs of | |||
| Section 3.2.2.5). These problems are unlikely to be due to malicious | ACK loss (see Section 3.2.2.5). These problems are unlikely to be | |||
| intervention (because if an attacker could strip a TCP option or | due to malicious intervention (because if an attacker could strip a | |||
| discard a long run of ACKs it could wreak other arbitrary havoc). | TCP option or discard a long run of ACKs, it could wreak other | |||
| However, it would be of concern if AccECN's resilience could be | arbitrary havoc). However, it would be of concern if AccECN's | |||
| indirectly compromised during a flooding attack. AccECN is still | resilience could be indirectly compromised during a flooding attack. | |||
| considered safe though, because if AccECN Options are not present, | AccECN is still considered safe though, because if AccECN Options are | |||
| the AccECN Data Sender is then required to switch to more | not present, the AccECN Data Sender is then required to switch to | |||
| conservative assumptions about wrap of congestion indication counters | more conservative assumptions about wrap of congestion indication | |||
| (see Section 3.2.2.5 and Appendix A.2). | counters (see Section 3.2.2.5 and Appendix A.2). | |||
| Section 5.1 describes how a TCP Server can negotiate AccECN and use | Section 5.1 describes how a TCP Server can negotiate AccECN and use | |||
| the SYN cookie method for mitigating SYN flooding attacks. | the SYN cookie method for mitigating SYN flooding attacks. | |||
| There is concern that ECN feedback could be altered or suppressed, | There is concern that ECN feedback could be altered or suppressed, | |||
| particularly because a misbehaving Data Receiver could increase its | particularly because a misbehaving Data Receiver could increase its | |||
| own throughput at the expense of others. AccECN is compatible with | own throughput at the expense of others. AccECN is compatible with | |||
| the three schemes known to assure the integrity of ECN feedback (see | the three schemes known to assure the integrity of ECN feedback (see | |||
| Section 5.3 for details). If AccECN Options are stripped by an | Section 5.3 for details). If AccECN Options are stripped by an | |||
| incorrectly implemented middlebox, the resolution of the feedback | incorrectly implemented middlebox, the resolution of the feedback | |||
| will be degraded, but the integrity of this degraded information can | will be degraded, but the integrity of this degraded information can | |||
| still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
| to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
| confined to the feedback protocol, and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
| feedback. | feedback. | |||
| In Section 3.2.3 a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
| TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
| it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
| padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
| (2+3*3)) B. However, it is really an overt channel (not hidden) and | (2+3*3)) B. However, it is really an overt channel (not hidden) and | |||
| it is no different to the use of unknown TCP options with unknown | it is no different than the use of unknown TCP options with unknown | |||
| option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
| can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
| technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
| The AccECN protocol is not believed to introduce any new privacy | The AccECN protocol is not believed to introduce any new privacy | |||
| concerns, because it merely counts and feeds back signals at the | concerns, because it merely counts and feeds back signals at the | |||
| transport layer that had already been visible at the IP layer. A | transport layer that had already been visible at the IP layer. A | |||
| covert channel can be used to compromise privacy. However, as | covert channel can be used to compromise privacy. However, as | |||
| explained above, undefined TCP options in general open up such | explained above, undefined TCP options in general open up such | |||
| channels and common techniques are available to close them off. | channels, and common techniques are available to close them off. | |||
| There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
| omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
| middlebox. No known way can yet be contrived for a receiver to take | middlebox. No known way can yet be contrived for a receiver to take | |||
| advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
| performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
| completeness. | completeness. | |||
| A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
| will be used by a small population of hosts, and thus show up more | will be used by a small population of hosts, and thus show up more | |||
| easily. However, it is expected that this option will become | easily. However, it is expected that AccECN will become available in | |||
| available in operating systems over time, and eventually turned on by | operating systems over time and that it will eventually be turned on | |||
| default in them. Thus a individual identification of a particular | by default. Thus, an individual identification of a particular user | |||
| user is less of a concern than the fingerprinting of specific | is less of a concern than the fingerprinting of specific versions of | |||
| versions of operation systems. However, the latter can be done using | operation systems. However, the latter can be done using different | |||
| different means independent of Accurate ECN. | means independent of Accurate ECN. | |||
| As Accurate ECN exposes more bits in the TCP header which could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
| tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
| may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
| a virtual private network (VPN) to an attacker which has access to | a virtual private network (VPN) to an attacker that has access to the | |||
| the datastream before and after the VPN tunnel endpoints. This may | datastream before and after the VPN tunnel endpoints. This may be | |||
| be achieved by injecting or modifying the ACE field in specific | achieved by injecting or modifying the ACE field in specific patterns | |||
| patters that can be recognized. | that can be recognized. | |||
| Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
| a user dramatically beyond what is already possible using classic | a user dramatically beyond what is already possible using classic | |||
| ECN. However, in order to prevent such attacks and means of easier | ECN. However, in order to prevent such attacks and means of easier | |||
| identification of flows, it is adviseable for privacy conscious users | identification of flows, it is advisable for privacy-conscious users | |||
| behind VPNs to not enable the Accurate ECN, or Classic ECN for that | behind VPNs to not enable the Accurate ECN, or Classic ECN for that | |||
| matter. | matter. | |||
| 9. References | 9. References | |||
| 9.1. Normative References | 9.1. Normative References | |||
| [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
| Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
| DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
| skipping to change at page 59, line 30 ¶ | skipping to change at line 2722 ¶ | |||
| [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
| [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
| STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
| <https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [I-D.ietf-tcpm-generalized-ecn] | [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
| Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | ||||
| Congestion Notification (ECN) to TCP Control Packets", | Congestion Notification (ECN) to TCP Control Packets", | |||
| Work in Progress, Internet-Draft, draft-ietf-tcpm- | Work in Progress, Internet-Draft, draft-ietf-tcpm- | |||
| generalized-ecn-16, 20 October 2024, | generalized-ecn-17, 21 April 2025, | |||
| <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
| generalized-ecn-16>. | generalized-ecn-17>. | |||
| [Mandalari18] | [Mandalari18] | |||
| Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | |||
| Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | |||
| over Mobile", IEEE Communications Magazine , March 2018, | over Mobile", IEEE Communications Magazine , March 2018, | |||
| <http://www.it.uc3m.es/amandala/ | <http://www.it.uc3m.es/amandala/ | |||
| ecn++/ecn_commag_2018.html>. | ecn++/ecn_commag_2018.html>. | |||
| [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | |||
| Sooriyabandara, "TCP Performance Implications of Network | Sooriyabandara, "TCP Performance Implications of Network | |||
| skipping to change at page 62, line 21 ¶ | skipping to change at line 2852 ¶ | |||
| (L4S) Internet Service: Architecture", RFC 9330, | (L4S) Internet Service: Architecture", RFC 9330, | |||
| DOI 10.17487/RFC9330, January 2023, | DOI 10.17487/RFC9330, January 2023, | |||
| <https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
| [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
| "CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
| DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
| <https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
| [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
| Specification Volume 1, Release 1.4", 2020, | Specification", Volume 1, Release 1.4, 2020, | |||
| <https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
| Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
| This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
| algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
| AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
| to implement the requirements. | to implement the requirements. | |||
| A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
| skipping to change at page 62, line 46 ¶ | skipping to change at line 2877 ¶ | |||
| the ECEB field into its byte counter s.ceb. The other counters for | the ECEB field into its byte counter s.ceb. The other counters for | |||
| bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | |||
| encoded and decoded. | encoded and decoded. | |||
| It is assumed that each local byte counter is an unsigned integer | It is assumed that each local byte counter is an unsigned integer | |||
| greater than 24b (probably 32b), and that the following constant has | greater than 24b (probably 32b), and that the following constant has | |||
| been assigned: | been assigned: | |||
| DIVOPT = 2^24 | DIVOPT = 2^24 | |||
| Every time a CE marked data segment arrives, the Data Receiver | Every time a CE-marked data segment arrives, the Data Receiver | |||
| increments its local value of r.ceb by the size of the TCP Data. | increments its local value of r.ceb by the size of the TCP Data. | |||
| Whenever it sends an ACK with an AccECN Option, the value it writes | Whenever it sends an ACK with an AccECN Option, the value it writes | |||
| into the ECEB field is | into the ECEB field is | |||
| ECEB = r.ceb % DIVOPT | ECEB = r.ceb % DIVOPT | |||
| where '%' is the remainder operator. | where '%' is the remainder operator. | |||
| On the arrival of an AccECN Option, the Data Sender first makes sure | On the arrival of an AccECN Option, the Data Sender first makes sure | |||
| the ACK has not been superseded in order to avoid winding the s.ceb | the ACK has not been superseded in order to avoid winding the s.ceb | |||
| counter backwards. It uses the TCP acknowledgement number and any | counter backwards. It uses the TCP acknowledgement number and any | |||
| SACK options [RFC2018] to calculate newlyAckedB, the amount of new | SACK options [RFC2018] to calculate newlyAckedB, the amount of new | |||
| data that the ACK acknowledges in bytes (newlyAckedB can be zero but | data that the ACK acknowledges in bytes (newlyAckedB can be zero but | |||
| not negative). If newlyAckedB is zero, either the ACK has been | not negative). If newlyAckedB is zero, either the ACK has been | |||
| superseded or CE-marked packet(s) without data could have arrived. | superseded or CE-marked packet(s) without data could have arrived. | |||
| To break the tie for the latter case, the Data Sender could use time- | To break the tie for the latter case, the Data Sender could use time- | |||
| stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | |||
| new time that the ACK acknowledges. If the Data Sender determines | new time that the ACK acknowledges. If the Data Sender determines | |||
| that the ACK has been superseded it ignores the AccECN Option. | that the ACK has been superseded, it ignores the AccECN Option. | |||
| Otherwise, the Data Sender calculates the minimum non-negative | Otherwise, the Data Sender calculates the minimum non-negative | |||
| difference d.ceb between the ECEB field and its local s.ceb counter, | difference d.ceb between the ECEB field and its local s.ceb counter, | |||
| using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
| if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | |||
| d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | |||
| s.ceb += d.ceb | s.ceb += d.ceb | |||
| } | } | |||
| For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), | For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), | |||
| then | then | |||
| s.ceb % DIVOPT = 1 | s.ceb % DIVOPT = 1 | |||
| d.ceb = (1461 + 2^24 - 1) % 2^24 | d.ceb = (1461 + 2^24 - 1) % 2^24 | |||
| = 1460 | = 1460 | |||
| s.ceb = 33,554,433 + 1460 | s.ceb = 33,554,433 + 1460 | |||
| = 33,555,893 | = 33,555,893 | |||
| In practice an implementation might use heuristics to guess the | In practice, an implementation might use heuristics to guess the | |||
| feedback in missing ACKs, then when it subsequently receives feedback | feedback in missing ACKs. Then when it subsequently receives | |||
| it might find that it needs to correct its earlier heuristics as part | feedback, it might find that it needs to correct its earlier | |||
| of the decoding process. The above decoding process does not include | heuristics as part of the decoding process. The above decoding | |||
| any such heuristics. | process does not include any such heuristics. | |||
| A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss | A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss | |||
| The example algorithms below show how a Data Receiver in AccECN mode | The example algorithms below show how a Data Receiver in AccECN mode | |||
| could encode its CE packet counter r.cep into the ACE field, and how | could encode its CE packet counter r.cep into the ACE field, and how | |||
| the Data Sender in AccECN mode could decode the ACE field into its | the Data Sender in AccECN mode could decode the ACE field into its | |||
| s.cep counter. The Data Sender's algorithm includes code to | s.cep counter. The Data Sender's algorithm includes code to | |||
| heuristically detect a long enough unbroken string of ACK losses that | heuristically detect a long enough unbroken string of ACK losses that | |||
| could have concealed a cycle of the congestion counter in the ACE | could have concealed a cycle of the congestion counter in the ACE | |||
| field of the next ACK to arrive. | field of the next ACK to arrive. | |||
| Two variants of the algorithm are given: i) a more conservative | Two variants of the algorithm are given: i) a more conservative | |||
| variant for a Data Sender to use if it detects that AccECN Options | variant for a Data Sender to use if it detects that AccECN Options | |||
| are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | |||
| a less conservative variant that is feasible when complementary | a less conservative variant that is feasible when complementary | |||
| information is available from AccECN Options. | information is available from AccECN Options. | |||
| A.2.1. Safety Algorithm without the AccECN Option | A.2.1. Safety Algorithm Without the AccECN Option | |||
| It is assumed that each local packet counter is a sufficiently sized | It is assumed that each local packet counter is a sufficiently sized | |||
| unsigned integer (probably 32b) and that the following constant has | unsigned integer (probably 32b) and that the following constant has | |||
| been assigned: | been assigned: | |||
| DIVACE = 2^3 | DIVACE = 2^3 | |||
| Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | |||
| the Data Receiver increments its local value of r.cep by 1. It | the Data Receiver increments its local value of r.cep by 1. It | |||
| repeats the same value of ACE in every subsequent ACK until the next | repeats the same value of ACE in every subsequent ACK until the next | |||
| CE marking arrives, where | CE marking arrives, where | |||
| ACE = r.cep % DIVACE. | ACE = r.cep % DIVACE. | |||
| If the Data Sender received an earlier value of the counter that had | If the Data Sender received an earlier value of the counter that had | |||
| been delayed due to ACK reordering, it might incorrectly calculate | been delayed due to ACK reordering, it might incorrectly calculate | |||
| that the ACE field had wrapped. Therefore, on the arrival of every | that the ACE field had wrapped. Therefore, on the arrival of every | |||
| ACK, the Data Sender ensures the ACK has not been superseded using | ACK, the Data Sender ensures the ACK has not been superseded using | |||
| the TCP acknowledgement number, any SACK options and timestamps (if | the TCP acknowledgement number, any SACK options, and timestamps (if | |||
| available) to calculate newlyAckedB, as in Appendix A.1. If the ACK | available) to calculate newlyAckedB, as in Appendix A.1. If the ACK | |||
| has not been superseded, the Data Sender calculates the minimum | has not been superseded, the Data Sender calculates the minimum | |||
| difference d.cep between the ACE field and its local s.cep counter, | difference d.cep between the ACE field and its local s.cep counter, | |||
| using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
| if ((newlyAckedB > 0) || (newlyAckedT > 0)) | if ((newlyAckedB > 0) || (newlyAckedT > 0)) | |||
| d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE | d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE | |||
| Section 3.2.2.5 expects the Data Sender to assume that the ACE field | Section 3.2.2.5 expects the Data Sender to assume that the ACE field | |||
| cycled if it is the safest likely case under prevailing conditions. | cycled if it is the safest likely case under prevailing conditions. | |||
| The 3-bit ACE field in an arriving ACK could have cycled and become | The 3-bit ACE field in an arriving ACK could have cycled and become | |||
| ambiguous to the Data Sender if a sequence of ACKs goes missing that | ambiguous to the Data Sender if a sequence of ACKs goes missing that | |||
| covers a stream of data long enough to contain 8 or more CE marks. | covers a stream of data long enough to contain 8 or more CE marks. | |||
| We use the word `missing' rather than `lost', because some or all the | We use the word 'missing' rather than 'lost', because some or all the | |||
| missing ACKs might arrive eventually, but out of order. Even if some | missing ACKs might arrive eventually, but out of order. Even if some | |||
| of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | |||
| retransmissions will not repair the lost AccECN information, because | retransmissions will not repair the lost AccECN information, because | |||
| AccECN requires retransmissions to carry the latest AccECN counters, | AccECN requires retransmissions to carry the latest AccECN counters, | |||
| not the original ones. | not the original ones. | |||
| The phrase `under prevailing conditions' allows for implementation- | The phrase 'under prevailing conditions' allows for implementation- | |||
| dependent interpretation. A Data Sender might take account of the | dependent interpretation. A Data Sender might take account of the | |||
| prevailing size of data segments and the prevailing CE marking rate | prevailing size of data segments and the prevailing CE marking rate | |||
| just before the sequence of missing ACKs. However, we shall start | just before the sequence of missing ACKs. However, we shall start | |||
| with the simplest algorithm, which assumes segments are all full- | with the simplest algorithm, which assumes segments are all full- | |||
| sized and ultra-conservatively it assumes that ECN marking was 100% | sized and ultra-conservatively it assumes that ECN marking was 100% | |||
| on the forward path when ACKs on the reverse path started to all be | on the forward path when ACKs on the reverse path started to all be | |||
| dropped. Specifically, if newlyAckedB is the amount of data that an | dropped. Specifically, if newlyAckedB is the amount of data that an | |||
| ACK acknowledges since the previous ACK, then the Data Sender could | ACK acknowledges since the previous ACK, then the Data Sender could | |||
| assume that this acknowledges newlyAckedPkt full-sized segments, | assume that this acknowledges newlyAckedPkt full-sized segments, | |||
| where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | |||
| ACE field incremented by | ACE field incremented by | |||
| dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE), | dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | |||
| For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | |||
| size segments than any previous ACK, and that ACE increments by a | size segments than any previous ACK, and that ACE increments by a | |||
| minimum of 2 CE marks (d.cep=2). The above formula works out that it | minimum of 2 CE marks (d.cep=2). The above formula works out that it | |||
| would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | |||
| 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | |||
| full-sized segments, then it would be necessary to assume that there | full-sized segments, then it would be necessary to assume that there | |||
| could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | |||
| Note that checks would need to be added to the above pseudocode for | Note that checks would need to be added to the above pseudocode for | |||
| (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | |||
| wrongly estimated using an inappropriate packet size. | wrongly estimated using an inappropriate packet size. | |||
| ACKs that acknowledge a large stretch of packets might be common in | ACKs that acknowledge a large stretch of packets might be common in | |||
| data centres to achieve a high packet rate or might be due to ACK | data centres to achieve a high packet rate or might be due to ACK | |||
| thinning by a middlebox. In these cases, cycling of the ACE field | thinning by a middlebox. In these cases, cycling of the ACE field | |||
| would often appear to have been possible, so the above algorithm | would often appear to have been possible, so the above algorithm | |||
| would be over-conservative, leading to a false high marking rate and | would be overly conservative, leading to a false high marking rate | |||
| poor performance. Therefore it would be reasonable to only use | and poor performance. Therefore, it would be reasonable to only use | |||
| dSafer.cep rather than d.cep if the moving average of newlyAckedPkt | dSafer.cep rather than d.cep if the moving average of newlyAckedPkt | |||
| was well below 8. | was well below 8. | |||
| Implementers could build in more heuristics to estimate prevailing | Implementers could build in more heuristics to estimate a prevailing | |||
| average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
| newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
| newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
| segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
| However, ultimately, if TCP's ECN feedback becomes inaccurate it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
| still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
| safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
| The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
| prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
| segments were on average at least 5% of full-sized as long as ECN | segments were on average at least 5% of full-sized as long as ECN | |||
| marking was 5% or less. Assuming it was used, the Data Sender would | marking was 5% or less. Assuming it was used, the Data Sender would | |||
| increment its packet counter as follows: | increment its packet counter as follows: | |||
| s.cep += dSafer.cep | s.cep += dSafer.cep | |||
| If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
| Section 3.2.2.5 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
| effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
| it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
| have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
| made assumptions, so that it could re-evaluate them later. Given | made assumptions, so that it could re-evaluate them later. Given | |||
| this could become complex and it is not required, we do not attempt | this could become complex and it is not required, we do not attempt | |||
| to provide an example of how to do this. | to provide an example of how to do this. | |||
| A.2.2. Safety Algorithm with the AccECN Option | A.2.2. Safety Algorithm with the AccECN Option | |||
| When AccECN Options are available on the ACKs before and after the | When AccECN Options are available on the ACKs before and after the | |||
| possible sequence of ACK losses, if the Data Sender only needs CE- | possible sequence of ACK losses, if the Data Sender only needs CE- | |||
| marked bytes, it will have sufficient information in AccECN Options | marked bytes, it will have sufficient information in AccECN Options | |||
| without needing to process the ACE field. If for some reason it | without needing to process the ACE field. If for some reason it | |||
| needs CE-marked packets, if dSafer.cep is different from d.cep, it | needs CE-marked packets, if dSafer.cep is different from d.cep, it | |||
| can determine whether d.cep is likely to be a safe enough estimate by | can determine whether d.cep is likely to be a safe enough estimate by | |||
| checking whether the average marked segment size (s = d.ceb/d.cep) is | checking whether the average marked segment size (s = d.ceb/d.cep) is | |||
| less than the MSS (where d.ceb is the amount of newly CE-marked bytes | less than the MSS (where d.ceb is the amount of newly CE-marked bytes | |||
| - see Appendix A.1). Specifically, it could use the following | -- see Appendix A.1). Specifically, it could use the following | |||
| algorithm: | algorithm: | |||
| SAFETY_FACTOR = 2 | SAFETY_FACTOR = 2 | |||
| if (dSafer.cep > d.cep) { | if (dSafer.cep > d.cep) { | |||
| if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | |||
| sSafer = d.ceb/dSafer.cep | sSafer = d.ceb/dSafer.cep | |||
| if (sSafer < MSS/SAFETY_FACTOR) | if (sSafer < MSS/SAFETY_FACTOR) | |||
| dSafer.cep = d.cep % d.cep is a safe enough estimate | dSafer.cep = d.cep % d.cep is a safe enough estimate | |||
| } % else | } % else | |||
| % No need for else; dSafer.cep is already correct, | % No need for else; dSafer.cep is already correct, | |||
| skipping to change at page 67, line 22 ¶ | skipping to change at line 3084 ¶ | |||
| MSS/SAFETY_FACTOR+--------------+ safest | MSS/SAFETY_FACTOR+--------------+ safest | |||
| | | | | | | |||
| | d.cep is safe| | | d.cep is safe| | |||
| | enough | | | enough | | |||
| +--------------------> | +--------------------> | |||
| MSS s | MSS s | |||
| The following examples give the reasoning behind the algorithm, | The following examples give the reasoning behind the algorithm, | |||
| assuming MSS=1460 : | assuming MSS=1460 : | |||
| * if d.cep=0, dSafer.cep=8 and d.ceb=1460, then s=infinity and | * if d.cep=0, dSafer.cep=8, and d.ceb=1460, then s=infinity and | |||
| sSafer=182.5. | sSafer=182.5. | |||
| Therefore even though the average size of 8 data segments is | Therefore, even though the average size of 8 data segments is | |||
| unlikely to have been as small as MSS/8, d.cep cannot have been | unlikely to have been as small as MSS/8, d.cep cannot have been | |||
| correct, because it would imply an average segment size greater | correct, because it would imply an average segment size greater | |||
| than the MSS. | than the MSS. | |||
| * if d.cep=2, dSafer.cep=10 and d.ceb=1460, then s=730 and | * if d.cep=2, dSafer.cep=10, and d.ceb=1460, then s=730 and | |||
| sSafer=146. | sSafer=146. | |||
| Therefore d.cep is safe enough, because the average size of 10 | Therefore d.cep is safe enough, because the average size of 10 | |||
| data segments is unlikely to have been as small as MSS/10. | data segments is unlikely to have been as small as MSS/10. | |||
| * if d.cep=7, dSafer.cep=15 and d.ceb=10200, then s=1457 and | * if d.cep=7, dSafer.cep=15, and d.ceb=10200, then s=1457 and | |||
| sSafer=680. | sSafer=680. | |||
| Therefore d.cep is safe enough, because the average data segment | Therefore d.cep is safe enough, because the average data segment | |||
| size is more likely to have been just less than one MSS, rather | size is more likely to have been just less than one MSS, rather | |||
| than below MSS/2. | than below MSS/2. | |||
| If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
| far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
| this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
| A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
| If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
| CE-marking from the ACE field in packets. Every time an ACK arrives, | a CE marking from the ACE field in packets. Every time an ACK | |||
| to convert this into an estimate of CE-marked bytes, it needs an | arrives, to convert this into an estimate of CE-marked bytes, it | |||
| average of the segment size, s_ave. Then it can add or subtract | needs an average of the segment size, s_ave. Then it can add or | |||
| s_ave from the value of d.ceb as the value of d.cep increments or | subtract s_ave from the value of d.ceb as the value of d.cep | |||
| decrements. Some possible ways to calculate s_ave are outlined | increments or decrements. Some possible ways to calculate s_ave are | |||
| below. The precise details will depend on why an estimate of marked | outlined below. The precise details will depend on why an estimate | |||
| bytes is needed. | of marked bytes is needed. | |||
| The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
| boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
| recalculate s_ave on every ACK. However it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
| merely maintain a counter packets_in_flight for the number of packets | merely maintain a counter packets_in_flight for the number of packets | |||
| in flight (including control packets), which is reset once per RTT. | in flight (including control packets), which is reset once per RTT. | |||
| Either way, it would estimate s_ave as: | Either way, it would estimate s_ave as: | |||
| s_ave ~= flightsize / packets_in_flight, | s_ave ~= flightsize / packets_in_flight, | |||
| where flightsize is the variable that TCP already maintains for the | where flightsize is the variable that TCP already maintains for the | |||
| number of bytes in flight and '~=' means 'approximately equal to'. | number of bytes in flight and '~=' means 'approximately equal to'. | |||
| To avoid floating point arithmetic, it could right-bit-shift by | To avoid floating point arithmetic, it could right-bit-shift by | |||
| lg(packets_in_flight), where lg() means log base 2. | lg(packets_in_flight), where lg() means log base 2. | |||
| skipping to change at page 68, line 45 ¶ | skipping to change at line 3149 ¶ | |||
| where a is the decay constant for the EWMA. However, then it is | where a is the decay constant for the EWMA. However, then it is | |||
| necessary to choose a good value for this constant, which ought to | necessary to choose a good value for this constant, which ought to | |||
| depend on the number of packets in flight. Also the decay constant | depend on the number of packets in flight. Also the decay constant | |||
| needs to be power of two to avoid floating point arithmetic. | needs to be power of two to avoid floating point arithmetic. | |||
| A.4. Example Algorithm to Count Not-ECT Bytes | A.4. Example Algorithm to Count Not-ECT Bytes | |||
| A Data Sender in AccECN mode can infer the amount of TCP payload data | A Data Sender in AccECN mode can infer the amount of TCP payload data | |||
| arriving at the receiver marked Not-ECT from the difference between | arriving at the receiver marked Not-ECT from the difference between | |||
| the amount of newly ACKed data and the sum of the bytes with the | the amount of newly ACKed data and the sum of the bytes with the | |||
| other three markings, d.ceb, d.e0b and d.e1b. | other three markings, d.ceb, d.e0b, and d.e1b. | |||
| For this approach to be precise, it has to be assumed that spurious | For this approach to be precise, it has to be assumed that spurious | |||
| (unnecessary) retransmissions do not lead to double counting. This | (unnecessary) retransmissions do not lead to double counting. This | |||
| assumption is currently correct, given that RFC 3168 requires that | assumption is currently correct, given that RFC 3168 requires that | |||
| the Data Sender marks retransmitted segments as Not-ECT. However, | the Data Sender mark retransmitted segments as Not-ECT. However, the | |||
| the converse is not true; necessary retransmissions will result in | converse is not true; necessary retransmissions will result in | |||
| under-counting. | undercounting. | |||
| However, such precision is unlikely to be necessary. The only known | However, such precision is unlikely to be necessary. The only known | |||
| use of a count of Not-ECT marked bytes is to test whether equipment | use of a count of Not-ECT marked bytes is to test whether equipment | |||
| on the path is clearing the ECN field (perhaps due to an out-dated | on the path is clearing the ECN field (perhaps due to an out-dated | |||
| attempt to clear, or bleach, what used to be the IPv4 ToS byte or the | attempt to clear, or bleach, what used to be the IPv4 ToS byte or the | |||
| IPv6 Traffic Class field). To detect bleaching it will be sufficient | IPv6 Traffic Class field). To detect bleaching, it will be | |||
| to detect whether nearly all bytes arrive marked as Not-ECT. | sufficient to detect whether nearly all bytes arrive marked as Not- | |||
| Therefore there ought to be no need to keep track of the details of | ECT. Therefore, there ought to be no need to keep track of the | |||
| retransmissions. | details of retransmissions. | |||
| Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
| B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
| AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
| version TCP ECN feedback scheme that both ends support, as justified | version TCP ECN feedback scheme that both ends support, as justified | |||
| below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP ECN capability negotiation | |||
| [RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
| original reserved flags in the TCP header, and fell back to no ECN | original reserved flags in the TCP header, and fell back to No ECN | |||
| support if the Server responded with the 2 flags cleared, which had | support if the Server responded with the 2 flags cleared, which had | |||
| previously been the default. | previously been the default. | |||
| Classic ECN used header flags rather than a TCP option because it was | Classic ECN used header flags rather than a TCP option because it was | |||
| considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
| per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
| Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
| bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
| work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
| the handshake. | the handshake. | |||
| In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
| this approach, using the 3rd least significant TCP header flag that | this approach, using the 3rd least significant TCP header flag that | |||
| had previously been allocated for the ECN nonce (now historic). | had previously been allocated for the ECN-nonce (now historic). | |||
| Then, whatever form of Server an AccECN Client encounters, the | Then, whatever form of Server an AccECN Client encounters, the | |||
| connection can fall back to the highest version of feedback protocol | connection can fall back to the highest version of feedback protocol | |||
| that both ends support, as explained in Section 3.1. | that both ends support, as explained in Section 3.1. | |||
| If AccECN capability negotiation had used the more orthodox approach | If AccECN capability negotiation had used the more orthodox approach | |||
| of a TCP option, it would still have had to set the two ECN flags in | of a TCP option, it would still have had to set the two ECN flags in | |||
| the main TCP header, in order to be able to fall back to Classic RFC | the main TCP header, in order to be able to fall back to Classic ECN | |||
| 3168 ECN, or to disable ECN support, without another round of | [RFC3168], or to disable ECN support, without another round of | |||
| negotiation. Then AccECN would also have had to handle all the | negotiation. Then AccECN would also have had to handle all the | |||
| different ways that Servers currently respond to settings of the ECN | different ways that Servers currently respond to settings of the ECN | |||
| flags in the main TCP header, including all the conflicting cases | flags in the main TCP header, including all of the conflicting cases | |||
| where a Server might have said it supported one approach in the flags | where a Server might have said it supported one approach in the flags | |||
| and another approach in a new TCP option. And AccECN would have had | and another approach in a new TCP option. And AccECN would have had | |||
| to deal with all the additional possibilities where a middlebox might | to deal with all of the additional possibilities where a middlebox | |||
| have mangled the ECN flags, or removed TCP options. Thus, usage of | might have mangled the ECN flags, or removed TCP options. Thus, | |||
| the 3rd reserved TCP header flag simplified the protocol. | usage of the 3rd reserved TCP header flag simplified the protocol. | |||
| The third flag was used in a way that could be distinguished from the | The third flag was used in a way that could be distinguished from the | |||
| ECN nonce, in case any nonce deployment was encountered. Previous | ECN-nonce, in case any nonce deployment was encountered. Previous | |||
| usage of this flag for the ECN nonce was integrated into the original | usage of this flag for the ECN-nonce was integrated into the original | |||
| ECN negotiation. This further justified the 3rd flag's use for | ECN negotiation. This further justified the third flag's use for | |||
| AccECN, because a non-ECN usage of this flag would have had to use it | AccECN, because a non-ECN usage of this flag would have had to use it | |||
| as a separate single bit, rather than in combination with the other 2 | as a separate single bit, rather than in combination with the other 2 | |||
| ECN flags. | ECN flags. | |||
| Indeed, having overloaded the original uses of these three flags for | Indeed, having overloaded the original uses of these three flags for | |||
| its handshake, AccECN overloads all three bits again as a 3-bit | its handshake, AccECN overloads all three bits again as a 3-bit | |||
| counter. | counter. | |||
| B.2. Four Codepoints in the SYN/ACK | B.2. Four Codepoints in the SYN/ACK | |||
| Of the 8 possible codepoints that the 3 TCP header flags can indicate | Of the eight possible codepoints that the three TCP header flags can | |||
| on the SYN/ACK, 4 already indicated earlier (or broken) versions of | indicate on the SYN/ACK, four already indicated earlier (or broken) | |||
| ECN support, 1 now being historic. In the early design of AccECN, an | versions of ECN support, one now being Historic. In the early design | |||
| AccECN Server could use only 2 of the 4 remaining codepoints. They | of AccECN, an AccECN Server could use only 2 of the 4 remaining | |||
| both indicated AccECN support, but one fed back that the SYN had | codepoints. They both indicated AccECN support, but one fed back | |||
| arrived marked as CE. Even though ECN support on a SYN is not yet on | that the SYN had arrived marked as CE. Even though ECN support on a | |||
| the standards track, the idea is for either end to act as a | SYN is not yet on the Standards Track, the idea is for either end to | |||
| mechanistic reflector, so that future capabilities can be | act as a mechanistic reflector, so that future capabilities can be | |||
| unilaterally deployed without requiring 2-ended deployment (justified | unilaterally deployed without requiring 2-ended deployment (justified | |||
| in Section 2.5). | in Section 2.5). | |||
| During traversal testing it was discovered that the IP-ECN field in | During traversal testing, it was discovered that the IP-ECN field in | |||
| the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
| Therefore it was necessary to allow the SYN/ACK to feed all four IP- | Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | |||
| ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
| Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
| the connection due to mangling of the IP-ECN field (also explained in | the connection due to mangling of the IP-ECN field (also explained in | |||
| Section 2.5). This development consumed the remaining 2 codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
| on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
| earlier versions. | earlier versions. | |||
| B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
| Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
| scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
| that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
| protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
| completely different ECN feedback approach is needed: | completely different ECN feedback approach is needed. | |||
| Future AccECN variants: When the AccECN capability is negotiated | Future AccECN variants: When the AccECN capability is negotiated | |||
| during TCP's three-way handshake, the rows in Table 2 tagged as | during TCP's three-way handshake, the rows in Table 2 tagged as | |||
| 'Nonce' and 'Broken' in the column for the capability of node B | 'Nonce' and 'Broken' in the column for the capability of node B | |||
| are unused by any current protocol in the RFC series. These could | are unused by any current protocol defined in the RFC series. | |||
| be used by TCP Servers in future to indicate a variant of the | These could be used by TCP Servers in the future to indicate a | |||
| AccECN protocol. In recent measurement studies in which the | variant of the AccECN protocol. In recent measurement studies in | |||
| response of large numbers of Servers to an AccECN SYN has been | which the response of large numbers of Servers to an AccECN SYN | |||
| tested, e.g., [Mandalari18], a very small number of SYN/ACKs | has been tested, e.g., [Mandalari18], a very small number of SYN/ | |||
| arrive with the pattern tagged as 'Nonce', and a small but more | ACKs arrive with the pattern tagged as 'Nonce', and a small but | |||
| significant number arrive with the pattern tagged as 'Broken'. | more significant number arrive with the pattern tagged as | |||
| The 'Nonce' pattern could be a sign that a few Servers have | 'Broken'. The 'Nonce' pattern could be a sign that a few Servers | |||
| implemented the ECN Nonce [RFC3540], which has now been | have implemented the ECN-nonce [RFC3540], which has now been | |||
| reclassified as historic [RFC8311], or it could be the random | reclassified as Historic [RFC8311], or it could be the random | |||
| result of some unknown middlebox behaviour. The greater | result of some unknown middlebox behaviour. The greater | |||
| prevalence of the 'Broken' pattern suggests that some instances | prevalence of the 'Broken' pattern suggests that some instances | |||
| still exist of the broken code that reflects the reserved flags on | still exist of the broken code that reflects the reserved flags on | |||
| the SYN. | the SYN. | |||
| The requirement not to reject unexpected initial values of the ACE | The requirement not to reject unexpected initial values of the ACE | |||
| counter (in the main TCP header) in the last paragraph of | counter (in the main TCP header) in the last paragraph of | |||
| Section 3.2.2.4 ensures that 3 unused codepoints on the ACK of the | Section 3.2.2.4 ensures that three unused codepoints on the ACK of | |||
| SYN/ACK, 6 unused values on the first SYN=0 data packet from the | the SYN/ACK, six unused values on the first SYN=0 data packet from | |||
| Client and 7 unused values on the first SYN=0 data packet from the | the Client, and seven unused values on the first SYN=0 data packet | |||
| Server could be used to declare future variants of the AccECN | from the Server could be used to declare future variants of the | |||
| protocol. The word 'declare' is used rather than 'negotiate' | AccECN protocol. The word 'declare' is used rather than | |||
| because, at this late stage in the three-way handshake, it would | 'negotiate' because, at this late stage in the three-way | |||
| be too late for a negotiation between the endpoints to be | handshake, it would be too late for a negotiation between the | |||
| completed. A similar requirement not to reject unexpected initial | endpoints to be completed. A similar requirement not to reject | |||
| values in AccECN TCP Options (Section 3.2.3.2.4) is for the same | unexpected initial values in AccECN TCP Options | |||
| purpose. If traversal of AccECN TCP Options were reliable, this | (Section 3.2.3.2.4) is for the same purpose. If traversal of | |||
| would have enabled a far wider range of future variation of the | AccECN TCP Options were reliable, this would have enabled a far | |||
| whole AccECN protocol. Nonetheless, it could be used to reliably | wider range of future variation of the whole AccECN protocol. | |||
| negotiate a wide range of variation in the semantics of the AccECN | Nonetheless, it could be used to reliably negotiate a wide range | |||
| Option. | of variation in the semantics of the AccECN Option. | |||
| Future non-AccECN variants: Five codepoints out of the 8 possible in | Future non-AccECN variants: Five codepoints out of the eight | |||
| the 3 TCP header flags used by AccECN are unused on the initial | possible in the three TCP header flags used by AccECN are unused | |||
| SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), (1,0,0), | on the initial SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), | |||
| (1,0,1), (1,1,0). Section 3.1.3 ensures that the installed base | (1,0,0), (1,0,1), (1,1,0). Section 3.1.3 ensures that the | |||
| of AccECN Servers will all assume these are equivalent to AccECN | installed base of AccECN Servers will all assume these are | |||
| negotiation with (1,1,1) on the SYN. These codepoints would not | equivalent to AccECN negotiation with (1,1,1) on the SYN. These | |||
| allow fall-back to Classic ECN support for a Server that did not | codepoints would not allow fall-back to Classic ECN support for a | |||
| understand them, but this approach ensures they are available in | Server that did not understand them, but this approach ensures | |||
| future, perhaps for uses other than ECN alongside the AccECN | they are available in the future, perhaps for uses other than ECN | |||
| scheme. All possible combinations of SYN/ACK could be used in | alongside the AccECN scheme. All possible combinations of SYN/ACK | |||
| response except either (0,0,0) or reflection of the same values | could be used in response except either (0,0,0) or reflection of | |||
| sent on the SYN. | the same values sent on the SYN. | |||
| In order to extend AccECN or ECN in future, other ways could be | In order to extend AccECN or ECN in the future, other ways could | |||
| resorted to, although their traversal properties are likely to be | be resorted to, although their traversal properties are likely to | |||
| inferior. They include a new TCP option; using the remaining | be inferior. They include a new TCP option; using the remaining | |||
| reserved flags in the main TCP header (preferably extending the | reserved flags in the main TCP header (preferably extending the | |||
| 3-bit combinations used by AccECN to 4-bit combinations, rather | 3-bit combinations used by AccECN to 4-bit combinations, rather | |||
| than burning one bit for just one state); a non-zero urgent | than burning one bit for just one state); a non-zero urgent | |||
| pointer in combination with the URG flag cleared; or some other | pointer in combination with the URG flag cleared; or some other | |||
| unexpected combination of fields yet to be invented. | unexpected combination of fields yet to be invented. | |||
| Acknowledgements | Acknowledgements | |||
| We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | |||
| Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | |||
| skipping to change at page 72, line 23 ¶ | skipping to change at line 3322 ¶ | |||
| Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan | Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan | |||
| Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and | Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and | |||
| Wes Eddy for their input and discussion. The idea of using the three | Wes Eddy for their input and discussion. The idea of using the three | |||
| ECN-related TCP flags as one field for more accurate TCP-ECN feedback | ECN-related TCP flags as one field for more accurate TCP-ECN feedback | |||
| was first introduced in the re-ECN protocol that was the ancestor of | was first introduced in the re-ECN protocol that was the ancestor of | |||
| ConEx. | ConEx. | |||
| The following contributed implementations of AccECN that validated | The following contributed implementations of AccECN that validated | |||
| and helped to improve this specification: | and helped to improve this specification: | |||
| Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell and Chia-Yu | Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell, and Chia-Yu | |||
| Chang; | Chang | |||
| FreeBSD: Richard Scheffenegger; | FreeBSD: Richard Scheffenegger | |||
| Apple OSs: Vidhi Goel. | Apple OSs: Vidhi Goel | |||
| Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation | Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation | |||
| Fund, the European Community under its Seventh Framework Programme | Fund, the European Community under its Seventh Framework Programme | |||
| through the Reducing Internet Transport Latency (RITE) project (ICT- | through the Reducing Internet Transport Latency (RITE) project (ICT- | |||
| 317700) and through the Trilogy 2 project (ICT-317756), and the | 317700) and through the Trilogy 2 project (ICT-317756), and the | |||
| Research Council of Norway through the TimeIn project. The views | Research Council of Norway through the TimeIn project. The views | |||
| expressed here are solely those of the authors. | expressed here are solely those of the authors. | |||
| Mirja Kühlewind was partly supported by the European Commission under | Mirja Kühlewind was partly supported by the European Commission under | |||
| Horizon 2020 grant agreement no. 688421 Measurement and Architecture | Horizon 2020 grant agreement no. 688421 Measurement and Architecture | |||
| for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat | for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat | |||
| for Education, Research, and Innovation under contract no. 15.0268. | for Education, Research, and Innovation under contract no. 15.0268. | |||
| This support does not imply endorsement. | This support does not imply endorsement. | |||
| Comments Solicited | ||||
| This section is to be removed before publishing as an RFC. | ||||
| Comments and questions are encouraged and very welcome. They can be | ||||
| addressed to the IETF TCP maintenance and minor modifications working | ||||
| group mailing list <tcpm@ietf.org>, and/or to the authors. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe | Bob Briscoe | |||
| Independent | Independent | |||
| United Kingdom | United Kingdom | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: http://bobbriscoe.net/ | URI: http://bobbriscoe.net/ | |||
| Mirja Kühlewind | Mirja Kühlewind | |||
| Ericsson | Ericsson | |||
| End of changes. 286 change blocks. | ||||
| 740 lines changed or deleted | 706 lines changed or added | |||
| This html diff was produced by rfcdiff 1.48. | ||||