rfc9768.original | rfc9768.txt | |||
---|---|---|---|---|
TCP Maintenance & Minor Extensions (tcpm) B. Briscoe | Internet Engineering Task Force (IETF) B. Briscoe | |||
Internet-Draft Independent | Request for Comments: 9768 Independent | |||
Updates: 3168 (if approved) M. Kühlewind | Updates: 3168 M. Kühlewind | |||
Intended status: Standards Track Ericsson | Category: Standards Track Ericsson | |||
Expires: 11 September 2025 R. Scheffenegger | ISSN: 2070-1721 R. Scheffenegger | |||
NetApp | NetApp | |||
10 March 2025 | August 2025 | |||
More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
draft-ietf-tcpm-accurate-ecn-34 | ||||
Abstract | Abstract | |||
Explicit Congestion Notification (ECN) is a mechanism where network | Explicit Congestion Notification (ECN) is a mechanism by which | |||
nodes can mark IP packets instead of dropping them to indicate | network nodes can mark IP packets instead of dropping them to | |||
incipient congestion to the endpoints. Receivers with an ECN-capable | indicate incipient congestion to the endpoints. Receivers with an | |||
transport protocol feed back this information to the sender. ECN was | ECN-capable transport protocol feed back this information to the | |||
originally specified for TCP in such a way that only one feedback | sender. ECN was originally specified for TCP in such a way that only | |||
signal can be transmitted per Round-Trip Time (RTT). Recent new TCP | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
mechanisms like Congestion Exposure (ConEx), Data Center TCP (DCTCP) | Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center | |||
or Low Latency, Low Loss, and Scalable Throughput (L4S) need more | TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) | |||
Accurate ECN (AccECN) feedback information whenever more than one | need more Accurate ECN (AccECN) feedback information whenever more | |||
marking is received in one RTT. This document updates the original | than one marking is received in one RTT. This document updates the | |||
ECN specification in RFC 3168 to specify a scheme that provides more | original ECN specification defined in RFC 3168 by specifying a scheme | |||
than one feedback signal per RTT in the TCP header. Given TCP header | that provides more than one feedback signal per RTT in the TCP | |||
space is scarce, it allocates a reserved header bit previously | header. Given TCP header space is scarce, it allocates a reserved | |||
assigned to the ECN-Nonce. It also overloads the two existing ECN | header bit previously assigned to the ECN-nonce. It also overloads | |||
flags in the TCP header. The resulting extra space is additionally | the two existing ECN flags in the TCP header. The resulting extra | |||
exploited to feed back the IP-ECN field received during the TCP | space is additionally exploited to feed back the IP-ECN field | |||
connection establishment. Supplementary feedback information can | received during the TCP connection establishment. Supplementary | |||
optionally be provided in two new TCP option alternatives, which are | feedback information can optionally be provided in two new TCP option | |||
never used on the TCP SYN. The document also specifies the treatment | alternatives, which are never used on the TCP SYN. The document also | |||
of this updated TCP wire protocol by middleboxes. | specifies the treatment of this updated TCP wire protocol by | |||
middleboxes. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This is an Internet Standards Track document. | |||
provisions of BCP 78 and BCP 79. | ||||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at https://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | ||||
This Internet-Draft will expire on 11 September 2025. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
https://www.rfc-editor.org/info/rfc9768. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2025 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents | |||
license-info) in effect on the date of publication of this document. | (https://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. Code Components | carefully, as they describe your rights and restrictions with respect | |||
extracted from this document must include Revised BSD License text as | to this document. Code Components extracted from this document must | |||
described in Section 4.e of the Trust Legal Provisions and are | include Revised BSD License text as described in Section 4.e of the | |||
provided without warranty as described in the Revised BSD License. | Trust Legal Provisions and are provided without warranty as described | |||
in the Revised BSD License. | ||||
This document may contain material from IETF Documents or IETF | This document may contain material from IETF Documents or IETF | |||
Contributions published or made publicly available before November | Contributions published or made publicly available before November | |||
10, 2008. The person(s) controlling the copyright in some of this | 10, 2008. The person(s) controlling the copyright in some of this | |||
material may not have granted the IETF Trust the right to allow | material may not have granted the IETF Trust the right to allow | |||
modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
than English. | than English. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 | 1. Introduction | |||
1.1. Document Roadmap . . . . . . . . . . . . . . . . . . . . 5 | 1.1. Document Roadmap | |||
1.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Goals | |||
1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Terminology | |||
1.4. Recap of Existing ECN feedback in IP/TCP . . . . . . . . 7 | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
2. AccECN Protocol Overview and Rationale . . . . . . . . . . . 9 | 2. AccECN Protocol Overview and Rationale | |||
2.1. Capability Negotiation . . . . . . . . . . . . . . . . . 10 | 2.1. Capability Negotiation | |||
2.2. Feedback Mechanism . . . . . . . . . . . . . . . . . . . 11 | 2.2. Feedback Mechanism | |||
2.3. Delayed ACKs and Resilience Against ACK Loss . . . . . . 11 | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
2.4. Feedback Metrics . . . . . . . . . . . . . . . . . . . . 12 | 2.4. Feedback Metrics | |||
2.5. Generic (Mechanistic) Reflector . . . . . . . . . . . . . 12 | 2.5. Generic (Mechanistic) Reflector | |||
3. AccECN Protocol Specification . . . . . . . . . . . . . . . . 13 | 3. AccECN Protocol Specification | |||
3.1. Negotiating to use AccECN . . . . . . . . . . . . . . . . 13 | 3.1. Negotiating to Use AccECN | |||
3.1.1. Negotiation during the TCP three-way handshake . . . 13 | 3.1.1. Negotiation During the TCP Three-Way Handshake | |||
3.1.2. Backward Compatibility . . . . . . . . . . . . . . . 15 | 3.1.2. Backward Compatibility | |||
3.1.3. Forward Compatibility . . . . . . . . . . . . . . . . 17 | 3.1.3. Forward Compatibility | |||
3.1.4. Multiple SYNs or SYN/ACKs . . . . . . . . . . . . . . 18 | 3.1.4. Multiple SYNs or SYN/ACKs | |||
3.1.4.1. Retransmitted SYNs . . . . . . . . . . . . . . . 18 | 3.1.4.1. Retransmitted SYNs | |||
3.1.4.2. Retransmitted SYN/ACKs . . . . . . . . . . . . . 19 | 3.1.4.2. Retransmitted SYN/ACKs | |||
3.1.5. Implications of AccECN Mode . . . . . . . . . . . . . 20 | 3.1.5. Implications of AccECN Mode | |||
3.2. AccECN Feedback . . . . . . . . . . . . . . . . . . . . . 24 | 3.2. AccECN Feedback | |||
3.2.1. Initialization of Feedback Counters . . . . . . . . . 25 | 3.2.1. Initialization of Feedback Counters | |||
3.2.2. The ACE Field . . . . . . . . . . . . . . . . . . . . 25 | 3.2.2. The ACE Field | |||
3.2.2.1. ACE Field on the ACK of the SYN/ACK . . . . . . . 26 | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
3.2.2.2. Encoding and Decoding Feedback in the ACE | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
Field . . . . . . . . . . . . . . . . . . . . . . . 29 | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
3.2.2.3. Testing for Mangling of the IP/ECN Field . . . . 31 | 3.2.2.4. Testing for Zeroing of the ACE Field | |||
3.2.2.4. Testing for Zeroing of the ACE Field . . . . . . 33 | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
3.2.2.5. Safety against Ambiguity of the ACE Field . . . . 34 | 3.2.3. The AccECN Option | |||
3.2.3. The AccECN Option . . . . . . . . . . . . . . . . . . 37 | ||||
3.2.3.1. Encoding and Decoding Feedback in the AccECN Option | 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option | |||
Fields . . . . . . . . . . . . . . . . . . . . . . 39 | Fields | |||
3.2.3.2. Path Traversal of the AccECN Option . . . . . . . 39 | 3.2.3.2. Path Traversal of the AccECN Option | |||
3.2.3.3. Usage of the AccECN TCP Option . . . . . . . . . 44 | 3.2.3.3. Usage of the AccECN TCP Option | |||
3.3. AccECN Compliance Requirements for TCP Proxies, Offload | 3.3. AccECN Compliance Requirements for TCP Proxies, Offload | |||
Engines and other Middleboxes . . . . . . . . . . . . . . 46 | Engines, and Other Middleboxes | |||
3.3.1. Requirements for TCP Proxies . . . . . . . . . . . . 46 | 3.3.1. Requirements for TCP Proxies | |||
3.3.2. Requirements for Transparent Middleboxes and TCP | 3.3.2. Requirements for Transparent Middleboxes and TCP | |||
Normalizers . . . . . . . . . . . . . . . . . . . . . 46 | Normalizers | |||
3.3.3. Requirements for TCP ACK Filtering . . . . . . . . . 47 | 3.3.3. Requirements for TCP ACK Filtering | |||
3.3.4. Requirements for TCP Segmentation Offload and Large | 3.3.4. Requirements for TCP Segmentation Offload and Large | |||
Receive Offload . . . . . . . . . . . . . . . . . . . 48 | Receive Offload | |||
4. Updates to RFC 3168 . . . . . . . . . . . . . . . . . . . . . 49 | 4. Updates to RFC 3168 | |||
5. Interaction with TCP Variants . . . . . . . . . . . . . . . . 51 | 5. Interaction with TCP Variants | |||
5.1. Compatibility with SYN Cookies . . . . . . . . . . . . . 51 | 5.1. Compatibility with SYN Cookies | |||
5.2. Compatibility with TCP Experiments and Common TCP | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
Options . . . . . . . . . . . . . . . . . . . . . . . . . 51 | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
5.3. Compatibility with Feedback Integrity Mechanisms . . . . 52 | 6. Summary: Protocol Properties | |||
6. Summary: Protocol Properties . . . . . . . . . . . . . . . . 53 | 7. IANA Considerations | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 | 8. Security and Privacy Considerations | |||
8. Security and Privacy Considerations . . . . . . . . . . . . . 57 | 9. References | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 58 | 9.1. Normative References | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 58 | 9.2. Informative References | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 59 | Appendix A. Example Algorithms | |||
Appendix A. Example Algorithms . . . . . . . . . . . . . . . . . 62 | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
A.1. Example Algorithm to Encode/Decode the AccECN Option . . 62 | ||||
A.2. Example Algorithm for Safety Against Long Sequences of ACK | A.2. Example Algorithm for Safety Against Long Sequences of ACK | |||
Loss . . . . . . . . . . . . . . . . . . . . . . . . . . 63 | Loss | |||
A.2.1. Safety Algorithm without the AccECN Option . . . . . 64 | A.2.1. Safety Algorithm Without the AccECN Option | |||
A.2.2. Safety Algorithm with the AccECN Option . . . . . . . 66 | A.2.2. Safety Algorithm with the AccECN Option | |||
A.3. Example Algorithm to Estimate Marked Bytes from Marked | A.3. Example Algorithm to Estimate Marked Bytes from Marked | |||
Packets . . . . . . . . . . . . . . . . . . . . . . . . . 68 | Packets | |||
A.4. Example Algorithm to Count Not-ECT Bytes . . . . . . . . 68 | A.4. Example Algorithm to Count Not-ECT Bytes | |||
Appendix B. Rationale for Usage of TCP Header Flags . . . . . . 69 | Appendix B. Rationale for Usage of TCP Header Flags | |||
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake . . . 69 | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
B.2. Four Codepoints in the SYN/ACK . . . . . . . . . . . . . 70 | B.2. Four Codepoints in the SYN/ACK | |||
B.3. Space for Future Evolution . . . . . . . . . . . . . . . 70 | B.3. Space for Future Evolution | |||
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 72 | Acknowledgements | |||
Comments Solicited . . . . . . . . . . . . . . . . . . . . . . . 72 | Authors' Addresses | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 73 | ||||
1. Introduction | 1. Introduction | |||
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism where | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | |||
network nodes can mark IP packets instead of dropping them to | which network nodes can mark IP packets instead of dropping them to | |||
indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
(RTT). This is sufficient for congestion control scheme like Reno | (RTT). This is sufficient for congestion control schemes like Reno | |||
[RFC6582] and Cubic [RFC9438], as those schemes reduce their | [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | |||
congestion window by a fixed factor if congestion occurs within an | congestion window by a fixed factor if congestion occurs within an | |||
RTT independent of the number of received congestion markings. | RTT independent of the number of received congestion markings. | |||
Recently, proposed mechanisms like Congestion Exposure (ConEx | Recently, proposed mechanisms like Congestion Exposure (ConEx | |||
[RFC7713]), DCTCP [RFC8257] or L4S [RFC9330] need to know when more | [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | |||
than one marking is received in one RTT, which is information that | than one marking is received in one RTT, which is information that | |||
cannot be provided by the feedback scheme as specified in [RFC3168]. | cannot be provided by the feedback scheme as specified in [RFC3168]. | |||
This document specifies an update to the ECN feedback scheme of RFC | This document specifies an update to the ECN feedback scheme of RFC | |||
3168 that provides more accurate information and could be used by | 3168 that provides more accurate information and could be used by | |||
these and potentially other future TCP extensions, while still also | these and potentially other future TCP extensions, while still also | |||
supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
as fast as the network can carry it, but no faster. A fuller | as fast as the network can carry it, but no faster. A fuller | |||
treatment of the motivation for this specification is given in the | description of the motivation for this specification is given in the | |||
associated requirements document [RFC7560]. | associated requirements document [RFC7560]. | |||
This document specifies a standards track scheme for ECN feedback in | This document specifies a Standards Track scheme for ECN feedback in | |||
the TCP header to provide more than one feedback signal per RTT. It | the TCP header to provide more than one feedback signal per RTT. It | |||
will be called the more Accurate ECN feedback scheme, or AccECN for | is called the more "Accurate ECN" feedback scheme, or AccECN for | |||
short. This document updates RFC 3168 with respect to negotiation | short. This document updates RFC 3168 with respect to negotiation | |||
and use of the feedback scheme for TCP. All aspects of RFC 3168 | and use of the feedback scheme for TCP. All aspects of RFC 3168 | |||
other than the TCP feedback scheme and its negotiation remain | other than the TCP feedback scheme and its negotiation remain | |||
unchanged by this specification. In particular the definition of ECN | unchanged by this specification. In particular, the definition of | |||
at the IP layer is unaffected. Section 4 gives a more detailed | ECN at the IP layer is unaffected. Section 4 details the aspects of | |||
specification of exactly which aspects of RFC 3168 this document | RFC 3168 that are updated by this document. | |||
updates. | ||||
This document uses the term Classic ECN feedback when it needs to | This document uses the term "Classic ECN feedback" when it needs to | |||
distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | distinguish the TCP/ECN feedback scheme defined in [RFC3168] from the | |||
AccECN TCP feedback scheme. AccECN is intended to offer a complete | AccECN TCP feedback scheme. AccECN is intended to offer a complete | |||
replacement for Classic TCP/ECN feedback, not a fork in the design of | replacement for Classic TCP/ECN feedback, not a fork in the design of | |||
TCP. AccECN feedback complements TCP's loss feedback and it can | TCP. AccECN feedback complements TCP's loss feedback and it can | |||
coexist alongside hosts using Classic TCP/ECN feedback. So its | coexist alongside hosts using Classic TCP/ECN feedback. So its | |||
applicability is intended to include the public Internet as well as | applicability is intended to include the public Internet as well as | |||
private IP network such as data centres (and even any non-IP networks | private IP networks such as data centres (and even any non-IP | |||
over which TCP is used), whether or not any nodes on the path support | networks over which TCP is used), whether or not any nodes on the | |||
ECN, of whatever flavour. | path support ECN, of whatever flavour. | |||
AccECN feedback overloads the two existing ECN flags in the TCP | AccECN feedback overloads the two existing ECN flags in the TCP | |||
header and allocates the currently reserved flag (previously called | header and allocates the currently reserved flag (previously called | |||
NS) in the TCP header, to be used as one three-bit counter field for | NS) in the TCP header to be used as one 3-bit counter field for | |||
feeding back the number of packets marked as congestion experienced | feeding back the number of packets marked as congestion experienced | |||
(CE). Given the new definitions of these three bits, both ends have | (CE). Given the new definitions of these three bits, both ends have | |||
to support the new wire protocol before it can be used. Therefore, | to support the new wire protocol before it can be used. Therefore, | |||
during the TCP handshake, the two ends use these three bits in the | during the TCP handshake, the two ends use these three bits in the | |||
TCP header to negotiate the most advanced feedback protocol that they | TCP header to negotiate the most advanced feedback protocol that they | |||
can both support, in a way that is backward compatible with | can both support, in a way that is backward compatible with | |||
[RFC3168]. | [RFC3168]. | |||
AccECN is solely a change to the TCP wire protocol; it covers the | AccECN is solely a change to the TCP wire protocol; it covers the | |||
negotiation and signaling of more Accurate ECN feedback from a TCP | negotiation and signaling of more Accurate ECN feedback from a TCP | |||
Data Receiver to a Data Sender. It is completely independent of how | Data Receiver to a Data Sender. It is completely independent of how | |||
TCP might respond to congestion feedback, which is out of scope, but | TCP might respond to congestion feedback, which is out of scope, but | |||
ultimately the motivation for Accurate ECN feedback. Like Classic | ultimately the motivation for Accurate ECN feedback. Like Classic | |||
ECN feedback, AccECN can be used by standard Reno or CUBIC congestion | ECN feedback, AccECN can be used by standard Reno or CUBIC congestion | |||
control [RFC5681] [RFC9438] to respond to the existence of at least | control [RFC5681] [RFC9438] to respond to the existence of at least | |||
one congestion notification within a round trip. Or, unlike Reno or | one congestion notification within a round trip. Or, unlike Reno or | |||
CUBIC, AccECN can be used to respond to the extent of congestion | CUBIC, AccECN can be used to respond to the extent of congestion | |||
notification over a round trip, as for example DCTCP does in | notification over a round trip, as for example DCTCP does in | |||
controlled environments [RFC8257]. For congestion response, this | controlled environments [RFC8257]. For congestion response, this | |||
specification refers to the original ECN specificiation adopted in | specification refers to the original ECN specification adopted in | |||
2001 [RFC3168], as updated by the more relaxed rules introduced in | 2001 [RFC3168], as updated by the more relaxed rules introduced in | |||
2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | |||
Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | |||
Alternative Backoff with ECN (ABE) [RFC8511]. | Alternative Backoff with ECN (ABE) [RFC8511]. | |||
Section 5.2 explains how AccECN is compatible with current commonly | Section 5.2 explains how AccECN is compatible with current commonly | |||
used TCP options, and a number of current experimental modifications | used TCP options, and a number of current experimental modifications | |||
to TCP, as well as SYN cookies. | to TCP, as well as SYN cookies. | |||
1.1. Document Roadmap | 1.1. Document Roadmap | |||
The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
(Section 1.2). Then, terminology is defined (Section 1.3) and a | (Section 1.2). Then, terminology is defined (Section 1.3) and a | |||
recap of existing prerequisite technology is given (Section 1.4). | recap of existing prerequisite technology is given (Section 1.4). | |||
Section 2 gives an informative overview of the AccECN protocol. Then | Section 2 gives an informative overview of the AccECN protocol. Then | |||
Section 3 gives the normative protocol specification, and Section 3.3 | Section 3 gives the normative protocol specification, and Section 3.3 | |||
collects together requirements for proxies, offload engines and other | collects requirements for proxies, offload engines, and other | |||
middleboxes. Section 4 clarifies which aspects of RFC 3168 are | middleboxes. Section 4 clarifies which aspects of RFC 3168 are | |||
updated by AccECN. Section 5 assesses the interaction of AccECN with | updated by AccECN. Section 5 assesses the interaction of AccECN with | |||
commonly used variants of TCP, whether standardized or not. Then | commonly used variants of TCP, whether they are standardized or not. | |||
Section 6 summarizes the features and properties of AccECN. | Then Section 6 summarizes the features and properties of AccECN. | |||
Section 7 summarizes the protocol fields and numbers that IANA will | Section 7 summarizes the protocol fields and numbers that IANA | |||
need to assign and Section 8 points to the aspects of the protocol | assigned, and Section 8 points to the aspects of the protocol that | |||
that will be of interest to the security community. | will be of interest to the security community. | |||
Appendix A gives pseudocode examples for the various algorithms that | Appendix A gives pseudocode examples for the various algorithms that | |||
AccECN uses and Appendix B explains why AccECN uses flags in the main | AccECN uses, and Appendix B explains why AccECN uses flags in the | |||
TCP header and quantifies the space left for future use. | main TCP header and quantifies the space left for future use. | |||
1.2. Goals | 1.2. Goals | |||
[RFC7560] enumerates requirements that a candidate feedback scheme | [RFC7560] enumerates requirements that a candidate feedback scheme | |||
will need to satisfy, under the headings: resilience, timeliness, | needs to satisfy, under the headings: resilience, timeliness, | |||
integrity, accuracy (including ordering and lack of bias), | integrity, accuracy (including ordering and lack of bias), | |||
complexity, overhead and compatibility (both backward and forward). | complexity, overhead, and compatibility (both backward and forward). | |||
It recognizes that a perfect scheme that fully satisfies all the | It recognizes that a perfect scheme that fully satisfies all the | |||
requirements is unlikely and trade-offs between requirements are | requirements is unlikely and trade-offs between requirements are | |||
likely. Section 6 presents the properties of AccECN against these | likely. Section 6 considers the properties of AccECN against these | |||
requirements and discusses the trade-offs made. | requirements and discusses the trade-offs. | |||
The requirements document recognizes that a protocol as ubiquitous as | The requirements document recognizes that a protocol as ubiquitous as | |||
TCP needs to be able to serve as-yet-unspecified requirements. | TCP needs to be able to serve as-yet-unspecified requirements. | |||
Therefore an AccECN receiver acts as a generic (mechanistic) | Therefore, an AccECN receiver acts as a generic (mechanistic) | |||
reflector of congestion information with the aim that in future new | reflector of congestion information with the aim that new sender | |||
sender behaviours can be deployed unilaterally (see Section 2.5). | behaviours can be deployed unilaterally (see Section 2.5) in the | |||
future. | ||||
1.3. Terminology | 1.3. Terminology | |||
AccECN: The more Accurate ECN feedback scheme will be called AccECN | AccECN: The more Accurate ECN feedback scheme is called AccECN for | |||
for short. | short. | |||
Classic ECN: The ECN protocol specified in [RFC3168]. | Classic ECN: The ECN protocol specified in [RFC3168]. | |||
Classic ECN feedback: The feedback aspect of the ECN protocol | Classic ECN feedback: The feedback aspect of the ECN protocol | |||
specified in [RFC3168], including generation, encoding, | specified in [RFC3168], including generation, encoding, | |||
transmission and decoding of feedback, but not the Data Sender's | transmission and decoding of feedback, but not the Data Sender's | |||
subsequent response to that feedback. | subsequent response to that feedback. | |||
ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | ACK: A TCP acknowledgement, with or without a data payload (ACK=1). | |||
skipping to change at page 7, line 30 ¶ | skipping to change at line 308 ¶ | |||
data and sends AccECN feedback. | data and sends AccECN feedback. | |||
Data Sender: The endpoint of a TCP half-connection that sends data | Data Sender: The endpoint of a TCP half-connection that sends data | |||
and receives AccECN feedback. | and receives AccECN feedback. | |||
In a mild abuse of terminology, this document sometimes refers to | In a mild abuse of terminology, this document sometimes refers to | |||
'TCP packets' instead of 'TCP segments'. | 'TCP packets' instead of 'TCP segments'. | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in | |||
14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
1.4. Recap of Existing ECN feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
two parts conceptionally. In the forward direction, alongside the | two parts conceptionally. In the forward direction, alongside the | |||
data stream, it uses a two-bit field in the IP header. This is | data stream, it uses a 2-bit field in the IP header. This is | |||
referred to as IP-ECN later on. This signal carried in the IP (Layer | referred to as IP-ECN later on. This signal carried in the IP (Layer | |||
3) header is exposed to network devices and may be modified when such | 3) header is exposed to network devices and may be modified when such | |||
a device starts to experience congestion (see Table 1). The second | a device starts to experience congestion (see Table 1). The second | |||
part is the feedback mechanism, by which the original data sender is | part is the feedback mechanism, by which the original data sender is | |||
notified of the current congestion state of the intermediate path. | notified of the current congestion state of the intermediate path. | |||
That returned signal is carried in a protocol specific manner, and is | That returned signal is carried in a protocol-specific manner, and is | |||
not to be modified by intermediate network devices. While ECN is in | not to be modified by intermediate network devices. While ECN is in | |||
active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | |||
[RFC6679] and Remote Direct Memory Access over Converged Ethernet | [RFC6679], and Remote Direct Memory Access over Converged Ethernet | |||
[RoCEv2], this document only concerns itself with the specific | [RoCEv2], this document only concerns itself with the specific | |||
implementation for the TCP protocol. | implementation for the TCP protocol. | |||
Once ECN has been negotiated for a transport layer connection, the | Once ECN has been negotiated for a transport layer connection, the | |||
Data Sender for either half-connection can set two possible | Data Sender for either half-connection can set two possible | |||
codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | codepoints (ECT(0) or ECT(1)) in the IP header of a data packet to | |||
indicate an ECN-capable transport (ECT). If the ECN codepoint is | indicate an ECN-capable transport (ECT). If the ECN codepoint is | |||
0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
skipping to change at page 8, line 32 ¶ | skipping to change at line 356 ¶ | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
Table 1: The ECN Field in the IP Header | Table 1: The ECN Field in the IP Header | |||
In the TCP header the first two bits in byte 14 (the TCP header flags | In the TCP header, the first two bits in byte 14 (the TCP header | |||
at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) and | flags at bit offsets 8 and 9 labelled Congestion Window Reduced (CWR) | |||
Explicit Congestion notification Echo (ECE) in Figure 1) are defined | and Explicit Congestion notification Echo (ECE) in Figure 1) are | |||
as flags for the use of Classic ECN [RFC3168]. A TCP Client | defined as flags for the use of Classic ECN [RFC3168]. A TCP Client | |||
indicates that it supports Classic ECN feedback by setting (CWR,ECE) | indicates that it supports Classic ECN feedback by setting (CWR,ECE) | |||
= (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic | = (1,1) in the SYN, and an ECN-enabled TCP Server confirms Classic | |||
ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On | ECN support by setting (CWR,ECE) = (0,1) in the SYN/ACK. On | |||
reception of a CE-marked packet at the IP layer, the Data Receiver | reception of a CE-marked packet at the IP layer, the Data Receiver | |||
for that half-connection starts to set the Echo Congestion | for that half-connection starts to set the Echo Congestion | |||
Experienced (ECE) flag continuously in the TCP header of ACKs, which | Experienced (ECE) flag continuously in the TCP header of ACKs, which | |||
gives the signal resilience to loss or reordering of ACKs. The Data | gives the signal resilience to loss or reordering of ACKs. The Data | |||
Sender for the same half-connection confirms that it has received at | Sender for the same half-connection confirms that it has received at | |||
least one ECE signal by responding with the congestion window reduced | least one ECE signal by responding with the CWR flag, which allows | |||
(CWR) flag, which allows the Data Receiver to stop repeating the ECN- | the Data Receiver to stop repeating the ECN-Echo flag. This always | |||
Echo flag. This always leads to a full RTT of ACKs with ECE set. | leads to a full RTT of ACKs with ECE set. Thus Classic ECN cannot | |||
Thus Classic ECN cannot feed back any additional CE markings arriving | feed back any additional CE markings arriving within this RTT. | |||
within this RTT. | ||||
The last bit in byte 13 of the TCP header (the TCP header flag at bit | The last bit in byte 13 of the TCP header (the TCP header flag at bit | |||
offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN | offset 7 in Figure 1) was defined as the Nonce Sum (NS) for the ECN- | |||
Nonce [RFC3540]. In the absence of widespread deployment RFC 3540 | nonce [RFC3540]. In the absence of widespread deployment, RFC 3540 | |||
has been reclassified as historic [RFC8311] and the respective flag | was reclassified as Historic [RFC8311] and the respective flag was | |||
has been marked as "reserved", making this TCP flag available for use | marked as "Reserved", which made this TCP flag available for use by | |||
by AccECN instead. | AccECN instead. | |||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 1: TCP header flags as defined before the Nonce Sum flag | Figure 1: TCP Header Flags as Defined Before the Nonce Sum Flag | |||
reverted to Reserved | Reverted to Reserved | |||
2. AccECN Protocol Overview and Rationale | 2. AccECN Protocol Overview and Rationale | |||
This section provides an informative overview of the AccECN protocol | This section provides an informative overview of the AccECN protocol | |||
that will be normatively specified in Section 3 | that is normatively specified in Section 3. | |||
Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
whenever possible. | whenever possible. | |||
The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
* an essential feedback part that re-uses the TCP-ECN header bits | * an essential feedback part that reuses the TCP-ECN header bits for | |||
for the Data Receiver to feed back the number of packets arriving | the Data Receiver to feed back the number of packets arriving with | |||
with CE in the IP-ECN field. This provides more accuracy than | CE in the IP-ECN field. This provides more accuracy than Classic | |||
Classic ECN feedback, but limited resilience against ACK loss; | ECN feedback, but limited resilience against ACK loss; | |||
* a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
AccECN TCP options that provide additional feedback on the number | AccECN TCP options that provide additional feedback on the number | |||
of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
codepoints in the IP-ECN field (not just CE marks). See the BCP | codepoints in the IP-ECN field (not just CE marks). See the BCP | |||
on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
suffer from middlebox interference. | suffer from middlebox interference. | |||
The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
available for TCP options and given the possibility that certain | available for TCP options and given the possibility that certain | |||
incorrectly designed middleboxes might prevent TCP using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
options. | options. | |||
The essential feedback part overloads the previous definition of the | The essential feedback part overloads the previous definition of the | |||
three flags in the TCP header that had been assigned for use by | three flags in the TCP header that had been assigned for use by | |||
Classic ECN. This design choice deliberately allows AccECN peers to | Classic ECN. This design choice deliberately allows AccECN peers to | |||
replace the Classic ECN feedback protocol, rather than leaving | replace the Classic ECN feedback protocol, rather than leaving | |||
Classic ECN feedback intact and adding more accurate feedback | Classic ECN feedback intact and adding more accurate feedback | |||
separately because: | separately because: | |||
* this efficiently reuses scarce TCP header space, given TCP option | * this efficiently reuses scarce TCP header space, given TCP option | |||
space is approaching saturation; | space is approaching saturation; | |||
* a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
in the design which modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
feedback; | feedback; | |||
* otherwise Classic and Accurate ECN feedback could give conflicting | * otherwise, Classic and Accurate ECN feedback could give | |||
feedback about the same segment, which could open up new security | conflicting feedback about the same segment, which could open up | |||
concerns and make implementations unnecessarily complex; | new security concerns and make implementations unnecessarily | |||
complex; | ||||
* middleboxes are more likely to faithfully forward the TCP ECN | * middleboxes are more likely to faithfully forward the TCP ECN | |||
flags than newly defined areas of the TCP header. | flags than newly defined areas of the TCP header. | |||
AccECN is designed to work even if the supplementary feedback part is | AccECN is designed to work even if the supplementary feedback part is | |||
removed or zeroed out, as long as the essential feedback part gets | removed or zeroed out, as long as the essential feedback part gets | |||
through. | through. | |||
2.1. Capability Negotiation | 2.1. Capability Negotiation | |||
AccECN is a change to the wire protocol of the main TCP header, | AccECN changes the wire protocol of the main TCP header; therefore, | |||
therefore it can only be used if both endpoints have been upgraded to | it can only be used if both endpoints have been upgraded to | |||
understand it. The TCP Client signals support for AccECN on the | understand it. The TCP Client signals support for AccECN on the | |||
initial SYN of a connection and the TCP Server signals whether it | initial SYN of a connection, and the TCP Server signals whether it | |||
supports AccECN on the SYN/ACK. The TCP flags on the SYN that the | supports AccECN on the SYN/ACK. The TCP flags on the SYN that the | |||
TCP Client uses to signal AccECN support have been carefully chosen | TCP Client uses to signal AccECN support have been carefully chosen | |||
so that a TCP Server will interpret them as a request to support the | so that a TCP Server will interpret them as a request to support the | |||
most recent variant of ECN feedback that it supports. Then the TCP | most recent variant of ECN feedback that it supports. Then the TCP | |||
Client falls back to the same variant of ECN feedback. | Client falls back to the same variant of ECN feedback. | |||
An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | |||
option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
the SYN/ACK and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
bytes marked CE, ECT(1) and ECT(0) in the IP-ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | |||
counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
header and TCP options. The fourth counter counts the number of | header and TCP options. The fourth counter counts the number of | |||
packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
The Data Sender maintains four equivalent counters for the half | The Data Sender maintains four equivalent counters for the half | |||
connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
little delay. | little delay. | |||
Each ACK carries the three least significant bits (LSBs) of the | Each ACK carries the three least significant bits (LSBs) of the | |||
packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
renamed the Accurate ECN (ACE) field (see Figure 3 later). The 24 | renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | |||
LSBs of some or all of the byte counters can be optionally carried in | some or all of the byte counters can be optionally carried in an | |||
an AccECN Option. For efficient use of limited option space, two | AccECN Option. For efficient use of limited option space, two | |||
alternative forms of AccECN Option are specified with the fields in | alternative forms of the AccECN Option are specified with the fields | |||
the opposite order to each other. | in the opposite order to each other. | |||
2.3. Delayed ACKs and Resilience Against ACK Loss | 2.3. Delayed ACKs and Resilience Against ACK Loss | |||
With both the ACE and the AccECN Option mechanisms, the Data Receiver | With both the ACE and the AccECN Option mechanisms, the Data Receiver | |||
continually repeats the current LSBs of each of its respective | continually repeats the current LSBs of each of its respective | |||
counters. There is no need to acknowledge these continually repeated | counters. There is no need to acknowledge these continually repeated | |||
counters, so the congestion window reduced (CWR) mechanism of | counters, so the Congestion Window Reduced (CWR) mechanism of | |||
[RFC3168] is no longer used. Even if some ACKs are lost, the Data | [RFC3168] is no longer used. Even if some ACKs are lost, the Data | |||
Sender ought to be able to infer how much to increment its own | Sender ought to be able to infer how much to increment its own | |||
counters, even if the protocol field has wrapped. | counters, even if the protocol field has wrapped. | |||
The 3-bit ACE field can wrap fairly frequently. Therefore, even if | The 3-bit ACE field can wrap fairly frequently. Therefore, even if | |||
it appears to have incremented by one (say), the field might have | it appears to have incremented by one (say), the field might have | |||
actually cycled completely then incremented by one. The Data | actually cycled completely and then incremented by one. The Data | |||
Receiver is not allowed to delay sending an ACK to such an extent | Receiver is not allowed to delay sending an ACK to such an extent | |||
that the ACE field would cycle. However ACKs received at the Data | that the ACE field would cycle. However, ACKs received at the Data | |||
Sender could still cycle because a whole sequence of ACKs carrying | Sender could still cycle because a whole sequence of ACKs carrying | |||
intervening values of the field might all be lost or delayed in | intervening values of the field might all be lost or delayed in | |||
transit. | transit. | |||
The fields in an AccECN Option are larger, but they will increment in | The fields in an AccECN Option are larger, but they will increment in | |||
larger steps because they count bytes not packets. Nonetheless, | larger steps because they count bytes not packets. Nonetheless, | |||
their size has been chosen such that a whole cycle of the field would | their size has been chosen such that a whole cycle of the field would | |||
never occur between ACKs unless there had been an infeasibly long | never occur between ACKs unless there has been an infeasibly long | |||
sequence of ACK losses. Therefore, provided that an AccECN Option is | sequence of ACK losses. Therefore, provided that an AccECN Option is | |||
available, it can be treated as a dependable feedback channel. | available, it can be treated as a dependable feedback channel. | |||
If an AccECN Option is not available, e.g., it is being stripped by a | If an AccECN Option is not available, e.g., it is being stripped by a | |||
middlebox, the AccECN protocol will only feed back information on CE | middlebox, the AccECN protocol will only feed back information on CE | |||
markings (using the ACE field). Although not ideal, this will be | markings (using the ACE field). Although not ideal, this will be | |||
sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | sufficient, because it is envisaged that neither ECT(0) nor ECT(1) | |||
will ever indicate more severe congestion than CE, even though future | will ever indicate more severe congestion than CE, even though future | |||
uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | uses for ECT(0) or ECT(1) are still unclear [RFC8311]. Because the | |||
3-bit ACE field is so small, when it is the only field available, the | 3-bit ACE field is so small, when it is the only field available, the | |||
skipping to change at page 12, line 26 ¶ | skipping to change at line 536 ¶ | |||
AccECN Option on an ACK. The rules are designed to ensure that the | AccECN Option on an ACK. The rules are designed to ensure that the | |||
order in which different markings arrive at the receiver is | order in which different markings arrive at the receiver is | |||
communicated to the sender (as long as options are reaching the | communicated to the sender (as long as options are reaching the | |||
sender and as long as there is no ACK loss). Implementations are | sender and as long as there is no ACK loss). Implementations are | |||
encouraged to send an AccECN Option more frequently, but this is left | encouraged to send an AccECN Option more frequently, but this is left | |||
up to the implementer. | up to the implementer. | |||
2.4. Feedback Metrics | 2.4. Feedback Metrics | |||
The CE packet counter in the ACE field and the CE byte counter in | The CE packet counter in the ACE field and the CE byte counter in | |||
AccECN Options both provide feedback on received CE-marks. The CE | AccECN Options both provide feedback on received CE marks. The CE | |||
packet counter includes control packets that do not have payload | packet counter includes control packets that do not have payload | |||
data, while the CE byte counter solely includes marked payload bytes. | data, while the CE byte counter solely includes marked payload bytes. | |||
If both are present, the byte counter in an AccECN Option will | If both are present, the byte counter in an AccECN Option will | |||
provide the more accurate information needed for modern congestion | provide the more accurate information needed for modern congestion | |||
control and policing schemes, such as L4S, DCTCP or ConEx. If AccECN | control and policing schemes, such as L4S, DCTCP, or ConEx. If | |||
Options are stripped, a simple algorithm to estimate the number of | AccECN Options are stripped, a simple algorithm to estimate the | |||
marked bytes from the ACE field is given in Appendix A.3. | number of marked bytes from the ACE field is given in Appendix A.3. | |||
The AccECN design has been generalized so that it ought to be able to | The AccECN design has been generalized so that it ought to be able to | |||
support possible future uses of the experimental ECT(1) codepoint | support possible future uses of the experimental ECT(1) codepoint | |||
other than the L4S experiment [RFC9330], such as a lower severity or | other than the L4S experiment [RFC9330], such as a lower severity or | |||
a more instant congestion signal than CE. | a more instant congestion signal than CE. | |||
Feedback in bytes is provided to protect against the receiver or a | Feedback in bytes is provided to protect against the receiver or a | |||
middlebox using attacks similar to 'ACK-Division' to artificially | middlebox using attacks similar to 'ACK-Division' to artificially | |||
inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
that TCP counts acknowledged bytes not packets. | that TCP counts acknowledge bytes not packets. | |||
2.5. Generic (Mechanistic) Reflector | 2.5. Generic (Mechanistic) Reflector | |||
The ACE field provides feedback about CE markings in the IP-ECN field | The ACE field provides feedback about CE markings in the IP-ECN field | |||
of both data and control packets. According to [RFC3168] the Data | of both data and control packets. According to [RFC3168], the Data | |||
Sender is meant to set the IP-ECN field of control packets to Not- | Sender is meant to set the IP-ECN field of control packets to Not- | |||
ECT. However, mechanisms in certain private networks (e.g., data | ECT. However, mechanisms in certain private networks (e.g., data | |||
centres) set control packets to be ECN capable because they are | centres) set control packets to be ECN-capable because they are | |||
precisely the packets that performance depends on most. | precisely the packets that performance depends on most. | |||
For this reason, AccECN is designed to be a generic reflector of | For this reason, AccECN is designed to be a generic reflector of | |||
whatever ECN markings it sees, whether or not they are compliant with | whatever ECN markings it sees, whether or not they are compliant with | |||
a current standard. Then as standards evolve, Data Senders can | a current standard. Then as standards evolve, Data Senders can | |||
upgrade unilaterally without any need for receivers to upgrade too. | upgrade unilaterally without any need for receivers to upgrade too. | |||
It is also useful to be able to rely on generic reflection behaviour | It is also useful to be able to rely on generic reflection behaviour | |||
when senders need to test for unexpected interference with markings | when senders need to test for unexpected interference with markings | |||
(for instance Section 3.2.2.3, Section 3.2.2.4 and Section 3.2.3.2 of | (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | |||
the present document and paragraph 2 of Section 20.2 of [RFC3168]). | document and paragraph 2 of Section 20.2 of [RFC3168]). | |||
The initial SYN and SYN/ACK are the most critical control packets, so | The initial SYN and SYN/ACK are the most critical control packets, so | |||
AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | |||
ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | |||
the SYN and SYN/ACK supports future scenarios in which SYNs might be | the SYN and SYN/ACK supports future scenarios in which SYNs might be | |||
ECN-enabled (without prejudging whether they ought to be). For | ECN-enabled (without prejudging whether they ought to be). For | |||
instance, [RFC8311] updates this aspect of RFC 3168 to allow | instance, [RFC8311] updates this aspect of RFC 3168 to allow | |||
experimentation with ECN-capable TCP control packets. | experimentation with ECN-capable TCP control packets. | |||
Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | |||
not-ECT in compliance with RFC 3168, feedback on the state of the IP- | Not-ECT in compliance with RFC 3168, feedback on the state of the IP- | |||
ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
because middleboxes have been known to overwrite the IP-ECN field as | because middleboxes have been known to overwrite the IP-ECN field as | |||
if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
[Mandalari18]. For example, if a TCP Client has set the SYN to Not- | [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | |||
ECT, but receives feedback that the IP-ECN field on the SYN arrived | ECT, but receives feedback that the IP-ECN field on the SYN arrived | |||
with a different codepoint, it can detect such middlebox | with a different codepoint, it can detect such middlebox | |||
interference. Previously, neither end knew what IP-ECN field the | interference. Previously, neither end knew what IP-ECN field the | |||
other had sent. So, if a TCP Server received ECT or CE on a SYN, it | other sent. So, if a TCP Server received ECT or CE on a SYN, it | |||
could not know whether it was invalid because only the TCP Client | could not know whether it was invalid because only the TCP Client | |||
knew whether it originally marked the SYN as Not-ECT (or ECT). | knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
Therefore, prior to AccECN, the Server's only safe course of action | Therefore, prior to AccECN, the Server's only safe course of action | |||
in this example was to disable ECN for the connection. Instead, the | in this example was to disable ECN for the connection. Instead, the | |||
AccECN protocol allows the Server and Client to feed back the ECN | AccECN protocol allows the Server and Client to feed back the ECN | |||
field received on the SYN and SYN/ACK to their peer, which then has | field received on the SYN and SYN/ACK to their peer, which now has | |||
all the information to decide whether the connection has to fall-back | all the information to decide whether the connection has to fall back | |||
from supporting ECN (or not). | from supporting ECN (or not). | |||
3. AccECN Protocol Specification | 3. AccECN Protocol Specification | |||
3.1. Negotiating to use AccECN | 3.1. Negotiating to Use AccECN | |||
3.1.1. Negotiation during the TCP three-way handshake | 3.1.1. Negotiation During the TCP Three-Way Handshake | |||
Given the ECN Nonce [RFC3540] has been reclassified as historic | Given the ECN-nonce [RFC3540] has been reclassified as Historic | |||
[RFC8311], the TCP flag that was previously called NS (Nonce Sum) is | [RFC8311], the TCP flag that was previously called NS (Nonce Sum) is | |||
renamed as the AE (Accurate ECN) flag (the TCP header flag at bit | renamed as the AE (Accurate ECN) flag (the TCP header flag at bit | |||
offset 7 in Figure 2). See the IANA Considerations in Section 7. | offset 7 in Figure 2). See the IANA Considerations in Section 7. | |||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | A | C | E | U | A | P | R | S | F | | | | | A | C | E | U | A | P | R | S | F | | |||
| Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | E | W | C | R | C | S | S | Y | I | | |||
| | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 2: The new definition of the TCP header flags during the | Figure 2: The New Definition of the TCP Header Flags During the | |||
TCP three-way handshake | TCP Three-Way Handshake | |||
During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
the above three flags set, it MUST set both its half connections into | the above three flags set, it MUST set both its half connections into | |||
AccECN mode. Then it MUST set the AE, CWR and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
back the IP-ECN field that arrived on the SYN. This applies whether | back the IP-ECN field that arrived on the SYN. This applies whether | |||
or not the Server itself supports setting the IP-ECN field on a SYN | or not the Server itself supports setting the IP-ECN field on a SYN | |||
or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
When the TCP Server returns any of the 4 combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
Server MUST NOT set one of these 4 combination of flags on the SYN/ | Server MUST NOT set one of these four combinations of flags on the | |||
ACK unless the preceding SYN requested support for AccECN as above. | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
above. | ||||
Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
that confirms that the TCP Server supports AccECN, the TCP Client | that confirms that the TCP Server supports AccECN, the TCP Client | |||
MUST set both its half connections into AccECN mode. The TCP Client | MUST set both its half connections into AccECN mode. The TCP Client | |||
MUST NOT enter AccECN mode (or any feedback mode) before it has | MUST NOT enter AccECN mode (or any feedback mode) before it has | |||
received the first SYN/ACK. | received the first SYN/ACK. | |||
Once in AccECN mode, a TCP Client or Server has the rights and | Once in AccECN mode, a TCP Client or Server has the rights and | |||
obligations to participate in the ECN protocol defined in | obligations to participate in the ECN protocol defined in | |||
Section 3.1.5. | Section 3.1.5. | |||
The procedures to follow for retransmission of SYNs or SYN/ACKs are | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
given in Section 3.1.4. | Section 3.1.4. | |||
It is RECOMMENDED that the AccECN protocol is implemented alongside | It is RECOMMENDED that the AccECN protocol be implemented alongside | |||
Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | Selective Acknowledgement (SACK) [RFC2018]. If SACK is implemented | |||
with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | with AccECN, Duplicate Selective Acknowledgement (D-SACK) [RFC2883] | |||
MUST also be implemented. | MUST also be implemented. | |||
3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
The three flags set to 1 to indicate AccECN support on the SYN have | The three flags are set to 1 to indicate AccECN support on the SYN | |||
been carefully chosen to enable natural fall-back to prior stages in | have been carefully chosen to enable natural fall-back to prior | |||
the evolution of ECN. Table 2 tabulates all the negotiation | stages in the evolution of ECN. Table 2 tabulates all the | |||
possibilities for ECN-related capabilities that involve at least one | negotiation possibilities for ECN-related capabilities that involve | |||
AccECN-capable host. The entries in the first two columns have been | at least one AccECN-capable host. The entries in the first two | |||
abbreviated, as follows: | columns have been abbreviated, as follows: | |||
AccECN: Supports more Accurate ECN Feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
specification) | specification) | |||
Nonce: Supports ECN Nonce feedback [RFC3540] | Nonce: Supports ECN-nonce feedback [RFC3540] | |||
ECN: Supports 'Classic' ECN feedback [RFC3168] | ECN: Supports 'Classic' ECN feedback [RFC3168] | |||
No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
packet drop. | packet drop. | |||
+========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| Host A | Host B | SYN | SYN/ACK | Feedback Mode | | | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | |||
| | | A->B | B->A | of Host A | | | | | A->B | B->A | of Host A | | |||
| | | AE CWR ECE | AE CWR ECE | | | | | | AE CWR ECE | AE CWR ECE | | | |||
+========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | |||
| AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 1 | AccECN (ECT1 on SYN) | | |||
| AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | | AccECN | AccECN | 1 1 1 | 1 0 0 | AccECN (ECT0 on SYN) | | |||
| AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | | AccECN | AccECN | 1 1 1 | 1 1 0 | AccECN (CE on SYN) | | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) | | | AccECN | Nonce | 1 1 1 | 1 0 1 | (Reserved) | | |||
| AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN | | | AccECN | ECN | 1 1 1 | 0 0 1 | Classic ECN | | |||
| AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | | AccECN | No ECN | 1 1 1 | 0 0 0 | Not ECN | | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | | Nonce | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | |||
| ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | | ECN | AccECN | 0 1 1 | 0 0 1 | Classic ECN | | |||
| No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | | No ECN | AccECN | 0 0 0 | 0 0 0 | Not ECN | | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
| AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | | AccECN | Broken | 1 1 1 | 1 1 1 | Not ECN | | |||
+--------+--------+------------+------------+----------------------+ | +--------+--------+------------+------------+----------------------+ | |||
Table 2: ECN capability negotiation between Client (A) and | Table 2: ECN Capability Negotiation Between Client (A) and | |||
Server (B) | Server (B) | |||
Table 2 is divided into blocks each separated by an empty row. | Table 2 is divided into blocks, with each block separated by an empty | |||
row. | ||||
1. The top block shows the case already described in Section 3.1 | 1. The top block shows the case already described in Section 3.1 | |||
where both endpoints support AccECN and how the TCP Server (B) | where both endpoints support AccECN and how the TCP Server (B) | |||
indicates congestion feedback. | indicates congestion feedback. | |||
2. The second block shows the cases where the TCP Client (A) | 2. The second block shows the cases where the TCP Client (A) | |||
supports AccECN but the TCP Server (B) supports some earlier | supports AccECN but the TCP Server (B) supports some earlier | |||
variant of TCP feedback, indicated in its SYN/ACK. Therefore, as | variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | |||
soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | |||
shown it MUST set both its half connections into the feedback | shown, it MUST set both its half connections into the feedback | |||
mode shown in the rightmost column. If the TCP Client has set | mode shown in the rightmost column. If the TCP Client has set | |||
itself into Classic ECN feedback mode it MUST then comply with | itself into Classic ECN feedback mode, it MUST comply with | |||
[RFC3168]. | [RFC3168]. | |||
An AccECN implementation has no need to recognize or support the | An AccECN implementation has no need to recognize or support the | |||
Server response labelled 'Nonce' or ECN Nonce feedback more | Server response labelled 'Nonce' or ECN-nonce feedback more | |||
generally [RFC3540], which has been reclassified as historic | generally [RFC3540], as RFC 3540 has been reclassified as | |||
[RFC8311]. AccECN is compatible with alternative ECN feedback | Historic [RFC8311]. AccECN is compatible with alternative ECN | |||
integrity approaches to the nonce (see Section 5.3). The SYN/ACK | feedback integrity approaches to the nonce (see Section 5.3). | |||
labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is reserved for | The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | |||
future use. A TCP Client (A) that receives such a SYN/ACK | reserved for future use. A TCP Client (A) that receives such a | |||
follows the procedure for forward compatibility given in | SYN/ACK follows the procedure for forward compatibility given in | |||
Section 3.1.3. | Section 3.1.3. | |||
3. The third block shows the cases where the TCP Server (B) supports | 3. The third block shows the cases where the TCP Server (B) supports | |||
AccECN but the TCP Client (A) supports some earlier variant of | AccECN but the TCP Client (A) supports some earlier variant of | |||
TCP feedback, indicated in its SYN. | TCP feedback, as indicated in its SYN. | |||
When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
(AE,CWR,ECE) = (0,1,1) it MUST do one of the following: | (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | |||
* set both its half connections into the Classic ECN feedback | * set both its half connections into the Classic ECN feedback | |||
mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | |||
shown. Then it MUST comply with [RFC3168]. | shown. Then it MUST comply with [RFC3168]. | |||
* set both its half-connections into Not ECN mode and return a | * set both its half-connections into Not ECN mode and return a | |||
SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | |||
disabled. This latter case is unlikely to be desirable, but | disabled. This latter case is unlikely to be desirable, but | |||
it is allowed as a possibility, e.g., for minimal TCP | it is allowed as a possibility, e.g., for minimal TCP | |||
implementations. | implementations. | |||
When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
(AE,CWR,ECE) = (0,0,0) it MUST set both its half connections into | (AE,CWR,ECE) = (0,0,0), it MUST set both its half connections | |||
the Not ECN feedback mode, return a SYN/ACK with (AE,CWR,ECE) = | into the Not ECN feedback mode, return a SYN/ACK with | |||
(0,0,0) as shown and continue with ECN disabled. | (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | |||
4. The fourth block displays a combination labelled `Broken'. Some | 4. The fourth block displays a combination labelled 'Broken'. Some | |||
older TCP Server implementations incorrectly set the TCP-ECN | older TCP Server implementations incorrectly set the TCP-ECN | |||
flags in the SYN/ACK by reflecting those in the SYN. Such broken | flags in the SYN/ACK by reflecting those in the SYN. Such broken | |||
TCP Servers (B) cannot support ECN, so as soon as an AccECN- | TCP Servers (B) cannot support ECN; so as soon as an AccECN- | |||
capable TCP Client (A) receives such a broken SYN/ACK it MUST | capable TCP Client (A) receives such a broken SYN/ACK, it MUST | |||
fall back to Not ECN mode for both its half connections and | fall back to Not ECN mode for both its half connections and | |||
continue with ECN disabled. | continue with ECN disabled. | |||
The following additional rules do not fit the structure of the table, | The following additional rules do not fit the structure of the table, | |||
but they complement it: | but they complement it: | |||
Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | |||
B. Host A MUST then enter the same feedback mode as it would have | B. Host A MUST then enter the same feedback mode as it would have | |||
entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
skipping to change at page 17, line 30 ¶ | skipping to change at line 782 ¶ | |||
new TCP connection if they receive an in-window SYN packet during | new TCP connection if they receive an in-window SYN packet during | |||
TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | TIME-WAIT state. When a TCP host enters TIME-WAIT or CLOSED | |||
state, it ought to ignore any previous state about the negotiation | state, it ought to ignore any previous state about the negotiation | |||
of AccECN for that connection and renegotiate the feedback mode | of AccECN for that connection and renegotiate the feedback mode | |||
according to Table 2. | according to Table 2. | |||
3.1.3. Forward Compatibility | 3.1.3. Forward Compatibility | |||
If a TCP Server that implements AccECN receives a SYN with the three | If a TCP Server that implements AccECN receives a SYN with the three | |||
TCP header flags (AE,CWR,ECE) set to any combination other than | TCP header flags (AE,CWR,ECE) set to any combination other than | |||
(0,0,0), (0,1,1) or (1,1,1) and it does not have logic specific to | (0,0,0), (0,1,1), or (1,1,1) and it does not have logic specific to | |||
such a combination, the Server MUST negotiate the use of AccECN as if | such a combination, the Server MUST negotiate the use of AccECN as if | |||
the three flags had been set to (1,1,1). However, an AccECN Client | the three flags had been set to (1,1,1). However, an AccECN Client | |||
implementation MUST NOT send a SYN with any combination other than | implementation MUST NOT send a SYN with any combination other than | |||
the three listed. | the three listed. | |||
If a TCP Client has sent a SYN requesting AccECN feedback with | If a TCP Client sent a SYN requesting AccECN feedback with | |||
(AE,CWR,ECE) = (1,1,1) then receives a SYN/ACK with the currently | (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | |||
reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | |||
logic specific to such a combination, the Client MUST enable AccECN | logic specific to such a combination, the Client MUST enable AccECN | |||
mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
as if it fed back that the IP-ECN field on the SYN had arrived | as if it fed back that the IP-ECN field on the SYN had arrived | |||
unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| present specification applies whether or not the three | | present specification applies whether or not the three | |||
| remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
All these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
combinations on a SYN or SYN/ACK can rely on consistent behaviour | combinations on a SYN or SYN/ACK can rely on consistent behaviour | |||
from the installed base of AccECN implementations. See Appendix B.3 | from the installed base of AccECN implementations. See Appendix B.3 | |||
for related discussion. | for related discussion. | |||
3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP ECN | |||
flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
the usual retransmission time-outs). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
also fails to be acknowledged, in deployment scenarios where AccECN | also fails to be acknowledged, in deployment scenarios where AccECN | |||
path traversal might be problematic, the TCP Client SHOULD send | path traversal might be problematic, the TCP Client SHOULD send | |||
subsequent retransmissions of the SYN with the three TCP-ECN flags | subsequent retransmissions of the SYN with the three TCP-ECN flags | |||
cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | cleared (AE,CWR,ECE) = (0,0,0). Such a retransmitted SYN MUST use | |||
the same initial sequence number (ISN) as the original SYN. | the same initial sequence number (ISN) as the original SYN. | |||
Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective (e.g., attempting to negotiate AccECN on the SYN | |||
only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
congestion). | congestion). | |||
Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
specification of the other option(s) and any attempt to co-ordinate | specification of the other option(s) and any attempt to coordinate | |||
fall-back between different modules of the stack. For instance, even | fall-back between different modules of the stack. For instance, even | |||
if taking part in an [RFC8311] experiment that allows ECT on a SYN, | if taking part in an [RFC8311] experiment that allows ECT on a SYN, | |||
it would be advisable to try it without. | it would be advisable to try it without. | |||
Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
skipping to change at page 19, line 15 ¶ | skipping to change at line 858 ¶ | |||
All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
negotiating different types of feedback have been sent within the | negotiating different types of feedback have been sent within the | |||
same connection, including the possibility that they arrive out of | same connection, including the possibility that they arrive out of | |||
order. As examples, the following non-normative bullets call out | order. As examples, the following non-normative bullets call out | |||
those rules from Section 3.1.5 that apply to the above fall-back | those rules from Section 3.1.5 that apply to the above fall-back | |||
strategies: | strategies: | |||
* Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | |||
with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | |||
from the Server in response to one, the other, or both and | from the Server in response to one, the other, or both, and | |||
possibly reordered; | possibly reordered; | |||
* Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
receive or send; | receive or send; | |||
* If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
connection. Note that this rule is different to that of a Server | connection. Note that this rule is different than that of a | |||
in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
* Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
see Section 3.2.2.3, Section 3.2.2.4); | see Section 3.2.2.3, Section 3.2.2.4); | |||
* Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
skipping to change at page 20, line 10 ¶ | skipping to change at line 902 ¶ | |||
negotiating different types of feedback are sent within the same | negotiating different types of feedback are sent within the same | |||
connection, including the possibility that they arrive out of order. | connection, including the possibility that they arrive out of order. | |||
As examples, the following non-normative bullets call out those rules | As examples, the following non-normative bullets call out those rules | |||
from Section 3.1.5 that apply to the above fall-back strategies: | from Section 3.1.5 that apply to the above fall-back strategies: | |||
* An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
whatever SYN/ACKs it might send; | whatever SYN/ACKs it might send; | |||
* if a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
(0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | |||
* If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | |||
field on the latest SYN to have arrived; | field on the latest SYN to have arrived; | |||
* If a TCP Server enters AccECN mode then subsequently sends a SYN/ | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
connection; | connection; | |||
* Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
see Section 3.2.2.3, Section 3.2.2.4); | see Sections 3.2.2.3, 3.2.2.4); | |||
* Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP-ECN field in incoming packets for the rest of | |||
the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
skipping to change at page 21, line 5 ¶ | skipping to change at line 946 ¶ | |||
synchronization; | synchronization; | |||
'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
* Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
- MUST NOT switch into a different feedback mode to the one it | - MUST NOT switch into a different feedback mode than the one it | |||
first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
different types. | different types. | |||
- SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
received after the implementation reaches the Established | received after the implementation reaches the Established | |||
state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]; | |||
Reason: Reaching established state implies that at least one | Reason: Reaching established state implies that at least one | |||
SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
skipping to change at page 22, line 35 ¶ | skipping to change at line 1024 ¶ | |||
- SHOULD respond to any subsequent valid SYN using a SYN/ACK with | - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | |||
(AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | |||
negotiate Classic ECN or AccECN feedback mode; | negotiate Classic ECN or AccECN feedback mode; | |||
Rationale: There would be no point in the Server offering any | Rationale: There would be no point in the Server offering any | |||
type of ECN feedback, because the Client will not be using ECN. | type of ECN feedback, because the Client will not be using ECN. | |||
However, there is no interoperability reason to make this rule | However, there is no interoperability reason to make this rule | |||
mandatory. | mandatory. | |||
If for any reason a host is not willing to provide ECN feedback on a | If for any reason a host is not willing to provide ECN feedback on a | |||
particular TCP connection, it SHOULD clear the AE, CWR and ECE flags | particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | |||
in all SYN and/or SYN/ACK packets that it sends. | in all SYN and/or SYN/ACK packets that it sends. | |||
Sending ECT: | Sending ECT: | |||
* Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
- MUST NOT set ECT if it is in Not ECN feedback mode. | - MUST NOT set ECT if it is in Not ECN feedback mode. | |||
A Data Sender in AccECN mode: | A Data Sender in AccECN mode: | |||
skipping to change at page 23, line 12 ¶ | skipping to change at line 1049 ¶ | |||
- MAY not set ECT on any packet (for instance if it has reason to | - MAY not set ECT on any packet (for instance if it has reason to | |||
believe such a packet would be blocked); | believe such a packet would be blocked); | |||
A TCP Server in AccECN mode: | A TCP Server in AccECN mode: | |||
- MUST NOT set ECT on any packet for the rest of the connection, | - MUST NOT set ECT on any packet for the rest of the connection, | |||
if it has received or sent at least one valid SYN or Acceptable | if it has received or sent at least one valid SYN or Acceptable | |||
SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
This rule solely applies to a Server because, when a Server | This rule solely applies to a Server because, when a Server | |||
enters AccECN mode it doesn't know for sure whether the Client | enters AccECN mode, it doesn't know for sure whether the Client | |||
will end up in AccECN mode. But when a Client enters AccECN | will end up in AccECN mode. But when a Client enters AccECN | |||
mode, it can be certain that the Server is already in AccECN | mode, it can be certain that the Server is already in AccECN | |||
feedback mode. | feedback mode. | |||
Congestion response: | Congestion response: | |||
* A host in AccECN mode: | * A host in AccECN mode: | |||
- is obliged to respond appropriately to AccECN feedback that | - is obliged to respond appropriately to AccECN feedback that | |||
indicates there were ECN marks on packets it had previously | indicates there were ECN marks on packets it had previously | |||
sent, where 'appropriately' is defined in Section 6.1 of | sent, where 'appropriately' is defined in Section 6.1 of | |||
[RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | |||
- is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
feedback, even when it is solely sending non-ECN-capable | feedback, even when it is solely sending non-ECN-capable | |||
packets (for rationale, some examples and some exceptions see | packets (for rationale, some examples and some exceptions see | |||
Section 3.2.2.3, Section 3.2.2.4). | Sections 3.2.2.3 and 3.2.2.4). | |||
- is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
feedback, even if it has sent or received a SYN or SYN/ACK | feedback, even if it has sent or received a SYN or SYN/ACK | |||
packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | |||
- MUST NOT set CWR to indicate that it has received and responded | - MUST NOT set CWR to indicate that it has received and responded | |||
to indications of congestion. | to indications of congestion. | |||
For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
skipping to change at page 24, line 29 ¶ | skipping to change at line 1112 ¶ | |||
- MUST NOT use reception of packets with ECT set in the IP-ECN | - MUST NOT use reception of packets with ECT set in the IP-ECN | |||
field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
Each Data Receiver of each half connection maintains four counters, | Each Data Receiver of each half connection maintains four counters, | |||
r.cep, r.ceb, r.e0b and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
* The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
point in the IP ECN field, including CE marked control packets and | point in the IP-ECN field, including CE-marked control packets and | |||
retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
* A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
increment the r.ceb, r.e0b or r.e1b byte counters by the number of | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
ECT(0) and ECT(1) codepoint in their IP-ECN field, including any | ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | |||
payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
Each Data Sender of each half connection maintains four counters, | Each Data Sender of each half connection maintains four counters, | |||
s.cep, s.ceb, s.e0b and s.e1b intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
counters at the Data Receiver. | counters at the Data Receiver. | |||
A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
specified in Section 3.2.3. | specified in Section 3.2.3. | |||
Whenever a Data Receiver feeds back the value of any counter, it MUST | Whenever a Data Receiver feeds back the value of any counter, it MUST | |||
report the most recent value, no matter whether it is in a pure ACK, | report the most recent value, no matter whether it is in a pure ACK, | |||
or an ACK piggybacked on a packet used by the other half-connection, | or an ACK piggybacked on a packet used by the other half-connection, | |||
whether new payload data or a retransmission. Therefore the feedback | whether a new payload data or a retransmission. Therefore, the | |||
piggybacked on a retransmitted packet is unlikely to be the same as | feedback piggybacked on a retransmitted packet is unlikely to be the | |||
the feedback on the original packet. | same as the feedback on the original packet. | |||
3.2.1. Initialization of Feedback Counters | 3.2.1. Initialization of Feedback Counters | |||
When a host first enters AccECN mode, in its role as a Data Receiver | When a host first enters AccECN mode, in its role as a Data Receiver, | |||
it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1 and r.ceb | it initializes its counters to r.cep = 5, r.e0b = r.e1b = 1, and | |||
= 0, | r.ceb = 0, | |||
Non-zero initial values are used to support a stateless handshake | Non-zero initial values are used to support a stateless handshake | |||
(see Section 5.1) and to be distinct from cases where the fields are | (see Section 5.1) and to be distinct from cases where the fields are | |||
incorrectly zeroed (e.g., by middleboxes - see Section 3.2.3.2.4). | incorrectly zeroed (e.g., by middleboxes -- see Section 3.2.3.2.4). | |||
When a host enters AccECN mode, in its role as a Data Sender it | When a host enters AccECN mode, in its role as a Data Sender, it | |||
initializes its counters to s.cep = 5, s.e0b = s.e1b = 1 and s.ceb = | initializes its counters to s.cep = 5, s.e0b = s.e1b = 1, and s.ceb = | |||
0. | 0. | |||
3.2.2. The ACE Field | 3.2.2. The ACE Field | |||
After AccECN has been negotiated on the SYN and SYN/ACK, both hosts | After AccECN has been negotiated on the SYN and SYN/ACK, both hosts | |||
overload the three TCP flags (AE, CWR and ECE) in the main TCP header | overload the three TCP flags (AE, CWR, and ECE) in the main TCP | |||
as one 3-bit field. Then the field is given a new name, ACE, as | header as one 3-bit field. Then the field is given a new name, ACE, | |||
shown in Figure 3. | as shown in Figure 3. | |||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | U | A | P | R | S | F | | | | | | U | A | P | R | S | F | | |||
| Header Length | Reserved | ACE | R | C | S | S | Y | I | | | Header Length | Reserved | ACE | R | C | S | S | Y | I | | |||
| | | | G | K | H | T | N | N | | | | | | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 3: Definition of the ACE field within bytes 13 and 14 of | Figure 3: Definition of the ACE Field Within Bytes 13 and 14 of | |||
the TCP Header (when AccECN has been negotiated and SYN=0). | the TCP Header (When AccECN Has Been Negotiated and SYN=0). | |||
The original definition of these three flags in the TCP header, | The original definition of these three flags in the TCP header, | |||
including the addition of support for the ECN Nonce, is shown for | including the addition of support for the ECN-nonce, is shown for | |||
comparison in Figure 1. This specification does not rename these | comparison in Figure 1. This specification does not rename these | |||
three TCP flags to ACE unconditionally; it merely overloads them with | three TCP flags to ACE unconditionally; it merely overloads them with | |||
another name and definition once an AccECN connection has been | another name and definition once an AccECN connection has been | |||
established. | established. | |||
With one exception (Section 3.2.2.1), a host with both of its half- | With one exception (Section 3.2.2.1), a host with both of its half- | |||
connections in AccECN mode MUST interpret the AE, CWR and ECE flags | connections in AccECN mode MUST interpret the AE, CWR, and ECE flags | |||
as the 3-bit ACE counter on a segment with the SYN flag cleared | as the 3-bit ACE counter on a segment with the SYN flag cleared | |||
(SYN=0). On such a packet, a Data Receiver MUST encode the three | (SYN=0). On such a packet, a Data Receiver MUST encode the 3 least | |||
least significant bits of its r.cep counter into the ACE field that | significant bits of its r.cep counter into the ACE field that it | |||
it feeds back to the Data Sender. The least significant bit is at | feeds back to the Data Sender. The least significant bit is at bit | |||
bit offset 9 in Figure 3. A host MUST NOT interpret the 3 flags as a | offset 9 in Figure 3. A host MUST NOT interpret the three flags as a | |||
3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or | 3-bit ACE field on any segment with SYN=1 (whether ACK is 0 or 1), or | |||
if AccECN negotiation is incomplete or has not succeeded. | if AccECN negotiation is incomplete or has not succeeded. | |||
Both parts of each of these conditions are equally important. For | Both parts of each of these conditions are equally important. For | |||
instance, even if AccECN negotiation has been successful, the ACE | instance, even if AccECN negotiation has been successful, the ACE | |||
field is not defined on any segments with SYN=1 (e.g., a | field is not defined on any segments with SYN=1 (e.g., a | |||
retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
a simultaneous open). | a simultaneous open). | |||
skipping to change at page 26, line 46 ¶ | skipping to change at line 1221 ¶ | |||
with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
back which of the four values of the IP-ECN field arrived on the SYN/ | back which of the four values of the IP-ECN field arrived on the SYN/ | |||
ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
the ECN field is unlikely. | the ECN field is unlikely. | |||
The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
supports AccECN. The procedure for the TCP Server to follow if the | supports AccECN. If the final ACK of the handshake does not arrive | |||
final ACK of the handshake does not arrive before its retransmission | before its retransmission timer expires, the TCP Server is follow the | |||
timer expires is given in Section 3.1.4.2. | procedure given in Section 3.1.4.2. | |||
+==================+================+=====================+ | +==================+================+=====================+ | |||
| IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | | IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | |||
| on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | |||
+==================+================+=====================+ | +==================+================+=====================+ | |||
| Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
Table 3: The encoding of the ACE field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
the SYN-ACK to reflect the SYN-ACK's IP-ECN field | the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | |||
When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
SYN=0 and no SACK blocks, instead of treating the ACE field as a | SYN=0 and no SACK blocks, instead of treating the ACE field as a | |||
counter, it MUST infer the meaning of each possible value of the ACE | counter, it MUST infer the meaning of each possible value of the ACE | |||
field from Table 4, which also shows the value that an AccECN Server | field from Table 4, which also shows the value that an AccECN Server | |||
MUST set s.cep to as a result. | MUST set s.cep to as a result. | |||
Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
prefer to disable LRO until such an ACK has transitioned it out of | prefer to disable LRO until such an ACK has transitioned it out of | |||
skipping to change at page 28, line 28 ¶ | skipping to change at line 1275 ¶ | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b101 | Currently Unused {Note | 5 | | | 0b101 | Currently Unused {Note | 5 | | |||
| | 2} | | | | | 2} | | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b110 | CE | 6 | | | 0b110 | CE | 6 | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b111 | Currently Unused {Note | 5 | | | 0b111 | Currently Unused {Note | 5 | | |||
| | 2} | | | | | 2} | | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
Table 4: Meaning of the ACE field on the ACK of the SYN/ACK | Table 4: Meaning of the ACE Field on the ACK of the SYN/ACK | |||
{Note 1}: If the Server is in AccECN mode and in SYN-RCVD state, and | Note 1: If the Server is in AccECN mode and in SYN-RCVD state, and | |||
if it receives a value of zero on a pure ACK with SYN=0 and no SACK | if it receives a value of zero on a pure ACK with SYN=0 and | |||
blocks, for the rest of the connection the Server MUST NOT set ECT on | no SACK blocks, for the rest of the connection the Server | |||
outgoing packets and MUST NOT respond to AccECN feedback. | MUST NOT set ECT on outgoing packets and MUST NOT respond to | |||
Nonetheless, as a Data Receiver it MUST NOT disable AccECN feedback. | AccECN feedback. Nonetheless, as a Data Receiver, it MUST | |||
NOT disable AccECN feedback. | ||||
Any of the circumstances below could cause a value of zero but, | Any of the circumstances below could cause a value of zero | |||
whatever the cause, the actions above would be the appropriate | but, whatever the cause, the actions above would be the | |||
response: | appropriate response: | |||
* The TCP Client has somehow entered No ECN feedback mode (most | * The TCP Client has somehow entered No ECN feedback mode | |||
likely if the Server received a SYN or sent a SYN/ACK with | (most likely if the Server received a SYN or sent a SYN/ | |||
(AE,CWR,ECE) = (0,0,0) after entering AccECN mode, but possible | ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | |||
even if it didn't); | mode, but possible even if it didn't); | |||
* The TCP Client genuinely might be in AccECN mode, but its count of | * The TCP Client genuinely might be in AccECN mode, but its | |||
received CE marks might have caused the ACE field to wrap to zero. | count of received CE marks might have caused the ACE | |||
This is highly unlikely, but not impossible because the Server | field to wrap to zero. This is highly unlikely, but not | |||
might have already sent multiple packets while still in SYN-RCVD | impossible because the Server might have already sent | |||
state, e.g., using TFO (see Section 5.2) and some might have been | multiple packets while still in SYN-RCVD state, e.g., | |||
CE-marked. Then ACE on the first ACK seen by the Server might be | using TFO (see Section 5.2), and some might have been CE- | |||
zero, due to previous ACKs experiencing an unfortunate pattern of | marked. Then ACE on the first ACK seen by the Server | |||
loss or delay. | might be zero, due to previous ACKs experiencing an | |||
unfortunate pattern of loss or delay. | ||||
* Some form of non-compliance at the TCP Client or on the path (see | * There is some form of non-compliance at the TCP Client or | |||
Section 3.2.2.4). | on the path (see Section 3.2.2.4). | |||
{Note 2}: If the Server is in AccECN mode, these values are Currently | Note 2: If the Server is in AccECN mode, these values are Currently | |||
Unused but the AccECN Server's behaviour is still defined for forward | Unused but the AccECN Server's behaviour is still defined | |||
compatibility. Then the designer of a future protocol can know for | for forward compatibility. Then the designer of a future | |||
certain what AccECN Servers will do with these codepoints. | protocol can know for certain what AccECN Servers will do | |||
with these codepoints. | ||||
{Note 3}: In the case where a Server that implements AccECN is also | Note 3: In the case where a Server that implements AccECN is also | |||
using a stateless handshake (termed a SYN cookie) it will not | using a stateless handshake (termed a SYN cookie), it will | |||
remember whether it entered AccECN mode. The values 0b000 or 0b001 | not remember whether it entered AccECN mode. The values | |||
will remind it that it did not enter AccECN mode, because AccECN does | 0b000 or 0b001 will remind it that it did not enter AccECN | |||
not use them (see Section 5.1 for details). If a Server that uses a | mode, because AccECN does not use them (see Section 5.1 for | |||
stateless handshake and implements AccECN receives either of these | details). If a Server that uses a stateless handshake and | |||
two values in the ACK, its action is implementation-dependent and | implements AccECN receives either of these two values in the | |||
outside the scope of this document. It will certainly not take the | ACK, its action is implementation-dependent and outside the | |||
action in the third column because, after it receives either of these | scope of this document. It will certainly not take the | |||
values, it is not in AccECN mode. In example, it will not disable | action in the third column because, after it receives either | |||
ECN (at least not just because ACE is 0b000) and it will not set | of these values, it is not in AccECN mode. For example, it | |||
s.cep. | will not disable ECN (at least not just because ACE is | |||
0b000) and it will not set s.cep. | ||||
3.2.2.2. Encoding and Decoding Feedback in the ACE Field | 3.2.2.2. Encoding and Decoding Feedback in the ACE Field | |||
Whenever the Data Receiver sends an ACK with SYN=0 (with or without | Whenever the Data Receiver sends an ACK with SYN=0 (with or without | |||
data), unless the handshake encoding in Section 3.2.2.1 applies, the | data), unless the handshake encoding in Section 3.2.2.1 applies, the | |||
Data Receiver MUST encode the least significant 3 bits of its r.cep | Data Receiver MUST encode the least significant 3 bits of its r.cep | |||
counter into the ACE field (see Appendix A.2). | counter into the ACE field (see Appendix A.2). | |||
Whenever the Data Sender receives an ACK with SYN=0 (with or without | Whenever the Data Sender receives an ACK with SYN=0 (with or without | |||
data), it first checks whether it has already been superseded | data), it first checks whether it has already been superseded | |||
(defined in Appendix A.1) by another ACK in which case it ignores the | (defined in Appendix A.1) by another ACK in which case it ignores the | |||
ECN feedback. If the ACK has not been superseded, and if the special | ECN feedback. If the ACK has not been superseded, and if the special | |||
handshake encoding in Section 3.2.2.1 does not apply, the Data Sender | handshake encoding in Section 3.2.2.1 does not apply, the Data Sender | |||
decodes the ACE field as follows (see Appendix A.2 for examples). | decodes the ACE field as follows (see Appendix A.2 for examples). | |||
* It takes the least significant 3 bits of its local s.cep counter | * It takes the least significant 3 bits of its local s.cep counter | |||
and subtracts them from the incoming ACE counter to work out the | and subtracts them from the incoming ACE counter to work out the | |||
minimum positive increment it could apply to s.cep (assuming the | minimum positive increment it could apply to s.cep (assuming the | |||
ACE field only wrapped at most once). | ACE field only wrapped once at most). | |||
* It then follows the safety procedures in Section 3.2.2.5.2 to | * It then follows the safety procedures in Section 3.2.2.5.2 to | |||
calculate or estimate how many packets the ACK could have | calculate or estimate how many packets the ACK could have | |||
acknowledged under the prevailing conditions to determine whether | acknowledged under the prevailing conditions to determine whether | |||
the ACE field might have wrapped more than once. | the ACE field might have wrapped more than once. | |||
The encode/decode procedures during the three-way handshake are | The encode/decode procedures during the three-way handshake are | |||
exceptions to the general rules given so far, so they are spelled out | exceptions to the general rules given so far, so they are spelled out | |||
step by step below for clarity: | step by step below for clarity: | |||
skipping to change at page 30, line 19 ¶ | skipping to change at line 1368 ¶ | |||
Reason: It would be redundant for the Server to include CE-marked | Reason: It would be redundant for the Server to include CE-marked | |||
SYNs in its r.cep counter, because it already reliably delivers | SYNs in its r.cep counter, because it already reliably delivers | |||
feedback of any CE marking using the encoding in the top block of | feedback of any CE marking using the encoding in the top block of | |||
Table 2 in the SYN/ACK. This also ensures that, when the Server | Table 2 in the SYN/ACK. This also ensures that, when the Server | |||
starts using the ACE field, it has not unnecessarily consumed more | starts using the ACE field, it has not unnecessarily consumed more | |||
than one initial value, given they can be used to negotiate | than one initial value, given they can be used to negotiate | |||
variants of the AccECN protocol (see Appendix B.3). | variants of the AccECN protocol (see Appendix B.3). | |||
* If a TCP Client in AccECN mode receives CE feedback in the TCP | * If a TCP Client in AccECN mode receives CE feedback in the TCP | |||
flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | |||
initial value of 5), so that it stays in step with r.cep on the | initial value of 5) so that it stays in step with r.cep on the | |||
Server. Nonetheless, the TCP Client still triggers the congestion | Server. Nonetheless, the TCP Client still triggers the congestion | |||
control actions necessary to respond to the CE feedback. | control actions necessary to respond to the CE feedback. | |||
* If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | |||
field of a SYN/ACK, it MUST increment r.cep, but no more than once | field of a SYN/ACK, it MUST increment r.cep, but no more than once | |||
no matter how many CE-marked SYN/ACKs it receives | no matter how many CE-marked SYN/ACKs it receives (i.e., | |||
(i.e., incremented from 5 to 6, but no further). | incremented from 5 to 6, but no further). | |||
Reason: Incrementing r.cep ensures the Client will eventually | Reason: Incrementing r.cep ensures the Client will eventually | |||
deliver any CE marking to the Server reliably when it starts using | deliver any CE marking to the Server reliably when it starts using | |||
the ACE field. Even though the Client also feeds back any CE | the ACE field. Even though the Client also feeds back any CE | |||
marking on the ACK of the SYN/ACK using the encoding in Table 3, | marking on the ACK of the SYN/ACK using the encoding in Table 3, | |||
this ACK is not delivered reliably, so it can be considered as a | this ACK is not delivered reliably, so it can be considered as a | |||
timely notification that is redundant but unreliable. The Client | timely notification that is redundant but unreliable. The Client | |||
does not increment r.cep more than once, because the Server can | does not increment r.cep more than once, because the Server can | |||
only increment s.cep once (see next bullet). Also, this limits | only increment s.cep once (see next bullet). Also, this limits | |||
the unnecessarily consumed initial values of the ACE field to two. | the unnecessarily consumed initial values of the ACE field to two. | |||
* If a TCP Server in AccECN mode and in SYN-RCVD state receives CE | * If a TCP Server in AccECN mode and in SYN-RCVD state receives CE | |||
feedback in the TCP flags of a pure ACK with no SACK blocks, it | feedback in the TCP flags of a pure ACK with no SACK blocks, it | |||
MUST increment s.cep (from 5 to 6). The TCP Server then triggers | MUST increment s.cep (from 5 to 6). The TCP Server then triggers | |||
the congestion control actions necessary to respond to the CE | the congestion control actions necessary to respond to the CE | |||
feedback. | feedback. | |||
Reasoning: The TCP Server can only increment s.cep once, because | Reasoning: The TCP Server can only increment s.cep once, because | |||
the first ACK it receives will cause it to transition out of SYN- | the first ACK it receives will cause it to transition out of SYN- | |||
RCVD state. The Server's congestion response would be no | RCVD state. The Server's congestion response would be no | |||
different even if it could receive feedback of more than one CE- | different, even if it could receive feedback of more than one CE- | |||
marked SYN/ACK. | marked SYN/ACK. | |||
Once the TCP Server transitions to ESTABLISHED state, it might | Once the TCP Server transitions to ESTABLISHED state, it might | |||
later receive other pure ACK(s) with the handshake encoding in the | later receive other pure ACK(s) with the handshake encoding in the | |||
ACE field. A Server MAY implement a test for such a case, but it | ACE field. A Server MAY implement a test for such a case, but it | |||
is not required. Therefore, once in the ESTABLISHED state, it | is not required. Therefore, once in the ESTABLISHED state, it | |||
will be sufficient for the Server to consider the ACE field to be | will be sufficient for the Server to consider the ACE field to be | |||
encoded as the normal ACE counter on all packets with SYN=0. | encoded as the normal ACE counter on all packets with SYN=0. | |||
Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or | Reasoning: Such ACKs will be quite unusual, e.g., a SYN/ACK (or | |||
skipping to change at page 31, line 46 ¶ | skipping to change at line 1444 ¶ | |||
comparison implies an invalid transition of the IP-ECN field, for | comparison implies an invalid transition of the IP-ECN field, for | |||
the remainder of the half-connection the Server is advised to send | the remainder of the half-connection the Server is advised to send | |||
non-ECN-capable packets, but it still ought to respond to any | non-ECN-capable packets, but it still ought to respond to any | |||
feedback of CE markings (explained below). However, the Server | feedback of CE markings (explained below). However, the Server | |||
MUST remain in the AccECN feedback mode and it MUST continue to | MUST remain in the AccECN feedback mode and it MUST continue to | |||
feed back any ECN markings on arriving packets (in its role as | feed back any ECN markings on arriving packets (in its role as | |||
Data Receiver). | Data Receiver). | |||
If a Data Sender in AccECN mode starts sending non-ECN-capable | If a Data Sender in AccECN mode starts sending non-ECN-capable | |||
packets because it has detected mangling, it is still advised to | packets because it has detected mangling, it is still advised to | |||
respond to CE feedback. Reason: any CE-marking arriving at the Data | respond to CE feedback. Reason: Any CE marking arriving at the Data | |||
Receiver could be due to something early in the path mangling the | Receiver could be due to something early in the path mangling the | |||
non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | |||
later in the path, a network bottleneck might be applying CE-markings | later in the path, a network bottleneck might be applying CE markings | |||
to indicate genuine congestion. This argument applies whether the | to indicate genuine congestion. This argument applies whether the | |||
handshake packet originally sent by the TCP Client or Server was non- | handshake packet originally sent by the TCP Client or Server was non- | |||
ECN-capable or ECN-capable because, in either case, an unsafe | ECN-capable or ECN-capable because, in either case, an unsafe | |||
transition could imply that non-ECN-capable packets later in the | transition could imply that non-ECN-capable packets later in the | |||
connection might get mangled. | connection might get mangled. | |||
Once a Data Sender has entered AccECN mode it is advised to check | Once a Data Sender has entered AccECN mode it is advised to check | |||
whether it is receiving continuous feedback of CE. Specifying | whether it is receiving continuous feedback of CE. Specifying | |||
exactly how to do this is beyond the scope of the present | exactly how to do this is beyond the scope of the present | |||
specification, but the sender might check whether the feedback for | specification, but the sender might check whether the feedback for | |||
every packet it sends for the first three or four rounds indicates | every packet it sends for the first three or four rounds indicates CE | |||
CE-marking. If continuous CE-marking is detected, for the remainder | marking. If continuous CE marking is detected, for the remainder of | |||
of the half-connection, the Data Sender ought to send non-ECN-capable | the half-connection, the Data Sender ought to send non-ECN-capable | |||
packets and it is advised not to respond to any feedback of CE | packets, and it is advised not to respond to any feedback of CE | |||
markings. The Data Sender might occasionally test whether it can | markings. The Data Sender might occasionally test whether it can | |||
resume sending ECN-capable packets. | resume sending ECN-capable packets. | |||
The above advice on switching to sending non-ECN-capable packets but | The above advice on switching to sending non-ECN-capable packets but | |||
still responding to CE-markings unless they become continuous is not | still responding to CE markings unless they become continuous is not | |||
stated normatively (in capitals), because the best strategy might | stated normatively (in capitals), because the best strategy might | |||
depend on experience of the most likely types of mangling, which can | depend on experience of the most likely types of mangling, which can | |||
only be known at the time of deployment. The same is true for other | only be known at the time of deployment. The same is true for other | |||
forms of mangling (or resumption of expected marking) during later | forms of mangling (or resumption of expected marking) during later | |||
stages of a connection. | stages of a connection. | |||
As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
skipping to change at page 32, line 42 ¶ | skipping to change at line 1488 ¶ | |||
whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
ACK does not arrive, the Server is advised to continue to send ECN- | ACK does not arrive, the Server is advised to continue to send ECN- | |||
capable packets without having tested for mangling of the IP-ECN | capable packets without having tested for mangling of the IP-ECN | |||
field on the SYN/ACK. | field on the SYN/ACK. | |||
All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
mangling of the IP-ECN field is asymmetric, which is currently common | mangling of the IP-ECN field is asymmetric, which is currently common | |||
over some mobile networks [Mandalari18]. Then one end might see no | over some mobile networks [Mandalari18]. In this case, one end might | |||
unsafe transition and continue sending ECN-capable packets, while the | see no unsafe transition and continue sending ECN-capable packets, | |||
other end sees an unsafe transition and stops sending ECN-capable | while the other end sees an unsafe transition and stops sending ECN- | |||
packets. | capable packets. | |||
Invalid transitions of the IP-ECN field are defined in section 18 of | Invalid transitions of the IP-ECN field are defined in Section 18 of | |||
the Classic ECN specification [RFC3168] and repeated here for | the Classic ECN specification [RFC3168] and repeated here for | |||
convenience: | convenience: | |||
* the not-ECT codepoint changes; | * the Not-ECT codepoint changes; | |||
* either ECT codepoint transitions to not-ECT; | ||||
* either ECT codepoint transitions to Not-ECT; | ||||
* the CE codepoint changes. | * the CE codepoint changes. | |||
RFC 3168 says that a router that changes ECT to not-ECT is invalid | RFC 3168 says that a router that changes ECT to Not-ECT is invalid | |||
but safe. However, from a host's viewpoint, this transition is | but safe. However, from a host's viewpoint, this transition is | |||
unsafe because it could be the result of two transitions at different | unsafe because it could be the result of two transitions at different | |||
routers on the path: ECT to CE (safe) then CE to not-ECT (unsafe). | routers on the path: ECT to CE (safe) then CE to Not-ECT (unsafe). | |||
This scenario could well happen where an ECN-enabled home router | This scenario could well happen where an ECN-enabled home router | |||
congests its upstream mobile broadband bottleneck link, then the | congests its upstream mobile broadband bottleneck link, then the | |||
ingress to the mobile network clears the ECN field [Mandalari18]. | ingress to the mobile network clears the ECN field [Mandalari18]. | |||
3.2.2.4. Testing for Zeroing of the ACE Field | 3.2.2.4. Testing for Zeroing of the ACE Field | |||
Section 3.2.2 required the Data Receiver to initialize the r.cep | Section 3.2.2 required the Data Receiver to initialize the r.cep | |||
counter to a non-zero value. Therefore, in either direction the | counter to a non-zero value. Therefore, in either direction the | |||
initial value of the ACE counter ought to be non-zero. | initial value of the ACE counter ought to be non-zero. | |||
skipping to change at page 34, line 13 ¶ | skipping to change at line 1554 ¶ | |||
the other half connection. | the other half connection. | |||
If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
it might have been a retransmission. | it might have been a retransmission. | |||
The possibility of re-ordering means that there is a small chance | The possibility of reordering means that there is a small chance that | |||
that the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
(without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half connection. Therefore, in | |||
environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
Note that the Data Sender MUST NOT test whether the arriving counter | Note that the Data Sender MUST NOT test whether the arriving counter | |||
in the initial ACE field has been initialized to a specific valid | in the initial ACE field has been initialized to a specific valid | |||
value - the above check solely tests whether the ACE fields have been | value -- the above check solely tests whether the ACE fields have | |||
incorrectly zeroed. This allows hosts to use different initial | been incorrectly zeroed. This allows hosts to use different initial | |||
values as an additional signalling channel in future. | values as an additional signalling channel in the future. | |||
3.2.2.5. Safety against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
If too many CE-marked segments are acknowledged at once, or if a long | If too many CE-marked segments are acknowledged at once, or if a long | |||
run of ACKs is lost or thinned out, the 3-bit counter in the ACE | run of ACKs is lost or thinned out, the 3-bit counter in the ACE | |||
field might have cycled between two ACKs arriving at the Data Sender. | field might have cycled between two ACKs arriving at the Data Sender. | |||
The following safety procedures minimize this ambiguity. | The following safety procedures minimize this ambiguity. | |||
3.2.2.5.1. Packet Receiver Safety Procedures | 3.2.2.5.1. Packet Receiver Safety Procedures | |||
The following rules define when the receiver of a packet in AccECN | The following rules define when the receiver of a packet in AccECN | |||
mode emits an ACK: | mode emits an ACK: | |||
Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | |||
whenever a data packet marked CE arrives after the previous packet | whenever a data packet marked CE arrives after the previous packet | |||
was not CE. | was not CE. | |||
Even though this rule is stated as a "SHOULD", it is important for | Even though this rule is stated as a "SHOULD", it is important for | |||
a transition to trigger an ACK if at all possible, The only valid | a transition to trigger an ACK if at all possible. The only valid | |||
exception to this rule is given below these bullets. | exception to this rule is given below these bullets. | |||
For the avoidance of doubt, this rule is deliberately worded to | For the avoidance of doubt, this rule is deliberately worded to | |||
apply solely when _data_ packets arrive, but the comparison with | apply solely when _data_ packets arrive, but the comparison with | |||
the previous packet includes any packet, not just data packets. | the previous packet includes any packet, not just data packets. | |||
Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | |||
an ACK if 'n' CE marks have arrived since the previous ACK. If | an ACK if 'n' CE marks have arrived since the previous ACK. If | |||
there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | |||
there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | |||
and MUST be no less than 3. In either case, 'n' MUST be no | and MUST be no less than 3. In either case, 'n' MUST be no | |||
greater than 7. | greater than 7. | |||
The above rules for when to send an ACK are designed to be | The above rules for when to send an ACK are designed to be | |||
complemented by those in Section 3.2.3.3, which concern whether an | complemented by those in Section 3.2.3.3, which concern whether an | |||
AccECN TCP Option ought to be included on ACKs. | AccECN TCP Option ought to be included on ACKs. | |||
If the arrivals of a number of data packets are all processed as one | If the arrivals of a number of data packets are all processed as one | |||
event, e.g., using large receive offload (LRO) or generic receive | event, e.g., using large receive offload (LRO) or generic receive | |||
offload (GRO), both the above rules SHOULD be interpreted as | offload (GRO), both the above rules SHOULD be interpreted as | |||
requiring multiple ACKs to be emitted back-to-back (for each | requiring multiple ACKs to be emitted back to back (for each | |||
transition and for each sequence of 'n' CE marks). If this is | transition and for each sequence of 'n' CE marks). If this is | |||
problematic for high performance, either rule can be interpreted as | problematic for high performance, either rule can be interpreted as | |||
requiring just a single ACK at the end of the whole receive event. | requiring just a single ACK at the end of the whole receive event. | |||
Even if a number of data packets do not arrive as one event, the | Even if a number of data packets do not arrive as one event, the | |||
'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | 'Change-Triggered ACKs' rule could sometimes cause the ACK rate to be | |||
problematic for high performance (although high performance protocols | problematic for high performance (although high performance protocols | |||
such as DCTCP already successfully use change-triggered ACKs). The | such as DCTCP already successfully use change-triggered ACKs). The | |||
rationale for change-triggered ACKs is so that the Data Sender can | rationale for change-triggered ACKs is so that the Data Sender can | |||
rely on them to detect queue growth as soon as possible, particularly | rely on them to detect queue growth as soon as possible, particularly | |||
at the start of a flow. The approach can lead to some additional | at the start of a flow. The approach can lead to some additional | |||
ACKs but it feeds back the timing and the order in which ECN marks | ACKs but it feeds back the timing and the order in which ECN marks | |||
are received with minimal additional complexity. If CE marks are | are received with minimal additional complexity. If CE marks are | |||
infrequent, as is the case for most Active Queue Managment (AQM) | infrequent, as is the case for most Active Queue Management (AQM) | |||
packet schedulers at the time of writing, or there are multiple marks | packet schedulers at the time of writing, or there are multiple marks | |||
in a row, the additional load will be low. However, marking patterns | in a row, the additional load will be low. However, marking patterns | |||
with numerous non-contiguous CE marks could increase the load | with numerous non-contiguous CE marks could increase the load | |||
significantly. One possible compromise would be for the receiver to | significantly. One possible compromise would be for the receiver to | |||
heuristically detect whether the sender is in slow-start, then to | heuristically detect whether the sender is in slow-start, then to | |||
implement change-triggered ACKs while the sender is in slow-start, | implement change-triggered ACKs while the sender is in slow-start, | |||
and offload otherwise. | and offload otherwise. | |||
In a scenario where both endpoints support AccECN, if host B has | In a scenario where both endpoints support AccECN, if host B has | |||
chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | chosen to use ECN-capable pure ACKs (as allowed in [RFC8311] | |||
experiments) and enough of these ACKs become CE-marked, then the | experiments) and enough of these ACKs become CE marked, then the | |||
'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | 'Increment-Triggered ACKs' rule ensures that its peer (host A) gives | |||
B sufficient feedback about this congestion on the ACKs from B to A. | B sufficient feedback about this congestion on the ACKs from B to A. | |||
Normally, for instance in a unidirectional data scenario from host A | Normally, for instance in a unidirectional data scenario from host A | |||
to B, the Data Sender (A) can piggyback that feedback on its data. | to B, the Data Sender (A) can piggyback that feedback on its data. | |||
But if A stops sending data, the second part of the 'Increment- | But if A stops sending data, the second part of the 'Increment- | |||
Triggered ACKs' rule requires A to emit a pure ACK for at least every | Triggered ACKs' rule requires A to emit a pure ACK for at least every | |||
third CE-marked incoming ACK over the subsequent round trip. | third CE-marked incoming ACK over the subsequent round trip. | |||
Although TCP normally only ACKs data segments, in this case the | Although TCP normally only ACKs data segments, in this case the | |||
increment-triggered ACK rule makes it mandatory for A to emit ACKs of | increment-triggered ACK rule makes it mandatory for A to emit ACKs of | |||
skipping to change at page 36, line 21 ¶ | skipping to change at line 1655 ¶ | |||
even if A also uses ECN-capable pure ACKs, and even if there is | even if A also uses ECN-capable pure ACKs, and even if there is | |||
pathological congestion in both directions, any resulting ping-pong | pathological congestion in both directions, any resulting ping-pong | |||
of ACKs will be rapidly damped. | of ACKs will be rapidly damped. | |||
In the above bidirectional scenario, incoming ACKs of ACKs could be | In the above bidirectional scenario, incoming ACKs of ACKs could be | |||
mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished | mistaken for duplicate ACKs. But ACKs of ACKs can be distinguished | |||
from duplicate ACKs because they do not contain any SACK blocks even | from duplicate ACKs because they do not contain any SACK blocks even | |||
when SACK has been negotiated. It is outside the scope of this | when SACK has been negotiated. It is outside the scope of this | |||
AccECN specification to normatively specify this additional test for | AccECN specification to normatively specify this additional test for | |||
DupACKs, because ACKs of ACKs can only arise if the original ACKs are | DupACKs, because ACKs of ACKs can only arise if the original ACKs are | |||
ECN-capable. Instead any specification that allows ECN-capable pure | ECN-capable. Instead, any specification that allows ECN-capable pure | |||
ACKs MUST make sending ACKs of ACKs conditional on measures to | ACKs MUST make sending ACKs of ACKs conditional on measures to | |||
distinguish ACKs of ACKs from DupACKs (see for example | distinguish ACKs of ACKs from DupACKs (see for example [ECN++]). All | |||
[I-D.ietf-tcpm-generalized-ecn]). All that is necessary here is to | that is necessary here is to require that these ACKs of ACKs MUST NOT | |||
require that these ACKs of ACKs MUST NOT contain any SACK blocks | contain any SACK blocks (which would normally not happen anyway). | |||
(which would normally not happen anyway). | ||||
3.2.2.5.2. Data Sender Safety Procedures | 3.2.2.5.2. Data Sender Safety Procedures | |||
If the Data Sender has not received AccECN TCP Options to give it | If the Data Sender has not received AccECN TCP Options to give it | |||
more dependable information, and it detects that the ACE field could | more dependable information, and it detects that the ACE field could | |||
have cycled, it SHOULD deem whether it cycled by taking the safest | have cycled, it SHOULD deem whether it cycled by taking the safest | |||
likely case under the prevailing conditions. It can detect if the | likely case under the prevailing conditions. It can detect if the | |||
counter could have cycled by using the jump in the acknowledgement | counter could have cycled by using the jump in the acknowledgement | |||
number since the last ACK to calculate or estimate how many segments | number since the last ACK to calculate or estimate how many segments | |||
could have been acknowledged. An example algorithm to implement this | could have been acknowledged. An example algorithm to implement this | |||
skipping to change at page 37, line 33 ¶ | skipping to change at line 1715 ¶ | |||
| Kind = 174 | Length = 11 | EE1B field | | | Kind = 174 | Length = 11 | EE1B field | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| EE1B (cont'd) | ECEB field | | | EE1B (cont'd) | ECEB field | | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| EE0B field | Order 1 | | EE0B field | Order 1 | |||
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
Figure 4: The Two Alternative AccECN TCP Options | Figure 4: The Two Alternative AccECN TCP Options | |||
Figure 4 shows two option field orders; order 0 and order 1. They | Figure 4 shows two option field orders; order 0 and order 1. They | |||
both consists of three 24-bit fields. Order 0 provides the 24 least | both consist of three 24-bit fields. Order 0 provides the 24 least | |||
significant bits of the r.e0b, r.ceb and r.e1b counters, | significant bits of the r.e0b, r.ceb, and r.e1b counters, | |||
respectively. Order 1 provides the same fields, but in the opposite | respectively. Order 1 provides the same fields, but in the opposite | |||
order. On each packet, the Data Receiver can use whichever order is | order. On each packet, the Data Receiver can use whichever order is | |||
more efficient. In either case, the bytes within the fields are in | more efficient. In either case, the bytes within the fields are in | |||
network byte order (big-endian). | network byte order (big-endian). | |||
The choice to use three bytes (24 bits) fields in the options was | The choice to use three bytes (24 bits) fields in the options was | |||
made to strike a balance between TCP option space usage, and the | made to strike a balance between TCP option space usage, and the | |||
required fidelity of the counters to accomodate typical scenarios | required fidelity of the counters to accommodate typical scenarios | |||
such as hardware TCP segmentation offloading (TSO), and periods where | such as hardware TCP Segmentation Offloading (TSO), and periods | |||
no option may be transmitted (e.g., SACK loss recovery). Providing | during which no option may be transmitted (e.g., SACK loss recovery). | |||
only 2 bytes (16 bits) for these counters could easily roll over | Providing only 2 bytes (16 bits) for these counters could easily roll | |||
within a single TSO transmission or large/generic receive offload | over within a single TSO transmission or large/generic receive | |||
(LRO/GRO) event. Having two distinct orderings further allows the | offload (LRO/GRO) event. Having two distinct orderings further | |||
transmission of the most pertinent changes in an abbreviated option | allows the transmission of the most pertinent changes in an | |||
(see below). | abbreviated option (see below). | |||
When a Data Receiver sends an AccECN Option, it MUST set the Kind | When a Data Receiver sends an AccECN Option, it MUST set the Kind | |||
field to 172 if using Order 0, or to 174 if using Order 1. These two | field to 172 if using Order 0, or to 174 if using Order 1. These two | |||
new TCP Option Kinds are registered in Section 7 and called | new TCP Option Kinds are registered in Section 7 and are called | |||
respectively AccECN0 and AccECN1. | AccECN0 and AccECN1, respectively. | |||
Note that there is no field to feed back Not-ECT bytes. Nonetheless | Note that there is no field to feed back Not-ECT bytes. Nonetheless, | |||
an algorithm for the Data Sender to calculate the number of payload | an algorithm for the Data Sender to calculate the number of payload | |||
bytes received as Not-ECT is given in Appendix A.4. | bytes received as Not-ECT is given in Appendix A.4. | |||
Whenever a Data Receiver sends an AccECN Option, the rules in | Whenever a Data Receiver sends an AccECN Option, the rules in | |||
Section 3.2.3.3 allow it to omit unchanged fields from the tail of | Section 3.2.3.3 allow it to omit unchanged fields from the tail of | |||
the option, to help cope with option space limitations, as long as it | the option, to help cope with option space limitations, as long as it | |||
preserves the order of the remaining fields and includes any field | preserves the order of the remaining fields and includes any field | |||
that has changed. The length field MUST indicate which fields are | that has changed. The length field MUST indicate which fields are | |||
present as follows: | present as follows: | |||
skipping to change at page 38, line 48 ¶ | skipping to change at line 1776 ¶ | |||
but there is very limited space for the option. | but there is very limited space for the option. | |||
All implementations of a Data Sender that read any AccECN Option MUST | All implementations of a Data Sender that read any AccECN Option MUST | |||
be able to read AccECN Options of any of the above lengths. For | be able to read AccECN Options of any of the above lengths. For | |||
forward compatibility, if the AccECN Option is of any other length, | forward compatibility, if the AccECN Option is of any other length, | |||
implementations MUST use those whole 3-octet fields that fit within | implementations MUST use those whole 3-octet fields that fit within | |||
the length and ignore the remainder of the option, treating it as | the length and ignore the remainder of the option, treating it as | |||
padding. | padding. | |||
AccECN Options have to be optional to implement, because both sender | AccECN Options have to be optional to implement, because both sender | |||
and receiver have to be able to cope without options anyway - in | and receiver have to be able to cope without options anyway -- in | |||
cases where they do not traverse a network path. It is RECOMMENDED | cases where they do not traverse a network path. It is RECOMMENDED | |||
to implement both sending and receiving of AccECN Options. Support | to implement both sending and receiving of AccECN Options. Support | |||
for AccECN Options is particularly valuable over paths that introduce | for AccECN Options is particularly valuable over paths that introduce | |||
a high degree of ACK filtering, where the 3-bit ACE counter alone | a high degree of ACK filtering, where the 3-bit ACE counter alone | |||
might sometimes be insufficient, when it is ambiguous whether it has | might sometimes be insufficient, when it is ambiguous whether it has | |||
wrapped. If sending of AccECN Options is implemented, the fall-backs | wrapped. If sending of AccECN Options is implemented, the fall-backs | |||
described in this document will need to be implemented as well | described in this document will need to be implemented as well | |||
(unless solely for a controlled environment where path traversal is | (unless solely for a controlled environment where path traversal is | |||
not considered a problem). Even if a developer does not implement | not considered a problem). Even if a developer does not implement | |||
logic to understand received AccECN Options, it is RECOMMENDED that | logic to understand received AccECN Options, it is RECOMMENDED that | |||
they implement logic to send AccECN Options. Otherwise, those remote | they implement logic to send AccECN Options. Otherwise, those remote | |||
peers that implement the receiving logic will still be excluded from | peers that implement the receiving logic will still be excluded from | |||
congestion feedback that is robust against the increasingly | congestion feedback that is robust against the increasingly | |||
aggressive ACK filtering in the Internet. The logic to send AccECN | aggressive ACK filtering in the Internet. The logic to send AccECN | |||
Options is the simpler to implement of the two sides. | Options is the simpler to implement of the two sides. | |||
If a Data Receiver intends to send an AccECN Option at any time | If a Data Receiver intends to send an AccECN Option at any time | |||
during the rest of the connection it is RECOMMENDED to also test path | during the rest of the connection, it is RECOMMENDED to also test | |||
traversal of the AccECN Option as specified in Section 3.2.3.2. | path traversal of the AccECN Option as specified in Section 3.2.3.2. | |||
3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | 3.2.3.1. Encoding and Decoding Feedback in the AccECN Option Fields | |||
Whenever the Data Receiver includes any of the counter fields (ECEB, | Whenever the Data Receiver includes any of the counter fields (ECEB, | |||
EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | EE0B, EE1B) in an AccECN Option, it MUST encode the 24 least | |||
significant bits of the current value of the associated counter into | significant bits of the current value of the associated counter into | |||
the field (respectively r.ceb, r.e0b, r.e1b). | the field (respectively r.ceb, r.e0b, r.e1b). | |||
Whenever the Data Sender receives an ACK carrying an AccECN Option, | Whenever the Data Sender receives an ACK carrying an AccECN Option, | |||
it first checks whether the ACK has already been superseded by | it first checks whether the ACK has already been superseded by | |||
another ACK in which case it ignores the ECN feedback. If the ACK | another ACK in which case it ignores the ECN feedback. If the ACK | |||
has not been superseded, the Data Sender normally decodes the fields | has not been superseded, the Data Sender normally decodes the fields | |||
in the AccECN Option as follows. For each field, it takes the least | in the AccECN Option as follows. For each field, it takes the least | |||
significant 24 bits of its associated local counter (s.ceb, s.e0b or | significant 24 bits of its associated local counter (s.ceb, s.e0b, or | |||
s.e1b) and subtracts them from the counter in the associated field of | s.e1b) and subtracts them from the counter in the associated field of | |||
the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work | the incoming AccECN Option (respectively ECEB, EE0B, EE1B), to work | |||
out the minimum positive increment it could apply to s.ceb, s.e0b or | out the minimum positive increment it could apply to s.ceb, s.e0b, or | |||
s.e1b (assuming the field in the option only wrapped at most once). | s.e1b (assuming the field in the option only wrapped once at most). | |||
Appendix A.1 gives an example algorithm for the Data Receiver to | Appendix A.1 gives an example algorithm for the Data Receiver to | |||
encode its byte counters into an AccECN Option, and for the Data | encode its byte counters into an AccECN Option, and for the Data | |||
Sender to decode the AccECN Option fields into its byte counters. | Sender to decode the AccECN Option fields into its byte counters. | |||
Note that, as specified in Section 3.2, any data on the SYN (SYN=1, | Note that, as specified in Section 3.2, any data on the SYN (SYN=1, | |||
ACK=0) is not included in any of the byte counters held locally for | ACK=0) is not included in any of the byte counters held locally for | |||
each ECN marking nor in an AccECN Option on the wire. | each ECN marking nor in an AccECN Option on the wire. | |||
3.2.3.2. Path Traversal of the AccECN Option | 3.2.3.2. Path Traversal of the AccECN Option | |||
3.2.3.2.1. Testing the AccECN Option during the Handshake | ||||
3.2.3.2.1. Testing the AccECN Option During the Handshake | ||||
The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If | The TCP Client MUST NOT include an AccECN TCP Option on the SYN. If | |||
there is somehow an AccECN Option on a SYN, it MUST be ignored when | there is somehow an AccECN Option on a SYN, it MUST be ignored when | |||
forwarded or received. | forwarded or received. | |||
A TCP Server that confirms its support for AccECN (in response to an | A TCP Server that confirms its support for AccECN (in response to an | |||
AccECN SYN from the Client as described in Section 3.1) SHOULD | AccECN SYN from the Client as described in Section 3.1) SHOULD | |||
include an AccECN TCP Option on the SYN/ACK. | include an AccECN TCP Option on the SYN/ACK. | |||
A TCP Client that has successfully negotiated AccECN SHOULD include | A TCP Client that has successfully negotiated AccECN SHOULD include | |||
an AccECN Option in the first ACK at the end of the three-way | an AccECN Option in the first ACK at the end of the three-way | |||
handshake. However, this first ACK is not delivered reliably, so the | handshake. However, this first ACK is not delivered reliably, so the | |||
TCP Client SHOULD also include an AccECN Option on the first data | TCP Client SHOULD also include an AccECN Option on the first data | |||
segment it sends (if it ever sends one). | segment it sends (if it ever sends one). | |||
A host MAY omit an AccECN Option in any of the above three cases due | A host MAY omit an AccECN Option in any of the above three cases | |||
to insufficient option space or if it has cached knowledge that the | because of insufficient option space or because it has cached | |||
packet would be likely to be blocked on the path to the other host if | knowledge that the packet would be likely to be blocked on the path | |||
it included an AccECN Option. | to the other host if it included an AccECN Option. | |||
3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option | 3.2.3.2.2. Testing for Loss of Packets Carrying the AccECN Option | |||
If the TCP Server has not received an ACK to acknowledge its SYN/ACK | If the TCP Server has not received an ACK to acknowledge its SYN/ACK | |||
after the normal TCP timeout or it receives a second SYN with a | after the normal TCP timeout or if it receives a second SYN with a | |||
request for AccECN support, then either the SYN/ACK might just have | request for AccECN support, then either the SYN/ACK might just have | |||
been lost, e.g., due to congestion, or a middlebox might be blocking | been lost, e.g., due to congestion, or a middlebox might be blocking | |||
AccECN Options. To expedite connection setup in deployment scenarios | AccECN Options. To expedite connection setup in deployment scenarios | |||
where AccECN path traversal might be problematic, the TCP Server | where AccECN path traversal might be problematic, the TCP Server | |||
SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | |||
retransmission times out, to expedite connection setup, the TCP | retransmission times out, to expedite connection setup, the TCP | |||
Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | |||
no AccECN Option, but it remains in AccECN feedback mode (per | no AccECN Option, but it remains in AccECN feedback mode (per | |||
Section 3.1.5). | Section 3.1.5). | |||
skipping to change at page 41, line 7 ¶ | skipping to change at line 1875 ¶ | |||
The above fall-back approach limits any interference by middleboxes | The above fall-back approach limits any interference by middleboxes | |||
that might drop packets with unknown options, even though it is more | that might drop packets with unknown options, even though it is more | |||
likely that SYN/ACK loss is due to congestion. The TCP Server MAY | likely that SYN/ACK loss is due to congestion. The TCP Server MAY | |||
try to send another packet with an AccECN Option at a later point | try to send another packet with an AccECN Option at a later point | |||
during the connection but it ought to monitor if that packet got lost | during the connection but it ought to monitor if that packet got lost | |||
as well, in which case it SHOULD disable the sending of AccECN | as well, in which case it SHOULD disable the sending of AccECN | |||
Options for this half-connection. | Options for this half-connection. | |||
Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
be more effective (e.g., retrying an AccECN Option for a second time | be more effective (e.g., retrying an AccECN Option for a second time | |||
before fall-back - most appropriate during high levels of | before fall-back -- most appropriate during high levels of | |||
congestion). However, other fall-back strategies will need to follow | congestion). However, other fall-back strategies will need to follow | |||
all the rules in Section 3.1.5, which concern behaviour when SYNs or | all the rules in Section 3.1.5, which concern behaviour when SYNs or | |||
SYN/ACKs negotiating different types of feedback have been sent | SYN/ACKs negotiating different types of feedback have been sent | |||
within the same connection. | within the same connection. | |||
Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
experimental fields or options on the SYN/ACK, although the required | experimental fields or options on the SYN/ACK, although the required | |||
behaviour will depend on the specification of the other option(s) and | behaviour will depend on the specification of the other option(s) and | |||
on any attempt to co-ordinate fall-back between different modules of | on any attempt to coordinate fall-back between different modules of | |||
the stack. | the stack. | |||
If the TCP Client detects that the first data segment it sent with an | If the TCP Client detects that the first data segment it sent with an | |||
AccECN Option was lost, in deployment scenarios where AccECN path | AccECN Option was lost, in deployment scenarios where AccECN path | |||
traversal might be problematic, it SHOULD fall back to no AccECN | traversal might be problematic, it SHOULD fall back to no AccECN | |||
Option on the retransmission. Again, implementers MAY use other | Option on the retransmission. Again, implementers MAY use other | |||
fall-back strategies such as attempting to retransmit a second | fall-back strategies such as attempting to retransmit a second | |||
segment with an AccECN Option before fall-back, and/or caching | segment with an AccECN Option before fall-back, and/or caching | |||
whether AccECN Options are blocked for subsequent connections. | whether AccECN Options are blocked for subsequent connections. | |||
[RFC9040] further discusses caching of TCP parameters and status | [RFC9040] further discusses caching of TCP parameters and status | |||
skipping to change at page 41, line 40 ¶ | skipping to change at line 1908 ¶ | |||
recognize, a host that is sending little or no data but mostly pure | recognize, a host that is sending little or no data but mostly pure | |||
ACKs will not inherently detect such losses. Such a host MAY detect | ACKs will not inherently detect such losses. Such a host MAY detect | |||
loss of ACKs carrying the AccECN Option by detecting whether the | loss of ACKs carrying the AccECN Option by detecting whether the | |||
acknowledged data always reappears as a retransmission. In such | acknowledged data always reappears as a retransmission. In such | |||
cases, the host SHOULD disable the sending of the AccECN Option for | cases, the host SHOULD disable the sending of the AccECN Option for | |||
this half-connection. | this half-connection. | |||
If a host falls back to not sending AccECN Options, it will continue | If a host falls back to not sending AccECN Options, it will continue | |||
to process any incoming AccECN Options as normal. | to process any incoming AccECN Options as normal. | |||
Either host MAY include AccECN Options in a subsequent segment or | Either host MAY include AccECN Options in one or more subsequent | |||
segments to retest whether AccECN Options can traverse the path. | segments to retest whether AccECN Options can traverse the path. | |||
Similarly, an AccECN endpoint MAY separately memorize which data | Similarly, an AccECN endpoint MAY separately memorize which data | |||
packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
If the TCP Client has successfully negotiated AccECN but does not | If the TCP Client has successfully negotiated AccECN but does not | |||
skipping to change at page 43, line 5 ¶ | skipping to change at line 1962 ¶ | |||
the initial value of the EE0B field or EE1B field in an AccECN Option | the initial value of the EE0B field or EE1B field in an AccECN Option | |||
(if one exists) ought to be non-zero. If AccECN has been negotiated: | (if one exists) ought to be non-zero. If AccECN has been negotiated: | |||
* the TCP Server MAY check that the initial value of the EE0B field | * the TCP Server MAY check that the initial value of the EE0B field | |||
or the EE1B field is non-zero in the first segment that | or the EE1B field is non-zero in the first segment that | |||
acknowledges sequence space that at least covers the ISN plus 1. | acknowledges sequence space that at least covers the ISN plus 1. | |||
If it runs a test and either initial value is zero, the Server | If it runs a test and either initial value is zero, the Server | |||
will switch into a mode that ignores AccECN Options for this half | will switch into a mode that ignores AccECN Options for this half | |||
connection. | connection. | |||
* the TCP Client MAY check the initial value of the EE0B field or | * the TCP Client MAY check that the initial value of the EE0B field | |||
the EE1B field is non-zero on the SYN/ACK. If it runs a test and | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
either initial value is zero, the Client will switch into a mode | and either initial value is zero, the Client will switch into a | |||
that ignores AccECN Options for this half connection. | mode that ignores AccECN Options for this half connection. | |||
While a host is in the mode that ignores AccECN Options it MUST adopt | While a host is in the mode that ignores AccECN Options, it MUST | |||
the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
Section 3.2.2.5. | Section 3.2.2.5. | |||
Note that the Data Sender MUST NOT test whether the arriving byte | Note that the Data Sender MUST NOT test whether the arriving byte | |||
counters in an initial AccECN Option have been initialized to | counters in an initial AccECN Option have been initialized to | |||
specific valid values - the above checks solely test whether these | specific valid values -- the above checks solely test whether these | |||
fields have been incorrectly zeroed. This allows hosts to use | fields have been incorrectly zeroed. This allows hosts to use | |||
different initial values as an additional signalling channel in | different initial values as an additional signalling channel in the | |||
future. Also note that the initial value of either field might be | future. Also note that the initial value of either field might be | |||
greater than its expected initial value, because the counters might | greater than its expected initial value, because the counters might | |||
already have been incremented. Nonetheless, the initial values of | already have been incremented. Nonetheless, the initial values of | |||
the counters have been chosen so that they cannot wrap to zero on | the counters have been chosen so that they cannot wrap to zero on | |||
these initial segments. | these initial segments. | |||
3.2.3.2.5. Consistency between AccECN Feedback Fields | 3.2.3.2.5. Consistency Between AccECN Feedback Fields | |||
When AccECN Options are available they ought to provide more | When AccECN Options are available, they ought to provide more | |||
unambiguous feedback. However, they supplement but do not replace | unambiguous feedback. However, they supplement but do not replace | |||
the ACE field. An endpoint using AccECN feedback MUST always | the ACE field. An endpoint using AccECN feedback MUST always | |||
reconcile the information provided in the ACE field with that in any | reconcile the information provided in the ACE field with that in any | |||
AccECN Option, so that the state of the ACE-related packet counter | AccECN Option, so that the state of the ACE-related packet counter | |||
can be relied on if future feedback does not carry an AccECN Option. | can be relied on if future feedback does not carry an AccECN Option. | |||
If an AccECN Option is present, the s.cep counter might increase more | If an AccECN Option is present, the s.cep counter might increase more | |||
than expected from the increase of the s.ceb counter (e.g., due to a | than expected from the increase of the s.ceb counter (e.g., due to a | |||
CE-marked control packet). The sender's response to such a situation | CE-marked control packet). The sender's response to such a situation | |||
is out of scope, and needs to be dealt with in a specification that | is out of scope, and needs to be dealt with in a specification that | |||
skipping to change at page 44, line 8 ¶ | skipping to change at line 2012 ¶ | |||
the s.cep has not (and by testing ACK coverage it is certain how much | the s.cep has not (and by testing ACK coverage it is certain how much | |||
the ACE field has wrapped), and if there is no explanation other than | the ACE field has wrapped), and if there is no explanation other than | |||
an invalid protocol transition due to some form of feedback mangling, | an invalid protocol transition due to some form of feedback mangling, | |||
the Data Sender MUST disable sending ECN-capable packets for the | the Data Sender MUST disable sending ECN-capable packets for the | |||
remainder of the half-connection by setting the IP-ECN field in all | remainder of the half-connection by setting the IP-ECN field in all | |||
subsequent packets to Not-ECT. | subsequent packets to Not-ECT. | |||
3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
to provide feedback, the rules below determine when it includes an | to provide feedback, the rules below determine when to include an | |||
AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
might be competing for limited option space: | might be competing for limited option space: | |||
Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
which implementations SHOULD generally prioritize over other TCP | which implementations SHOULD generally prioritize over other TCP | |||
options when there is insufficient space for all the options in | options when there is insufficient space for all the options in | |||
use. | use. | |||
If SACK has been negotiated [RFC2018], and the smallest | If SACK has been negotiated [RFC2018], and the smallest | |||
recommended AccECN Option would leave insufficient space for two | recommended AccECN Option would leave insufficient space for two | |||
skipping to change at page 44, line 38 ¶ | skipping to change at line 2042 ¶ | |||
A scheduled ACK means an ACK that the Data Receiver would send by | A scheduled ACK means an ACK that the Data Receiver would send by | |||
its regular delayed ACK rules. Recall that Section 1.3 defines an | its regular delayed ACK rules. Recall that Section 1.3 defines an | |||
'ACK' as either with data payload or without. But the above rule | 'ACK' as either with data payload or without. But the above rule | |||
is worded so that, in the common case when most of the data is | is worded so that, in the common case when most of the data is | |||
from a Server to a Client, the Server only includes an AccECN TCP | from a Server to a Client, the Server only includes an AccECN TCP | |||
Option while it is acknowledging data from the Client. | Option while it is acknowledging data from the Client. | |||
When available TCP option space is limited on particular packets, the | When available TCP option space is limited on particular packets, the | |||
recommended scheme will need to include compromises. To guide the | recommended scheme will need to include compromises. To guide the | |||
implementer the rules below are ranked in order of importance, but | implementer, the rules below are ranked in order of importance, but | |||
the final decision has to be implementation-dependent, because | the final decision has to be implementation-dependent, because | |||
tradeoffs will alter as new TCP options are defined and new use-cases | tradeoffs will alter as new TCP options are defined and new use-cases | |||
arise. | arise. | |||
Necessary Option Length: When TCP option space is limited, an AccECN | Necessary Option Length: When TCP option space is limited, an AccECN | |||
TCP option MAY be truncated to omit one or two fields from the end | TCP option MAY be truncated to omit one or two fields from the end | |||
of the option, as indicated by the permitted variants listed in | of the option, as indicated by the permitted variants listed in | |||
Table 5, provided that the counter(s) that have changed since the | Table 5, provided that the counter(s) that have changed since the | |||
previous AccECN TCP option are not omitted. | previous AccECN TCP option are not omitted. | |||
skipping to change at page 45, line 51 ¶ | skipping to change at line 2104 ¶ | |||
available for payload data with counter field(s) that have never | available for payload data with counter field(s) that have never | |||
changed. | changed. | |||
As an example of the recommended scheme, if ECT(0) is the only | As an example of the recommended scheme, if ECT(0) is the only | |||
codepoint that has ever arrived in the IP-ECN field, the Data | codepoint that has ever arrived in the IP-ECN field, the Data | |||
Receiver will feed back an AccECN0 TCP Option with only the EE0B | Receiver will feed back an AccECN0 TCP Option with only the EE0B | |||
field on every packet that acknowledges new data. However, as soon | field on every packet that acknowledges new data. However, as soon | |||
as even one CE-marked packet arrives, on every packet that | as even one CE-marked packet arrives, on every packet that | |||
acknowledges new data it will start to include an option with two | acknowledges new data it will start to include an option with two | |||
fields, EE0B and ECEB. As a second example, if the first packet to | fields, EE0B and ECEB. As a second example, if the first packet to | |||
arrive happens to be CE-marked, the Data Receiver will have to | arrive happens to be CE marked, the Data Receiver will have to | |||
arbitrarily choose whether to precede the ECEB field with an EE0B | arbitrarily choose whether to precede the ECEB field with an EE0B | |||
field or an EE1B field. If it chooses, say, EEB0 but it turns out | field or an EE1B field. If it chooses, say, EEB0 but it turns out | |||
never to receive ECT(0), it can start sending EE1B and ECEB instead - | never to receive ECT(0), it can start sending EE1B and ECEB instead | |||
it does not have to include the EE0B field if the r.e0b counter has | -- it does not have to include the EE0B field if the r.e0b counter | |||
never changed during the connection. | never changed during the connection. | |||
With the recommended scheme, if the data sending direction switches | With the recommended scheme, if the data sending direction switches | |||
during a connection, there can be cases where the AccECN TCP Option | during a connection, there can be cases where the AccECN TCP Option | |||
that is meant to feed back the counter values at the end of a volley | that is meant to feed back the counter values at the end of a volley | |||
in one direction never reaches the other peer, due to packet loss. | in one direction never reaches the other peer due to packet loss. | |||
ACE feedback ought to be sufficient to fill this gap, given accurate | ACE feedback ought to be sufficient to fill this gap, given accurate | |||
feedback becomes moot after data transmission has paused. | feedback becomes moot after data transmission has paused. | |||
Appendix A.3 gives an example algorithm to estimate the number of | Appendix A.3 gives an example algorithm to estimate the number of | |||
marked bytes from the ACE field alone, if AccECN Options are not | marked bytes from the ACE field alone, if AccECN Options are not | |||
available. | available. | |||
If a host has determined that segments with AccECN Options always | If a host has determined that segments with AccECN Options always | |||
seem to be discarded somewhere along the path, it is no longer | seem to be discarded somewhere along the path, it is no longer | |||
obliged to follow any of the rules in this section. | obliged to follow any of the rules in this section. | |||
3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines | 3.3. AccECN Compliance Requirements for TCP Proxies, Offload Engines, | |||
and other Middleboxes | and Other Middleboxes | |||
Given AccECN alters the TCP protocol on the wire, this section | Given AccECN alters the TCP protocol on the wire, this section | |||
specifies new requirements on certain networking equipment that | specifies new requirements on certain networking equipment that | |||
forwards TCP and inspects TCP header information. | forwards TCP and inspects TCP header information. | |||
3.3.1. Requirements for TCP Proxies | 3.3.1. Requirements for TCP Proxies | |||
A large class of middleboxes split TCP connections. Such a middlebox | A large class of middleboxes split TCP connections. Such a middlebox | |||
would be compliant with the AccECN protocol if the TCP implementation | would be compliant with the AccECN protocol if the TCP implementation | |||
on each side complied with the present AccECN specification and each | on each side complied with the present AccECN specification and each | |||
side negotiated AccECN independently of the other side. | side negotiated AccECN independently of the other side. | |||
3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | 3.3.2. Requirements for Transparent Middleboxes and TCP Normalizers | |||
Another large class of middleboxes intervenes to some degree at the | Another large class of middleboxes intervenes to some degree at the | |||
transport layer, but attempts to be transparent (invisible) to the | transport layer, but attempts to be transparent (invisible) to the | |||
end-to-end connection. A subset of this class of middleboxes | end-to-end connection. A subset of this class of middleboxes | |||
attempts to `normalize' the TCP wire protocol by checking that all | attempts to 'normalize' the TCP wire protocol by checking that all | |||
values in header fields comply with a rather narrow interpretation of | values in header fields comply with a rather narrow interpretation of | |||
the TCP specifications that is also not always up to date. | the TCP specifications that is not always up to date. | |||
A middlebox that is not normalizing the TCP protocol and does not | A middlebox that is not normalizing the TCP protocol and does not | |||
itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | itself act as a back-to-back pair of TCP endpoints (i.e., a middlebox | |||
that intends to be transparent or invisible at the transport layer) | that intends to be transparent or invisible at the transport layer) | |||
ought to forward AccECN TCP Options unaltered, whether or not the | ought to forward AccECN TCP Options unaltered, whether or not the | |||
length value matches one of those specified in Section 3.2.3, and | length value matches one of those specified in Section 3.2.3, and | |||
whether or not the initial values of the byte-counter fields match | whether or not the initial values of the byte-counter fields match | |||
those in Section 3.2.1. This is because blocking apparently invalid | those in Section 3.2.1. This is because blocking apparently invalid | |||
values prevents the standardized set of values being extended in | values prevents the standardized set of values from being extended in | |||
future (such outdated normalizers would block updated hosts from | the future (such outdated normalizers would block updated hosts from | |||
using the extended AccECN standard). | using the extended AccECN standard). | |||
A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
not match one of those specified in Section 3.2.3 or Section 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
Section 5.2.1 of BCP 69 [RFC3449] gives best current practice on | Section 5.2.1 of [RFC3449] gives best current practice on filtering | |||
filtering (aka. thinning or coalescing) of pure TCP ACKs. It advises | (aka thinning or coalescing) of pure TCP ACKs. It advises that | |||
that filtering ACKs carrying ECN feedback ought to preserve the | filtering ACKs carrying ECN feedback ought to preserve the correct | |||
correct operation of ECN feedback. As the present specification | operation of ECN feedback. As the present specification updates the | |||
updates the operation of ECN feedback, this section discusses how an | operation of ECN feedback, this section discusses how an ACK filter | |||
ACK filter might preserve correct operation of AccECN feedback as | might preserve correct operation of AccECN feedback as well. | |||
well. | ||||
The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
operation of AccECN feedback: | operation of AccECN feedback: | |||
* To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
[I-D.ietf-tcpm-generalized-ecn]). This heuristic is simple and | [ECN++]). This heuristic is simple and stateless. However, it | |||
stateless. However, it might omit some AccECN ACKs, because | might omit some AccECN ACKs, because AccECN can be used without | |||
AccECN can be used without ECN++ and even if it is, ECN++ does not | ECN++ and even if it is, ECN++ does not have to make pure ACKs | |||
have to make pure ACKs ECN-capable - only deployment experience | ECN-capable -- only deployment experience will tell. Also, TCP | |||
will tell. Also, TCP ACKs might be ECN-capable owing to some | ACKs might be ECN-capable owing to some scheme other than AccECN, | |||
scheme other than AccECN, e.g., [RFC5690] or some future standards | e.g., [RFC5690] or some future standards action. Again, only | |||
action. Again, only deployment experience will tell. | deployment experience will tell. | |||
* The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
to implement (albeit recommended — see Section 3.2.3). So one end | to implement (albeit recommended -- see Section 3.2.3). So one | |||
writing an AccECN TCP Option into a packet does not necessarily | end writing an AccECN TCP Option into a packet does not | |||
imply that the other end will read it. | necessarily imply that the other end will read it. | |||
Note that the present specification of AccECN in TCP does not presume | Note that the present specification of AccECN in TCP does not presume | |||
to rely on any of the above ACK filtering behaviour in the network, | to rely on any of the above ACK filtering behaviour in the network, | |||
because it has to be robust against pre-existing network nodes that | because it has to be robust against pre-existing network nodes that | |||
do not distinguish AccECN ACKs, and robust against ACK loss during | do not distinguish AccECN ACKs, and robust against ACK loss during | |||
overload more generally. | overload more generally. | |||
3.3.4. Requirements for TCP Segmentation Offload and Large Receive | 3.3.4. Requirements for TCP Segmentation Offload and Large Receive | |||
Offload | Offload | |||
skipping to change at page 48, line 30 ¶ | skipping to change at line 2227 ¶ | |||
Offloading can happen in the transmit path, usually referred to as | Offloading can happen in the transmit path, usually referred to as | |||
TCP Segmentation Offload (TSO), and the receive path where it is | TCP Segmentation Offload (TSO), and the receive path where it is | |||
called Large Receive Offload (LRO). | called Large Receive Offload (LRO). | |||
In the transmit direction, with AccECN, all segments created from the | In the transmit direction, with AccECN, all segments created from the | |||
same super-segment should retain the same ACE field, which should | same super-segment should retain the same ACE field, which should | |||
make TSO straighforward. | make TSO straighforward. | |||
However, with TSO hardware that supports [RFC3168], the CWR bit is | However, with TSO hardware that supports [RFC3168], the CWR bit is | |||
usually masked out on the middle and last segment. If applied to an | usually masked out on the middle and last segments. If applied to an | |||
AccECN segment, this would change the ACE field, and would be | AccECN segment, this would change the ACE field, and would be | |||
interpreted as having received numerous CE marks in the receive | interpreted as having received numerous CE marks in the receive | |||
direction. Therefore, currently available TSO hardware with | direction. Therefore, currently available TSO hardware with | |||
[RFC3168] support may need some minor driver changes, to adjust the | [RFC3168] support may need some minor driver changes, to adjust the | |||
bitmask for the first, middle and last segment processed with TSO. | bitmask for the first, middle, and last segments processed with TSO. | |||
Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | |||
on the same offloading engine, the host software may need to work | on the same offloading engine, the host software may need to work | |||
around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
not offer a fall back to [RFC3168] ECN. Another way could be to | not offer a fall back to [RFC3168] ECN. Another way could be to | |||
allow TSO only as long as the CWR flag in the TCP header is not set - | allow TSO only as long as the CWR flag in the TCP header is not set | |||
at the cost of more processing overhead while the ACE field has this | -- at the cost of more processing overhead while the ACE field has | |||
bit set. | this bit set. | |||
For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
with [RFC3168] ECN supporting hardware. | with [RFC3168] ECN supporting hardware. | |||
The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
the marking rate is generally higher. | the marking rate is generally higher. | |||
Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
whenever the TCP ECN flags change. In data centres it has been | whenever the TCP ECN flags change. In data centres, it has been | |||
fortunate for this offload hardware that DCTCP-style feedback changes | fortunate for this offload hardware that DCTCP-style feedback changes | |||
less often when there are long sequences of CE marks, which is more | less often when there are long sequences of CE marks, which is more | |||
common with a step marking threshold (but less likely the more short | common with a step marking threshold (but less likely the more short | |||
flows are in the mix). The ACE counter approach has been designed so | flows are in the mix). The ACE counter approach has been designed so | |||
that coalescing can continue over arbitrary patterns of marking and | that coalescing can continue over arbitrary patterns of marking and | |||
only needs to stop when the counter wraps. Nonetheless, until the | only needs to stop when the counter wraps. Nonetheless, until the | |||
particular offload hardware in use implements this more efficient | particular offload hardware in use implements this more efficient | |||
approach, it is likely to be more efficient for AccECN connections to | approach, it is likely to be more efficient for AccECN connections to | |||
implement this counter-style logic using software segmentation | implement this counter-style logic using software segmentation | |||
offload. | offload. | |||
skipping to change at page 49, line 35 ¶ | skipping to change at line 2278 ¶ | |||
ECN encodes a varying signal in the ACK stream, so it is inevitable | ECN encodes a varying signal in the ACK stream, so it is inevitable | |||
that offload hardware will ultimately need to handle any form of ECN | that offload hardware will ultimately need to handle any form of ECN | |||
feedback exceptionally. The ACE field has been designed as a counter | feedback exceptionally. The ACE field has been designed as a counter | |||
so that it is straightforward for offload hardware to pass on the | so that it is straightforward for offload hardware to pass on the | |||
highest counter, and to push a segment from its cache before the | highest counter, and to push a segment from its cache before the | |||
counter wraps. The purpose of working towards standardized TCP ECN | counter wraps. The purpose of working towards standardized TCP ECN | |||
feedback is to reduce the risk for hardware developers, who would | feedback is to reduce the risk for hardware developers, who would | |||
otherwise have to guess which scheme is likely to become dominant. | otherwise have to guess which scheme is likely to become dominant. | |||
The above process has been designed to enable a continuing | The above process has been designed to enable a continuing | |||
incremental deployment path - to more highly dynamic congestion | incremental deployment path -- to more highly dynamic congestion | |||
control. Once offload hardware supports AccECN, it will be able to | control. Once offload hardware supports AccECN, it will be able to | |||
coalesce efficiently for any sequence of marks, instead of relying | coalesce efficiently for any sequence of marks, instead of relying on | |||
for efficiency on the long marking sequences from step marking. In | the long marking sequences from step marking for efficiency. In the | |||
the next stage, marking can evolve from a step to a ramp function. | next stage, marking can evolve from a step to a ramp function. That | |||
That in turn will allow host congestion control algorithms to respond | in turn will allow host congestion control algorithms to respond | |||
faster to dynamics, while being backwards compatible with existing | faster to dynamics, while being backwards compatible with existing | |||
host algorithms. | host algorithms. | |||
4. Updates to RFC 3168 | 4. Updates to RFC 3168 | |||
This section clarifies which parts of RFC3168 are updated and maps | This section clarifies which parts of RFC 3168 are updated and maps | |||
them to the sections of the present AccECN specification that update | them to the relevant updated sections of the present AccECN | |||
them: | specification. | |||
* The whole of "6.1.1 TCP Initialization" of [RFC3168] is updated by | * The whole of Section 6.1.1 of [RFC3168] is updated by Section 3.1 | |||
Section 3.1 of the present specification. | of the present specification. | |||
* In "6.1.2. The TCP Sender" of [RFC3168], all mentions of a | * In Section 6.1.2 of [RFC3168], all mentions of a congestion | |||
congestion response to an ECN-Echo (ECE) ACK packet are updated by | response to an ECN-Echo (ECE) ACK packet are updated by | |||
Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
remaining requirements in "6.1.2. The TCP Sender" still stand. | remaining requirements in Section 6.1.2 of [RFC3168] still stand. | |||
It will be noted that RFC 8311 already updates, or potentially | It will be noted that [RFC8311] already updates, or potentially | |||
updates, a number of the requirements in "6.1.2. The TCP Sender". | updates, a number of the requirements in Section 6.1.2 of | |||
Section 6.1.2 of RFC 3168 extended standard TCP congestion control | [RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP | |||
[RFC5681] to cover ECN marking as well as packet drop. Whereas, | congestion control [RFC5681] to cover ECN marking as well as | |||
RFC 8311 enables experimentation with alternative responses to ECN | packet drop. Whereas, [RFC8311] enables experimentation with | |||
marking, if specified for instance by an experimental RFC on the | alternative responses to ECN marking, if specified for instance by | |||
IETF document stream. RFC 8311 also strengthened the statement | an Experimental RFC produced by the IETF Stream. [RFC8311] also | |||
that "ECT(0) SHOULD be used" to a "MUST" (see [RFC8311] for the | strengthened the statement that "ECT(0) SHOULD be used" to a | |||
details). | "MUST" (see [RFC8311] for the details). | |||
* The whole of "6.1.3. The TCP Receiver" of [RFC3168] is updated by | * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | |||
Section 3.2 of the present specification, with the exception of | of the present specification, with the exception of the last | |||
the last paragraph (about congestion response to drop and ECN in | paragraph (about congestion response to drop and ECN in the same | |||
the same round trip), which still stands. Incidentally, this last | round trip), which still stands. Incidentally, this last | |||
paragraph is in the wrong section, because it relates to "TCP | paragraph is in the wrong section, because it relates to "TCP | |||
Sender" behaviour. | Sender" behaviour. | |||
* The following text within "6.1.5. Retransmitted TCP packets": | * The following text within Section 6.1.5 of [RFC3168]: | |||
"the TCP data receiver SHOULD ignore the ECN field on arriving | | the TCP data receiver SHOULD ignore the ECN field on arriving | |||
data packets that are outside of the receiver's current | | data packets that are outside of the receiver's current window. | |||
window." | ||||
is updated by more stringent acceptability tests for any packet | is updated by more stringent acceptability tests for any packet | |||
(not just data packets) in the present specification. | (not just data packets) in the present specification. | |||
Specifically, in the normative specification of AccECN (Section 3) | Specifically, in the normative specification of AccECN | |||
only 'Acceptable' packets contribute to the ECN counters at the | (Section 3), only 'Acceptable' packets contribute to the ECN | |||
AccECN receiver and Section 1.3 defines an Acceptable packet as | counters at the AccECN receiver and Section 1.3 defines an | |||
one that passes acceptability tests equivalent in strength to | Acceptable packet as one that passes acceptability tests | |||
those in both [RFC9293] and [RFC5961]. | equivalent in strength to those in both [RFC9293] and [RFC5961]. | |||
* Sections 5.2, 6.1.1, 6.1.4, 6.1.5 and 6.1.6 of [RFC3168] prohibit | * Sections 5.2, 6.1.1, 6.1.4, 6.1.5, and 6.1.6 of [RFC3168] prohibit | |||
use of ECN on TCP control packets and retransmissions. The | use of ECN on TCP control packets and retransmissions. The | |||
present specification does not update that aspect of RFC 3168, but | present specification does not update that aspect of [RFC3168], | |||
it does say what feedback an AccECN Data Receiver ought to provide | but it does say what feedback an AccECN Data Receiver ought to | |||
if it receives an ECN-capable control packet or retransmission. | provide if it receives an ECN-capable control packet or | |||
This ensures AccECN is forward compatible with any future scheme | retransmission. This ensures AccECN is forward compatible with | |||
that allows ECN on these packets, as provided for in section 4.3 | any future scheme that allows ECN on these packets, as provided | |||
of [RFC8311] and as proposed in [I-D.ietf-tcpm-generalized-ecn]. | for in Section 4.3 of [RFC8311] and as proposed in [ECN++]. | |||
5. Interaction with TCP Variants | 5. Interaction with TCP Variants | |||
This section is informative, not normative. | This section is informative, not normative. | |||
5.1. Compatibility with SYN Cookies | 5.1. Compatibility with SYN Cookies | |||
A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | A TCP Server can use SYN Cookies (see Appendix A of [RFC4987]) to | |||
protect itself from SYN flooding attacks. It places minimal commonly | protect itself from SYN flooding attacks. It places minimal commonly | |||
used connection state in the SYN/ACK, and deliberately does not hold | used connection state in the SYN/ACK, and deliberately does not hold | |||
any state while waiting for the subsequent ACK (e.g., it closes the | any state while waiting for the subsequent ACK (e.g., it closes the | |||
thread). Therefore it cannot record the fact that it entered AccECN | thread). Therefore, it cannot record the fact that it entered AccECN | |||
mode for both half-connections. Indeed, it cannot even remember | mode for both half-connections. Indeed, it cannot even remember | |||
whether it negotiated the use of Classic ECN [RFC3168]. | whether it negotiated the use of Classic ECN [RFC3168]. | |||
Nonetheless, such a Server can determine that it negotiated AccECN as | Nonetheless, such a Server can determine that it negotiated AccECN as | |||
follows. If a TCP Server using SYN Cookies supports AccECN and if it | follows. If a TCP Server using SYN Cookies supports AccECN and if it | |||
receives a pure ACK that acknowledges an ISN that is a valid SYN | receives a pure ACK that acknowledges an ISN that is a valid SYN | |||
cookie, and if the ACK contains an ACE field with the value 0b010 to | cookie, and if the ACK contains an ACE field with the value 0b010 to | |||
0b111 (decimal 2 to 7), the Server can infer the first two stages of | 0b111 (decimal 2 to 7), the Server can infer the first two stages of | |||
the handshake: | the handshake: | |||
* the TCP Client has to have requested AccECN support on the SYN; | * the TCP Client has to have requested AccECN support on the SYN; | |||
* then, even though the Server kept no state, it has to have | * then, even though the Server kept no state, it has to have | |||
confirmed that it supported AccECN. | confirmed that it supported AccECN. | |||
Therefore the Server can switch itself into AccECN mode, and continue | Therefore, the Server can switch itself into AccECN mode, and | |||
as if it had never forgotten that it switched itself into AccECN mode | continue as if it had never forgotten that it switched itself into | |||
earlier. | AccECN mode earlier. | |||
If the pure ACK that acknowledges a SYN cookie contains an ACE field | If the pure ACK that acknowledges a SYN cookie contains an ACE field | |||
with the value 0b000 or 0b001, these values indicate that the TCP | with the value 0b000 or 0b001, these values indicate that the TCP | |||
Client did not request support for AccECN and therefore the Server | Client did not request support for AccECN; therefore, the Server does | |||
does not enter AccECN mode for this connection. Further, 0b001 on | not enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
the ACK implies that the Server sent an ECN-capable SYN/ACK, which | implies that the Server sent an ECN-capable SYN/ACK, which was marked | |||
was marked CE in the network, and the non-AccECN TCP Client fed this | CE in the network, and the non-AccECN TCP Client fed this back by | |||
back by setting ECE on the ACK of the SYN/ACK. | setting ECE on the ACK of the SYN/ACK. | |||
5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
TCP options: MSS, time-stamp, window scaling, SACK and TCP-AO. It is | TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It | |||
also compatible with Multipath TCP (MPTCP [RFC8684]) and the | is also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
friendly to all these protocols, because space for TCP options is | friendly to all these protocols, because space for TCP options is | |||
particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
header space. | header space. | |||
When option space is under pressure from other options, | When option space is under pressure from other options, | |||
Section 3.2.3.3 provides guidance on how important it is to send an | Section 3.2.3.3 provides guidance on how important it is to send an | |||
AccECN Option relative to other options, and which fields are more | AccECN Option relative to other options, and which fields are more | |||
important to include. | important to include. | |||
Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
the data. Then it can reflect the IP-ECN field of the SYN/ACK on | the data. Then it can reflect the IP-ECN field of the SYN/ACK on | |||
this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
AccECN feedback is compatible with the ECN++ | AccECN feedback is compatible with the ECN++ experiment [ECN++], | |||
[I-D.ietf-tcpm-generalized-ecn] experiment, which allows TCP control | which allows TCP control packets and retransmissions to be ECN- | |||
packets and retransmissions to be ECN-capable ([RFC3168] was updated | capable ([RFC3168] was updated by [RFC8311] to permit such | |||
by [RFC8311] to permit such experiments). AccECN is likely to | experiments). AccECN is likely to inherently support any experiment | |||
inherently support any experiment with ECN-capable packets, because | with ECN-capable packets, because it feeds back the contents of the | |||
it feeds back the contents of the ECN field mechanistically, without | ECN field mechanistically, without judging whether or not a packet | |||
judging whether a packet ought to use the ECN capability or not | ought to use the ECN capability (Section 2.5). This specification | |||
(Section 2.5). This specification does not discuss implementing | does not discuss implementing AccECN alongside [RFC5562], which was | |||
AccECN alongside [RFC5562], which was an earlier experimental | an earlier experimental protocol with narrower scope than ECN++ and a | |||
protocol with narrower scope than ECN++ and a 5-way handshake. | 5-way handshake. | |||
5.3. Compatibility with Feedback Integrity Mechanisms | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
Three alternative mechanisms are available to assure the integrity of | Three alternative mechanisms are available to assure the integrity of | |||
ECN and/or loss signals. AccECN is compatible with any of these | ECN and/or loss signals. AccECN is compatible with any of these | |||
approaches: | approaches: | |||
* The Data Sender can test the integrity of the receiver's ECN (or | * The Data Sender can test the integrity of the receiver's ECN (or | |||
loss) feedback by occasionally setting the IP-ECN field to a value | loss) feedback by occasionally setting the IP-ECN field to a value | |||
normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN Nonce | paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | |||
[RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
the IP header, it does not require standardization and it does not | the IP header, it does not require standardization, and it does | |||
rely on misbehaving receivers volunteering to reveal feedback | not rely on misbehaving receivers volunteering to reveal feedback | |||
information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
from the network and therefore ought to only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
* Networks generate congestion signals when they are becoming | * Networks generate congestion signals when they are becoming | |||
congested, so networks are more likely than Data Senders to be | congested, so networks are more likely than Data Senders to be | |||
concerned about the integrity of the receiver's feedback of these | concerned about the integrity of the receiver's feedback of these | |||
signals. A network can enforce a congestion response to its ECN | signals. A network can enforce a congestion response to its ECN | |||
markings (or packet losses) using congestion exposure (ConEx) | markings (or packet losses) using congestion exposure (ConEx) | |||
audit [RFC7713]. Whether the receiver or a downstream network is | audit [RFC7713]. Whether the receiver or a downstream network is | |||
suppressing congestion feedback or the sender is unresponsive to | suppressing congestion feedback, or the sender is unresponsive to | |||
the feedback, or both, ConEx audit can neutralize any advantage | the feedback, or both, ConEx audit can neutralize any advantage | |||
that any of these three parties would otherwise gain. | that any of these three parties would otherwise gain. | |||
ConEx is an experimental change to the Data Sender that would be | ConEx is an experimental change to the Data Sender that would be | |||
most useful when combined with AccECN. Without AccECN, the ConEx | most useful when combined with AccECN. Without AccECN, the ConEx | |||
behaviour of a Data Sender would have to be more conservative than | behaviour of a Data Sender would have to be more conservative than | |||
would be necessary if it had the accurate feedback of AccECN. | would be necessary if it had the accurate feedback of AccECN. | |||
* The standards track TCP authentication option (TCP-AO [RFC5925]) | * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | |||
can be used to detect any tampering with AccECN feedback between | can be used to detect any tampering with AccECN feedback between | |||
the Data Receiver and the Data Sender (whether malicious or | the Data Receiver and the Data Sender (whether malicious or | |||
accidental). The AccECN fields are immutable end-to-end, so they | accidental). The AccECN fields are immutable end to end, so they | |||
are amenable to TCP-AO protection, which covers TCP options by | are amenable to TCP-AO protection, which covers TCP options by | |||
default. However, TCP-AO is often too brittle to use on many end- | default. However, TCP-AO is often too brittle to use on many end- | |||
to-end paths, where middleboxes can make verification fail in | to-end paths, where middleboxes can make verification fail in | |||
their attempts to improve performance or security, e.g., Network | their attempts to improve performance or security, e.g., Network | |||
Address (and Port) Translation (NAT/NAPT), resegmentation or | Address Translation (NAT) and Network Address Port Translation | |||
shifting the sequence space. | (NAPT), resegmentation, or shifting the sequence space. | |||
6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
This section is informative not normative. It describes how well the | This section is informative, not normative. It describes how well | |||
protocol satisfies the agreed requirements for a more Accurate ECN | the protocol satisfies the agreed requirements for a more Accurate | |||
feedback protocol [RFC7560]. | ECN feedback protocol [RFC7560]. | |||
Accuracy: From each ACK, the Data Sender can infer the number of new | Accuracy: From each ACK, the Data Sender can infer the number of new | |||
CE marked segments since the previous ACK. This provides better | CE-marked segments since the previous ACK. This provides better | |||
accuracy on CE feedback than Classic ECN. In addition if an | accuracy on CE feedback than Classic ECN. In addition, if an | |||
AccECN Option is present (not blocked by the network path) the | AccECN Option is present (not blocked by the network path), the | |||
number of bytes marked with CE, ECT(1) and ECT(0) are provided. | number of bytes marked with CE, ECT(1), and ECT(0) are provided. | |||
Overhead: The AccECN scheme is divided into two parts. The | Overhead: The AccECN scheme is divided into two parts. The | |||
essential feedback part reuses the 3 flags already assigned to ECN | essential feedback part reuses the three flags already assigned to | |||
in the TCP header. The supplementary feedback part adds an | ECN in the TCP header. The supplementary feedback part adds an | |||
additional TCP option consuming up to 11 bytes. However, no TCP | additional TCP option consuming up to 11 bytes. However, no TCP | |||
option space is consumed in the SYN. | option space is consumed in the SYN. | |||
Ordering: The order in which marks arrive at the Data Receiver is | Ordering: The order in which marks arrive at the Data Receiver is | |||
preserved in AccECN feedback, because the Data Receiver is | preserved in AccECN feedback, because the Data Receiver is | |||
expected to send an ACK immediately whenever a different mark | expected to send an ACK immediately whenever a different mark | |||
arrives. | arrives. | |||
Timeliness: While the same ECN markings are arriving continually at | Timeliness: While the same ECN markings are arriving continually at | |||
the Data Receiver, it can defer ACKs as TCP does normally, but it | the Data Receiver, it can defer ACKs as TCP does normally, but it | |||
skipping to change at page 54, line 18 ¶ | skipping to change at line 2500 ¶ | |||
Timeliness vs Overhead: Change-Triggered ACKs are intended to enable | Timeliness vs Overhead: Change-Triggered ACKs are intended to enable | |||
latency-sensitive uses of ECN feedback by capturing the timing of | latency-sensitive uses of ECN feedback by capturing the timing of | |||
transitions but not wasting resources while the state of the | transitions but not wasting resources while the state of the | |||
signalling system is stable. Within the constraints of the | signalling system is stable. Within the constraints of the | |||
change-triggered ACK rules, the receiver can control how | change-triggered ACK rules, the receiver can control how | |||
frequently it sends AccECN TCP Options and therefore to some | frequently it sends AccECN TCP Options and therefore to some | |||
extent it can control the overhead induced by AccECN. | extent it can control the overhead induced by AccECN. | |||
Resilience: All information is provided based on counters. | Resilience: All information is provided based on counters. | |||
Therefore if ACKs are lost, the counters on the first ACK | Therefore if ACKs are lost, the counters on the first ACK | |||
following the losses allows the Data Sender to immediately recover | following the losses allow the Data Sender to immediately recover | |||
the number of the ECN markings that it missed. And if data or | the number of the ECN markings that it missed. If data or ACKs | |||
ACKs are reordered, stale congestion information can be identified | are reordered, stale congestion information can be identified and | |||
and ignored. | ignored. | |||
Resilience against Bias: Because feedback is based on repetition of | Resilience against Bias: Because feedback is based on repetition of | |||
counters, random losses do not remove any information, they only | counters, random losses do not remove any information, they only | |||
delay it. Therefore, even though some ACKs are change-triggered, | delay it. Therefore, even though some ACKs are change-triggered, | |||
random losses will not alter the proportions of the different ECN | random losses will not alter the proportions of the different ECN | |||
markings in the feedback. | markings in the feedback. | |||
Resilience vs Overhead: If space is limited in some segments | Resilience vs Overhead: If space is limited in some segments (e.g., | |||
(e.g., because more options are needed on some segments, such as | because more options are needed on some segments, such as the SACK | |||
the SACK option after loss), the Data Receiver can send AccECN | option after loss), the Data Receiver can send AccECN Options less | |||
Options less frequently or truncate fields that have not changed, | frequently or truncate fields that have not changed, usually down | |||
usually down to as little as 5 bytes. | to as little as 5 bytes. | |||
Resilience vs Timeliness and Ordering: Ordering information and the | Resilience vs Timeliness and Ordering: Ordering information and the | |||
timing of transitions cannot be communicated in three cases: i) | timing of transitions cannot be communicated in three cases: i) | |||
during ACK loss; ii) if something on the path strips AccECN | during ACK loss; ii) if something on the path strips AccECN | |||
Options; or iii) if the Data Receiver is unable to support Change- | Options; or iii) if the Data Receiver is unable to support Change- | |||
Triggered ACKs. Following ACK reordering, the Data Sender can | Triggered ACKs. Following ACK reordering, the Data Sender can | |||
reconstruct the order in which feedback was sent, but not until | reconstruct the order in which feedback was sent, but not until | |||
all the missing feedback has arrived. | all the missing feedback has arrived. | |||
Complexity: An AccECN implementation solely involves simple counter | Complexity: An AccECN implementation solely involves simple counter | |||
increments, some modulo arithmetic to communicate the least | increments, some modulo arithmetic to communicate the least | |||
significant bits and allow for wrap, and some heuristics for | significant bits and allow for wrap, and some heuristics for | |||
safety against fields cycling due to prolonged periods of ACK | safety against fields cycling due to prolonged periods of ACK | |||
loss. Each host needs to maintain eight additional counters. The | loss. Each host needs to maintain eight additional counters. The | |||
hosts have to apply some additional tests to detect tampering by | hosts have to apply some additional tests to detect tampering by | |||
middleboxes, but in general the protocol is simple to understand, | middleboxes, but in general the protocol is simple to understand | |||
simple to implement and requires few cycles per packet to execute. | and implement and requires few cycles per packet to execute. | |||
Integrity: AccECN is compatible with at least three approaches that | Integrity: AccECN is compatible with at least three approaches that | |||
can assure the integrity of ECN feedback. If AccECN Options are | can assure the integrity of ECN feedback. If AccECN Options are | |||
stripped the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
scheme, it will fall-back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
supported by the other end. | supported by the other end. | |||
If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
AccECN can be used to detect mangling of the IP ECN field; | AccECN can be used to detect mangling of the IP-ECN field; | |||
mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP ECN flags; blocking of ECT-marked segments; | |||
and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
Options. | Options. | |||
Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
understand which currently unused values might appear in future. | understand which currently unused values might appear in the | |||
So, even if they choose to treat such values as anomalous while | future. So, even if they choose to treat such values as anomalous | |||
they are not widely used, any blocking will at least be under | while they are not widely used, any blocking will at least be | |||
policy control not hard-coded. Then, if previously unused values | under policy control and not hard-coded. Then, if previously | |||
start to appear on the Internet (or in standards), such policies | unused values start to appear on the Internet (or in standards), | |||
could be quickly reversed. | such policies could be quickly reversed. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document reassigns the TCP header flag at bit offset 7 to the | This document reassigns the TCP header flag at bit offset 7 to the | |||
AccECN protocol. This bit was previously called the Nonce Sum (NS) | AccECN protocol. This bit was previously called the Nonce Sum (NS) | |||
flag [RFC3540], but RFC 3540 has been reclassified as historic | flag [RFC3540], but RFC 3540 has been reclassified as Historic | |||
[RFC8311]. The flag will now be defined as the following in the "TCP | [RFC8311]. The flag is now defined as the following in the "TCP | |||
Header Flags" registry in the "Transmission Control Protocol (TCP) | Header Flags" registry in the "Transmission Control Protocol (TCP) | |||
Parameters" registry group: | Parameters" registry group: | |||
+=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| Bit | Name | Reference | Assignment Notes | | | Bit | Name | Reference | Assignment Notes | | |||
+=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| 7 | AE (Accurate | RFC XXXX | Previously used as NS (Nonce | | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | |||
| | ECN) | | Sum) by [RFC3540], which is | | | | ECN) | | Sum) by [RFC3540], which is | | |||
| | | | now historic [RFC8311] | | | | | | now Historic [RFC8311] | | |||
+-----+--------------+-----------+------------------------------+ | +-----+--------------+-----------+------------------------------+ | |||
Table 6: TCP header flag reassignment | Table 6: TCP Header Flag Reassignment | |||
[TO BE REMOVED: IANA is requested to update the existing entry in the | ||||
TCP Header Flags registry (https://www.iana.org/assignments/tcp- | ||||
parameters/tcp-parameters.xhtml#tcp-header-flags) for Bit 7 to "AE | ||||
(Accurate ECN)" and to change the reference to this RFC-to-be instead | ||||
of RFC8311. Also IANA is requested to change the assignment note to | ||||
"Previously used as NS (Nonce Sum) by [RFC3540], which is now | ||||
historic [RFC8311]."] | ||||
This document also defines two new TCP options for AccECN, assigned | This document also defines two new TCP options for AccECN from the | |||
values of 172 and 174 (decimal) from the TCP option space. These | TCP option space. These values are defined as the following in the | |||
values are defined as the following in the "TCP Option Kind Numbers" | "TCP Option Kind Numbers" registry in the "Transmission Control | |||
registry in the "Transmission Control Protocol (TCP) Parameters" | Protocol (TCP) Parameters" registry group: | |||
registry group: | ||||
+======+========+================================+===========+ | +======+========+================================+===========+ | |||
| Kind | Length | Meaning | Reference | | | Kind | Length | Meaning | Reference | | |||
+======+========+================================+===========+ | +======+========+================================+===========+ | |||
| 172 | N | Accurate ECN Order 0 (AccECN0) | RFC XXXX | | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | |||
+------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| 174 | N | Accurate ECN Order 1 (AccECN1) | RFC XXXX | | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | |||
+------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
Table 7: New TCP Option assignments | Table 7: New TCP Option assignments | |||
[TO BE REMOVED: These registrations have taken place using the early | ||||
registration procedure, which may be temporary if this draft does not | ||||
proceed, at the following location: http://www.iana.org/assignments/ | ||||
tcp-parameters/tcp-parameters.xhtml#tcp-parameters-1 ] | ||||
Early experimental implementations of the two AccECN Options used | Early experimental implementations of the two AccECN Options used | |||
experimental option 254 per [RFC6994] with the 16-bit magic numbers | experimental option 254 per [RFC6994] with the 16-bit magic numbers | |||
0xACC0 and 0xACC1 respectively for Order 0 and 1, as allocated in the | 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | |||
IANA "TCP Experimental Option Experiment Identifiers (TCP ExIDs)" | the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | |||
registry. Even earlier experimental implementations used the single | ExIDs)" registry. Even earlier experimental implementations used the | |||
magic number 0xACCE (16 bits). Uses of these experimental options | single magic number 0xACCE (16 bits). Uses of these experimental | |||
SHOULD migrate to use the new option kinds (172 & 174). | options SHOULD migrate to use the new option kinds (172 and 174). | |||
[TO BE REMOVED: IANA is requested to replace the references for all | ||||
three of the above experimental options (0xACC0, 0xACC1 and 0xACCE) | ||||
with a reference to the present RFC XXXX.] | ||||
[TO BE REMOVED: If the early registrations, which may be temporary, | ||||
do not proceed, the three references to them in the TCP ExIDs | ||||
registry at the following location will also need to be edited out: | ||||
https://www.iana.org/assignments/tcp-parameters/tcp- | ||||
parameters.xhtml#tcp-exids ] | ||||
8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
If ever the supplementary feedback part of AccECN based on one of the | If ever the supplementary feedback part of AccECN that is based on | |||
new AccECN TCP Options is unusable (due for example to middlebox | one of the new AccECN TCP Options is unusable (due for example to | |||
interference) the essential feedback part of AccECN's congestion | middlebox interference), the essential feedback part of AccECN's | |||
feedback offers only limited resilience to long runs of ACK loss (see | congestion feedback offers only limited resilience to long runs of | |||
Section 3.2.2.5). These problems are unlikely to be due to malicious | ACK loss (see Section 3.2.2.5). These problems are unlikely to be | |||
intervention (because if an attacker could strip a TCP option or | due to malicious intervention (because if an attacker could strip a | |||
discard a long run of ACKs it could wreak other arbitrary havoc). | TCP option or discard a long run of ACKs, it could wreak other | |||
However, it would be of concern if AccECN's resilience could be | arbitrary havoc). However, it would be of concern if AccECN's | |||
indirectly compromised during a flooding attack. AccECN is still | resilience could be indirectly compromised during a flooding attack. | |||
considered safe though, because if AccECN Options are not present, | AccECN is still considered safe though, because if AccECN Options are | |||
the AccECN Data Sender is then required to switch to more | not present, the AccECN Data Sender is then required to switch to | |||
conservative assumptions about wrap of congestion indication counters | more conservative assumptions about wrap of congestion indication | |||
(see Section 3.2.2.5 and Appendix A.2). | counters (see Section 3.2.2.5 and Appendix A.2). | |||
Section 5.1 describes how a TCP Server can negotiate AccECN and use | Section 5.1 describes how a TCP Server can negotiate AccECN and use | |||
the SYN cookie method for mitigating SYN flooding attacks. | the SYN cookie method for mitigating SYN flooding attacks. | |||
There is concern that ECN feedback could be altered or suppressed, | There is concern that ECN feedback could be altered or suppressed, | |||
particularly because a misbehaving Data Receiver could increase its | particularly because a misbehaving Data Receiver could increase its | |||
own throughput at the expense of others. AccECN is compatible with | own throughput at the expense of others. AccECN is compatible with | |||
the three schemes known to assure the integrity of ECN feedback (see | the three schemes known to assure the integrity of ECN feedback (see | |||
Section 5.3 for details). If AccECN Options are stripped by an | Section 5.3 for details). If AccECN Options are stripped by an | |||
incorrectly implemented middlebox, the resolution of the feedback | incorrectly implemented middlebox, the resolution of the feedback | |||
will be degraded, but the integrity of this degraded information can | will be degraded, but the integrity of this degraded information can | |||
still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
confined to the feedback protocol, and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
feedback. | feedback. | |||
In Section 3.2.3 a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
(2+3*3)) B. However, it is really an overt channel (not hidden) and | (2+3*3)) B. However, it is really an overt channel (not hidden) and | |||
it is no different to the use of unknown TCP options with unknown | it is no different than the use of unknown TCP options with unknown | |||
option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
The AccECN protocol is not believed to introduce any new privacy | The AccECN protocol is not believed to introduce any new privacy | |||
concerns, because it merely counts and feeds back signals at the | concerns, because it merely counts and feeds back signals at the | |||
transport layer that had already been visible at the IP layer. A | transport layer that had already been visible at the IP layer. A | |||
covert channel can be used to compromise privacy. However, as | covert channel can be used to compromise privacy. However, as | |||
explained above, undefined TCP options in general open up such | explained above, undefined TCP options in general open up such | |||
channels and common techniques are available to close them off. | channels, and common techniques are available to close them off. | |||
There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
middlebox. No known way can yet be contrived for a receiver to take | middlebox. No known way can yet be contrived for a receiver to take | |||
advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
completeness. | completeness. | |||
A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
will be used by a small population of hosts, and thus show up more | will be used by a small population of hosts, and thus show up more | |||
easily. However, it is expected that this option will become | easily. However, it is expected that AccECN will become available in | |||
available in operating systems over time, and eventually turned on by | operating systems over time and that it will eventually be turned on | |||
default in them. Thus a individual identification of a particular | by default. Thus, an individual identification of a particular user | |||
user is less of a concern than the fingerprinting of specific | is less of a concern than the fingerprinting of specific versions of | |||
versions of operation systems. However, the latter can be done using | operation systems. However, the latter can be done using different | |||
different means independent of Accurate ECN. | means independent of Accurate ECN. | |||
As Accurate ECN exposes more bits in the TCP header which could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
a virtual private network (VPN) to an attacker which has access to | a virtual private network (VPN) to an attacker that has access to the | |||
the datastream before and after the VPN tunnel endpoints. This may | datastream before and after the VPN tunnel endpoints. This may be | |||
be achieved by injecting or modifying the ACE field in specific | achieved by injecting or modifying the ACE field in specific patterns | |||
patters that can be recognized. | that can be recognized. | |||
Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
a user dramatically beyond what is already possible using classic | a user dramatically beyond what is already possible using classic | |||
ECN. However, in order to prevent such attacks and means of easier | ECN. However, in order to prevent such attacks and means of easier | |||
identification of flows, it is adviseable for privacy conscious users | identification of flows, it is advisable for privacy-conscious users | |||
behind VPNs to not enable the Accurate ECN, or Classic ECN for that | behind VPNs to not enable the Accurate ECN, or Classic ECN for that | |||
matter. | matter. | |||
9. References | 9. References | |||
9.1. Normative References | 9.1. Normative References | |||
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | |||
Selective Acknowledgment Options", RFC 2018, | Selective Acknowledgment Options", RFC 2018, | |||
DOI 10.17487/RFC2018, October 1996, | DOI 10.17487/RFC2018, October 1996, | |||
skipping to change at page 59, line 30 ¶ | skipping to change at line 2722 ¶ | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
<https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
9.2. Informative References | 9.2. Informative References | |||
[I-D.ietf-tcpm-generalized-ecn] | [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | ||||
Congestion Notification (ECN) to TCP Control Packets", | Congestion Notification (ECN) to TCP Control Packets", | |||
Work in Progress, Internet-Draft, draft-ietf-tcpm- | Work in Progress, Internet-Draft, draft-ietf-tcpm- | |||
generalized-ecn-16, 20 October 2024, | generalized-ecn-17, 21 April 2025, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
generalized-ecn-16>. | generalized-ecn-17>. | |||
[Mandalari18] | [Mandalari18] | |||
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | |||
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | |||
over Mobile", IEEE Communications Magazine , March 2018, | over Mobile", IEEE Communications Magazine , March 2018, | |||
<http://www.it.uc3m.es/amandala/ | <http://www.it.uc3m.es/amandala/ | |||
ecn++/ecn_commag_2018.html>. | ecn++/ecn_commag_2018.html>. | |||
[RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | [RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | |||
Sooriyabandara, "TCP Performance Implications of Network | Sooriyabandara, "TCP Performance Implications of Network | |||
skipping to change at page 62, line 21 ¶ | skipping to change at line 2852 ¶ | |||
(L4S) Internet Service: Architecture", RFC 9330, | (L4S) Internet Service: Architecture", RFC 9330, | |||
DOI 10.17487/RFC9330, January 2023, | DOI 10.17487/RFC9330, January 2023, | |||
<https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
"CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
<https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
[RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
Specification Volume 1, Release 1.4", 2020, | Specification", Volume 1, Release 1.4, 2020, | |||
<https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
to implement the requirements. | to implement the requirements. | |||
A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
skipping to change at page 62, line 46 ¶ | skipping to change at line 2877 ¶ | |||
the ECEB field into its byte counter s.ceb. The other counters for | the ECEB field into its byte counter s.ceb. The other counters for | |||
bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | |||
encoded and decoded. | encoded and decoded. | |||
It is assumed that each local byte counter is an unsigned integer | It is assumed that each local byte counter is an unsigned integer | |||
greater than 24b (probably 32b), and that the following constant has | greater than 24b (probably 32b), and that the following constant has | |||
been assigned: | been assigned: | |||
DIVOPT = 2^24 | DIVOPT = 2^24 | |||
Every time a CE marked data segment arrives, the Data Receiver | Every time a CE-marked data segment arrives, the Data Receiver | |||
increments its local value of r.ceb by the size of the TCP Data. | increments its local value of r.ceb by the size of the TCP Data. | |||
Whenever it sends an ACK with an AccECN Option, the value it writes | Whenever it sends an ACK with an AccECN Option, the value it writes | |||
into the ECEB field is | into the ECEB field is | |||
ECEB = r.ceb % DIVOPT | ECEB = r.ceb % DIVOPT | |||
where '%' is the remainder operator. | where '%' is the remainder operator. | |||
On the arrival of an AccECN Option, the Data Sender first makes sure | On the arrival of an AccECN Option, the Data Sender first makes sure | |||
the ACK has not been superseded in order to avoid winding the s.ceb | the ACK has not been superseded in order to avoid winding the s.ceb | |||
counter backwards. It uses the TCP acknowledgement number and any | counter backwards. It uses the TCP acknowledgement number and any | |||
SACK options [RFC2018] to calculate newlyAckedB, the amount of new | SACK options [RFC2018] to calculate newlyAckedB, the amount of new | |||
data that the ACK acknowledges in bytes (newlyAckedB can be zero but | data that the ACK acknowledges in bytes (newlyAckedB can be zero but | |||
not negative). If newlyAckedB is zero, either the ACK has been | not negative). If newlyAckedB is zero, either the ACK has been | |||
superseded or CE-marked packet(s) without data could have arrived. | superseded or CE-marked packet(s) without data could have arrived. | |||
To break the tie for the latter case, the Data Sender could use time- | To break the tie for the latter case, the Data Sender could use time- | |||
stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | |||
new time that the ACK acknowledges. If the Data Sender determines | new time that the ACK acknowledges. If the Data Sender determines | |||
that the ACK has been superseded it ignores the AccECN Option. | that the ACK has been superseded, it ignores the AccECN Option. | |||
Otherwise, the Data Sender calculates the minimum non-negative | Otherwise, the Data Sender calculates the minimum non-negative | |||
difference d.ceb between the ECEB field and its local s.ceb counter, | difference d.ceb between the ECEB field and its local s.ceb counter, | |||
using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | |||
d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | |||
s.ceb += d.ceb | s.ceb += d.ceb | |||
} | } | |||
For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), | For example, if s.ceb is 33,554,433 and ECEB is 1461 (both decimal), | |||
then | then | |||
s.ceb % DIVOPT = 1 | s.ceb % DIVOPT = 1 | |||
d.ceb = (1461 + 2^24 - 1) % 2^24 | d.ceb = (1461 + 2^24 - 1) % 2^24 | |||
= 1460 | = 1460 | |||
s.ceb = 33,554,433 + 1460 | s.ceb = 33,554,433 + 1460 | |||
= 33,555,893 | = 33,555,893 | |||
In practice an implementation might use heuristics to guess the | In practice, an implementation might use heuristics to guess the | |||
feedback in missing ACKs, then when it subsequently receives feedback | feedback in missing ACKs. Then when it subsequently receives | |||
it might find that it needs to correct its earlier heuristics as part | feedback, it might find that it needs to correct its earlier | |||
of the decoding process. The above decoding process does not include | heuristics as part of the decoding process. The above decoding | |||
any such heuristics. | process does not include any such heuristics. | |||
A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss | A.2. Example Algorithm for Safety Against Long Sequences of ACK Loss | |||
The example algorithms below show how a Data Receiver in AccECN mode | The example algorithms below show how a Data Receiver in AccECN mode | |||
could encode its CE packet counter r.cep into the ACE field, and how | could encode its CE packet counter r.cep into the ACE field, and how | |||
the Data Sender in AccECN mode could decode the ACE field into its | the Data Sender in AccECN mode could decode the ACE field into its | |||
s.cep counter. The Data Sender's algorithm includes code to | s.cep counter. The Data Sender's algorithm includes code to | |||
heuristically detect a long enough unbroken string of ACK losses that | heuristically detect a long enough unbroken string of ACK losses that | |||
could have concealed a cycle of the congestion counter in the ACE | could have concealed a cycle of the congestion counter in the ACE | |||
field of the next ACK to arrive. | field of the next ACK to arrive. | |||
Two variants of the algorithm are given: i) a more conservative | Two variants of the algorithm are given: i) a more conservative | |||
variant for a Data Sender to use if it detects that AccECN Options | variant for a Data Sender to use if it detects that AccECN Options | |||
are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | are not available (see Section 3.2.2.5 and Section 3.2.3.2); and ii) | |||
a less conservative variant that is feasible when complementary | a less conservative variant that is feasible when complementary | |||
information is available from AccECN Options. | information is available from AccECN Options. | |||
A.2.1. Safety Algorithm without the AccECN Option | A.2.1. Safety Algorithm Without the AccECN Option | |||
It is assumed that each local packet counter is a sufficiently sized | It is assumed that each local packet counter is a sufficiently sized | |||
unsigned integer (probably 32b) and that the following constant has | unsigned integer (probably 32b) and that the following constant has | |||
been assigned: | been assigned: | |||
DIVACE = 2^3 | DIVACE = 2^3 | |||
Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | Every time an Acceptable CE marked packet arrives (Section 3.2.2.2), | |||
the Data Receiver increments its local value of r.cep by 1. It | the Data Receiver increments its local value of r.cep by 1. It | |||
repeats the same value of ACE in every subsequent ACK until the next | repeats the same value of ACE in every subsequent ACK until the next | |||
CE marking arrives, where | CE marking arrives, where | |||
ACE = r.cep % DIVACE. | ACE = r.cep % DIVACE. | |||
If the Data Sender received an earlier value of the counter that had | If the Data Sender received an earlier value of the counter that had | |||
been delayed due to ACK reordering, it might incorrectly calculate | been delayed due to ACK reordering, it might incorrectly calculate | |||
that the ACE field had wrapped. Therefore, on the arrival of every | that the ACE field had wrapped. Therefore, on the arrival of every | |||
ACK, the Data Sender ensures the ACK has not been superseded using | ACK, the Data Sender ensures the ACK has not been superseded using | |||
the TCP acknowledgement number, any SACK options and timestamps (if | the TCP acknowledgement number, any SACK options, and timestamps (if | |||
available) to calculate newlyAckedB, as in Appendix A.1. If the ACK | available) to calculate newlyAckedB, as in Appendix A.1. If the ACK | |||
has not been superseded, the Data Sender calculates the minimum | has not been superseded, the Data Sender calculates the minimum | |||
difference d.cep between the ACE field and its local s.cep counter, | difference d.cep between the ACE field and its local s.cep counter, | |||
using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
if ((newlyAckedB > 0) || (newlyAckedT > 0)) | if ((newlyAckedB > 0) || (newlyAckedT > 0)) | |||
d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE | d.cep = (ACE + DIVACE - (s.cep % DIVACE)) % DIVACE | |||
Section 3.2.2.5 expects the Data Sender to assume that the ACE field | Section 3.2.2.5 expects the Data Sender to assume that the ACE field | |||
cycled if it is the safest likely case under prevailing conditions. | cycled if it is the safest likely case under prevailing conditions. | |||
The 3-bit ACE field in an arriving ACK could have cycled and become | The 3-bit ACE field in an arriving ACK could have cycled and become | |||
ambiguous to the Data Sender if a sequence of ACKs goes missing that | ambiguous to the Data Sender if a sequence of ACKs goes missing that | |||
covers a stream of data long enough to contain 8 or more CE marks. | covers a stream of data long enough to contain 8 or more CE marks. | |||
We use the word `missing' rather than `lost', because some or all the | We use the word 'missing' rather than 'lost', because some or all the | |||
missing ACKs might arrive eventually, but out of order. Even if some | missing ACKs might arrive eventually, but out of order. Even if some | |||
of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | |||
retransmissions will not repair the lost AccECN information, because | retransmissions will not repair the lost AccECN information, because | |||
AccECN requires retransmissions to carry the latest AccECN counters, | AccECN requires retransmissions to carry the latest AccECN counters, | |||
not the original ones. | not the original ones. | |||
The phrase `under prevailing conditions' allows for implementation- | The phrase 'under prevailing conditions' allows for implementation- | |||
dependent interpretation. A Data Sender might take account of the | dependent interpretation. A Data Sender might take account of the | |||
prevailing size of data segments and the prevailing CE marking rate | prevailing size of data segments and the prevailing CE marking rate | |||
just before the sequence of missing ACKs. However, we shall start | just before the sequence of missing ACKs. However, we shall start | |||
with the simplest algorithm, which assumes segments are all full- | with the simplest algorithm, which assumes segments are all full- | |||
sized and ultra-conservatively it assumes that ECN marking was 100% | sized and ultra-conservatively it assumes that ECN marking was 100% | |||
on the forward path when ACKs on the reverse path started to all be | on the forward path when ACKs on the reverse path started to all be | |||
dropped. Specifically, if newlyAckedB is the amount of data that an | dropped. Specifically, if newlyAckedB is the amount of data that an | |||
ACK acknowledges since the previous ACK, then the Data Sender could | ACK acknowledges since the previous ACK, then the Data Sender could | |||
assume that this acknowledges newlyAckedPkt full-sized segments, | assume that this acknowledges newlyAckedPkt full-sized segments, | |||
where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | |||
ACE field incremented by | ACE field incremented by | |||
dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE), | dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | |||
For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | |||
size segments than any previous ACK, and that ACE increments by a | size segments than any previous ACK, and that ACE increments by a | |||
minimum of 2 CE marks (d.cep=2). The above formula works out that it | minimum of 2 CE marks (d.cep=2). The above formula works out that it | |||
would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | |||
2). However, if ACE increases by a minimum of 2 but acknowledges 10 | 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | |||
full-sized segments, then it would be necessary to assume that there | full-sized segments, then it would be necessary to assume that there | |||
could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | |||
Note that checks would need to be added to the above pseudocode for | Note that checks would need to be added to the above pseudocode for | |||
(d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | |||
wrongly estimated using an inappropriate packet size. | wrongly estimated using an inappropriate packet size. | |||
ACKs that acknowledge a large stretch of packets might be common in | ACKs that acknowledge a large stretch of packets might be common in | |||
data centres to achieve a high packet rate or might be due to ACK | data centres to achieve a high packet rate or might be due to ACK | |||
thinning by a middlebox. In these cases, cycling of the ACE field | thinning by a middlebox. In these cases, cycling of the ACE field | |||
would often appear to have been possible, so the above algorithm | would often appear to have been possible, so the above algorithm | |||
would be over-conservative, leading to a false high marking rate and | would be overly conservative, leading to a false high marking rate | |||
poor performance. Therefore it would be reasonable to only use | and poor performance. Therefore, it would be reasonable to only use | |||
dSafer.cep rather than d.cep if the moving average of newlyAckedPkt | dSafer.cep rather than d.cep if the moving average of newlyAckedPkt | |||
was well below 8. | was well below 8. | |||
Implementers could build in more heuristics to estimate prevailing | Implementers could build in more heuristics to estimate a prevailing | |||
average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
However, ultimately, if TCP's ECN feedback becomes inaccurate it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
segments were on average at least 5% of full-sized as long as ECN | segments were on average at least 5% of full-sized as long as ECN | |||
marking was 5% or less. Assuming it was used, the Data Sender would | marking was 5% or less. Assuming it was used, the Data Sender would | |||
increment its packet counter as follows: | increment its packet counter as follows: | |||
s.cep += dSafer.cep | s.cep += dSafer.cep | |||
If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
Section 3.2.2.5 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
made assumptions, so that it could re-evaluate them later. Given | made assumptions, so that it could re-evaluate them later. Given | |||
this could become complex and it is not required, we do not attempt | this could become complex and it is not required, we do not attempt | |||
to provide an example of how to do this. | to provide an example of how to do this. | |||
A.2.2. Safety Algorithm with the AccECN Option | A.2.2. Safety Algorithm with the AccECN Option | |||
When AccECN Options are available on the ACKs before and after the | When AccECN Options are available on the ACKs before and after the | |||
possible sequence of ACK losses, if the Data Sender only needs CE- | possible sequence of ACK losses, if the Data Sender only needs CE- | |||
marked bytes, it will have sufficient information in AccECN Options | marked bytes, it will have sufficient information in AccECN Options | |||
without needing to process the ACE field. If for some reason it | without needing to process the ACE field. If for some reason it | |||
needs CE-marked packets, if dSafer.cep is different from d.cep, it | needs CE-marked packets, if dSafer.cep is different from d.cep, it | |||
can determine whether d.cep is likely to be a safe enough estimate by | can determine whether d.cep is likely to be a safe enough estimate by | |||
checking whether the average marked segment size (s = d.ceb/d.cep) is | checking whether the average marked segment size (s = d.ceb/d.cep) is | |||
less than the MSS (where d.ceb is the amount of newly CE-marked bytes | less than the MSS (where d.ceb is the amount of newly CE-marked bytes | |||
- see Appendix A.1). Specifically, it could use the following | -- see Appendix A.1). Specifically, it could use the following | |||
algorithm: | algorithm: | |||
SAFETY_FACTOR = 2 | SAFETY_FACTOR = 2 | |||
if (dSafer.cep > d.cep) { | if (dSafer.cep > d.cep) { | |||
if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | |||
sSafer = d.ceb/dSafer.cep | sSafer = d.ceb/dSafer.cep | |||
if (sSafer < MSS/SAFETY_FACTOR) | if (sSafer < MSS/SAFETY_FACTOR) | |||
dSafer.cep = d.cep % d.cep is a safe enough estimate | dSafer.cep = d.cep % d.cep is a safe enough estimate | |||
} % else | } % else | |||
% No need for else; dSafer.cep is already correct, | % No need for else; dSafer.cep is already correct, | |||
skipping to change at page 67, line 22 ¶ | skipping to change at line 3084 ¶ | |||
MSS/SAFETY_FACTOR+--------------+ safest | MSS/SAFETY_FACTOR+--------------+ safest | |||
| | | | | | |||
| d.cep is safe| | | d.cep is safe| | |||
| enough | | | enough | | |||
+--------------------> | +--------------------> | |||
MSS s | MSS s | |||
The following examples give the reasoning behind the algorithm, | The following examples give the reasoning behind the algorithm, | |||
assuming MSS=1460 : | assuming MSS=1460 : | |||
* if d.cep=0, dSafer.cep=8 and d.ceb=1460, then s=infinity and | * if d.cep=0, dSafer.cep=8, and d.ceb=1460, then s=infinity and | |||
sSafer=182.5. | sSafer=182.5. | |||
Therefore even though the average size of 8 data segments is | Therefore, even though the average size of 8 data segments is | |||
unlikely to have been as small as MSS/8, d.cep cannot have been | unlikely to have been as small as MSS/8, d.cep cannot have been | |||
correct, because it would imply an average segment size greater | correct, because it would imply an average segment size greater | |||
than the MSS. | than the MSS. | |||
* if d.cep=2, dSafer.cep=10 and d.ceb=1460, then s=730 and | * if d.cep=2, dSafer.cep=10, and d.ceb=1460, then s=730 and | |||
sSafer=146. | sSafer=146. | |||
Therefore d.cep is safe enough, because the average size of 10 | Therefore d.cep is safe enough, because the average size of 10 | |||
data segments is unlikely to have been as small as MSS/10. | data segments is unlikely to have been as small as MSS/10. | |||
* if d.cep=7, dSafer.cep=15 and d.ceb=10200, then s=1457 and | * if d.cep=7, dSafer.cep=15, and d.ceb=10200, then s=1457 and | |||
sSafer=680. | sSafer=680. | |||
Therefore d.cep is safe enough, because the average data segment | Therefore d.cep is safe enough, because the average data segment | |||
size is more likely to have been just less than one MSS, rather | size is more likely to have been just less than one MSS, rather | |||
than below MSS/2. | than below MSS/2. | |||
If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
CE-marking from the ACE field in packets. Every time an ACK arrives, | a CE marking from the ACE field in packets. Every time an ACK | |||
to convert this into an estimate of CE-marked bytes, it needs an | arrives, to convert this into an estimate of CE-marked bytes, it | |||
average of the segment size, s_ave. Then it can add or subtract | needs an average of the segment size, s_ave. Then it can add or | |||
s_ave from the value of d.ceb as the value of d.cep increments or | subtract s_ave from the value of d.ceb as the value of d.cep | |||
decrements. Some possible ways to calculate s_ave are outlined | increments or decrements. Some possible ways to calculate s_ave are | |||
below. The precise details will depend on why an estimate of marked | outlined below. The precise details will depend on why an estimate | |||
bytes is needed. | of marked bytes is needed. | |||
The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
recalculate s_ave on every ACK. However it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
merely maintain a counter packets_in_flight for the number of packets | merely maintain a counter packets_in_flight for the number of packets | |||
in flight (including control packets), which is reset once per RTT. | in flight (including control packets), which is reset once per RTT. | |||
Either way, it would estimate s_ave as: | Either way, it would estimate s_ave as: | |||
s_ave ~= flightsize / packets_in_flight, | s_ave ~= flightsize / packets_in_flight, | |||
where flightsize is the variable that TCP already maintains for the | where flightsize is the variable that TCP already maintains for the | |||
number of bytes in flight and '~=' means 'approximately equal to'. | number of bytes in flight and '~=' means 'approximately equal to'. | |||
To avoid floating point arithmetic, it could right-bit-shift by | To avoid floating point arithmetic, it could right-bit-shift by | |||
lg(packets_in_flight), where lg() means log base 2. | lg(packets_in_flight), where lg() means log base 2. | |||
skipping to change at page 68, line 45 ¶ | skipping to change at line 3149 ¶ | |||
where a is the decay constant for the EWMA. However, then it is | where a is the decay constant for the EWMA. However, then it is | |||
necessary to choose a good value for this constant, which ought to | necessary to choose a good value for this constant, which ought to | |||
depend on the number of packets in flight. Also the decay constant | depend on the number of packets in flight. Also the decay constant | |||
needs to be power of two to avoid floating point arithmetic. | needs to be power of two to avoid floating point arithmetic. | |||
A.4. Example Algorithm to Count Not-ECT Bytes | A.4. Example Algorithm to Count Not-ECT Bytes | |||
A Data Sender in AccECN mode can infer the amount of TCP payload data | A Data Sender in AccECN mode can infer the amount of TCP payload data | |||
arriving at the receiver marked Not-ECT from the difference between | arriving at the receiver marked Not-ECT from the difference between | |||
the amount of newly ACKed data and the sum of the bytes with the | the amount of newly ACKed data and the sum of the bytes with the | |||
other three markings, d.ceb, d.e0b and d.e1b. | other three markings, d.ceb, d.e0b, and d.e1b. | |||
For this approach to be precise, it has to be assumed that spurious | For this approach to be precise, it has to be assumed that spurious | |||
(unnecessary) retransmissions do not lead to double counting. This | (unnecessary) retransmissions do not lead to double counting. This | |||
assumption is currently correct, given that RFC 3168 requires that | assumption is currently correct, given that RFC 3168 requires that | |||
the Data Sender marks retransmitted segments as Not-ECT. However, | the Data Sender mark retransmitted segments as Not-ECT. However, the | |||
the converse is not true; necessary retransmissions will result in | converse is not true; necessary retransmissions will result in | |||
under-counting. | undercounting. | |||
However, such precision is unlikely to be necessary. The only known | However, such precision is unlikely to be necessary. The only known | |||
use of a count of Not-ECT marked bytes is to test whether equipment | use of a count of Not-ECT marked bytes is to test whether equipment | |||
on the path is clearing the ECN field (perhaps due to an out-dated | on the path is clearing the ECN field (perhaps due to an out-dated | |||
attempt to clear, or bleach, what used to be the IPv4 ToS byte or the | attempt to clear, or bleach, what used to be the IPv4 ToS byte or the | |||
IPv6 Traffic Class field). To detect bleaching it will be sufficient | IPv6 Traffic Class field). To detect bleaching, it will be | |||
to detect whether nearly all bytes arrive marked as Not-ECT. | sufficient to detect whether nearly all bytes arrive marked as Not- | |||
Therefore there ought to be no need to keep track of the details of | ECT. Therefore, there ought to be no need to keep track of the | |||
retransmissions. | details of retransmissions. | |||
Appendix B. Rationale for Usage of TCP Header Flags | Appendix B. Rationale for Usage of TCP Header Flags | |||
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
version TCP ECN feedback scheme that both ends support, as justified | version TCP ECN feedback scheme that both ends support, as justified | |||
below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP ECN capability negotiation | |||
[RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
original reserved flags in the TCP header, and fell back to no ECN | original reserved flags in the TCP header, and fell back to No ECN | |||
support if the Server responded with the 2 flags cleared, which had | support if the Server responded with the 2 flags cleared, which had | |||
previously been the default. | previously been the default. | |||
Classic ECN used header flags rather than a TCP option because it was | Classic ECN used header flags rather than a TCP option because it was | |||
considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
the handshake. | the handshake. | |||
In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
this approach, using the 3rd least significant TCP header flag that | this approach, using the 3rd least significant TCP header flag that | |||
had previously been allocated for the ECN nonce (now historic). | had previously been allocated for the ECN-nonce (now historic). | |||
Then, whatever form of Server an AccECN Client encounters, the | Then, whatever form of Server an AccECN Client encounters, the | |||
connection can fall back to the highest version of feedback protocol | connection can fall back to the highest version of feedback protocol | |||
that both ends support, as explained in Section 3.1. | that both ends support, as explained in Section 3.1. | |||
If AccECN capability negotiation had used the more orthodox approach | If AccECN capability negotiation had used the more orthodox approach | |||
of a TCP option, it would still have had to set the two ECN flags in | of a TCP option, it would still have had to set the two ECN flags in | |||
the main TCP header, in order to be able to fall back to Classic RFC | the main TCP header, in order to be able to fall back to Classic ECN | |||
3168 ECN, or to disable ECN support, without another round of | [RFC3168], or to disable ECN support, without another round of | |||
negotiation. Then AccECN would also have had to handle all the | negotiation. Then AccECN would also have had to handle all the | |||
different ways that Servers currently respond to settings of the ECN | different ways that Servers currently respond to settings of the ECN | |||
flags in the main TCP header, including all the conflicting cases | flags in the main TCP header, including all of the conflicting cases | |||
where a Server might have said it supported one approach in the flags | where a Server might have said it supported one approach in the flags | |||
and another approach in a new TCP option. And AccECN would have had | and another approach in a new TCP option. And AccECN would have had | |||
to deal with all the additional possibilities where a middlebox might | to deal with all of the additional possibilities where a middlebox | |||
have mangled the ECN flags, or removed TCP options. Thus, usage of | might have mangled the ECN flags, or removed TCP options. Thus, | |||
the 3rd reserved TCP header flag simplified the protocol. | usage of the 3rd reserved TCP header flag simplified the protocol. | |||
The third flag was used in a way that could be distinguished from the | The third flag was used in a way that could be distinguished from the | |||
ECN nonce, in case any nonce deployment was encountered. Previous | ECN-nonce, in case any nonce deployment was encountered. Previous | |||
usage of this flag for the ECN nonce was integrated into the original | usage of this flag for the ECN-nonce was integrated into the original | |||
ECN negotiation. This further justified the 3rd flag's use for | ECN negotiation. This further justified the third flag's use for | |||
AccECN, because a non-ECN usage of this flag would have had to use it | AccECN, because a non-ECN usage of this flag would have had to use it | |||
as a separate single bit, rather than in combination with the other 2 | as a separate single bit, rather than in combination with the other 2 | |||
ECN flags. | ECN flags. | |||
Indeed, having overloaded the original uses of these three flags for | Indeed, having overloaded the original uses of these three flags for | |||
its handshake, AccECN overloads all three bits again as a 3-bit | its handshake, AccECN overloads all three bits again as a 3-bit | |||
counter. | counter. | |||
B.2. Four Codepoints in the SYN/ACK | B.2. Four Codepoints in the SYN/ACK | |||
Of the 8 possible codepoints that the 3 TCP header flags can indicate | Of the eight possible codepoints that the three TCP header flags can | |||
on the SYN/ACK, 4 already indicated earlier (or broken) versions of | indicate on the SYN/ACK, four already indicated earlier (or broken) | |||
ECN support, 1 now being historic. In the early design of AccECN, an | versions of ECN support, one now being Historic. In the early design | |||
AccECN Server could use only 2 of the 4 remaining codepoints. They | of AccECN, an AccECN Server could use only 2 of the 4 remaining | |||
both indicated AccECN support, but one fed back that the SYN had | codepoints. They both indicated AccECN support, but one fed back | |||
arrived marked as CE. Even though ECN support on a SYN is not yet on | that the SYN had arrived marked as CE. Even though ECN support on a | |||
the standards track, the idea is for either end to act as a | SYN is not yet on the Standards Track, the idea is for either end to | |||
mechanistic reflector, so that future capabilities can be | act as a mechanistic reflector, so that future capabilities can be | |||
unilaterally deployed without requiring 2-ended deployment (justified | unilaterally deployed without requiring 2-ended deployment (justified | |||
in Section 2.5). | in Section 2.5). | |||
During traversal testing it was discovered that the IP-ECN field in | During traversal testing, it was discovered that the IP-ECN field in | |||
the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
Therefore it was necessary to allow the SYN/ACK to feed all four IP- | Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | |||
ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
the connection due to mangling of the IP-ECN field (also explained in | the connection due to mangling of the IP-ECN field (also explained in | |||
Section 2.5). This development consumed the remaining 2 codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
earlier versions. | earlier versions. | |||
B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
completely different ECN feedback approach is needed: | completely different ECN feedback approach is needed. | |||
Future AccECN variants: When the AccECN capability is negotiated | Future AccECN variants: When the AccECN capability is negotiated | |||
during TCP's three-way handshake, the rows in Table 2 tagged as | during TCP's three-way handshake, the rows in Table 2 tagged as | |||
'Nonce' and 'Broken' in the column for the capability of node B | 'Nonce' and 'Broken' in the column for the capability of node B | |||
are unused by any current protocol in the RFC series. These could | are unused by any current protocol defined in the RFC series. | |||
be used by TCP Servers in future to indicate a variant of the | These could be used by TCP Servers in the future to indicate a | |||
AccECN protocol. In recent measurement studies in which the | variant of the AccECN protocol. In recent measurement studies in | |||
response of large numbers of Servers to an AccECN SYN has been | which the response of large numbers of Servers to an AccECN SYN | |||
tested, e.g., [Mandalari18], a very small number of SYN/ACKs | has been tested, e.g., [Mandalari18], a very small number of SYN/ | |||
arrive with the pattern tagged as 'Nonce', and a small but more | ACKs arrive with the pattern tagged as 'Nonce', and a small but | |||
significant number arrive with the pattern tagged as 'Broken'. | more significant number arrive with the pattern tagged as | |||
The 'Nonce' pattern could be a sign that a few Servers have | 'Broken'. The 'Nonce' pattern could be a sign that a few Servers | |||
implemented the ECN Nonce [RFC3540], which has now been | have implemented the ECN-nonce [RFC3540], which has now been | |||
reclassified as historic [RFC8311], or it could be the random | reclassified as Historic [RFC8311], or it could be the random | |||
result of some unknown middlebox behaviour. The greater | result of some unknown middlebox behaviour. The greater | |||
prevalence of the 'Broken' pattern suggests that some instances | prevalence of the 'Broken' pattern suggests that some instances | |||
still exist of the broken code that reflects the reserved flags on | still exist of the broken code that reflects the reserved flags on | |||
the SYN. | the SYN. | |||
The requirement not to reject unexpected initial values of the ACE | The requirement not to reject unexpected initial values of the ACE | |||
counter (in the main TCP header) in the last paragraph of | counter (in the main TCP header) in the last paragraph of | |||
Section 3.2.2.4 ensures that 3 unused codepoints on the ACK of the | Section 3.2.2.4 ensures that three unused codepoints on the ACK of | |||
SYN/ACK, 6 unused values on the first SYN=0 data packet from the | the SYN/ACK, six unused values on the first SYN=0 data packet from | |||
Client and 7 unused values on the first SYN=0 data packet from the | the Client, and seven unused values on the first SYN=0 data packet | |||
Server could be used to declare future variants of the AccECN | from the Server could be used to declare future variants of the | |||
protocol. The word 'declare' is used rather than 'negotiate' | AccECN protocol. The word 'declare' is used rather than | |||
because, at this late stage in the three-way handshake, it would | 'negotiate' because, at this late stage in the three-way | |||
be too late for a negotiation between the endpoints to be | handshake, it would be too late for a negotiation between the | |||
completed. A similar requirement not to reject unexpected initial | endpoints to be completed. A similar requirement not to reject | |||
values in AccECN TCP Options (Section 3.2.3.2.4) is for the same | unexpected initial values in AccECN TCP Options | |||
purpose. If traversal of AccECN TCP Options were reliable, this | (Section 3.2.3.2.4) is for the same purpose. If traversal of | |||
would have enabled a far wider range of future variation of the | AccECN TCP Options were reliable, this would have enabled a far | |||
whole AccECN protocol. Nonetheless, it could be used to reliably | wider range of future variation of the whole AccECN protocol. | |||
negotiate a wide range of variation in the semantics of the AccECN | Nonetheless, it could be used to reliably negotiate a wide range | |||
Option. | of variation in the semantics of the AccECN Option. | |||
Future non-AccECN variants: Five codepoints out of the 8 possible in | Future non-AccECN variants: Five codepoints out of the eight | |||
the 3 TCP header flags used by AccECN are unused on the initial | possible in the three TCP header flags used by AccECN are unused | |||
SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), (1,0,0), | on the initial SYN (in the order (AE,CWR,ECE)): (0,0,1), (0,1,0), | |||
(1,0,1), (1,1,0). Section 3.1.3 ensures that the installed base | (1,0,0), (1,0,1), (1,1,0). Section 3.1.3 ensures that the | |||
of AccECN Servers will all assume these are equivalent to AccECN | installed base of AccECN Servers will all assume these are | |||
negotiation with (1,1,1) on the SYN. These codepoints would not | equivalent to AccECN negotiation with (1,1,1) on the SYN. These | |||
allow fall-back to Classic ECN support for a Server that did not | codepoints would not allow fall-back to Classic ECN support for a | |||
understand them, but this approach ensures they are available in | Server that did not understand them, but this approach ensures | |||
future, perhaps for uses other than ECN alongside the AccECN | they are available in the future, perhaps for uses other than ECN | |||
scheme. All possible combinations of SYN/ACK could be used in | alongside the AccECN scheme. All possible combinations of SYN/ACK | |||
response except either (0,0,0) or reflection of the same values | could be used in response except either (0,0,0) or reflection of | |||
sent on the SYN. | the same values sent on the SYN. | |||
In order to extend AccECN or ECN in future, other ways could be | In order to extend AccECN or ECN in the future, other ways could | |||
resorted to, although their traversal properties are likely to be | be resorted to, although their traversal properties are likely to | |||
inferior. They include a new TCP option; using the remaining | be inferior. They include a new TCP option; using the remaining | |||
reserved flags in the main TCP header (preferably extending the | reserved flags in the main TCP header (preferably extending the | |||
3-bit combinations used by AccECN to 4-bit combinations, rather | 3-bit combinations used by AccECN to 4-bit combinations, rather | |||
than burning one bit for just one state); a non-zero urgent | than burning one bit for just one state); a non-zero urgent | |||
pointer in combination with the URG flag cleared; or some other | pointer in combination with the URG flag cleared; or some other | |||
unexpected combination of fields yet to be invented. | unexpected combination of fields yet to be invented. | |||
Acknowledgements | Acknowledgements | |||
We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | |||
Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | |||
skipping to change at page 72, line 23 ¶ | skipping to change at line 3322 ¶ | |||
Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan | Järvinen, Neal Cardwell, Yoshifumi Nishida, Martin Duke, Jonathan | |||
Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and | Morton, Vidhi Goel, Alex Burr, Markku Kojo, Grenville Armitage and | |||
Wes Eddy for their input and discussion. The idea of using the three | Wes Eddy for their input and discussion. The idea of using the three | |||
ECN-related TCP flags as one field for more accurate TCP-ECN feedback | ECN-related TCP flags as one field for more accurate TCP-ECN feedback | |||
was first introduced in the re-ECN protocol that was the ancestor of | was first introduced in the re-ECN protocol that was the ancestor of | |||
ConEx. | ConEx. | |||
The following contributed implementations of AccECN that validated | The following contributed implementations of AccECN that validated | |||
and helped to improve this specification: | and helped to improve this specification: | |||
Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell and Chia-Yu | Linux: Mirja Kühlewind, Ilpo Järvinen, Neal Cardwell, and Chia-Yu | |||
Chang; | Chang | |||
FreeBSD: Richard Scheffenegger; | FreeBSD: Richard Scheffenegger | |||
Apple OSs: Vidhi Goel. | Apple OSs: Vidhi Goel | |||
Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation | Bob Briscoe was part-funded by Apple Inc, the Comcast Innovation | |||
Fund, the European Community under its Seventh Framework Programme | Fund, the European Community under its Seventh Framework Programme | |||
through the Reducing Internet Transport Latency (RITE) project (ICT- | through the Reducing Internet Transport Latency (RITE) project (ICT- | |||
317700) and through the Trilogy 2 project (ICT-317756), and the | 317700) and through the Trilogy 2 project (ICT-317756), and the | |||
Research Council of Norway through the TimeIn project. The views | Research Council of Norway through the TimeIn project. The views | |||
expressed here are solely those of the authors. | expressed here are solely those of the authors. | |||
Mirja Kühlewind was partly supported by the European Commission under | Mirja Kühlewind was partly supported by the European Commission under | |||
Horizon 2020 grant agreement no. 688421 Measurement and Architecture | Horizon 2020 grant agreement no. 688421 Measurement and Architecture | |||
for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat | for a Middleboxed Internet (MAMI), and by the Swiss State Secretariat | |||
for Education, Research, and Innovation under contract no. 15.0268. | for Education, Research, and Innovation under contract no. 15.0268. | |||
This support does not imply endorsement. | This support does not imply endorsement. | |||
Comments Solicited | ||||
This section is to be removed before publishing as an RFC. | ||||
Comments and questions are encouraged and very welcome. They can be | ||||
addressed to the IETF TCP maintenance and minor modifications working | ||||
group mailing list <tcpm@ietf.org>, and/or to the authors. | ||||
Authors' Addresses | Authors' Addresses | |||
Bob Briscoe | Bob Briscoe | |||
Independent | Independent | |||
United Kingdom | United Kingdom | |||
Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
URI: http://bobbriscoe.net/ | URI: http://bobbriscoe.net/ | |||
Mirja Kühlewind | Mirja Kühlewind | |||
Ericsson | Ericsson | |||
End of changes. 286 change blocks. | ||||
740 lines changed or deleted | 706 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |