rfc9768v1.txt | rfc9768.txt | |||
---|---|---|---|---|
skipping to change at line 20 ¶ | skipping to change at line 20 ¶ | |||
More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | More Accurate Explicit Congestion Notification (AccECN) Feedback in TCP | |||
Abstract | Abstract | |||
Explicit Congestion Notification (ECN) is a mechanism by which | Explicit Congestion Notification (ECN) is a mechanism by which | |||
network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
sender. ECN was originally specified for TCP in such a way that only | sender. ECN was originally specified for TCP in such a way that only | |||
one feedback signal can be transmitted per Round-Trip Time (RTT). | one feedback signal can be transmitted per Round-Trip Time (RTT). | |||
Newer TCP mechanisms like Congestion Exposure (ConEx), Data Center | More recently defined TCP mechanisms like Congestion Exposure | |||
TCP (DCTCP), or Low Latency, Low Loss, and Scalable Throughput (L4S) | (ConEx), Data Center TCP (DCTCP), or Low Latency, Low Loss, and | |||
need more Accurate ECN (AccECN) feedback information whenever more | Scalable Throughput (L4S) need more Accurate ECN (AccECN) feedback | |||
than one marking is received in one RTT. This document updates the | information whenever more than one marking is received in one RTT. | |||
original ECN specification defined in RFC 3168 by specifying a scheme | This document updates the original ECN specification defined in RFC | |||
that provides more than one feedback signal per RTT in the TCP | 3168 by specifying a scheme that provides more than one feedback | |||
header. Given TCP header space is scarce, it allocates a reserved | signal per RTT in the TCP header. Given TCP header space is scarce, | |||
header bit previously assigned to the ECN-nonce. It also overloads | it allocates a reserved header bit previously assigned to the ECN- | |||
the two existing ECN flags in the TCP header. The resulting extra | nonce. It also overloads the two existing ECN flags in the TCP | |||
space is additionally exploited to feed back the IP-ECN field | header. The resulting extra space is additionally exploited to feed | |||
received during the TCP connection establishment. Supplementary | back the IP ECN field received during the TCP connection | |||
feedback information can optionally be provided in two new TCP option | establishment. Supplementary feedback information can optionally be | |||
alternatives, which are never used on the TCP SYN. The document also | provided in two new TCP Option alternatives, which are never used on | |||
specifies the treatment of this updated TCP wire protocol by | the TCP SYN. The document also specifies the treatment of this | |||
middleboxes. | updated TCP wire protocol by middleboxes. | |||
Status of This Memo | Status of This Memo | |||
This is an Internet Standards Track document. | This is an Internet Standards Track document. | |||
This document is a product of the Internet Engineering Task Force | This document is a product of the Internet Engineering Task Force | |||
(IETF). It represents the consensus of the IETF community. It has | (IETF). It represents the consensus of the IETF community. It has | |||
received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
skipping to change at line 158 ¶ | skipping to change at line 158 ¶ | |||
Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | Explicit Congestion Notification (ECN) [RFC3168] is a mechanism by | |||
which network nodes can mark IP packets instead of dropping them to | which network nodes can mark IP packets instead of dropping them to | |||
indicate incipient congestion to the endpoints. Receivers with an | indicate incipient congestion to the endpoints. Receivers with an | |||
ECN-capable transport protocol feed back this information to the | ECN-capable transport protocol feed back this information to the | |||
sender. In RFC 3168, ECN was specified for TCP in such a way that | sender. In RFC 3168, ECN was specified for TCP in such a way that | |||
only one feedback signal could be transmitted per Round-Trip Time | only one feedback signal could be transmitted per Round-Trip Time | |||
(RTT). This is sufficient for congestion control schemes like Reno | (RTT). This is sufficient for congestion control schemes like Reno | |||
[RFC6582] and CUBIC [RFC9438], as those schemes reduce their | [RFC6582] and CUBIC [RFC9438], as those schemes reduce their | |||
congestion window by a fixed factor if congestion occurs within an | congestion window by a fixed factor if congestion occurs within an | |||
RTT independent of the number of received congestion markings. | RTT independent of the number of received congestion markings. More | |||
Recently, proposed mechanisms like Congestion Exposure (ConEx | recently defined mechanisms like Congestion Exposure (ConEx | |||
[RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | [RFC7713]), DCTCP [RFC8257], and L4S [RFC9330] need to know when more | |||
than one marking is received in one RTT, which is information that | than one marking is received in one RTT, which is information that | |||
cannot be provided by the feedback scheme as specified in [RFC3168]. | cannot be provided by the feedback scheme as specified in [RFC3168]. | |||
This document specifies an update to the ECN feedback scheme of RFC | This document specifies an update to the ECN feedback scheme of RFC | |||
3168 that provides more accurate information and could be used by | 3168 that provides more accurate information and could be used by | |||
these and potentially other future TCP extensions, while still also | these and potentially other future TCP extensions, while still also | |||
supporting the pre-existing TCP congestion controllers that use just | supporting the pre-existing TCP congestion controllers that use just | |||
one feedback signal per round. Congestion control is the term the | one feedback signal per round. Congestion control is the term the | |||
IETF uses to describe data rate management. It is the algorithm that | IETF uses to describe data rate management. It is the algorithm that | |||
a sender uses to optimize its sending rate so that it transmits data | a sender uses to optimize its sending rate so that it transmits data | |||
skipping to change at line 224 ¶ | skipping to change at line 224 ¶ | |||
CUBIC, AccECN can be used to respond to the extent of congestion | CUBIC, AccECN can be used to respond to the extent of congestion | |||
notification over a round trip, as for example DCTCP does in | notification over a round trip, as for example DCTCP does in | |||
controlled environments [RFC8257]. For congestion response, this | controlled environments [RFC8257]. For congestion response, this | |||
specification refers to the original ECN specification adopted in | specification refers to the original ECN specification adopted in | |||
2001 [RFC3168], as updated by the more relaxed rules introduced in | 2001 [RFC3168], as updated by the more relaxed rules introduced in | |||
2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | 2018 to allow ECN experiments [RFC8311], namely: a TCP-based Low | |||
Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | Latency Low Loss Scalable (L4S) congestion control [RFC9330]; or | |||
Alternative Backoff with ECN (ABE) [RFC8511]. | Alternative Backoff with ECN (ABE) [RFC8511]. | |||
Section 5.2 explains how AccECN is compatible with current commonly | Section 5.2 explains how AccECN is compatible with current commonly | |||
used TCP options, and a number of current experimental modifications | used TCP Options, and a number of current experimental modifications | |||
to TCP, as well as SYN cookies. | to TCP, as well as SYN cookies. | |||
1.1. Document Roadmap | 1.1. Document Roadmap | |||
The following introductory section outlines the goals of AccECN | The following introductory section outlines the goals of AccECN | |||
(Section 1.2). Then, terminology is defined (Section 1.3) and a | (Section 1.2). Then, terminology is defined (Section 1.3) and a | |||
recap of existing prerequisite technology is given (Section 1.4). | recap of existing prerequisite technology is given (Section 1.4). | |||
Section 2 gives an informative overview of the AccECN protocol. Then | Section 2 gives an informative overview of the AccECN protocol. Then | |||
Section 3 gives the normative protocol specification, and Section 3.3 | Section 3 gives the normative protocol specification, and Section 3.3 | |||
skipping to change at line 317 ¶ | skipping to change at line 317 ¶ | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
capitals, as shown here. | capitals, as shown here. | |||
1.4. Recap of Existing ECN Feedback in IP/TCP | 1.4. Recap of Existing ECN Feedback in IP/TCP | |||
Explicit Congestion Notification (ECN) [RFC3168] can be split into | Explicit Congestion Notification (ECN) [RFC3168] can be split into | |||
two parts conceptionally. In the forward direction, alongside the | two parts conceptionally. In the forward direction, alongside the | |||
data stream, it uses a 2-bit field in the IP header. This is | data stream, it uses a 2-bit field in the IP header. This is | |||
referred to as IP-ECN later on. This signal carried in the IP (Layer | referred to as IP ECN later on. This signal carried in the IP (Layer | |||
3) header is exposed to network devices and may be modified when such | 3) header is exposed to network devices and may be modified when such | |||
a device starts to experience congestion (see Table 1). The second | a device starts to experience congestion (see Table 1). The second | |||
part is the feedback mechanism, by which the original data sender is | part is the feedback mechanism, by which the original data sender is | |||
notified of the current congestion state of the intermediate path. | notified of the current congestion state of the intermediate path. | |||
That returned signal is carried in a protocol-specific manner, and is | That returned signal is carried in a protocol-specific manner, and is | |||
not to be modified by intermediate network devices. While ECN is in | not to be modified by intermediate network devices. While ECN is in | |||
active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | active use for protocols such as QUIC [RFC9000], SCTP [RFC9260], RTP | |||
[RFC6679], and Remote Direct Memory Access over Converged Ethernet | [RFC6679], and Remote Direct Memory Access over Converged Ethernet | |||
[RoCEv2], this document only concerns itself with the specific | [RoCEv2], this document only concerns itself with the specific | |||
implementation for the TCP protocol. | implementation for the TCP protocol. | |||
skipping to change at line 343 ¶ | skipping to change at line 343 ¶ | |||
0b00, the packet is considered to have been sent by a Not ECN-capable | 0b00, the packet is considered to have been sent by a Not ECN-capable | |||
Transport (Not-ECT). When a network node experiences congestion, it | Transport (Not-ECT). When a network node experiences congestion, it | |||
will occasionally either drop or mark a packet, with the choice | will occasionally either drop or mark a packet, with the choice | |||
depending on the packet's ECN codepoint. If the codepoint is Not- | depending on the packet's ECN codepoint. If the codepoint is Not- | |||
ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | ECT, only drop is appropriate. If the codepoint is ECT(0) or ECT(1), | |||
the node can mark the packet by setting the ECN codepoint to 0b11, | the node can mark the packet by setting the ECN codepoint to 0b11, | |||
which is termed 'Congestion Experienced' (CE), or loosely a | which is termed 'Congestion Experienced' (CE), or loosely a | |||
'congestion mark'. Table 1 summarises these codepoints. | 'congestion mark'. Table 1 summarises these codepoints. | |||
+==================+================+===========================+ | +==================+================+===========================+ | |||
| IP-ECN codepoint | Codepoint name | Description | | | IP ECN Codepoint | Codepoint Name | Description | | |||
+==================+================+===========================+ | +==================+================+===========================+ | |||
| 0b00 | Not-ECT | Not ECN-Capable Transport | | | 0b00 | Not-ECT | Not ECN-Capable Transport | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b01 | ECT(1) | ECN-Capable Transport (1) | | | 0b01 | ECT(1) | ECN-Capable Transport (1) | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b10 | ECT(0) | ECN-Capable Transport (0) | | | 0b10 | ECT(0) | ECN-Capable Transport (0) | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
| 0b11 | CE | Congestion Experienced | | | 0b11 | CE | Congestion Experienced | | |||
+------------------+----------------+---------------------------+ | +------------------+----------------+---------------------------+ | |||
skipping to change at line 404 ¶ | skipping to change at line 404 ¶ | |||
Like the general TCP approach, the Data Receiver of each TCP half- | Like the general TCP approach, the Data Receiver of each TCP half- | |||
connection sends AccECN feedback to the Data Sender on TCP | connection sends AccECN feedback to the Data Sender on TCP | |||
acknowledgements, reusing data packets of the other half-connection | acknowledgements, reusing data packets of the other half-connection | |||
whenever possible. | whenever possible. | |||
The AccECN protocol has had to be designed in two parts: | The AccECN protocol has had to be designed in two parts: | |||
* an essential feedback part that reuses the TCP-ECN header bits for | * an essential feedback part that reuses the TCP-ECN header bits for | |||
the Data Receiver to feed back the number of packets arriving with | the Data Receiver to feed back the number of packets arriving with | |||
CE in the IP-ECN field. This provides more accuracy than Classic | CE in the IP ECN field. This provides more accuracy than Classic | |||
ECN feedback, but limited resilience against ACK loss; | ECN feedback, but limited resilience against ACK loss. | |||
* a supplementary feedback part using one of two new alternative | * a supplementary feedback part using one of two new alternative | |||
AccECN TCP options that provide additional feedback on the number | AccECN TCP Options that provide additional feedback on the number | |||
of payload bytes that arrive marked with each of the three ECN | of payload bytes that arrive marked with each of the three ECN | |||
codepoints in the IP-ECN field (not just CE marks). See the BCP | codepoints in the IP ECN field (not just CE marks). See the BCP | |||
on Byte and Packet Congestion Notification [RFC7141] for the | on Byte and Packet Congestion Notification [RFC7141] for the | |||
rationale determining that conveying congested payload bytes | rationale determining that conveying congested payload bytes | |||
should be preferred over just providing feedback about congested | should be preferred over just providing feedback about congested | |||
packets. This also provides greater resilience against ACK loss | packets. This also provides greater resilience against ACK loss | |||
than the essential feedback, but it is currently more likely to | than the essential feedback, but it is currently more likely to | |||
suffer from middlebox interference. | suffer from middlebox interference. | |||
The two part design was necessary, given limitations on the space | The two part design was necessary, given limitations on the space | |||
available for TCP options and given the possibility that certain | available for TCP Options and given the possibility that certain | |||
incorrectly designed middleboxes might prevent TCP from using any new | incorrectly designed middleboxes might prevent TCP from using any new | |||
options. | options. | |||
The essential feedback part overloads the previous definition of the | The essential feedback part overloads the previous definition of the | |||
three flags in the TCP header that had been assigned for use by | three flags in the TCP header that had been assigned for use by | |||
Classic ECN. This design choice deliberately allows AccECN peers to | Classic ECN. This design choice deliberately allows AccECN peers to | |||
replace the Classic ECN feedback protocol, rather than leaving | replace the Classic ECN feedback protocol, rather than leaving | |||
Classic ECN feedback intact and adding more accurate feedback | Classic ECN feedback intact and adding more accurate feedback | |||
separately because: | separately because: | |||
* this efficiently reuses scarce TCP header space, given TCP option | * this efficiently reuses scarce TCP header space, given TCP Option | |||
space is approaching saturation; | space is approaching saturation. | |||
* a single upgrade path for the TCP protocol is preferable to a fork | * a single upgrade path for the TCP protocol is preferable to a fork | |||
in the design that modifies the TCP header to convey all ECN | in the design that modifies the TCP header to convey all ECN | |||
feedback; | feedback. | |||
* otherwise, Classic and Accurate ECN feedback could give | * otherwise, Classic and Accurate ECN feedback could give | |||
conflicting feedback about the same segment, which could open up | conflicting feedback about the same segment, which could open up | |||
new security concerns and make implementations unnecessarily | new security concerns and make implementations unnecessarily | |||
complex; | complex. | |||
* middleboxes are more likely to faithfully forward the TCP ECN | * middleboxes are more likely to faithfully forward the TCP ECN | |||
flags than newly defined areas of the TCP header. | flags than newly defined areas of the TCP header. | |||
AccECN is designed to work even if the supplementary feedback part is | AccECN is designed to work even if the supplementary feedback part is | |||
removed or zeroed out, as long as the essential feedback part gets | removed or zeroed out, as long as the essential feedback part gets | |||
through. | through. | |||
2.1. Capability Negotiation | 2.1. Capability Negotiation | |||
skipping to change at line 470 ¶ | skipping to change at line 470 ¶ | |||
An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | An AccECN TCP Client does not send an AccECN Option on the SYN as SYN | |||
option space is limited. The TCP Server sends an AccECN Option on | option space is limited. The TCP Server sends an AccECN Option on | |||
the SYN/ACK, and the TCP Client sends one on the first ACK to test | the SYN/ACK, and the TCP Client sends one on the first ACK to test | |||
whether the network path forwards these options correctly. | whether the network path forwards these options correctly. | |||
2.2. Feedback Mechanism | 2.2. Feedback Mechanism | |||
A Data Receiver maintains four counters initialized at the start of | A Data Receiver maintains four counters initialized at the start of | |||
the half-connection. Three count the number of arriving payload | the half-connection. Three count the number of arriving payload | |||
bytes marked CE, ECT(1), and ECT(0) in the IP-ECN field. These byte | bytes marked CE, ECT(1), and ECT(0) in the IP ECN field. These byte | |||
counters reflect only the TCP payload length, excluding the TCP | counters reflect only the TCP payload length, excluding the TCP | |||
header and TCP options. The fourth counter counts the number of | header and TCP Options. The fourth counter counts the number of | |||
packets arriving marked with a CE codepoint (including control | packets arriving marked with a CE codepoint (including control | |||
packets without payload if they are CE-marked). | packets without payload if they are CE-marked). | |||
The Data Sender maintains four equivalent counters for the half | The Data Sender maintains four equivalent counters for the half- | |||
connection, and the AccECN protocol is designed to ensure they will | connection, and the AccECN protocol is designed to ensure they will | |||
match the values in the Data Receiver's counters, albeit after a | match the values in the Data Receiver's counters, albeit after a | |||
little delay. | little delay. | |||
Each ACK carries the three least significant bits (LSBs) of the | Each ACK carries the three least significant bits (LSBs) of the | |||
packet-based CE counter using the ECN bits in the TCP header, now | packet-based CE counter using the ECN bits in the TCP header, now | |||
renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | renamed the Accurate ECN (ACE) field (see Figure 3). The 24 LSBs of | |||
some or all of the byte counters can be optionally carried in an | some or all of the byte counters can be optionally carried in an | |||
AccECN Option. For efficient use of limited option space, two | AccECN Option. For efficient use of limited option space, two | |||
alternative forms of the AccECN Option are specified with the fields | alternative forms of the AccECN Option are specified with the fields | |||
skipping to change at line 557 ¶ | skipping to change at line 557 ¶ | |||
other than the L4S experiment [RFC9330], such as a lower severity or | other than the L4S experiment [RFC9330], such as a lower severity or | |||
a more instant congestion signal than CE. | a more instant congestion signal than CE. | |||
Feedback in bytes is provided to protect against the receiver or a | Feedback in bytes is provided to protect against the receiver or a | |||
middlebox using attacks similar to 'ACK-Division' to artificially | middlebox using attacks similar to 'ACK-Division' to artificially | |||
inflate the congestion window, which is why [RFC5681] now recommends | inflate the congestion window, which is why [RFC5681] now recommends | |||
that TCP counts acknowledge bytes not packets. | that TCP counts acknowledge bytes not packets. | |||
2.5. Generic (Mechanistic) Reflector | 2.5. Generic (Mechanistic) Reflector | |||
The ACE field provides feedback about CE markings in the IP-ECN field | The ACE field provides feedback about CE markings in the IP ECN field | |||
of both data and control packets. According to [RFC3168], the Data | of both data and control packets. According to [RFC3168], the Data | |||
Sender is meant to set the IP-ECN field of control packets to Not- | Sender is meant to set the IP ECN field of control packets to Not- | |||
ECT. However, mechanisms in certain private networks (e.g., data | ECT. However, mechanisms in certain private networks (e.g., data | |||
centres) set control packets to be ECN-capable because they are | centres) set control packets to be ECN-capable because they are | |||
precisely the packets that performance depends on most. | precisely the packets that performance depends on most. | |||
For this reason, AccECN is designed to be a generic reflector of | For this reason, AccECN is designed to be a generic reflector of | |||
whatever ECN markings it sees, whether or not they are compliant with | whatever ECN markings it sees, whether or not they are compliant with | |||
a current standard. Then as standards evolve, Data Senders can | a current standard. Then as standards evolve, Data Senders can | |||
upgrade unilaterally without any need for receivers to upgrade too. | upgrade unilaterally without any need for receivers to upgrade too. | |||
It is also useful to be able to rely on generic reflection behaviour | It is also useful to be able to rely on generic reflection behaviour | |||
when senders need to test for unexpected interference with markings | when senders need to test for unexpected interference with markings | |||
(for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | (for instance Sections 3.2.2.3, 3.2.2.4, and 3.2.3.2 of the present | |||
document and paragraph 2 of Section 20.2 of [RFC3168]). | document and paragraph 2 of Section 20.2 of [RFC3168]). | |||
The initial SYN and SYN/ACK are the most critical control packets, so | The initial SYN and SYN/ACK are the most critical control packets, so | |||
AccECN feeds back their IP-ECN fields. Although RFC 3168 prohibits | AccECN feeds back their IP ECN fields. Although RFC 3168 prohibits | |||
ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | ECN-capable SYNs and SYN/ACKs, providing feedback of ECN marking on | |||
the SYN and SYN/ACK supports future scenarios in which SYNs might be | the SYN and SYN/ACK supports future scenarios in which SYNs might be | |||
ECN-enabled (without prejudging whether they ought to be). For | ECN-enabled (without prejudging whether they ought to be). For | |||
instance, [RFC8311] updates this aspect of RFC 3168 to allow | instance, [RFC8311] updates this aspect of RFC 3168 to allow | |||
experimentation with ECN-capable TCP control packets. | experimentation with ECN-capable TCP control packets. | |||
Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | Even if the TCP Client (or Server) has set the SYN (or SYN/ACK) to | |||
Not-ECT in compliance with RFC 3168, feedback on the state of the IP- | Not-ECT in compliance with RFC 3168, feedback on the state of the IP | |||
ECN field when it arrives at the receiver could still be useful, | ECN field when it arrives at the receiver could still be useful, | |||
because middleboxes have been known to overwrite the IP-ECN field as | because middleboxes have been known to overwrite the IP ECN field as | |||
if it is still part of the old Type of Service (ToS) field | if it is still part of the old Type of Service (ToS) field | |||
[Mandalari18]. For example, if a TCP Client has set the SYN to Not- | [Mandalari18]. For example, if a TCP Client has set the SYN to Not- | |||
ECT, but receives feedback that the IP-ECN field on the SYN arrived | ECT, but receives feedback that the IP ECN field on the SYN arrived | |||
with a different codepoint, it can detect such middlebox | with a different codepoint, it can detect such middlebox | |||
interference. Previously, neither end knew what IP-ECN field the | interference. Previously, neither end knew what IP ECN field the | |||
other sent. So, if a TCP Server received ECT or CE on a SYN, it | other sent. So, if a TCP Server received ECT or CE on a SYN, it | |||
could not know whether it was invalid because only the TCP Client | could not know whether it was invalid because only the TCP Client | |||
knew whether it originally marked the SYN as Not-ECT (or ECT). | knew whether it originally marked the SYN as Not-ECT (or ECT). | |||
Therefore, prior to AccECN, the Server's only safe course of action | Therefore, prior to AccECN, the Server's only safe course of action | |||
in this example was to disable ECN for the connection. Instead, the | in this example was to disable ECN for the connection. Instead, the | |||
AccECN protocol allows the Server and Client to feed back the ECN | AccECN protocol allows the Server and Client to feed back the ECN | |||
field received on the SYN and SYN/ACK to their peer, which now has | field received on the SYN and SYN/ACK to their peer, which now has | |||
all the information to decide whether the connection has to fall back | all the information to decide whether the connection has to fall back | |||
from supporting ECN (or not). | from supporting ECN (or not). | |||
skipping to change at line 627 ¶ | skipping to change at line 627 ¶ | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 2: The New Definition of the TCP Header Flags During the | Figure 2: The New Definition of the TCP Header Flags During the | |||
TCP Three-Way Handshake | TCP Three-Way Handshake | |||
During the TCP three-way handshake at the start of a connection, to | During the TCP three-way handshake at the start of a connection, to | |||
request more Accurate ECN feedback the TCP Client (host A) MUST set | request more Accurate ECN feedback the TCP Client (host A) MUST set | |||
the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | the TCP flags (AE,CWR,ECE) = (1,1,1) in the initial SYN segment. | |||
If a TCP Server (host B) that is AccECN-enabled receives a SYN with | If a TCP Server (host B) that is AccECN-enabled receives a SYN with | |||
the above three flags set, it MUST set both its half connections into | the above three flags set, it MUST set both its half-connections into | |||
AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | AccECN mode. Then it MUST set the AE, CWR, and ECE TCP flags on the | |||
SYN/ACK to the combination in the top block of Table 2 that feeds | SYN/ACK to the combination in the top block of Table 2 that feeds | |||
back the IP-ECN field that arrived on the SYN. This applies whether | back the IP ECN field that arrived on the SYN. This applies whether | |||
or not the Server itself supports setting the IP-ECN field on a SYN | or not the Server itself supports setting the IP ECN field on a SYN | |||
or SYN/ACK (see Section 2.5 for rationale). | or SYN/ACK (see Section 2.5 for rationale). | |||
When the TCP Server returns any of the four combinations in the top | When the TCP Server returns any of the four combinations in the top | |||
block of Table 2, it confirms that it supports AccECN. The TCP | block of Table 2, it confirms that it supports AccECN. The TCP | |||
Server MUST NOT set one of these four combinations of flags on the | Server MUST NOT set one of these four combinations of flags on the | |||
SYN/ACK unless the preceding SYN requested support for AccECN as | SYN/ACK unless the preceding SYN requested support for AccECN as | |||
above. | above. | |||
Once a TCP Client (A) has sent the above SYN to declare that it | Once a TCP Client (A) has sent the above SYN to declare that it | |||
supports AccECN, and once it has received the above SYN/ACK segment | supports AccECN, and once it has received the above SYN/ACK segment | |||
that confirms that the TCP Server supports AccECN, the TCP Client | that confirms that the TCP Server supports AccECN, the TCP Client | |||
MUST set both its half connections into AccECN mode. The TCP Client | MUST set both its half-connections into AccECN mode. The TCP Client | |||
MUST NOT enter AccECN mode (or any feedback mode) before it has | MUST NOT enter AccECN mode (or any feedback mode) before it has | |||
received the first SYN/ACK. | received the first SYN/ACK. | |||
Once in AccECN mode, a TCP Client or Server has the rights and | Once in AccECN mode, a TCP Client or Server has the rights and | |||
obligations to participate in the ECN protocol defined in | obligations to participate in the ECN protocol defined in | |||
Section 3.1.5. | Section 3.1.5. | |||
The procedures for retransmission of SYNs or SYN/ACKs are given in | The procedures for retransmission of SYNs or SYN/ACKs are given in | |||
Section 3.1.4. | Section 3.1.4. | |||
skipping to change at line 669 ¶ | skipping to change at line 669 ¶ | |||
3.1.2. Backward Compatibility | 3.1.2. Backward Compatibility | |||
The three flags are set to 1 to indicate AccECN support on the SYN | The three flags are set to 1 to indicate AccECN support on the SYN | |||
have been carefully chosen to enable natural fall-back to prior | have been carefully chosen to enable natural fall-back to prior | |||
stages in the evolution of ECN. Table 2 tabulates all the | stages in the evolution of ECN. Table 2 tabulates all the | |||
negotiation possibilities for ECN-related capabilities that involve | negotiation possibilities for ECN-related capabilities that involve | |||
at least one AccECN-capable host. The entries in the first two | at least one AccECN-capable host. The entries in the first two | |||
columns have been abbreviated, as follows: | columns have been abbreviated, as follows: | |||
AccECN: Supports more Accurate ECN feedback (the present | AccECN: Supports more Accurate ECN feedback (the present | |||
specification) | specification). | |||
Nonce: Supports ECN-nonce feedback [RFC3540] | Nonce: Supports ECN-nonce feedback [RFC3540]. | |||
ECN: Supports 'Classic' ECN feedback [RFC3168] | ECN: Supports 'Classic' ECN feedback [RFC3168]. | |||
No ECN: Not ECN-capable. Implicit congestion notification using | No ECN: Not ECN-capable. Implicit congestion notification using | |||
packet drop. | packet drop. | |||
+========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| Host A | Host B | SYN | SYN/ACK | Feedback Mode | | | Host A | Host B | SYN | SYN/ACK | Feedback Mode | | |||
| | | A->B | B->A | of Host A | | | | | A->B | B->A | of Host A | | |||
| | | AE CWR ECE | AE CWR ECE | | | | | | AE CWR ECE | AE CWR ECE | | | |||
+========+========+============+============+======================+ | +========+========+============+============+======================+ | |||
| AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | | AccECN | AccECN | 1 1 1 | 0 1 0 | AccECN (Not-ECT SYN) | | |||
skipping to change at line 716 ¶ | skipping to change at line 716 ¶ | |||
row. | row. | |||
1. The top block shows the case already described in Section 3.1 | 1. The top block shows the case already described in Section 3.1 | |||
where both endpoints support AccECN and how the TCP Server (B) | where both endpoints support AccECN and how the TCP Server (B) | |||
indicates congestion feedback. | indicates congestion feedback. | |||
2. The second block shows the cases where the TCP Client (A) | 2. The second block shows the cases where the TCP Client (A) | |||
supports AccECN but the TCP Server (B) supports some earlier | supports AccECN but the TCP Server (B) supports some earlier | |||
variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | variant of TCP feedback, as indicated in its SYN/ACK. Therefore, | |||
as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | as soon as an AccECN-capable TCP Client (A) receives the SYN/ACK | |||
shown, it MUST set both its half connections into the feedback | shown, it MUST set both its half-connections into the feedback | |||
mode shown in the rightmost column. If the TCP Client has set | mode shown in the rightmost column. If the TCP Client has set | |||
itself into Classic ECN feedback mode, it MUST comply with | itself into Classic ECN feedback mode, it MUST comply with | |||
[RFC3168]. | [RFC3168]. | |||
An AccECN implementation has no need to recognize or support the | An AccECN implementation has no need to recognize or support the | |||
Server response labelled 'Nonce' or ECN-nonce feedback more | Server response labelled 'Nonce' or ECN-nonce feedback more | |||
generally [RFC3540], as RFC 3540 has been reclassified as | generally [RFC3540], as RFC 3540 has been reclassified as | |||
Historic [RFC8311]. AccECN is compatible with alternative ECN | Historic [RFC8311]. AccECN is compatible with alternative ECN | |||
feedback integrity approaches to the nonce (see Section 5.3). | feedback integrity approaches to the nonce (see Section 5.3). | |||
The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | The SYN/ACK labelled 'Nonce' with (AE,CWR,ECE) = (1,0,1) is | |||
skipping to change at line 738 ¶ | skipping to change at line 738 ¶ | |||
SYN/ACK follows the procedure for forward compatibility given in | SYN/ACK follows the procedure for forward compatibility given in | |||
Section 3.1.3. | Section 3.1.3. | |||
3. The third block shows the cases where the TCP Server (B) supports | 3. The third block shows the cases where the TCP Server (B) supports | |||
AccECN but the TCP Client (A) supports some earlier variant of | AccECN but the TCP Client (A) supports some earlier variant of | |||
TCP feedback, as indicated in its SYN. | TCP feedback, as indicated in its SYN. | |||
When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
(AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | (AE,CWR,ECE) = (0,1,1), it MUST do one of the following: | |||
* set both its half connections into the Classic ECN feedback | * set both its half-connections into the Classic ECN feedback | |||
mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | mode and return a SYN/ACK with (AE,CWR,ECE) = (0,0,1) as | |||
shown. Then it MUST comply with [RFC3168]. | shown. Then it MUST comply with [RFC3168]. | |||
* set both its half-connections into Not ECN mode and return a | * set both its half-connections into Not ECN mode and return a | |||
SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | SYN/ACK with (AE,CWR,ECE) = (0,0,0), then continue with ECN | |||
disabled. This latter case is unlikely to be desirable, but | disabled. This latter case is unlikely to be desirable, but | |||
it is allowed as a possibility, e.g., for minimal TCP | it is allowed as a possibility, e.g., for minimal TCP | |||
implementations. | implementations. | |||
When an AccECN-enabled TCP Server (B) receives a SYN with | When an AccECN-enabled TCP Server (B) receives a SYN with | |||
(AE,CWR,ECE) = (0,0,0), it MUST set both its half connections | (AE,CWR,ECE) = (0,0,0), it MUST set both its half-connections | |||
into the Not ECN feedback mode, return a SYN/ACK with | into the Not ECN feedback mode, return a SYN/ACK with | |||
(AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | (AE,CWR,ECE) = (0,0,0) as shown, and continue with ECN disabled. | |||
4. The fourth block displays a combination labelled 'Broken'. Some | 4. The fourth block displays a combination labelled 'Broken'. Some | |||
older TCP Server implementations incorrectly set the TCP-ECN | older TCP Server implementations incorrectly set the TCP-ECN | |||
flags in the SYN/ACK by reflecting those in the SYN. Such broken | flags in the SYN/ACK by reflecting those in the SYN. Such broken | |||
TCP Servers (B) cannot support ECN; so as soon as an AccECN- | TCP Servers (B) cannot support ECN; so as soon as an AccECN- | |||
capable TCP Client (A) receives such a broken SYN/ACK, it MUST | capable TCP Client (A) receives such a broken SYN/ACK, it MUST | |||
fall back to Not ECN mode for both its half connections and | fall back to Not ECN mode for both its half-connections and | |||
continue with ECN disabled. | continue with ECN disabled. | |||
The following additional rules do not fit the structure of the table, | The following additional rules do not fit the structure of the table, | |||
but they complement it: | but they complement it: | |||
Simultaneous Open: An originating AccECN Host (A), having sent a SYN | Simultaneous Open: An originating AccECN Host (A), having sent a SYN | |||
with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | with (AE,CWR,ECE) = (1,1,1), might receive another SYN from host | |||
B. Host A MUST then enter the same feedback mode as it would have | B. Host A MUST then enter the same feedback mode as it would have | |||
entered had it been a responding host and received the same SYN. | entered had it been a responding host and received the same SYN. | |||
Then host A MUST send the same SYN/ACK as it would have sent had | Then host A MUST send the same SYN/ACK as it would have sent had | |||
skipping to change at line 793 ¶ | skipping to change at line 793 ¶ | |||
such a combination, the Server MUST negotiate the use of AccECN as if | such a combination, the Server MUST negotiate the use of AccECN as if | |||
the three flags had been set to (1,1,1). However, an AccECN Client | the three flags had been set to (1,1,1). However, an AccECN Client | |||
implementation MUST NOT send a SYN with any combination other than | implementation MUST NOT send a SYN with any combination other than | |||
the three listed. | the three listed. | |||
If a TCP Client sent a SYN requesting AccECN feedback with | If a TCP Client sent a SYN requesting AccECN feedback with | |||
(AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | (AE,CWR,ECE) = (1,1,1) and then receives a SYN/ACK with the currently | |||
reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | reserved combination (AE,CWR,ECE) = (1,0,1) but it does not have | |||
logic specific to such a combination, the Client MUST enable AccECN | logic specific to such a combination, the Client MUST enable AccECN | |||
mode as if the SYN/ACK confirmed that the Server supported AccECN and | mode as if the SYN/ACK confirmed that the Server supported AccECN and | |||
as if it fed back that the IP-ECN field on the SYN had arrived | as if it fed back that the IP ECN field on the SYN had arrived | |||
unchanged. However, an AccECN Server implementation MUST NOT send a | unchanged. However, an AccECN Server implementation MUST NOT send a | |||
SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | SYN/ACK with this combination (AE,CWR,ECE) = (1,0,1). | |||
| For the avoidance of doubt, the behaviour described in the | | For the avoidance of doubt, the behaviour described in the | |||
| present specification applies whether or not the three | | present specification applies whether or not the three | |||
| remaining reserved TCP header flags are zero. | | remaining reserved TCP header flags are zero. | |||
All of these requirements ensure that future uses of all the Reserved | All of these requirements ensure that future uses of all the Reserved | |||
combinations on a SYN or SYN/ACK can rely on consistent behaviour | combinations on a SYN or SYN/ACK (see Table 2) can rely on consistent | |||
from the installed base of AccECN implementations. See Appendix B.3 | behaviour from the installed base of AccECN implementations. See | |||
for related discussion. | Appendix B.3 for related discussion. | |||
3.1.4. Multiple SYNs or SYN/ACKs | 3.1.4. Multiple SYNs or SYN/ACKs | |||
3.1.4.1. Retransmitted SYNs | 3.1.4.1. Retransmitted SYNs | |||
If the sender of an AccECN SYN (the TCP Client) times out before | If the sender of an AccECN SYN (the TCP Client) times out before | |||
receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | receiving the SYN/ACK, it SHOULD attempt to negotiate the use of | |||
AccECN at least one more time by continuing to set all three TCP ECN | AccECN at least one more time by continuing to set all three TCP ECN | |||
flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | flags (AE,CWR,ECE) = (1,1,1) on the first retransmitted SYN (using | |||
the usual retransmission timeouts). If this first retransmission | the usual retransmission timeouts). If this first retransmission | |||
skipping to change at line 830 ¶ | skipping to change at line 830 ¶ | |||
Retrying once before fall-back adds delay in the case where a | Retrying once before fall-back adds delay in the case where a | |||
middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | middlebox drops an AccECN (or ECN) SYN deliberately. However, recent | |||
measurements [Mandalari18] imply that a drop is less likely to be due | measurements [Mandalari18] imply that a drop is less likely to be due | |||
to middlebox interference than other intermittent causes of loss, | to middlebox interference than other intermittent causes of loss, | |||
e.g., congestion, wireless transmission loss, etc. | e.g., congestion, wireless transmission loss, etc. | |||
Implementers MAY use other fall-back strategies if they are found to | Implementers MAY use other fall-back strategies if they are found to | |||
be more effective (e.g., attempting to negotiate AccECN on the SYN | be more effective (e.g., attempting to negotiate AccECN on the SYN | |||
only once or more than twice (most appropriate during high levels of | only once or more than twice (most appropriate during high levels of | |||
congestion). | congestion)). | |||
Further it might make sense to also remove any other new or | Further it might make sense to also remove any other new or | |||
experimental fields or options on the SYN in case a middlebox might | experimental fields or options on the SYN in case a middlebox might | |||
be blocking them, although the required behaviour will depend on the | be blocking them, although the required behaviour will depend on the | |||
specification of the other option(s) and any attempt to coordinate | specification of the other option(s) and any attempt to coordinate | |||
fall-back between different modules of the stack. For instance, even | fall-back between different modules of the stack. For instance, if | |||
if taking part in an [RFC8311] experiment that allows ECT on a SYN, | taking part in an [RFC8311] experiment that allows ECT on a SYN, it | |||
it would be advisable to try it without. | would be advisable to have a fall-back strategy that tries use of | |||
AccECN without setting ETC on SYN. | ||||
Whichever fall-back strategy is used, the TCP initiator SHOULD cache | Whichever fall-back strategy is used, the TCP initiator SHOULD cache | |||
failed connection attempts. If it does, it SHOULD NOT give up | failed connection attempts. If it does, it SHOULD NOT give up | |||
attempting to negotiate AccECN on the SYN of subsequent connection | attempting to negotiate AccECN on the SYN of subsequent connection | |||
attempts until it is clear that the blockage is persistently and | attempts until it is clear that the blockage is persistently and | |||
specifically due to AccECN. The cache needs to be arranged to expire | specifically due to AccECN. The cache needs to be arranged to expire | |||
so that the initiator will infrequently attempt to check whether the | so that the initiator will infrequently attempt to check whether the | |||
problem has been resolved. | problem has been resolved. | |||
All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
negotiating different types of feedback have been sent within the | negotiating different types of feedback have been sent within the | |||
same connection, including the possibility that they arrive out of | same connection, including the possibility that they arrive out of | |||
order. As examples, the following non-normative bullets call out | order. As examples, the following non-normative bullets call out | |||
those rules from Section 3.1.5 that apply to the above fall-back | those rules from Section 3.1.5 that apply to the above fall-back | |||
strategies: | strategies: | |||
* Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | * Once the TCP Client has sent SYNs with (AE,CWR,ECE) = (1,1,1) and | |||
with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | with (AE,CWR,ECE) = (0,0,0), it might eventually receive a SYN/ACK | |||
from the Server in response to one, the other, or both, and | from the Server in response to one, the other, or both, and | |||
possibly reordered; | possibly reordered. | |||
* Such a TCP Client enters the feedback mode appropriate to the | * Such a TCP Client enters the feedback mode appropriate to the | |||
first SYN/ACK it receives according to Table 2, and it does not | first SYN/ACK it receives according to Table 2, and it does not | |||
switch to a different mode, whatever other SYN/ACKs it might | switch to a different mode, whatever other SYN/ACKs it might | |||
receive or send; | receive or send. | |||
* If a TCP Client has entered AccECN mode but then subsequently | * If a TCP Client has entered AccECN mode but then subsequently | |||
sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | sends a SYN or receives a SYN/ACK with (AE,CWR,ECE) = (0,0,0), it | |||
is still allowed to set ECT on packets for the rest of the | is still allowed to set ECT on packets for the rest of the | |||
connection. Note that this rule is different than that of a | connection. Note that this rule is different than that of a | |||
Server in an equivalent position (Section 3.1.5 explains). | Server in an equivalent position (Section 3.1.5 explains). | |||
* Having entered AccECN mode, in general a TCP Client commits to | * Having entered AccECN mode, in general a TCP Client commits to | |||
respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
see Section 3.2.2.3, Section 3.2.2.4); | see Section 3.2.2.3, Section 3.2.2.4). | |||
* Having entered AccECN mode, a TCP Client commits to using AccECN | * Having entered AccECN mode, a TCP Client commits to using AccECN | |||
to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP ECN field in incoming packets for the rest of | |||
the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
3.1.4.2. Retransmitted SYN/ACKs | 3.1.4.2. Retransmitted SYN/ACKs | |||
A TCP Server might send multiple SYN/ACKs indicating different | A TCP Server might send multiple SYN/ACKs indicating different | |||
feedback modes. For instance, when falling back to sending a SYN/ACK | feedback modes. For instance, when falling back to sending a SYN/ACK | |||
with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | with (AE,CWR,ECE) = (0,0,0) after previous AccECN SYN/ACKs have timed | |||
out (Section 3.2.3.2.2); or to acknowledge different retransmissions | out (Section 3.2.3.2.2); or to acknowledge different retransmissions | |||
of the SYN (Section 3.1.4.1). | of the SYN (Section 3.1.4.1). | |||
skipping to change at line 900 ¶ | skipping to change at line 901 ¶ | |||
All fall-back strategies will need to follow all the normative rules | All fall-back strategies will need to follow all the normative rules | |||
in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | in Section 3.1.5, which concern behaviour when SYNs or SYN/ACKs | |||
negotiating different types of feedback are sent within the same | negotiating different types of feedback are sent within the same | |||
connection, including the possibility that they arrive out of order. | connection, including the possibility that they arrive out of order. | |||
As examples, the following non-normative bullets call out those rules | As examples, the following non-normative bullets call out those rules | |||
from Section 3.1.5 that apply to the above fall-back strategies: | from Section 3.1.5 that apply to the above fall-back strategies: | |||
* An AccECN-capable TCP Server enters the feedback mode appropriate | * An AccECN-capable TCP Server enters the feedback mode appropriate | |||
to the first SYN it receives using Table 2, and it does not switch | to the first SYN it receives using Table 2, and it does not switch | |||
to a different mode, whatever other SYNs it might receive and | to a different mode, whatever other SYNs it might receive and | |||
whatever SYN/ACKs it might send; | whatever SYN/ACKs it might send. | |||
* If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | * If a TCP Server in AccECN mode receives a SYN with (AE,CWR,ECE) = | |||
(0,0,0), it preferably acknowledges it first using an AccECN SYN/ | (0,0,0), it preferably acknowledges it first using an AccECN SYN/ | |||
ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | ACK, but it can retry using a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
* If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | * If a TCP Server in AccECN mode sends multiple AccECN SYN/ACKs, it | |||
uses the TCP-ECN flags in each SYN/ACK to feed back the IP-ECN | uses the TCP-ECN flags in each SYN/ACK to feed back the IP ECN | |||
field on the latest SYN to have arrived; | field on the latest SYN to have arrived. | |||
* If a TCP Server enters AccECN mode and then subsequently sends a | * If a TCP Server enters AccECN mode and then subsequently sends a | |||
SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | SYN/ACK or receives a SYN with (AE,CWR,ECE) = (0,0,0), it is | |||
prohibited from setting ECT on any packet for the rest of the | prohibited from setting ECT on any packet for the rest of the | |||
connection; | connection. | |||
* Having entered AccECN mode, in general a TCP Server commits to | * Having entered AccECN mode, in general a TCP Server commits to | |||
respond to any incoming congestion feedback, whether or not it | respond to any incoming congestion feedback, whether or not it | |||
sets ECT on outgoing packets (for rationale and some exceptions | sets ECT on outgoing packets (for rationale and some exceptions | |||
see Sections 3.2.2.3, 3.2.2.4); | see Sections 3.2.2.3, 3.2.2.4). | |||
* Having entered AccECN mode, a TCP Server commits to using AccECN | * Having entered AccECN mode, a TCP Server commits to using AccECN | |||
to feed back the IP-ECN field in incoming packets for the rest of | to feed back the IP ECN field in incoming packets for the rest of | |||
the connection, as specified in Section 3.2, even if it is not | the connection, as specified in Section 3.2, even if it is not | |||
itself setting ECT on outgoing packets. | itself setting ECT on outgoing packets. | |||
3.1.5. Implications of AccECN Mode | 3.1.5. Implications of AccECN Mode | |||
Section 3.1.1 describes the only ways that a host can enter AccECN | Section 3.1.1 describes the only ways that a host can enter AccECN | |||
mode, whether as a Client or as a Server. | mode, whether as a Client or as a Server. | |||
An implementation that supports AccECN has the rights and obligations | An implementation that supports AccECN has the rights and obligations | |||
concerning the use of ECN defined below, which update those in | concerning the use of ECN defined below, which update those in | |||
Section 6.1.1 of [RFC3168]. This section uses the following | Section 6.1.1 of [RFC3168]. This section uses the following | |||
definitions: | definitions: | |||
'During the handshake': The connection states prior to | 'During the handshake': The connection states prior to | |||
synchronization; | synchronization. | |||
'Valid SYN': A SYN that has the same port numbers and the same ISN | 'Valid SYN': A SYN that has the same port numbers and the same ISN | |||
as the SYN that first caused the Server to open the connection. | as the SYN that first caused the Server to open the connection. | |||
An 'Acceptable' packet is defined in Section 1.3. | An 'Acceptable' packet is defined in Section 1.3. | |||
Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | Handling SYNs or SYN/ACKs of multiple types (e.g., fall-back): | |||
* Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
- MUST NOT switch into a different feedback mode than the one it | - MUST NOT switch into a different feedback mode than the one it | |||
first entered according to Table 2, no matter whether it | first entered according to Table 2, no matter whether it | |||
subsequently receives valid SYNs or Acceptable SYN/ACKs of | subsequently receives valid SYNs or Acceptable SYN/ACKs of | |||
different types. | different types. | |||
- SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | - SHOULD ignore the TCP-ECN flags in SYNs or SYN/ACKs that are | |||
received after the implementation reaches the Established | received after the implementation reaches the ESTABLISHED | |||
state, in line with the general TCP approach [RFC9293]; | state, in line with the general TCP approach [RFC9293]. | |||
Reason: Reaching established state implies that at least one | Reason: Reaching ESTABLISHED state implies that at least one | |||
SYN and one SYN/ACK have successfully been delivered. And all | SYN and one SYN/ACK have successfully been delivered. And all | |||
the rules for handshake fall-back are designed to work based on | the rules for handshake fall-back are designed to work based on | |||
those packets that successfully traverse the path, whatever | those packets that successfully traverse the path, whatever | |||
other handshake packets are lost or delayed. | other handshake packets are lost or delayed. | |||
- MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN [RFC3168] with | |||
(AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | (AE,CWR,ECE) = (0,1,1) and a SYN with (AE,CWR,ECE) = (1,1,1) | |||
requesting AccECN feedback within the same connection; | requesting AccECN feedback within the same connection; | |||
- MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | - MUST NOT send a 'Classic' ECN-setup SYN/ACK [RFC3168] with | |||
skipping to change at line 986 ¶ | skipping to change at line 987 ¶ | |||
handshake; | handshake; | |||
The last four rules are necessary because, if one peer were to | The last four rules are necessary because, if one peer were to | |||
negotiate the feedback mode in two different types of handshake, | negotiate the feedback mode in two different types of handshake, | |||
it would not be possible for the other peer to know for certain | it would not be possible for the other peer to know for certain | |||
which handshake packet(s) the other end had eventually received or | which handshake packet(s) the other end had eventually received or | |||
in which order it received them. So, in the absence of these | in which order it received them. So, in the absence of these | |||
rules, the two peers could end up using different ECN feedback | rules, the two peers could end up using different ECN feedback | |||
modes without knowing it. | modes without knowing it. | |||
* A host in AccECN mode that is feeding back the IP-ECN field on a | * A host in AccECN mode that is feeding back the IP ECN field on a | |||
SYN or SYN/ACK: | SYN or SYN/ACK: | |||
- MUST feed back the IP-ECN field on the latest valid SYN or | - MUST feed back the IP ECN field on the latest valid SYN or | |||
acceptable SYN/ACK to arrive. | acceptable SYN/ACK to arrive. | |||
* A TCP Server already in AccECN mode: | * A TCP Server already in AccECN mode: | |||
- SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - SHOULD acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
(0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | (0,0,0) by emitting an AccECN SYN/ACK (with the appropriate | |||
combination of TCP-ECN flags to feed back the IP-ECN field of | combination of TCP-ECN flags to feed back the IP ECN field of | |||
this latest SYN); | this latest SYN). | |||
- MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | - MAY acknowledge a valid SYN arriving with (AE,CWR,ECE) = | |||
(0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0); | (0,0,0) by sending a SYN/ACK with (AE,CWR,ECE) = (0,0,0). | |||
Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | Rationale: When a SYN arrives with (AE,CWR,ECE) = (0,0,0) at a TCP | |||
Server that is already in AccECN mode, it implies that the TCP | Server that is already in AccECN mode, it implies that the TCP | |||
Client had probably not received the previous AccECN SYN/ACK | Client had probably not received the previous AccECN SYN/ACK | |||
emitted by the TCP Server. Therefore, the first bullet recommends | emitted by the TCP Server. Therefore, the first bullet recommends | |||
attempting at least one more AccECN SYN/ACK. Nonetheless, the | attempting at least one more AccECN SYN/ACK. Nonetheless, the | |||
second bullet recognizes that the Server might eventually need to | second bullet recognizes that the Server might eventually need to | |||
fall back to a non-ECN SYN/ACK. In either case, the TCP Server | fall back to a non-ECN SYN/ACK. In either case, the TCP Server | |||
remains in AccECN feedback mode (according to the earlier | remains in AccECN feedback mode (according to the earlier | |||
requirement not to switch modes). | requirement not to switch modes). | |||
* An AccECN-capable TCP Server already in Not ECN mode: | * An AccECN-capable TCP Server already in Not ECN mode: | |||
- SHOULD respond to any subsequent valid SYN using a SYN/ACK with | - SHOULD respond to any subsequent valid SYN using a SYN/ACK with | |||
(AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | (AE,CWR,ECE) = (0,0,0), even if the SYN is offering to | |||
negotiate Classic ECN or AccECN feedback mode; | negotiate Classic ECN or AccECN feedback mode. | |||
Rationale: There would be no point in the Server offering any | Rationale: There would be no point in the Server offering any | |||
type of ECN feedback, because the Client will not be using ECN. | type of ECN feedback, because the Client will not be using ECN. | |||
However, there is no interoperability reason to make this rule | However, there is no interoperability reason to make this rule | |||
mandatory. | mandatory. | |||
If for any reason a host is not willing to provide ECN feedback on a | If for any reason a host is not willing to provide ECN feedback on a | |||
particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | particular TCP connection, it SHOULD clear the AE, CWR, and ECE flags | |||
in all SYN and/or SYN/ACK packets that it sends. | in all SYN and/or SYN/ACK packets that it sends. | |||
Sending ECT: | Sending ECT: | |||
* Any implementation that supports AccECN: | * Any implementation that supports AccECN: | |||
- MUST NOT set ECT if it is in Not ECN feedback mode. | - MUST NOT set ECT if it is in Not ECN feedback mode. | |||
A Data Sender in AccECN mode: | A Data Sender in AccECN mode: | |||
- SHOULD set an ECT codepoint in the IP header of packets to | - SHOULD set an ECT codepoint in the IP header of packets to | |||
indicate to the network that the transport is capable and | indicate to the network that the transport is capable and | |||
willing to participate in ECN for this packet; | willing to participate in ECN for this packet. | |||
- MAY not set ECT on any packet (for instance if it has reason to | - MAY not set ECT on any packet (for instance if it has reason to | |||
believe such a packet would be blocked); | believe such a packet would be blocked). | |||
A TCP Server in AccECN mode: | A TCP Server in AccECN mode: | |||
- MUST NOT set ECT on any packet for the rest of the connection, | - MUST NOT set ECT on any packet for the rest of the connection, | |||
if it has received or sent at least one valid SYN or Acceptable | if it has received or sent at least one valid SYN or Acceptable | |||
SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | SYN/ACK with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
This rule solely applies to a Server because, when a Server | This rule solely applies to a Server because, when a Server | |||
enters AccECN mode, it doesn't know for sure whether the Client | enters AccECN mode, it doesn't know for sure whether the Client | |||
will end up in AccECN mode. But when a Client enters AccECN | will end up in AccECN mode. But when a Client enters AccECN | |||
mode, it can be certain that the Server is already in AccECN | mode, it can be certain that the Server is already in AccECN | |||
feedback mode. | feedback mode. | |||
Congestion response: | Congestion response: | |||
* A host in AccECN mode: | * A host in AccECN mode: | |||
- is obliged to respond appropriately to AccECN feedback that | - is obliged to respond appropriately to AccECN feedback that | |||
indicates there were ECN marks on packets it had previously | indicates there were ECN marks on packets it had previously | |||
sent, where 'appropriately' is defined in Section 6.1 of | sent, where 'appropriately' is defined in Section 6.1 of | |||
[RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]; | [RFC3168] and updated by Sections 2.1 and 4.1 of [RFC8311]. | |||
- is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
feedback, even when it is solely sending non-ECN-capable | feedback, even when it is solely sending non-ECN-capable | |||
packets (for rationale, some examples and some exceptions see | packets (for rationale, some examples and some exceptions see | |||
Sections 3.2.2.3 and 3.2.2.4). | Sections 3.2.2.3 and 3.2.2.4). | |||
- is still obliged to respond appropriately to congestion | - is still obliged to respond appropriately to congestion | |||
feedback, even if it has sent or received a SYN or SYN/ACK | feedback, even if it has sent or received a SYN or SYN/ACK | |||
packet with (AE,CWR,ECE) = (0,0,0) during the handshake; | packet with (AE,CWR,ECE) = (0,0,0) during the handshake. | |||
- MUST NOT set CWR to indicate that it has received and responded | - MUST NOT set CWR to indicate that it has received and responded | |||
to indications of congestion. | to indications of congestion. | |||
For the avoidance of doubt, this is unlike an RFC 3168 data | For the avoidance of doubt, this is unlike an RFC 3168 data | |||
sender and this does not preclude the Data Sender from setting | sender and this does not preclude the Data Sender from setting | |||
the bits of the ACE counter field, which includes an overloaded | the bits of the ACE counter field, which includes an overloaded | |||
use of the same bit. | use of the same bit. | |||
Receiving ECT: | Receiving ECT: | |||
* A host in AccECN mode: | * A host in AccECN mode: | |||
- MUST feed back the information in the IP-ECN field of incoming | - MUST feed back the information in the IP ECN field of incoming | |||
packets using Accurate ECN feedback, as specified in | packets using Accurate ECN feedback, as specified in | |||
Section 3.2. | Section 3.2. | |||
For the avoidance of doubt, this requirement stands even if the | For the avoidance of doubt, this requirement stands even if the | |||
AccECN host has also sent or received a SYN or SYN/ACK with | AccECN host has also sent or received a SYN or SYN/ACK with | |||
(AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | (AE,CWR,ECE) = (0,0,0). Reason: Such a SYN or SYN/ACK implies | |||
some form of packet mangling might be present. Even if the | some form of packet mangling might be present. Even if the | |||
remote peer is not setting ECT, it could still be set | remote peer is not setting ECT, it could still be set | |||
erroneously by packet mangling at the IP layer (see | erroneously by packet mangling at the IP layer (see | |||
Section 3.2.2.3). In such cases, the Data Sender is best | Section 3.2.2.3). In such cases, the Data Sender is best | |||
placed to decide whether ECN markings are valid, but it can | placed to decide whether ECN markings are valid, but it can | |||
only do that if the Data Receiver mechanistically feeds back | only do that if the Data Receiver mechanistically feeds back | |||
any ECN markings. This approach will not lead to TCP Options | any ECN markings. This approach will not lead to TCP Options | |||
being generated unnecessarily if the recommended simple scheme | being generated unnecessarily if the recommended simple scheme | |||
in Section 3.2.3.3 is used, because no byte counters will | in Section 3.2.3.3 is used, because no byte counters will | |||
change if no packets are set to ECT. | change if no packets are set to ECT. | |||
- MUST NOT use reception of packets with ECT set in the IP-ECN | - MUST NOT use reception of packets with ECT set in the IP ECN | |||
field as an implicit signal that the peer is ECN-capable. | field as an implicit signal that the peer is ECN-capable. | |||
Reason: ECT at the IP layer does not explicitly confirm the | Reason: ECT at the IP layer does not explicitly confirm the | |||
peer has the correct ECN feedback logic, because the packets | peer has the correct ECN feedback logic, because the packets | |||
could have been mangled at the IP layer. | could have been mangled at the IP layer. | |||
3.2. AccECN Feedback | 3.2. AccECN Feedback | |||
Each Data Receiver of each half connection maintains four counters, | Each Data Receiver of each half-connection maintains four counters, | |||
r.cep, r.ceb, r.e0b, and r.e1b: | r.cep, r.ceb, r.e0b, and r.e1b: | |||
* The Data Receiver MUST increment the CE packet counter (r.cep), | * The Data Receiver MUST increment the CE packet counter (r.cep), | |||
for every Acceptable packet that it receives with the CE code | for every Acceptable packet that it receives with the CE code | |||
point in the IP-ECN field, including CE-marked control packets and | point in the IP ECN field, including CE-marked control packets and | |||
retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | retransmissions but excluding CE on SYN packets (SYN=1; ACK=0). | |||
* A Data Receiver that supports sending of AccECN TCP Options MUST | * A Data Receiver that supports sending of AccECN TCP Options MUST | |||
increment the r.ceb, r.e0b, or r.e1b byte counters by the number | increment the r.ceb, r.e0b, or r.e1b byte counters by the number | |||
of TCP payload octets in Acceptable packets marked with the CE, | of TCP payload octets in Acceptable packets marked with the CE, | |||
ECT(0), and ECT(1) codepoint in their IP-ECN field, including any | ECT(0), and ECT(1) codepoint in their IP ECN field, including any | |||
payload octets on control packets and retransmissions, but not | payload octets on control packets and retransmissions, but not | |||
including any payload octets on SYN packets (SYN=1; ACK=0). | including any payload octets on SYN packets (SYN=1; ACK=0). | |||
Each Data Sender of each half connection maintains four counters, | Each Data Sender of each half-connection maintains four counters, | |||
s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | s.cep, s.ceb, s.e0b, and s.e1b, intended to track the equivalent | |||
counters at the Data Receiver. | counters at the Data Receiver. | |||
A Data Receiver feeds back the CE packet counter using the Accurate | A Data Receiver feeds back the CE packet counter using the Accurate | |||
ECN (ACE) field, as explained in Section 3.2.2. And it optionally | ECN (ACE) field, as explained in Section 3.2.2. And it optionally | |||
feeds back all the byte counters using the AccECN TCP Option, as | feeds back all the byte counters using the AccECN TCP Option, as | |||
specified in Section 3.2.3. | specified in Section 3.2.3. | |||
Whenever a Data Receiver feeds back the value of any counter, it MUST | Whenever a Data Receiver feeds back the value of any counter, it MUST | |||
report the most recent value, no matter whether it is in a pure ACK, | report the most recent value, no matter whether it is in a pure ACK, | |||
skipping to change at line 1200 ¶ | skipping to change at line 1201 ¶ | |||
Both parts of each of these conditions are equally important. For | Both parts of each of these conditions are equally important. For | |||
instance, even if AccECN negotiation has been successful, the ACE | instance, even if AccECN negotiation has been successful, the ACE | |||
field is not defined on any segments with SYN=1 (e.g., a | field is not defined on any segments with SYN=1 (e.g., a | |||
retransmission of an unacknowledged SYN/ACK, or when both ends send | retransmission of an unacknowledged SYN/ACK, or when both ends send | |||
SYN/ACKs after AccECN support has been successfully negotiated during | SYN/ACKs after AccECN support has been successfully negotiated during | |||
a simultaneous open). | a simultaneous open). | |||
3.2.2.1. ACE Field on the ACK of the SYN/ACK | 3.2.2.1. ACE Field on the ACK of the SYN/ACK | |||
A TCP Client (A) in AccECN mode MUST feed back which of the 4 | A TCP Client (A) in AccECN mode MUST feed back which of the 4 | |||
possible values of the IP-ECN field was on the SYN/ACK by writing it | possible values of the IP ECN field was on the SYN/ACK by writing it | |||
into the ACE field of a pure ACK with no SACK blocks using the binary | into the ACE field of a pure ACK with no SACK blocks using the binary | |||
encoding in Table 3 (which is the same as that used on the SYN/ACK in | encoding in Table 3 (which is the same as that used on the SYN/ACK in | |||
Table 2). This shall be called the handshake encoding of the ACE | Table 2). This shall be called the "handshake encoding" of the ACE | |||
field, and it is the only exception to the rule that the ACE field | field, and it is the only exception to the rule that the ACE field | |||
carries the 3 least significant bits of the r.cep counter on packets | carries the 3 least significant bits of the r.cep counter on packets | |||
with SYN=0. | with SYN=0. | |||
Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | Normally, a TCP Client acknowledges a SYN/ACK with an ACK that | |||
satisfies the above conditions anyway (SYN=0, no data, no SACK | satisfies the above conditions anyway (SYN=0, no data, no SACK | |||
blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | blocks). If an AccECN TCP Client intends to acknowledge the SYN/ACK | |||
with a packet that does not satisfy these conditions (e.g., it has | with a packet that does not satisfy these conditions (e.g., it has | |||
data to include on the ACK), it SHOULD first send a pure ACK that | data to include on the ACK), it SHOULD first send a pure ACK that | |||
does satisfy these conditions (see Section 5.2), so that it can feed | does satisfy these conditions (see Section 5.2), so that it can feed | |||
back which of the four values of the IP-ECN field arrived on the SYN/ | back which of the four values of the IP ECN field arrived on the SYN/ | |||
ACK. A valid exception to this "SHOULD" would be where the | ACK. A valid exception to this "SHOULD" would be where the | |||
implementation will only be used in an environment where mangling of | implementation will only be used in an environment where mangling of | |||
the ECN field is unlikely. | the ECN field is unlikely. | |||
The TCP Client MUST also use the handshake encoding for the pure ACK | The TCP Client MUST also use the handshake encoding for the pure ACK | |||
of any retransmitted SYN/ACK that confirms that the TCP Server | of any retransmitted SYN/ACK that confirms that the TCP Server | |||
supports AccECN. If the final ACK of the handshake does not arrive | supports AccECN. If the final ACK of the handshake does not arrive | |||
before its retransmission timer expires, the TCP Server is follow the | before its retransmission timer expires, the TCP Server is follow the | |||
procedure given in Section 3.1.4.2. | procedure given in Section 3.1.4.2. | |||
+==================+================+=====================+ | +==================+================+=====================+ | |||
| IP-ECN codepoint | ACE on pure | r.cep of TCP Client | | | IP ECN Codepoint | ACE on Pure | r.cep of TCP Client | | |||
| on SYN/ACK | ACK of SYN/ACK | in AccECN mode | | | on SYN/ACK | ACK of SYN/ACK | in AccECN Mode | | |||
+==================+================+=====================+ | +==================+================+=====================+ | |||
| Not-ECT | 0b010 | 5 | | | Not-ECT | 0b010 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| ECT(1) | 0b011 | 5 | | | ECT(1) | 0b011 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| ECT(0) | 0b100 | 5 | | | ECT(0) | 0b100 | 5 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
| CE | 0b110 | 6 | | | CE | 0b110 | 6 | | |||
+------------------+----------------+---------------------+ | +------------------+----------------+---------------------+ | |||
Table 3: The Encoding of the ACE Field in the ACK of | Table 3: The Encoding of the ACE Field in the ACK of | |||
the SYN-ACK to Reflect the SYN-ACK's IP-ECN Field | the SYN-ACK to Reflect the SYN-ACK's IP ECN Field | |||
When an AccECN Server in SYN-RCVD state receives a pure ACK with | When an AccECN Server in SYN-RCVD state receives a pure ACK with | |||
SYN=0 and no SACK blocks, instead of treating the ACE field as a | SYN=0 and no SACK blocks, it MUST infer the meaning of each possible | |||
counter, it MUST infer the meaning of each possible value of the ACE | value of the ACE field from Table 4 instead of treating the ACE field | |||
field from Table 4, which also shows the value that an AccECN Server | as a counter. As a result, an AccECN Server MUST set s.cep to the | |||
MUST set s.cep to as a result. | respective value, also shown in Table 4. | |||
Given this encoding of the ACE field on the ACK of a SYN/ACK is | Given this encoding of the ACE field on the ACK of a SYN/ACK is | |||
exceptional, an AccECN Server using large receive offload (LRO) might | exceptional, an AccECN Server using large receive offload (LRO) might | |||
prefer to disable LRO until such an ACK has transitioned it out of | prefer to disable LRO until the ACK of the SYN/ACK was sent and it | |||
SYN-RCVD state. | has transitioned out of SYN-RCVD state. | |||
+============+==========================+=====================+ | +============+==========================+=====================+ | |||
| ACE on ACK | IP-ECN codepoint on SYN/ | s.cep of TCP Server | | | ACE on ACK | IP ECN Codepoint on SYN/ | s.cep of TCP Server | | |||
| of SYN/ACK | ACK inferred by Server | in AccECN mode | | | of SYN/ACK | ACK Inferred by Server | in AccECN Mode | | |||
+============+==========================+=====================+ | +============+==========================+=====================+ | |||
| 0b000 | {Notes 1, 3} | Disable s.cep | | | 0b000 | {Notes 1, 3} | Disable s.cep | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b001 | {Notes 2, 3} | 5 | | | 0b001 | {Notes 2, 3} | 5 | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b010 | Not-ECT | 5 | | | 0b010 | Not-ECT | 5 | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b011 | ECT(1) | 5 | | | 0b011 | ECT(1) | 5 | | |||
+------------+--------------------------+---------------------+ | +------------+--------------------------+---------------------+ | |||
| 0b100 | ECT(0) | 5 | | | 0b100 | ECT(0) | 5 | | |||
skipping to change at line 1291 ¶ | skipping to change at line 1292 ¶ | |||
AccECN feedback. Nonetheless, as a Data Receiver, it MUST | AccECN feedback. Nonetheless, as a Data Receiver, it MUST | |||
NOT disable AccECN feedback. | NOT disable AccECN feedback. | |||
Any of the circumstances below could cause a value of zero | Any of the circumstances below could cause a value of zero | |||
but, whatever the cause, the actions above would be the | but, whatever the cause, the actions above would be the | |||
appropriate response: | appropriate response: | |||
* The TCP Client has somehow entered No ECN feedback mode | * The TCP Client has somehow entered No ECN feedback mode | |||
(most likely if the Server received a SYN or sent a SYN/ | (most likely if the Server received a SYN or sent a SYN/ | |||
ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | ACK with (AE,CWR,ECE) = (0,0,0) after entering AccECN | |||
mode, but possible even if it didn't); | mode, but possible even if it didn't). | |||
* The TCP Client genuinely might be in AccECN mode, but its | * The TCP Client genuinely might be in AccECN mode, but its | |||
count of received CE marks might have caused the ACE | count of received CE marks might have caused the ACE | |||
field to wrap to zero. This is highly unlikely, but not | field to wrap to zero. This is highly unlikely, but not | |||
impossible because the Server might have already sent | impossible because the Server might have already sent | |||
multiple packets while still in SYN-RCVD state, e.g., | multiple packets while still in SYN-RCVD state, e.g., | |||
using TFO (see Section 5.2), and some might have been CE- | using TFO (see Section 5.2), and some might have been CE- | |||
marked. Then ACE on the first ACK seen by the Server | marked. Then ACE on the first ACK seen by the Server | |||
might be zero, due to previous ACKs experiencing an | might be zero, due to previous ACKs experiencing an | |||
unfortunate pattern of loss or delay. | unfortunate pattern of loss or delay. | |||
skipping to change at line 1354 ¶ | skipping to change at line 1355 ¶ | |||
* It then follows the safety procedures in Section 3.2.2.5.2 to | * It then follows the safety procedures in Section 3.2.2.5.2 to | |||
calculate or estimate how many packets the ACK could have | calculate or estimate how many packets the ACK could have | |||
acknowledged under the prevailing conditions to determine whether | acknowledged under the prevailing conditions to determine whether | |||
the ACE field might have wrapped more than once. | the ACE field might have wrapped more than once. | |||
The encode/decode procedures during the three-way handshake are | The encode/decode procedures during the three-way handshake are | |||
exceptions to the general rules given so far, so they are spelled out | exceptions to the general rules given so far, so they are spelled out | |||
step by step below for clarity: | step by step below for clarity: | |||
* If a TCP Server in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Server in AccECN mode receives a CE mark in the IP ECN | |||
field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | field of a SYN (SYN=1, ACK=0), it MUST NOT increment r.cep (it | |||
remains at its initial value of 5). | remains at its initial value of 5). | |||
Reason: It would be redundant for the Server to include CE-marked | Reason: It would be redundant for the Server to include CE-marked | |||
SYNs in its r.cep counter, because it already reliably delivers | SYNs in its r.cep counter, because it already reliably delivers | |||
feedback of any CE marking using the encoding in the top block of | feedback of any CE marking using the encoding in the top block of | |||
Table 2 in the SYN/ACK. This also ensures that, when the Server | Table 2 in the SYN/ACK. This also ensures that, when the Server | |||
starts using the ACE field, it has not unnecessarily consumed more | starts using the ACE field, it has not unnecessarily consumed more | |||
than one initial value, given they can be used to negotiate | than one initial value, given they can be used to negotiate | |||
variants of the AccECN protocol (see Appendix B.3). | variants of the AccECN protocol (see Appendix B.3). | |||
* If a TCP Client in AccECN mode receives CE feedback in the TCP | * If a TCP Client in AccECN mode receives CE feedback in the TCP | |||
flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | flags of a SYN/ACK, it MUST NOT increment s.cep (it remains at its | |||
initial value of 5) so that it stays in step with r.cep on the | initial value of 5) so that it stays in step with r.cep on the | |||
Server. Nonetheless, the TCP Client still triggers the congestion | Server. Nonetheless, the TCP Client still triggers the congestion | |||
control actions necessary to respond to the CE feedback. | control actions necessary to respond to the CE feedback. | |||
* If a TCP Client in AccECN mode receives a CE mark in the IP-ECN | * If a TCP Client in AccECN mode receives a CE mark in the IP ECN | |||
field of a SYN/ACK, it MUST increment r.cep, but no more than once | field of a SYN/ACK, it MUST increment r.cep, but no more than once | |||
no matter how many CE-marked SYN/ACKs it receives (i.e., | no matter how many CE-marked SYN/ACKs it receives (i.e., | |||
incremented from 5 to 6, but no further). | incremented from 5 to 6, but no further). | |||
Reason: Incrementing r.cep ensures the Client will eventually | Reason: Incrementing r.cep ensures the Client will eventually | |||
deliver any CE marking to the Server reliably when it starts using | deliver any CE marking to the Server reliably when it starts using | |||
the ACE field. Even though the Client also feeds back any CE | the ACE field. Even though the Client also feeds back any CE | |||
marking on the ACK of the SYN/ACK using the encoding in Table 3, | marking on the ACK of the SYN/ACK using the encoding in Table 3, | |||
this ACK is not delivered reliably, so it can be considered as a | this ACK is not delivered reliably, so it can be considered as a | |||
timely notification that is redundant but unreliable. The Client | timely notification that is redundant but unreliable. The Client | |||
skipping to change at line 1417 ¶ | skipping to change at line 1418 ¶ | |||
ACK of the SYN/ACK) that is delayed for longer than the Server's | ACK of the SYN/ACK) that is delayed for longer than the Server's | |||
retransmission timeout; or packet duplication by the network. And | retransmission timeout; or packet duplication by the network. And | |||
the impact of any error in the feedback on such ACKs will only be | the impact of any error in the feedback on such ACKs will only be | |||
temporary. | temporary. | |||
3.2.2.3. Testing for Mangling of the IP/ECN Field | 3.2.2.3. Testing for Mangling of the IP/ECN Field | |||
* TCP Client side: | * TCP Client side: | |||
The value of the TCP-ECN flags on the SYN/ACK indicates the value | The value of the TCP-ECN flags on the SYN/ACK indicates the value | |||
of the IP-ECN field when the SYN arrived at the Server. The TCP | of the IP ECN field when the SYN arrived at the Server. The TCP | |||
Client can compare this with how it originally set the IP-ECN | Client can compare this with how it originally set the IP ECN | |||
field on the SYN. If this comparison implies an invalid | field on the SYN. If this comparison implies an invalid | |||
transition (defined below) of the IP-ECN field, for the remainder | transition (defined below) of the IP ECN field, for the remainder | |||
of the half-connection the Client is advised to send non-ECN- | of the half-connection the Client is advised to send non-ECN- | |||
capable packets, but it still ought to respond to any feedback of | capable packets, but it still ought to respond to any feedback of | |||
CE markings (explained below). However, the TCP Client MUST | CE markings (explained below). However, the TCP Client MUST | |||
remain in the AccECN feedback mode and it MUST continue to feed | remain in the AccECN feedback mode and it MUST continue to feed | |||
back any ECN markings on arriving packets (in its role as Data | back any ECN markings on arriving packets (in its role as Data | |||
Receiver). | Receiver). | |||
* TCP Server side: | * TCP Server side: | |||
The value of the ACE field on the last ACK of the three-way | The value of the ACE field on the last ACK of the three-way | |||
handshake indicates the value of the IP-ECN field when the SYN/ACK | handshake indicates the value of the IP ECN field when the SYN/ACK | |||
arrived at the TCP Client. The Server can compare this with how | arrived at the TCP Client. The Server can compare this with how | |||
it originally set the IP-ECN field on the SYN/ACK. If this | it originally set the IP ECN field on the SYN/ACK. If this | |||
comparison implies an invalid transition of the IP-ECN field, for | comparison implies an invalid transition of the IP ECN field, for | |||
the remainder of the half-connection the Server is advised to send | the remainder of the half-connection the Server is advised to send | |||
non-ECN-capable packets, but it still ought to respond to any | non-ECN-capable packets, but it still ought to respond to any | |||
feedback of CE markings (explained below). However, the Server | feedback of CE markings (explained below). However, the Server | |||
MUST remain in the AccECN feedback mode and it MUST continue to | MUST remain in the AccECN feedback mode and it MUST continue to | |||
feed back any ECN markings on arriving packets (in its role as | feed back any ECN markings on arriving packets (in its role as | |||
Data Receiver). | Data Receiver). | |||
If a Data Sender in AccECN mode starts sending non-ECN-capable | If a Data Sender in AccECN mode starts sending non-ECN-capable | |||
packets because it has detected mangling, it is still advised to | packets because it has detected mangling, it is still advised to | |||
respond to CE feedback. Reason: Any CE marking arriving at the Data | respond to CE feedback. Reason: Any CE marking arriving at the Data | |||
Receiver could be due to something early in the path mangling the | Receiver could be due to something early in the path mangling the | |||
non-ECN-capable IP-ECN field into an ECN-capable codepoint and then, | non-ECN-capable IP ECN field into an ECN-capable codepoint and then, | |||
later in the path, a network bottleneck might be applying CE markings | later in the path, a network bottleneck might be applying CE markings | |||
to indicate genuine congestion. This argument applies whether the | to indicate genuine congestion. This argument applies whether the | |||
handshake packet originally sent by the TCP Client or Server was non- | handshake packet originally sent by the TCP Client or Server was non- | |||
ECN-capable or ECN-capable because, in either case, an unsafe | ECN-capable or ECN-capable because, in either case, an unsafe | |||
transition could imply that non-ECN-capable packets later in the | transition could imply that non-ECN-capable packets later in the | |||
connection might get mangled. | connection might get mangled. | |||
Once a Data Sender has entered AccECN mode it is advised to check | Once a Data Sender has entered AccECN mode it is advised to check | |||
whether it is receiving continuous feedback of CE. Specifying | whether it is receiving continuous feedback of CE. Specifying | |||
exactly how to do this is beyond the scope of the present | exactly how to do this is beyond the scope of the present | |||
skipping to change at line 1483 ¶ | skipping to change at line 1484 ¶ | |||
As always, once a host has entered AccECN mode, it follows the | As always, once a host has entered AccECN mode, it follows the | |||
general mandatory requirements (Section 3.1.5) to remain in the same | general mandatory requirements (Section 3.1.5) to remain in the same | |||
feedback mode and to continue feeding back any ECN markings on | feedback mode and to continue feeding back any ECN markings on | |||
arriving packets using AccECN feedback. This follows the general | arriving packets using AccECN feedback. This follows the general | |||
approach where an AccECN Data Receiver mechanistically reflects | approach where an AccECN Data Receiver mechanistically reflects | |||
whatever it receives (Section 2.5). | whatever it receives (Section 2.5). | |||
The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | The ACK of the SYN/ACK is not reliably delivered (nonetheless, the | |||
count of CE marks is still eventually delivered reliably). If this | count of CE marks is still eventually delivered reliably). If this | |||
ACK does not arrive, the Server is advised to continue to send ECN- | ACK does not arrive, the Server is advised to continue to send ECN- | |||
capable packets without having tested for mangling of the IP-ECN | capable packets without having tested for mangling of the IP ECN | |||
field on the SYN/ACK. | field on the SYN/ACK. | |||
All the fall-back behaviours in this section are necessary in case | All the fall-back behaviours in this section are necessary in case | |||
mangling of the IP-ECN field is asymmetric, which is currently common | mangling of the IP ECN field is asymmetric, which is currently common | |||
over some mobile networks [Mandalari18]. In this case, one end might | over some mobile networks [Mandalari18]. In this case, one end might | |||
see no unsafe transition and continue sending ECN-capable packets, | see no unsafe transition and continue sending ECN-capable packets, | |||
while the other end sees an unsafe transition and stops sending ECN- | while the other end sees an unsafe transition and stops sending ECN- | |||
capable packets. | capable packets. | |||
Invalid transitions of the IP-ECN field are defined in Section 18 of | Invalid transitions of the IP ECN field are defined in Section 18 of | |||
the Classic ECN specification [RFC3168] and repeated here for | the Classic ECN specification [RFC3168] and repeated here for | |||
convenience: | convenience: | |||
* the Not-ECT codepoint changes; | * the Not-ECT codepoint changes. | |||
* either ECT codepoint transitions to Not-ECT; | * either ECT codepoint transitions to Not-ECT. | |||
* the CE codepoint changes. | * the CE codepoint changes. | |||
RFC 3168 says that a router that changes ECT to Not-ECT is invalid | RFC 3168 says that a router that changes ECT to Not-ECT is invalid | |||
but safe. However, from a host's viewpoint, this transition is | but safe. However, from a host's viewpoint, this transition is | |||
unsafe because it could be the result of two transitions at different | unsafe because it could be the result of two transitions at different | |||
routers on the path: ECT to CE (safe) then CE to Not-ECT (unsafe). | routers on the path: ECT to CE (safe) then CE to Not-ECT (unsafe). | |||
This scenario could well happen where an ECN-enabled home router | This scenario could well happen where an ECN-enabled home router | |||
congests its upstream mobile broadband bottleneck link, then the | congests its upstream mobile broadband bottleneck link, then the | |||
ingress to the mobile network clears the ECN field [Mandalari18]. | ingress to the mobile network clears the ECN field [Mandalari18]. | |||
skipping to change at line 1531 ¶ | skipping to change at line 1532 ¶ | |||
If AccECN has been successfully negotiated, the Data Sender MAY check | If AccECN has been successfully negotiated, the Data Sender MAY check | |||
the value of the ACE counter in the first feedback packet (with or | the value of the ACE counter in the first feedback packet (with or | |||
without data) that arrives after the three-way handshake. If the | without data) that arrives after the three-way handshake. If the | |||
value of this ACE field is found to be zero (0b000), for the | value of this ACE field is found to be zero (0b000), for the | |||
remainder of the half-connection the Data Sender ought to send non- | remainder of the half-connection the Data Sender ought to send non- | |||
ECN-capable packets and it is advised not to respond to any feedback | ECN-capable packets and it is advised not to respond to any feedback | |||
of CE markings. | of CE markings. | |||
Reason: the symptoms imply any or all of the following: | Reason: the symptoms imply any or all of the following: | |||
* the remote peer has somehow entered Not ECN feedback mode; | * the remote peer has somehow entered Not ECN feedback mode. | |||
* a broken remote TCP implementation; | * a broken remote TCP implementation. | |||
* potential mangling of the ECN fields in the TCP headers (although | * potential mangling of the ECN fields in the TCP headers (although | |||
unlikely given they clearly survived during the handshake). | unlikely given they clearly survived during the handshake). | |||
This advice is not stated normatively (in capitals), because the best | This advice is not stated normatively (in capitals), because the best | |||
strategy might depend on experience of the most likely scenarios, | strategy might depend on the likelihood to experience these | |||
which can only be known at the time of deployment. | scenarios, which can only be known at the time of deployment. | |||
Note that a host in AccECN mode MUST continue to provide Accurate ECN | Note that a host in AccECN mode MUST continue to provide Accurate ECN | |||
feedback to its peer, even if it is no longer sending ECT itself over | feedback to its peer, even if it is no longer sending ECT itself over | |||
the other half connection. | the other half-connection. | |||
If reordering occurs, the first feedback packet that arrives will not | If reordering occurs, the first feedback packet that arrives will not | |||
necessarily be the same as the first packet in sequence order. The | necessarily be the same as the first packet in sequence order. The | |||
test has been specified loosely like this to simplify implementation, | test has been specified loosely like this to simplify implementation, | |||
and because it would not have been any more precise to have specified | and because it would not have been any more precise to have specified | |||
the first packet in sequence order, which would not necessarily be | the first packet in sequence order, which would not necessarily be | |||
the first ACE counter that the Data Receiver fed back anyway, given | the first ACE counter that the Data Receiver fed back anyway, given | |||
it might have been a retransmission. | it might have been a retransmission. | |||
The possibility of reordering means that there is a small chance that | The possibility of reordering means that there is a small chance that | |||
the ACE field on the first packet to arrive is genuinely zero | the ACE field on the first packet to arrive is genuinely zero | |||
(without middlebox interference). This would cause a host to | (without middlebox interference). This would cause a host to | |||
unnecessarily disable ECN for a half connection. Therefore, in | unnecessarily disable ECN for a half-connection. Therefore, in | |||
environments where there is no evidence of the ACE field being | environments where there is no evidence of the ACE field being | |||
zeroed, implementations MAY skip this test. | zeroed, implementations MAY skip this test. | |||
Note that the Data Sender MUST NOT test whether the arriving counter | Note that the Data Sender MUST NOT test whether the arriving counter | |||
in the initial ACE field has been initialized to a specific valid | in the initial ACE field has been initialized to a specific valid | |||
value -- the above check solely tests whether the ACE fields have | value -- the above check solely tests whether the ACE fields have | |||
been incorrectly zeroed. This allows hosts to use different initial | been incorrectly zeroed. This allows hosts to use different initial | |||
values as an additional signalling channel in the future. | values as an additional signalling channel in the future. | |||
3.2.2.5. Safety Against Ambiguity of the ACE Field | 3.2.2.5. Safety Against Ambiguity of the ACE Field | |||
skipping to change at line 1585 ¶ | skipping to change at line 1586 ¶ | |||
The following rules define when the receiver of a packet in AccECN | The following rules define when the receiver of a packet in AccECN | |||
mode emits an ACK: | mode emits an ACK: | |||
Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | Change-Triggered ACKs: An AccECN Data Receiver SHOULD emit an ACK | |||
whenever a data packet marked CE arrives after the previous packet | whenever a data packet marked CE arrives after the previous packet | |||
was not CE. | was not CE. | |||
Even though this rule is stated as a "SHOULD", it is important for | Even though this rule is stated as a "SHOULD", it is important for | |||
a transition to trigger an ACK if at all possible. The only valid | a transition to trigger an ACK if at all possible. The only valid | |||
exception to this rule is given below these bullets. | exception to this rule is due to large receive offload (LRO) or | |||
generic receive offload (GRO) as further described below. | ||||
For the avoidance of doubt, this rule is deliberately worded to | For the avoidance of doubt, this rule is deliberately worded to | |||
apply solely when _data_ packets arrive, but the comparison with | apply solely when _data_ packets arrive, but the comparison with | |||
the previous packet includes any packet, not just data packets. | the previous packet includes any packet, not just data packets. | |||
Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | Increment-Triggered ACKs: An AccECN receiver of a packet MUST emit | |||
an ACK if 'n' CE marks have arrived since the previous ACK. If | an ACK if 'n' CE marks have arrived since the previous ACK. If | |||
there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | there is unacknowledged data at the receiver, 'n' SHOULD be 2. If | |||
there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | there is no unacknowledged data at the receiver, 'n' SHOULD be 3 | |||
and MUST be no less than 3. In either case, 'n' MUST be no | and MUST be no less than 3. In either case, 'n' MUST be no | |||
skipping to change at line 1723 ¶ | skipping to change at line 1725 ¶ | |||
Figure 4 shows two option field orders; order 0 and order 1. They | Figure 4 shows two option field orders; order 0 and order 1. They | |||
both consist of three 24-bit fields. Order 0 provides the 24 least | both consist of three 24-bit fields. Order 0 provides the 24 least | |||
significant bits of the r.e0b, r.ceb, and r.e1b counters, | significant bits of the r.e0b, r.ceb, and r.e1b counters, | |||
respectively. Order 1 provides the same fields, but in the opposite | respectively. Order 1 provides the same fields, but in the opposite | |||
order. On each packet, the Data Receiver can use whichever order is | order. On each packet, the Data Receiver can use whichever order is | |||
more efficient. In either case, the bytes within the fields are in | more efficient. In either case, the bytes within the fields are in | |||
network byte order (big-endian). | network byte order (big-endian). | |||
The choice to use three bytes (24 bits) fields in the options was | The choice to use three bytes (24 bits) fields in the options was | |||
made to strike a balance between TCP option space usage, and the | made to strike a balance between TCP Option space usage, and the | |||
required fidelity of the counters to accommodate typical scenarios | required fidelity of the counters to accommodate typical scenarios | |||
such as hardware TCP Segmentation Offloading (TSO), and periods | such as hardware TCP Segmentation Offloading (TSO), and periods | |||
during which no option may be transmitted (e.g., SACK loss recovery). | during which no option may be transmitted (e.g., SACK loss recovery). | |||
Providing only 2 bytes (16 bits) for these counters could easily roll | Providing only 2 bytes (16 bits) for these counters could easily roll | |||
over within a single TSO transmission or large/generic receive | over within a single TSO transmission or large/generic receive | |||
offload (LRO/GRO) event. Having two distinct orderings further | offload (LRO/GRO) event. Having two distinct orderings further | |||
allows the transmission of the most pertinent changes in an | allows the transmission of the most pertinent changes in an | |||
abbreviated option (see below). | abbreviated option (see below). | |||
When a Data Receiver sends an AccECN Option, it MUST set the Kind | When a Data Receiver sends an AccECN Option, it MUST set the Kind | |||
skipping to change at line 1862 ¶ | skipping to change at line 1864 ¶ | |||
AccECN Options. To expedite connection setup in deployment scenarios | AccECN Options. To expedite connection setup in deployment scenarios | |||
where AccECN path traversal might be problematic, the TCP Server | where AccECN path traversal might be problematic, the TCP Server | |||
SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | SHOULD retransmit the SYN/ACK, but with no AccECN Option. If this | |||
retransmission times out, to expedite connection setup, the TCP | retransmission times out, to expedite connection setup, the TCP | |||
Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | Server SHOULD retransmit the SYN/ACK with (AE,CWR,ECE) = (0,0,0) and | |||
no AccECN Option, but it remains in AccECN feedback mode (per | no AccECN Option, but it remains in AccECN feedback mode (per | |||
Section 3.1.5). | Section 3.1.5). | |||
| Note that a retransmitted AccECN SYN/ACK will not necessarily | | Note that a retransmitted AccECN SYN/ACK will not necessarily | |||
| have the same TCP-ECN flags as the original SYN/ACK, because it | | have the same TCP-ECN flags as the original SYN/ACK, because it | |||
| feeds back the IP-ECN field of the latest SYN to have arrived | | feeds back the IP ECN field of the latest SYN to have arrived | |||
| (by the rule in Section 3.1.5). | | (by the rule in Section 3.1.5). | |||
The above fall-back approach limits any interference by middleboxes | The above fall-back approach limits any interference by middleboxes | |||
that might drop packets with unknown options, even though it is more | that might drop packets with unknown options, even though it is more | |||
likely that SYN/ACK loss is due to congestion. The TCP Server MAY | likely that SYN/ACK loss is due to congestion. The TCP Server MAY | |||
try to send another packet with an AccECN Option at a later point | try to send another packet with an AccECN Option at a later point | |||
during the connection but it ought to monitor if that packet got lost | during the connection but it ought to monitor if that packet got lost | |||
as well, in which case it SHOULD disable the sending of AccECN | as well, in which case it SHOULD disable the sending of AccECN | |||
Options for this half-connection. | Options for this half-connection. | |||
skipping to change at line 1922 ¶ | skipping to change at line 1924 ¶ | |||
packets carried an AccECN Option and disable the sending of AccECN | packets carried an AccECN Option and disable the sending of AccECN | |||
Options if the loss probability of those packets is significantly | Options if the loss probability of those packets is significantly | |||
higher than that of all other data packets in the same connection. | higher than that of all other data packets in the same connection. | |||
3.2.3.2.3. Testing for Absence of the AccECN Option | 3.2.3.2.3. Testing for Absence of the AccECN Option | |||
If the TCP Client has successfully negotiated AccECN but does not | If the TCP Client has successfully negotiated AccECN but does not | |||
receive an AccECN Option on the SYN/ACK (e.g., because is has been | receive an AccECN Option on the SYN/ACK (e.g., because is has been | |||
stripped by a middlebox or not sent by the Server), the Client | stripped by a middlebox or not sent by the Server), the Client | |||
switches into a mode that assumes that the AccECN Option is not | switches into a mode that assumes that the AccECN Option is not | |||
available for this half connection. | available for this half-connection. | |||
Similarly, if the TCP Server has successfully negotiated AccECN but | Similarly, if the TCP Server has successfully negotiated AccECN but | |||
does not receive an AccECN Option on the first segment that | does not receive an AccECN Option on the first segment that | |||
acknowledges sequence space at least covering the ISN, it switches | acknowledges sequence space at least covering the ISN, it switches | |||
into a mode that assumes that the AccECN Option is not available for | into a mode that assumes that the AccECN Option is not available for | |||
this half connection. | this half-connection. | |||
While a host is in this mode that assumes incoming AccECN Options are | While a host is in this mode that assumes incoming AccECN Options are | |||
not available, it MUST adopt the conservative interpretation of the | not available, it MUST adopt the conservative interpretation of the | |||
ACE field discussed in Section 3.2.2.5. However, it cannot make any | ACE field discussed in Section 3.2.2.5. However, it cannot make any | |||
assumption about support of outgoing AccECN Options on the other half | assumption about support of outgoing AccECN Options on the other | |||
connection, so it SHOULD continue to send AccECN Options itself | half-connection, so it SHOULD continue to send AccECN Options itself | |||
(unless it has established that sending AccECN Options is causing | (unless it has established that sending AccECN Options is causing | |||
packets to be blocked as in Section 3.2.3.2.2). | packets to be blocked as in Section 3.2.3.2.2). | |||
If a host is in the mode that assumes incoming AccECN Options are not | If a host is in the mode that assumes incoming AccECN Options are not | |||
available, but it receives an AccECN Option at any later point during | available, but it receives an AccECN Option at any later point during | |||
the connection, this clearly indicates that AccECN Options are no | the connection, this clearly indicates that AccECN Options are no | |||
longer blocked on the respective path, and the AccECN endpoint MAY | longer blocked on the respective path, and the AccECN endpoint MAY | |||
switch out of the mode that assumes AccECN Options are not available | switch out of the mode that assumes AccECN Options are not available | |||
for this half connection. | for this half-connection. | |||
3.2.3.2.4. Test for Zeroing of the AccECN Option | 3.2.3.2.4. Test for Zeroing of the AccECN Option | |||
For a related test for invalid initialization of the ACE field, see | For a related test for invalid initialization of the ACE field, see | |||
Section 3.2.2.4 | Section 3.2.2.4 | |||
Section 3.2.1 required the Data Receiver to initialize the r.e0b and | Section 3.2.1 required the Data Receiver to initialize the r.e0b and | |||
r.e1b counters to a non-zero value. Therefore, in either direction | r.e1b counters to a non-zero value. Therefore, in either direction | |||
the initial value of the EE0B field or EE1B field in an AccECN Option | the initial value of the EE0B field or EE1B field in an AccECN Option | |||
(if one exists) ought to be non-zero. If AccECN has been negotiated: | (if one exists) ought to be non-zero. If AccECN has been negotiated: | |||
* the TCP Server MAY check that the initial value of the EE0B field | * the TCP Server MAY check that the initial value of the EE0B field | |||
or the EE1B field is non-zero in the first segment that | or the EE1B field is non-zero in the first segment that | |||
acknowledges sequence space that at least covers the ISN plus 1. | acknowledges sequence space that at least covers the ISN plus 1. | |||
If it runs a test and either initial value is zero, the Server | If it runs a test and either initial value is zero, the Server | |||
will switch into a mode that ignores AccECN Options for this half | will switch into a mode that ignores AccECN Options for this half- | |||
connection. | connection. | |||
* the TCP Client MAY check that the initial value of the EE0B field | * the TCP Client MAY check that the initial value of the EE0B field | |||
or the EE1B field is non-zero on the SYN/ACK. If it runs a test | or the EE1B field is non-zero on the SYN/ACK. If it runs a test | |||
and either initial value is zero, the Client will switch into a | and either initial value is zero, the Client will switch into a | |||
mode that ignores AccECN Options for this half connection. | mode that ignores AccECN Options for this half-connection. | |||
While a host is in the mode that ignores AccECN Options, it MUST | While a host is in the mode that ignores AccECN Options, it MUST | |||
adopt the conservative interpretation of the ACE field discussed in | adopt the conservative interpretation of the ACE field discussed in | |||
Section 3.2.2.5. | Section 3.2.2.5. | |||
Note that the Data Sender MUST NOT test whether the arriving byte | Note that the Data Sender MUST NOT test whether the arriving byte | |||
counters in an initial AccECN Option have been initialized to | counters in an initial AccECN Option have been initialized to | |||
specific valid values -- the above checks solely test whether these | specific valid values -- the above checks solely test whether these | |||
fields have been incorrectly zeroed. This allows hosts to use | fields have been incorrectly zeroed. This allows hosts to use | |||
different initial values as an additional signalling channel in the | different initial values as an additional signalling channel in the | |||
skipping to change at line 2006 ¶ | skipping to change at line 2008 ¶ | |||
could also occur if a middlebox mangled an AccECN Option but not the | could also occur if a middlebox mangled an AccECN Option but not the | |||
ACE field. However, the Data Sender has to assume that the integrity | ACE field. However, the Data Sender has to assume that the integrity | |||
of AccECN Options is sound, based on the above test of the well-known | of AccECN Options is sound, based on the above test of the well-known | |||
initial values and optionally other integrity tests (Section 5.3). | initial values and optionally other integrity tests (Section 5.3). | |||
If either endpoint detects that the s.ceb counter has increased but | If either endpoint detects that the s.ceb counter has increased but | |||
the s.cep has not (and by testing ACK coverage it is certain how much | the s.cep has not (and by testing ACK coverage it is certain how much | |||
the ACE field has wrapped), and if there is no explanation other than | the ACE field has wrapped), and if there is no explanation other than | |||
an invalid protocol transition due to some form of feedback mangling, | an invalid protocol transition due to some form of feedback mangling, | |||
the Data Sender MUST disable sending ECN-capable packets for the | the Data Sender MUST disable sending ECN-capable packets for the | |||
remainder of the half-connection by setting the IP-ECN field in all | remainder of the half-connection by setting the IP ECN field in all | |||
subsequent packets to Not-ECT. | subsequent packets to Not-ECT. | |||
3.2.3.3. Usage of the AccECN TCP Option | 3.2.3.3. Usage of the AccECN TCP Option | |||
If a Data Receiver in AccECN mode intends to use AccECN TCP Options | If a Data Receiver in AccECN mode intends to use AccECN TCP Options | |||
to provide feedback, the rules below determine when to include an | to provide feedback, the rules below determine when to include an | |||
AccECN TCP Option, and which fields to include, given other options | AccECN TCP Option, and which fields to include, given other options | |||
might be competing for limited option space: | might be competing for limited option space: | |||
Importance of Congestion Control: AccECN is for congestion control, | Importance of Congestion Control: AccECN is for congestion control, | |||
which implementations SHOULD generally prioritize over other TCP | which implementations SHOULD generally prioritize over other TCP | |||
options when there is insufficient space for all the options in | Options when there is insufficient space for all the options in | |||
use. | use. | |||
If SACK has been negotiated [RFC2018], and the smallest | If SACK has been negotiated [RFC2018], and the smallest | |||
recommended AccECN Option would leave insufficient space for two | recommended AccECN Option would leave insufficient space for two | |||
SACK blocks on a particular ACK, the Data Receiver MUST give | SACK blocks on a particular ACK, the Data Receiver MUST give | |||
precedence to the SACK option (total 18 octets), because loss | precedence to the SACK option (total 18 octets), because loss | |||
feedback is more critical. | feedback is more critical. | |||
Recommended Simple Scheme: The Data Receiver SHOULD include an | Recommended Simple Scheme: The Data Receiver SHOULD include an | |||
AccECN TCP Option on every scheduled ACK if any byte counter has | AccECN TCP Option on every scheduled ACK if any byte counter has | |||
skipping to change at line 2040 ¶ | skipping to change at line 2042 ¶ | |||
include a field for every byte counter that has changed at some | include a field for every byte counter that has changed at some | |||
time during the connection (see examples later). | time during the connection (see examples later). | |||
A scheduled ACK means an ACK that the Data Receiver would send by | A scheduled ACK means an ACK that the Data Receiver would send by | |||
its regular delayed ACK rules. Recall that Section 1.3 defines an | its regular delayed ACK rules. Recall that Section 1.3 defines an | |||
'ACK' as either with data payload or without. But the above rule | 'ACK' as either with data payload or without. But the above rule | |||
is worded so that, in the common case when most of the data is | is worded so that, in the common case when most of the data is | |||
from a Server to a Client, the Server only includes an AccECN TCP | from a Server to a Client, the Server only includes an AccECN TCP | |||
Option while it is acknowledging data from the Client. | Option while it is acknowledging data from the Client. | |||
When available TCP option space is limited on particular packets, the | When available TCP Option space is limited on particular packets, the | |||
recommended scheme will need to include compromises. To guide the | recommended scheme will need to include compromises. To guide the | |||
implementer, the rules below are ranked in order of importance, but | implementer, the rules below are ranked in order of importance, but | |||
the final decision has to be implementation-dependent, because | the final decision has to be implementation-dependent, because | |||
tradeoffs will alter as new TCP options are defined and new use-cases | tradeoffs will alter as new TCP Options are defined and new use-cases | |||
arise. | arise. | |||
Necessary Option Length: When TCP option space is limited, an AccECN | Necessary Option Length: When TCP Option space is limited, an AccECN | |||
TCP option MAY be truncated to omit one or two fields from the end | TCP Option MAY be truncated to omit one or two fields from the end | |||
of the option, as indicated by the permitted variants listed in | of the option, as indicated by the permitted variants listed in | |||
Table 5, provided that the counter(s) that have changed since the | Table 5, provided that the counter(s) that have changed since the | |||
previous AccECN TCP option are not omitted. | previous AccECN TCP Option are not omitted. | |||
If there is insufficient space to include an AccECN TCP option | If there is insufficient space to include an AccECN TCP Option | |||
containing the counter(s) that have changed since the previous | containing the counter(s) that have changed since the previous | |||
AccECN TCP option, then the entire AccECN TCP option MUST be | AccECN TCP Option, then the entire AccECN TCP Option MUST be | |||
omitted. (see Section 3.2.3); | omitted. (see Section 3.2.3); | |||
Change-Triggered AccECN TCP Options: If an arriving packet | Change-Triggered AccECN TCP Options: If an arriving packet | |||
increments a different byte counter to that incremented by the | increments a different byte counter to that incremented by the | |||
previous packet, the Data Receiver SHOULD feed it back in an | previous packet, the Data Receiver SHOULD feed it back in an | |||
AccECN Option on the next scheduled ACK. | AccECN Option on the next scheduled ACK. | |||
For the avoidance of doubt, this rule does not concern the arrival | For the avoidance of doubt, this rule does not concern the arrival | |||
of control packets with no payload, because they cannot alter any | of control packets with no payload, because they cannot alter any | |||
byte counters. | byte counters. | |||
Continual Repetition: Otherwise, if arriving packets continue to | Continual Repetition: Otherwise, if arriving packets continue to | |||
increment the same byte counter: | increment the same byte counter: | |||
* the Data Receiver SHOULD include a counter that has continued | * the Data Receiver SHOULD include a counter that has continued | |||
to increment on the next scheduled ACK following a change- | to increment on the next scheduled ACK following a change- | |||
triggered AccECN TCP Option; | triggered AccECN TCP Option. | |||
* while the same counter continues to increment, it SHOULD | * while the same counter continues to increment, it SHOULD | |||
include the counter every n ACKs as consistently as possible, | include the counter every n ACKs as consistently as possible, | |||
where n can be chosen by the implementer; | where n can be chosen by the implementer. | |||
* It SHOULD always include an AccECN Option if the r.ceb counter | * It SHOULD always include an AccECN Option if the r.ceb counter | |||
is incrementing and it MAY include an AccECN Option if r.ec0b | is incrementing and it MAY include an AccECN Option if r.ec0b | |||
or r.ec1b is incrementing | or r.ec1b is incrementing. | |||
* It SHOULD include each counter at least once for every 2^22 | * It SHOULD include each counter at least once for every 2^22 | |||
bytes incremented to prevent overflow during continual | bytes incremented to prevent overflow during continual | |||
repetition. | repetition. | |||
The above rules complement those in Section 3.2.2.5, which determine | The above rules complement those in Section 3.2.2.5, which determine | |||
when to generate an ACK irrespective of whether an AccECN TCP Option | when to generate an ACK irrespective of whether an AccECN TCP Option | |||
is to be included. | is to be included. | |||
The recommended scheme is intended as a simple way to ensure that all | The recommended scheme is intended as a simple way to ensure that all | |||
the relevant byte counters will be carried on any ACK that reaches | the relevant byte counters will be carried on any ACK that reaches | |||
the Data Sender, no matter how many pure ACKs are filtered or | the Data Sender, no matter how many pure ACKs are filtered or | |||
coalesced along the network path, and without consuming the space | coalesced along the network path, and without consuming the space | |||
available for payload data with counter field(s) that have never | available for payload data with counter field(s) that have never | |||
changed. | changed. | |||
As an example of the recommended scheme, if ECT(0) is the only | As an example of the recommended scheme, if ECT(0) is the only | |||
codepoint that has ever arrived in the IP-ECN field, the Data | codepoint that has ever arrived in the IP ECN field, the Data | |||
Receiver will feed back an AccECN0 TCP Option with only the EE0B | Receiver will feed back an AccECN0 TCP Option with only the EE0B | |||
field on every packet that acknowledges new data. However, as soon | field on every packet that acknowledges new data. However, as soon | |||
as even one CE-marked packet arrives, on every packet that | as even one CE-marked packet arrives, on every packet that | |||
acknowledges new data it will start to include an option with two | acknowledges new data it will start to include an option with two | |||
fields, EE0B and ECEB. As a second example, if the first packet to | fields, EE0B and ECEB. As a second example, if the first packet to | |||
arrive happens to be CE marked, the Data Receiver will have to | arrive happens to be CE marked, the Data Receiver will have to | |||
arbitrarily choose whether to precede the ECEB field with an EE0B | arbitrarily choose whether to precede the ECEB field with an EE0B | |||
field or an EE1B field. If it chooses, say, EEB0 but it turns out | field or an EE1B field. If it chooses, say, EEB0 but it turns out | |||
never to receive ECT(0), it can start sending EE1B and ECEB instead | never to receive ECT(0), it can start sending EE1B and ECEB instead | |||
-- it does not have to include the EE0B field if the r.e0b counter | -- it does not have to include the EE0B field if the r.e0b counter | |||
skipping to change at line 2170 ¶ | skipping to change at line 2172 ¶ | |||
A TCP normalizer is likely to block or alter an AccECN TCP Option if | A TCP normalizer is likely to block or alter an AccECN TCP Option if | |||
the length value or the initial values of its byte-counter fields do | the length value or the initial values of its byte-counter fields do | |||
not match one of those specified in Sections 3.2.3 or 3.2.1. | not match one of those specified in Sections 3.2.3 or 3.2.1. | |||
However, to comply with the present AccECN specification, a middlebox | However, to comply with the present AccECN specification, a middlebox | |||
MUST NOT change the ACE field; or those fields of an AccECN Option | MUST NOT change the ACE field; or those fields of an AccECN Option | |||
that are currently specified in Section 3.2.3; or any AccECN field | that are currently specified in Section 3.2.3; or any AccECN field | |||
covered by integrity protection (e.g., [RFC5925]). | covered by integrity protection (e.g., [RFC5925]). | |||
3.3.3. Requirements for TCP ACK Filtering | 3.3.3. Requirements for TCP ACK Filtering | |||
Section 5.2.1 of [RFC3449] gives best current practice on filtering | Section 5.2.1 of RFC 3449 [BCP69] gives best current practice on | |||
(aka thinning or coalescing) of pure TCP ACKs. It advises that | filtering (aka thinning or coalescing) of pure TCP ACKs. It advises | |||
filtering ACKs carrying ECN feedback ought to preserve the correct | that filtering ACKs carrying ECN feedback ought to preserve the | |||
operation of ECN feedback. As the present specification updates the | correct operation of ECN feedback. As the present specification | |||
operation of ECN feedback, this section discusses how an ACK filter | updates the operation of ECN feedback, this section discusses how an | |||
might preserve correct operation of AccECN feedback as well. | ACK filter might preserve correct operation of AccECN feedback as | |||
well. | ||||
The problem divides into two parts: determining if an ACK is part of | The problem divides into two parts: determining if an ACK is part of | |||
a connection that is using AccECN and then preserving the correct | a connection that is using AccECN and then preserving the correct | |||
operation of AccECN feedback: | operation of AccECN feedback: | |||
* To determine whether a pure TCP ACK is part of an AccECN | * To determine whether a pure TCP ACK is part of an AccECN | |||
connection without resorting to connection tracking and per-flow | connection without resorting to connection tracking and per-flow | |||
state, a useful heuristic would be to check for a non-zero ECN | state, a useful heuristic would be to check for a non-zero ECN | |||
field at the IP layer (because the ECN++ experiment only allows | field at the IP layer (because the ECN++ experiment only allows | |||
TCP pure ACKs to be ECN-capable if AccECN has been negotiated | TCP pure ACKs to be ECN-capable if AccECN has been negotiated | |||
[ECN++]). This heuristic is simple and stateless. However, it | [ECN++]). This heuristic is simple and stateless. However, it | |||
might omit some AccECN ACKs, because AccECN can be used without | might omit some AccECN ACKs because AccECN can be used without | |||
ECN++ and even if it is, ECN++ does not have to make pure ACKs | ECN++. Even if ECN++ is used, pure ACKs do not necessarily have | |||
ECN-capable -- only deployment experience will tell. Also, TCP | to be marked as ECN-capable -- only deployment experience will | |||
ACKs might be ECN-capable owing to some scheme other than AccECN, | tell. Also, TCP ACKs might be ECN-capable owing to some scheme | |||
e.g., [RFC5690] or some future standards action. Again, only | other than AccECN, e.g., [RFC5690] or some future standards | |||
deployment experience will tell. | action. Again, only deployment experience will tell. | |||
* The main concern with preserving correct AccECN operation involves | * The main concern with preserving correct AccECN operation involves | |||
leaving enough ACKs for the Data Sender to work out whether the | leaving enough ACKs for the Data Sender to work out whether the | |||
3-bit ACE field has wrapped. In the worst case, in feedback about | 3-bit ACE field has wrapped. In the worst case, in feedback about | |||
a run of received packets that were all ECN-marked, the ACE field | a run of received packets that were all ECN-marked, the ACE field | |||
will wrap every 8 acknowledged packets. ACE field wrap might be | will wrap every 8 acknowledged packets. ACE field wrap might be | |||
of less concern if packets also carry AccECN TCP Options. | of less concern if packets also carry AccECN TCP Options. | |||
However, note that logic to read an AccECN TCP Option is optional | However, note that logic to read an AccECN TCP Option is optional | |||
to implement (albeit recommended -- see Section 3.2.3). So one | to implement (albeit recommended -- see Section 3.2.3). So one | |||
end writing an AccECN TCP Option into a packet does not | end writing an AccECN TCP Option into a packet does not | |||
skipping to change at line 2240 ¶ | skipping to change at line 2243 ¶ | |||
direction. Therefore, currently available TSO hardware with | direction. Therefore, currently available TSO hardware with | |||
[RFC3168] support may need some minor driver changes, to adjust the | [RFC3168] support may need some minor driver changes, to adjust the | |||
bitmask for the first, middle, and last segments processed with TSO. | bitmask for the first, middle, and last segments processed with TSO. | |||
Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | Initially, when Classic ECN [RFC3168] and Accurate ECN flows coexist | |||
on the same offloading engine, the host software may need to work | on the same offloading engine, the host software may need to work | |||
around incompatibilities (e.g., when only global configurable TSO TCP | around incompatibilities (e.g., when only global configurable TSO TCP | |||
Flag bitmasks are available), otherwise this would cause some issues. | Flag bitmasks are available), otherwise this would cause some issues. | |||
One way around this could be to only negotiate for Accurate ECN, but | One way around this could be to only negotiate for Accurate ECN, but | |||
not offer a fall back to [RFC3168] ECN. Another way could be to | not offer a fall back to Classic ECN [RFC3168]. Another way could be | |||
allow TSO only as long as the CWR flag in the TCP header is not set | to allow TSO only as long as the CWR flag in the TCP header is not | |||
-- at the cost of more processing overhead while the ACE field has | set -- at the cost of more processing overhead while the ACE field | |||
this bit set. | has this bit set. | |||
For LRO in the receive direction, a different issue may get exposed | For LRO in the receive direction, a different issue may get exposed | |||
with [RFC3168] ECN supporting hardware. | with Classic ECN [RFC3168] supporting hardware. | |||
The ACE field changes with every received CE marking, so today's | The ACE field changes with every received CE marking, so today's | |||
receive offloading could lead to many interrupts in high congestion | receive offloading could lead to many interrupts in high congestion | |||
situations. Although that would be useful (because congestion | situations. Although that would be useful (because congestion | |||
information is received sooner), it could also significantly increase | information is received sooner), it could also significantly increase | |||
processor load, particularly in scenarios such as DCTCP or L4S where | processor load, particularly in scenarios such as DCTCP or L4S where | |||
the marking rate is generally higher. | the marking rate is generally higher. | |||
Current offload hardware ejects a segment from the coalescing process | Current offload hardware ejects a segment from the coalescing process | |||
whenever the TCP ECN flags change. In data centres, it has been | whenever the TCP ECN flags change. In data centres, it has been | |||
skipping to change at line 2304 ¶ | skipping to change at line 2307 ¶ | |||
of the present specification. | of the present specification. | |||
* In Section 6.1.2 of [RFC3168], all mentions of a congestion | * In Section 6.1.2 of [RFC3168], all mentions of a congestion | |||
response to an ECN-Echo (ECE) ACK packet are updated by | response to an ECN-Echo (ECE) ACK packet are updated by | |||
Section 3.2 of the present specification to mean an increment to | Section 3.2 of the present specification to mean an increment to | |||
the sender's count of CE-marked packets, s.cep. And the | the sender's count of CE-marked packets, s.cep. And the | |||
requirements to set the CWR flag no longer apply, as specified in | requirements to set the CWR flag no longer apply, as specified in | |||
Section 3.1.5 of the present specification. Otherwise, the | Section 3.1.5 of the present specification. Otherwise, the | |||
remaining requirements in Section 6.1.2 of [RFC3168] still stand. | remaining requirements in Section 6.1.2 of [RFC3168] still stand. | |||
It will be noted that [RFC8311] already updates, or potentially | It will be noted that [RFC8311] already updates a number of the | |||
updates, a number of the requirements in Section 6.1.2 of | requirements in Section 6.1.2 of [RFC3168]. Section 6.1.2 of RFC | |||
[RFC3168]. Section 6.1.2 of RFC 3168 extended standard TCP | 3168 extended standard TCP congestion control [RFC5681] to cover | |||
congestion control [RFC5681] to cover ECN marking as well as | ECN marking as well as packet drop. Whereas, [RFC8311] enables | |||
packet drop. Whereas, [RFC8311] enables experimentation with | experimentation with alternative responses to ECN marking, if | |||
alternative responses to ECN marking, if specified for instance by | specified for instance by an Experimental RFC produced by the IETF | |||
an Experimental RFC produced by the IETF Stream. [RFC8311] also | Stream. [RFC8311] also strengthened the statement that "ECT(0) | |||
strengthened the statement that "ECT(0) SHOULD be used" to a | SHOULD be used" to a "MUST" (see [RFC8311] for the details). | |||
"MUST" (see [RFC8311] for the details). | ||||
* The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | * The whole of Section 6.1.3 of [RFC3168] is updated by Section 3.2 | |||
of the present specification, with the exception of the last | of the present specification, with the exception of the last | |||
paragraph (about congestion response to drop and ECN in the same | paragraph (about congestion response to drop and ECN in the same | |||
round trip), which still stands. Incidentally, this last | round trip), which still stands. Incidentally, this last | |||
paragraph is in the wrong section, because it relates to "TCP | paragraph is in the wrong section, because it relates to "TCP | |||
Sender" behaviour. | Sender" behaviour. | |||
* The following text within Section 6.1.5 of [RFC3168]: | * The following text within Section 6.1.5 of [RFC3168]: | |||
skipping to change at line 2384 ¶ | skipping to change at line 2386 ¶ | |||
with the value 0b000 or 0b001, these values indicate that the TCP | with the value 0b000 or 0b001, these values indicate that the TCP | |||
Client did not request support for AccECN; therefore, the Server does | Client did not request support for AccECN; therefore, the Server does | |||
not enter AccECN mode for this connection. Further, 0b001 on the ACK | not enter AccECN mode for this connection. Further, 0b001 on the ACK | |||
implies that the Server sent an ECN-capable SYN/ACK, which was marked | implies that the Server sent an ECN-capable SYN/ACK, which was marked | |||
CE in the network, and the non-AccECN TCP Client fed this back by | CE in the network, and the non-AccECN TCP Client fed this back by | |||
setting ECE on the ACK of the SYN/ACK. | setting ECE on the ACK of the SYN/ACK. | |||
5.2. Compatibility with TCP Experiments and Common TCP Options | 5.2. Compatibility with TCP Experiments and Common TCP Options | |||
AccECN is compatible (at least on paper) with the most commonly used | AccECN is compatible (at least on paper) with the most commonly used | |||
TCP options: MSS, time-stamp, window scaling, SACK, and TCP-AO. It | TCP Options: MSS, timestamp, window scaling, SACK, and TCP-AO. It is | |||
is also compatible with Multipath TCP (MPTCP [RFC8684]) and the | also compatible with Multipath TCP (MPTCP [RFC8684]) and the | |||
experimental TCP option TCP Fast Open (TFO [RFC7413]). AccECN is | experimental TCP Option TCP Fast Open (TFO [RFC7413]). AccECN is | |||
friendly to all these protocols, because space for TCP options is | friendly to all these protocols, because space for TCP Options is | |||
particularly scarce on the SYN, where AccECN consumes zero additional | particularly scarce on the SYN, where AccECN consumes zero additional | |||
header space. | header space. | |||
When option space is under pressure from other options, | Because option space is limited, Section 3.2.3.3 provides guidance on | |||
Section 3.2.3.3 provides guidance on how important it is to send an | how important it is to send an AccECN Option relative to other | |||
AccECN Option relative to other options, and which fields are more | options and specifies which fields are more important to include. | |||
important to include. | ||||
Implementers of TFO need to take careful note of the recommendation | Implementers of TFO need to take careful note of the recommendation | |||
in Section 3.2.2.1. That section recommends that, if the TCP Client | in Section 3.2.2.1. That section recommends that, if the TCP Client | |||
has successfully negotiated AccECN, when acknowledging the SYN/ACK, | has successfully negotiated AccECN, when acknowledging the SYN/ACK, | |||
even if it has data to send, it sends a pure ACK immediately before | even if it has data to send, it sends a pure ACK immediately before | |||
the data. Then it can reflect the IP-ECN field of the SYN/ACK on | the data. Then it can reflect the IP ECN field of the SYN/ACK on | |||
this pure ACK, which allows the Server to detect ECN mangling. Note | this pure ACK, which allows the Server to detect ECN mangling. Note | |||
that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | that, as specified in Section 3.2, any data on the SYN (SYN=1, ACK=0) | |||
is not included in any of the byte counters held locally for each ECN | is not included in any of the byte counters held locally for each ECN | |||
marking, nor in the AccECN Option on the wire. | marking, nor in the AccECN Option on the wire. | |||
AccECN feedback is compatible with the ECN++ experiment [ECN++], | AccECN feedback is compatible with the ECN++ experiment [ECN++], | |||
which allows TCP control packets and retransmissions to be ECN- | which allows TCP control packets and retransmissions to be ECN- | |||
capable ([RFC3168] was updated by [RFC8311] to permit such | capable ([RFC3168] was updated by [RFC8311] to permit such | |||
experiments). AccECN is likely to inherently support any experiment | experiments). AccECN is likely to inherently support any experiment | |||
with ECN-capable packets, because it feeds back the contents of the | with ECN-capable packets, because it feeds back the contents of the | |||
skipping to change at line 2424 ¶ | skipping to change at line 2425 ¶ | |||
an earlier experimental protocol with narrower scope than ECN++ and a | an earlier experimental protocol with narrower scope than ECN++ and a | |||
5-way handshake. | 5-way handshake. | |||
5.3. Compatibility with Feedback Integrity Mechanisms | 5.3. Compatibility with Feedback Integrity Mechanisms | |||
Three alternative mechanisms are available to assure the integrity of | Three alternative mechanisms are available to assure the integrity of | |||
ECN and/or loss signals. AccECN is compatible with any of these | ECN and/or loss signals. AccECN is compatible with any of these | |||
approaches: | approaches: | |||
* The Data Sender can test the integrity of the receiver's ECN (or | * The Data Sender can test the integrity of the receiver's ECN (or | |||
loss) feedback by occasionally setting the IP-ECN field to a value | loss) feedback by occasionally setting the IP ECN field to a value | |||
normally only set by the network (and/or deliberately leaving a | normally only set by the network (and/or deliberately leaving a | |||
sequence number gap). Then it can test whether the Data | sequence number gap). Then it can test whether the Data | |||
Receiver's feedback faithfully reports what it expects (similar to | Receiver's feedback faithfully reports what it expects (similar to | |||
paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | paragraph 2 of Section 20.2 of [RFC3168]). Unlike the ECN-nonce | |||
[RFC3540], this approach does not waste the ECT(1) codepoint in | [RFC3540], this approach does not waste the ECT(1) codepoint in | |||
the IP header, it does not require standardization, and it does | the IP header, it does not require standardization, and it does | |||
not rely on misbehaving receivers volunteering to reveal feedback | not rely on misbehaving receivers volunteering to reveal feedback | |||
information that allows them to be detected. However, setting the | information that allows them to be detected. However, setting the | |||
CE mark by the sender might conceal actual congestion feedback | CE mark by the sender might conceal actual congestion feedback | |||
from the network and therefore ought to only be done sparingly. | from the network and therefore ought to only be done sparingly. | |||
skipping to change at line 2455 ¶ | skipping to change at line 2456 ¶ | |||
ConEx is an experimental change to the Data Sender that would be | ConEx is an experimental change to the Data Sender that would be | |||
most useful when combined with AccECN. Without AccECN, the ConEx | most useful when combined with AccECN. Without AccECN, the ConEx | |||
behaviour of a Data Sender would have to be more conservative than | behaviour of a Data Sender would have to be more conservative than | |||
would be necessary if it had the accurate feedback of AccECN. | would be necessary if it had the accurate feedback of AccECN. | |||
* The Standards Track TCP authentication option (TCP-AO [RFC5925]) | * The Standards Track TCP authentication option (TCP-AO [RFC5925]) | |||
can be used to detect any tampering with AccECN feedback between | can be used to detect any tampering with AccECN feedback between | |||
the Data Receiver and the Data Sender (whether malicious or | the Data Receiver and the Data Sender (whether malicious or | |||
accidental). The AccECN fields are immutable end to end, so they | accidental). The AccECN fields are immutable end to end, so they | |||
are amenable to TCP-AO protection, which covers TCP options by | are amenable to TCP-AO protection, which covers TCP Options by | |||
default. However, TCP-AO is often too brittle to use on many end- | default. However, TCP-AO is often too brittle to use on many end- | |||
to-end paths, where middleboxes can make verification fail in | to-end paths, where middleboxes can make verification fail in | |||
their attempts to improve performance or security, e.g., Network | their attempts to improve performance or security, e.g., Network | |||
Address Translation (NAT) and Network Address Port Translation | Address Translation (NAT) and Network Address Port Translation | |||
(NAPT), resegmentation, or shifting the sequence space. | (NAPT), resegmentation, or shifting the sequence space. | |||
6. Summary: Protocol Properties | 6. Summary: Protocol Properties | |||
This section is informative, not normative. It describes how well | This section is informative, not normative. It describes how well | |||
the protocol satisfies the agreed requirements for a more Accurate | the protocol satisfies the agreed requirements for a more Accurate | |||
skipping to change at line 2477 ¶ | skipping to change at line 2478 ¶ | |||
Accuracy: From each ACK, the Data Sender can infer the number of new | Accuracy: From each ACK, the Data Sender can infer the number of new | |||
CE-marked segments since the previous ACK. This provides better | CE-marked segments since the previous ACK. This provides better | |||
accuracy on CE feedback than Classic ECN. In addition, if an | accuracy on CE feedback than Classic ECN. In addition, if an | |||
AccECN Option is present (not blocked by the network path), the | AccECN Option is present (not blocked by the network path), the | |||
number of bytes marked with CE, ECT(1), and ECT(0) are provided. | number of bytes marked with CE, ECT(1), and ECT(0) are provided. | |||
Overhead: The AccECN scheme is divided into two parts. The | Overhead: The AccECN scheme is divided into two parts. The | |||
essential feedback part reuses the three flags already assigned to | essential feedback part reuses the three flags already assigned to | |||
ECN in the TCP header. The supplementary feedback part adds an | ECN in the TCP header. The supplementary feedback part adds an | |||
additional TCP option consuming up to 11 bytes. However, no TCP | additional TCP Option consuming up to 11 bytes. However, no TCP | |||
option space is consumed in the SYN. | Option space is consumed in the SYN. | |||
Ordering: The order in which marks arrive at the Data Receiver is | Ordering: The order in which marks arrive at the Data Receiver is | |||
preserved in AccECN feedback, because the Data Receiver is | preserved in AccECN feedback, because the Data Receiver is | |||
expected to send an ACK immediately whenever a different mark | expected to send an ACK immediately whenever a different mark | |||
arrives. | arrives. | |||
Timeliness: While the same ECN markings are arriving continually at | Timeliness: While the same ECN markings are arriving continually at | |||
the Data Receiver, it can defer ACKs as TCP does normally, but it | the Data Receiver, it can defer ACKs as TCP does normally, but it | |||
will immediately send an ACK as soon as a different ECN marking | will immediately send an ACK as soon as a different ECN marking | |||
arrives. | arrives. | |||
skipping to change at line 2545 ¶ | skipping to change at line 2546 ¶ | |||
can assure the integrity of ECN feedback. If AccECN Options are | can assure the integrity of ECN feedback. If AccECN Options are | |||
stripped, the resolution of the feedback is degraded, but the | stripped, the resolution of the feedback is degraded, but the | |||
integrity of this degraded feedback can still be assured. | integrity of this degraded feedback can still be assured. | |||
Backward Compatibility: If only one endpoint supports the AccECN | Backward Compatibility: If only one endpoint supports the AccECN | |||
scheme, it will fall back to the most advanced ECN feedback scheme | scheme, it will fall back to the most advanced ECN feedback scheme | |||
supported by the other end. | supported by the other end. | |||
If AccECN Options are stripped by a middlebox, AccECN still | If AccECN Options are stripped by a middlebox, AccECN still | |||
provides basic congestion feedback in the ACE field. Further, | provides basic congestion feedback in the ACE field. Further, | |||
AccECN can be used to detect mangling of the IP-ECN field; | AccECN can be used to detect mangling of the IP ECN field; | |||
mangling of the TCP ECN flags; blocking of ECT-marked segments; | mangling of the TCP ECN flags; blocking of ECT-marked segments; | |||
and blocking of segments carrying an AccECN Option. It can detect | and blocking of segments carrying an AccECN Option. It can detect | |||
these conditions during TCP's three-way handshake so that it can | these conditions during TCP's three-way handshake so that it can | |||
fall back to operation without ECN and/or operation without AccECN | fall back to operation without ECN and/or operation without AccECN | |||
Options. | Options. | |||
Forward Compatibility: The behaviour of endpoints and middleboxes is | Forward Compatibility: The behaviour of endpoints and middleboxes is | |||
carefully defined for all reserved or currently unused codepoints | carefully defined for all reserved or currently unused codepoints | |||
in the scheme. Then, the designers of security devices can | in the scheme. Then, the designers of security devices can | |||
understand which currently unused values might appear in the | understand which currently unused values might appear in the | |||
skipping to change at line 2581 ¶ | skipping to change at line 2582 ¶ | |||
+=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| Bit | Name | Reference | Assignment Notes | | | Bit | Name | Reference | Assignment Notes | | |||
+=====+==============+===========+==============================+ | +=====+==============+===========+==============================+ | |||
| 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | | 7 | AE (Accurate | RFC 9768 | Previously used as NS (Nonce | | |||
| | ECN) | | Sum) by [RFC3540], which is | | | | ECN) | | Sum) by [RFC3540], which is | | |||
| | | | now Historic [RFC8311] | | | | | | now Historic [RFC8311] | | |||
+-----+--------------+-----------+------------------------------+ | +-----+--------------+-----------+------------------------------+ | |||
Table 6: TCP Header Flag Reassignment | Table 6: TCP Header Flag Reassignment | |||
This document also defines two new TCP options for AccECN from the | This document also defines two new TCP Options for AccECN from the | |||
TCP option space. These values are defined as the following in the | TCP Option space. These values are defined as the following in the | |||
"TCP Option Kind Numbers" registry in the "Transmission Control | "TCP Option Kind Numbers" registry in the "Transmission Control | |||
Protocol (TCP) Parameters" registry group: | Protocol (TCP) Parameters" registry group: | |||
+======+========+================================+===========+ | +======+========+================================+===========+ | |||
| Kind | Length | Meaning | Reference | | | Kind | Length | Meaning | Reference | | |||
+======+========+================================+===========+ | +======+========+================================+===========+ | |||
| 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | | 172 | N | Accurate ECN Order 0 (AccECN0) | RFC 9768 | | |||
+------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
| 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | | 174 | N | Accurate ECN Order 1 (AccECN1) | RFC 9768 | | |||
+------+--------+--------------------------------+-----------+ | +------+--------+--------------------------------+-----------+ | |||
Table 7: New TCP Option assignments | Table 7: New TCP Option Assignments | |||
Early experimental implementations of the two AccECN Options used | Early experimental implementations of the two AccECN Options used | |||
experimental option 254 per [RFC6994] with the 16-bit magic numbers | experimental option 254 per [RFC6994] with the 16-bit magic numbers | |||
0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | 0xACC0 and 0xACC1, respectively, for Order 0 and 1, as allocated in | |||
the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | the IANA "TCP/UDP Experimental Option Experiment Identifiers (TCP/UDP | |||
ExIDs)" registry. Even earlier experimental implementations used the | ExIDs)" registry. Even earlier experimental implementations used the | |||
single magic number 0xACCE (16 bits). Uses of these experimental | single magic number 0xACCE (16 bits). Uses of these experimental | |||
options SHOULD migrate to use the new option kinds (172 and 174). | options SHOULD migrate to use the new option kinds (172 and 174). | |||
8. Security and Privacy Considerations | 8. Security and Privacy Considerations | |||
If ever the supplementary feedback part of AccECN that is based on | If ever the supplementary feedback part of AccECN that is based on | |||
one of the new AccECN TCP Options is unusable (due for example to | one of the new AccECN TCP Options is unusable (due for example to | |||
middlebox interference), the essential feedback part of AccECN's | middlebox interference), the essential feedback part of AccECN's | |||
congestion feedback offers only limited resilience to long runs of | congestion feedback offers only limited resilience to long runs of | |||
ACK loss (see Section 3.2.2.5). These problems are unlikely to be | ACK loss (see Section 3.2.2.5). These problems are unlikely to be | |||
due to malicious intervention (because if an attacker could strip a | due to malicious intervention (because if an attacker could strip a | |||
TCP option or discard a long run of ACKs, it could wreak other | TCP Option or discard a long run of ACKs, it could wreak other | |||
arbitrary havoc). However, it would be of concern if AccECN's | arbitrary havoc). However, it would be of concern if AccECN's | |||
resilience could be indirectly compromised during a flooding attack. | resilience could be indirectly compromised during a flooding attack. | |||
AccECN is still considered safe though, because if AccECN Options are | AccECN is still considered safe though, because if AccECN Options are | |||
not present, the AccECN Data Sender is then required to switch to | not present, the AccECN Data Sender is then required to switch to | |||
more conservative assumptions about wrap of congestion indication | more conservative assumptions about wrap of congestion indication | |||
counters (see Section 3.2.2.5 and Appendix A.2). | counters (see Section 3.2.2.5 and Appendix A.2). | |||
Section 5.1 describes how a TCP Server can negotiate AccECN and use | Section 5.1 describes how a TCP Server can negotiate AccECN and use | |||
the SYN cookie method for mitigating SYN flooding attacks. | the SYN cookie method for mitigating SYN flooding attacks. | |||
skipping to change at line 2639 ¶ | skipping to change at line 2640 ¶ | |||
will be degraded, but the integrity of this degraded information can | will be degraded, but the integrity of this degraded information can | |||
still be assured. Assuring that Data Senders respond appropriately | still be assured. Assuring that Data Senders respond appropriately | |||
to ECN feedback is possible, but the scope of the present document is | to ECN feedback is possible, but the scope of the present document is | |||
confined to the feedback protocol and excludes the response to this | confined to the feedback protocol and excludes the response to this | |||
feedback. | feedback. | |||
In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | In Section 3.2.3, a Data Sender is allowed to ignore an unrecognized | |||
TCP AccECN Option length and read as many whole 3-octet fields from | TCP AccECN Option length and read as many whole 3-octet fields from | |||
it as possible up to a maximum of 3, treating the remainder as | it as possible up to a maximum of 3, treating the remainder as | |||
padding. This opens up a potential covert channel of up to 29B (40 - | padding. This opens up a potential covert channel of up to 29B (40 - | |||
(2+3*3)) B. However, it is really an overt channel (not hidden) and | (2+3*3)). However, it is really an overt channel (not hidden) and it | |||
it is no different than the use of unknown TCP options with unknown | is no different than the use of unknown TCP Options with unknown | |||
option lengths in general. Therefore, where this is of concern, it | option lengths in general. Therefore, where this is of concern, it | |||
can already be adequately mitigated by regular TCP normalizer | can already be adequately mitigated by regular TCP normalizer | |||
technology (see Section 3.3.2). | technology (see Section 3.3.2). | |||
The AccECN protocol is not believed to introduce any new privacy | The AccECN protocol is not believed to introduce any new privacy | |||
concerns, because it merely counts and feeds back signals at the | concerns, because it merely counts and feeds back signals at the | |||
transport layer that had already been visible at the IP layer. A | transport layer that had already been visible at the IP layer. A | |||
covert channel can be used to compromise privacy. However, as | covert channel can be used to compromise privacy. However, as | |||
explained above, undefined TCP options in general open up such | explained above, undefined TCP Options in general open up such | |||
channels, and common techniques are available to close them off. | channels, and common techniques are available to close them off. | |||
There is a potential concern that a Data Receiver could deliberately | There is a potential concern that a Data Receiver could deliberately | |||
omit AccECN Options pretending that they had been stripped by a | omit AccECN Options pretending that they had been stripped by a | |||
middlebox. No known way can yet be contrived for a receiver to take | middlebox. Currently, there is no known way for a receiver to take | |||
advantage of this behaviour, which seems to always degrade its own | advantage of this behaviour, which seems to always degrade its own | |||
performance. However, the concern is mentioned here for | performance. However, the concern is mentioned here for | |||
completeness. | completeness. | |||
A generic privacy concern of any new protocol is that for a while it | A generic privacy concern of any new protocol is that for a while it | |||
will be used by a small population of hosts, and thus show up more | will be used by a small population of hosts, and thus those hosts | |||
easily. However, it is expected that AccECN will become available in | could be more easily identified. However, it is expected that AccECN | |||
operating systems over time and that it will eventually be turned on | will become available in operating systems over time and that it will | |||
by default. Thus, an individual identification of a particular user | eventually be turned on by default. Thus, an individual | |||
is less of a concern than the fingerprinting of specific versions of | identification of a particular user is less of a concern than the | |||
operation systems. However, the latter can be done using different | fingerprinting of specific versions of operation systems. However, | |||
means independent of Accurate ECN. | the latter can be done using different means independent of Accurate | |||
ECN. | ||||
As Accurate ECN exposes more bits in the TCP header that could be | As Accurate ECN exposes more bits in the TCP header that could be | |||
tampered with without interfering with the transport excessively, it | tampered with without interfering with the transport excessively, it | |||
may allow an additional way to identify specific data streams across | may allow an additional way to identify specific data streams across | |||
a virtual private network (VPN) to an attacker that has access to the | a virtual private network (VPN) to an attacker that has access to the | |||
datastream before and after the VPN tunnel endpoints. This may be | datastream before and after the VPN tunnel endpoints. This may be | |||
achieved by injecting or modifying the ACE field in specific patterns | achieved by injecting or modifying the ACE field in specific patterns | |||
that can be recognized. | that can be recognized. | |||
Overall, Accurate ECN does not change the risk profile on privacy to | Overall, Accurate ECN does not change the risk profile on privacy to | |||
skipping to change at line 2722 ¶ | skipping to change at line 2724 ¶ | |||
[RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | |||
May 2017, <https://www.rfc-editor.org/info/rfc8174>. | May 2017, <https://www.rfc-editor.org/info/rfc8174>. | |||
[RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | [RFC9293] Eddy, W., Ed., "Transmission Control Protocol (TCP)", | |||
STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | STD 7, RFC 9293, DOI 10.17487/RFC9293, August 2022, | |||
<https://www.rfc-editor.org/info/rfc9293>. | <https://www.rfc-editor.org/info/rfc9293>. | |||
9.2. Informative References | 9.2. Informative References | |||
[BCP69] Best Current Practice 69, | ||||
<https://www.rfc-editor.org/info/bcp69>. | ||||
At the time of writing, this BCP comprises the following: | ||||
Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
Sooriyabandara, "TCP Performance Implications of Network | ||||
Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
[ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | [ECN++] Bagnulo, M. and B. Briscoe, "ECN++: Adding Explicit | |||
Congestion Notification (ECN) to TCP Control Packets", | Congestion Notification (ECN) to TCP Control Packets", | |||
Work in Progress, Internet-Draft, draft-ietf-tcpm- | Work in Progress, Internet-Draft, draft-ietf-tcpm- | |||
generalized-ecn-17, 21 April 2025, | generalized-ecn-17, 21 April 2025, | |||
<https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | <https://datatracker.ietf.org/doc/html/draft-ietf-tcpm- | |||
generalized-ecn-17>. | generalized-ecn-17>. | |||
[Mandalari18] | [Mandalari18] | |||
Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | Mandalari, A., Lutu, A., Briscoe, B., Bagnulo, M., and Ö. | |||
Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | Alay, "Measuring ECN++: Good News for ++, Bad News for ECN | |||
over Mobile", IEEE Communications Magazine , March 2018, | over Mobile", IEEE Communications Magazine , March 2018, | |||
<http://www.it.uc3m.es/amandala/ | <http://www.it.uc3m.es/amandala/ | |||
ecn++/ecn_commag_2018.html>. | ecn++/ecn_commag_2018.html>. | |||
[RFC3449] Balakrishnan, H., Padmanabhan, V., Fairhurst, G., and M. | ||||
Sooriyabandara, "TCP Performance Implications of Network | ||||
Path Asymmetry", BCP 69, RFC 3449, DOI 10.17487/RFC3449, | ||||
December 2002, <https://www.rfc-editor.org/info/rfc3449>. | ||||
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
RFC 3540, DOI 10.17487/RFC3540, June 2003, | RFC 3540, DOI 10.17487/RFC3540, June 2003, | |||
<https://www.rfc-editor.org/info/rfc3540>. | <https://www.rfc-editor.org/info/rfc3540>. | |||
[RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | [RFC4987] Eddy, W., "TCP SYN Flooding Attacks and Common | |||
Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | Mitigations", RFC 4987, DOI 10.17487/RFC4987, August 2007, | |||
<https://www.rfc-editor.org/info/rfc4987>. | <https://www.rfc-editor.org/info/rfc4987>. | |||
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
skipping to change at line 2852 ¶ | skipping to change at line 2858 ¶ | |||
(L4S) Internet Service: Architecture", RFC 9330, | (L4S) Internet Service: Architecture", RFC 9330, | |||
DOI 10.17487/RFC9330, January 2023, | DOI 10.17487/RFC9330, January 2023, | |||
<https://www.rfc-editor.org/info/rfc9330>. | <https://www.rfc-editor.org/info/rfc9330>. | |||
[RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | [RFC9438] Xu, L., Ha, S., Rhee, I., Goel, V., and L. Eggert, Ed., | |||
"CUBIC for Fast and Long-Distance Networks", RFC 9438, | "CUBIC for Fast and Long-Distance Networks", RFC 9438, | |||
DOI 10.17487/RFC9438, August 2023, | DOI 10.17487/RFC9438, August 2023, | |||
<https://www.rfc-editor.org/info/rfc9438>. | <https://www.rfc-editor.org/info/rfc9438>. | |||
[RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | [RoCEv2] InfiniBand Trade Association, "InfiniBand Architecture | |||
Specification", Volume 1, Release 1.4, 2020, | Specification", | |||
<https://www.infinibandta.org/ibta-specification/>. | <https://www.infinibandta.org/ibta-specification/>. | |||
Appendix A. Example Algorithms | Appendix A. Example Algorithms | |||
This appendix is informative, not normative. It gives example | This appendix is informative, not normative. It gives example | |||
algorithms that would satisfy the normative requirements of the | algorithms that would satisfy the normative requirements of the | |||
AccECN protocol. However, implementers are free to choose other ways | AccECN protocol. However, implementers are free to choose other ways | |||
to implement the requirements. | to satisfy the requirements. | |||
A.1. Example Algorithm to Encode/Decode the AccECN Option | A.1. Example Algorithm to Encode/Decode the AccECN Option | |||
The example algorithms below show how a Data Receiver in AccECN mode | The example algorithms below show how a Data Receiver in AccECN mode | |||
could encode its CE byte counter r.ceb into the ECEB field within an | could encode its CE byte counter r.ceb into the ECEB field within an | |||
AccECN TCP Option, and how a Data Sender in AccECN mode could decode | AccECN TCP Option, and how a Data Sender in AccECN mode could decode | |||
the ECEB field into its byte counter s.ceb. The other counters for | the ECEB field into its byte counter s.ceb. The other counters for | |||
bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | bytes marked ECT(0) and ECT(1) in an AccECN Option would be similarly | |||
encoded and decoded. | encoded and decoded. | |||
skipping to change at line 2893 ¶ | skipping to change at line 2899 ¶ | |||
where '%' is the remainder operator. | where '%' is the remainder operator. | |||
On the arrival of an AccECN Option, the Data Sender first makes sure | On the arrival of an AccECN Option, the Data Sender first makes sure | |||
the ACK has not been superseded in order to avoid winding the s.ceb | the ACK has not been superseded in order to avoid winding the s.ceb | |||
counter backwards. It uses the TCP acknowledgement number and any | counter backwards. It uses the TCP acknowledgement number and any | |||
SACK options [RFC2018] to calculate newlyAckedB, the amount of new | SACK options [RFC2018] to calculate newlyAckedB, the amount of new | |||
data that the ACK acknowledges in bytes (newlyAckedB can be zero but | data that the ACK acknowledges in bytes (newlyAckedB can be zero but | |||
not negative). If newlyAckedB is zero, either the ACK has been | not negative). If newlyAckedB is zero, either the ACK has been | |||
superseded or CE-marked packet(s) without data could have arrived. | superseded or CE-marked packet(s) without data could have arrived. | |||
To break the tie for the latter case, the Data Sender could use time- | To break the tie for the latter case, the Data Sender could use | |||
stamps [RFC7323] (if present) to work out newlyAckedT, the amount of | timestamps [RFC7323] (if present) to work out newlyAckedT, the amount | |||
new time that the ACK acknowledges. If the Data Sender determines | of new time that the ACK acknowledges. If the Data Sender determines | |||
that the ACK has been superseded, it ignores the AccECN Option. | that the ACK has been superseded, it ignores the AccECN Option. | |||
Otherwise, the Data Sender calculates the minimum non-negative | Otherwise, the Data Sender calculates the minimum non-negative | |||
difference d.ceb between the ECEB field and its local s.ceb counter, | difference d.ceb between the ECEB field and its local s.ceb counter, | |||
using modulo arithmetic as follows: | using modulo arithmetic as follows: | |||
if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | if ((newlyAckedB > 0) || (newlyAckedT > 0)) { | |||
d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | d.ceb = (ECEB + DIVOPT - (s.ceb % DIVOPT)) % DIVOPT | |||
s.ceb += d.ceb | s.ceb += d.ceb | |||
} | } | |||
skipping to change at line 2982 ¶ | skipping to change at line 2988 ¶ | |||
of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | of the missing ACKs were piggy-backed on data (i.e., not pure ACKs) | |||
retransmissions will not repair the lost AccECN information, because | retransmissions will not repair the lost AccECN information, because | |||
AccECN requires retransmissions to carry the latest AccECN counters, | AccECN requires retransmissions to carry the latest AccECN counters, | |||
not the original ones. | not the original ones. | |||
The phrase 'under prevailing conditions' allows for implementation- | The phrase 'under prevailing conditions' allows for implementation- | |||
dependent interpretation. A Data Sender might take account of the | dependent interpretation. A Data Sender might take account of the | |||
prevailing size of data segments and the prevailing CE marking rate | prevailing size of data segments and the prevailing CE marking rate | |||
just before the sequence of missing ACKs. However, we shall start | just before the sequence of missing ACKs. However, we shall start | |||
with the simplest algorithm, which assumes segments are all full- | with the simplest algorithm, which assumes segments are all full- | |||
sized and ultra-conservatively it assumes that ECN marking was 100% | sized, and ultra-conservatively it assumes that ECN marking was 100% | |||
on the forward path when ACKs on the reverse path started to all be | on the forward path when ACKs on the reverse path started to all be | |||
dropped. Specifically, if newlyAckedB is the amount of data that an | dropped. Specifically, if newlyAckedB is the amount of data that an | |||
ACK acknowledges since the previous ACK, then the Data Sender could | ACK acknowledges since the previous ACK, then the Data Sender could | |||
assume that this acknowledges newlyAckedPkt full-sized segments, | assume that this acknowledges newlyAckedPkt full-sized segments, | |||
where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | where newlyAckedPkt = newlyAckedB/MSS. Then it could assume that the | |||
ACE field incremented by | ACE field incremented by | |||
dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | dSafer.cep = newlyAckedPkt - ((newlyAckedPkt - d.cep) % DIVACE) | |||
For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | For example, imagine an ACK acknowledges newlyAckedPkt=9 more full- | |||
size segments than any previous ACK, and that ACE increments by a | size segments than any previous ACK, and that ACE increments by a | |||
minimum of 2 CE marks (d.cep=2). The above formula works out that it | minimum of 2 CE marks (d.cep=2). The above formula indicates that it | |||
would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | would still be safe to assume 2 CE marks (because 9 - ((9-2) % 8) = | |||
2). However, if ACE increases by a minimum of 2 but acknowledges 10 | 2). However, if ACE increases by a minimum of 2 but acknowledges 10 | |||
full-sized segments, then it would be necessary to assume that there | full-sized segments, then it would be necessary to assume that there | |||
could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | could have been 10 CE marks (because 10 - ((10-2) % 8) = 10). | |||
Note that checks would need to be added to the above pseudocode for | Note that checks would need to be added to the above pseudocode for | |||
(d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | (d.cep > newlyAckedPkt), which could occur if newlyAckedPkt had been | |||
wrongly estimated using an inappropriate packet size. | wrongly estimated using an inappropriate packet size. | |||
ACKs that acknowledge a large stretch of packets might be common in | ACKs that acknowledge a large stretch of packets might be common in | |||
skipping to change at line 3024 ¶ | skipping to change at line 3030 ¶ | |||
average segment size and prevailing ECN marking. For instance, | average segment size and prevailing ECN marking. For instance, | |||
newlyAckedPkt in the above formula could be replaced with | newlyAckedPkt in the above formula could be replaced with | |||
newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | newlyAckedPktHeur = newlyAckedPkt*p*MSS/s, where s is the prevailing | |||
segment size and p is the prevailing ECN marking probability. | segment size and p is the prevailing ECN marking probability. | |||
However, ultimately, if TCP's ECN feedback becomes inaccurate, it | However, ultimately, if TCP's ECN feedback becomes inaccurate, it | |||
still has loss detection to fall back on. Therefore, it would seem | still has loss detection to fall back on. Therefore, it would seem | |||
safe to implement a simple algorithm, rather than a perfect one. | safe to implement a simple algorithm, rather than a perfect one. | |||
The simple algorithm for dSafer.cep above requires no monitoring of | The simple algorithm for dSafer.cep above requires no monitoring of | |||
prevailing conditions and it would still be safe if, for example, | prevailing conditions and it would still be safe if, for example, | |||
segments were on average at least 5% of full-sized as long as ECN | segments were on average at least 5% of a full-sized packet as long | |||
marking was 5% or less. Assuming it was used, the Data Sender would | as ECN marking was 5% or less. Assuming it was used, the Data Sender | |||
increment its packet counter as follows: | would increment its packet counter as follows: | |||
s.cep += dSafer.cep | s.cep += dSafer.cep | |||
If missing acknowledgement numbers arrive later (due to reordering), | If missing acknowledgement numbers arrive later (due to reordering), | |||
Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | Section 3.2.2.5.2 says "the Data Sender MAY attempt to neutralize the | |||
effect of any action it took based on a conservative assumption that | effect of any action it took based on a conservative assumption that | |||
it later found to be incorrect". To do this, the Data Sender would | it later found to be incorrect". To do this, the Data Sender would | |||
have to store the values of all the relevant variables whenever it | have to store the values of all the relevant variables whenever it | |||
made assumptions, so that it could re-evaluate them later. Given | made assumptions, so that it could re-evaluate them later. Given | |||
this could become complex and it is not required, we do not attempt | this could become complex and it is not required, we do not attempt | |||
skipping to change at line 3063 ¶ | skipping to change at line 3069 ¶ | |||
if (dSafer.cep > d.cep) { | if (dSafer.cep > d.cep) { | |||
if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | if (d.ceb <= MSS * d.cep) { % Same as (s <= MSS), but no DBZ | |||
sSafer = d.ceb/dSafer.cep | sSafer = d.ceb/dSafer.cep | |||
if (sSafer < MSS/SAFETY_FACTOR) | if (sSafer < MSS/SAFETY_FACTOR) | |||
dSafer.cep = d.cep % d.cep is a safe enough estimate | dSafer.cep = d.cep % d.cep is a safe enough estimate | |||
} % else | } % else | |||
% No need for else; dSafer.cep is already correct, | % No need for else; dSafer.cep is already correct, | |||
% because d.cep must have been too small | % because d.cep must have been too small | |||
} | } | |||
The chart below shows when the above algorithm will consider d.cep | The chart below shows when the above algorithm will replace | |||
can replace dSafer.cep as a safe enough estimate of the number of CE- | dSafer.cep with d.cep as a safe enough estimate of the number of CE | |||
marked packets: | marked packets: | |||
^ | ^ | |||
sSafer| | sSafer| | |||
| | | | |||
MSS+ | MSS+ | |||
| | | | |||
| dSafer.cep | | dSafer.cep | |||
| is | | is | |||
MSS/SAFETY_FACTOR+--------------+ safest | MSS/SAFETY_FACTOR+--------------+ safest | |||
skipping to change at line 3113 ¶ | skipping to change at line 3119 ¶ | |||
than below MSS/2. | than below MSS/2. | |||
If pure ACKs were allowed to be ECN-capable, missing ACKs would be | If pure ACKs were allowed to be ECN-capable, missing ACKs would be | |||
far less likely. However, because [RFC3168] currently precludes | far less likely. However, because [RFC3168] currently precludes | |||
this, the above algorithm assumes that pure ACKs are not ECN-capable. | this, the above algorithm assumes that pure ACKs are not ECN-capable. | |||
A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | A.3. Example Algorithm to Estimate Marked Bytes from Marked Packets | |||
If AccECN Options are not available, the Data Sender can only decode | If AccECN Options are not available, the Data Sender can only decode | |||
a CE marking from the ACE field in packets. Every time an ACK | a CE marking from the ACE field in packets. Every time an ACK | |||
arrives, to convert this into an estimate of CE-marked bytes, it | arrives, to convert the number of CE markings into an estimate of CE- | |||
needs an average of the segment size, s_ave. Then it can add or | marked bytes, it needs an average of the segment size, s_ave. Then | |||
subtract s_ave from the value of d.ceb as the value of d.cep | it can add or subtract s_ave from the value of d.ceb as the value of | |||
increments or decrements. Some possible ways to calculate s_ave are | d.cep increments or decrements. Some possible ways to calculate | |||
outlined below. The precise details will depend on why an estimate | s_ave are outlined below. The precise details will depend on why an | |||
of marked bytes is needed. | estimate of marked bytes is needed. | |||
The implementation could keep a record of the byte numbers of all the | The implementation could keep a record of the byte numbers of all the | |||
boundaries between packets in flight (including control packets), and | boundaries between packets in flight (including control packets), and | |||
recalculate s_ave on every ACK. However, it would be simpler to | recalculate s_ave on every ACK. However, it would be simpler to | |||
merely maintain a counter packets_in_flight for the number of packets | merely maintain a counter packets_in_flight for the number of packets | |||
in flight (including control packets), which is reset once per RTT. | in flight (including control packets), which is reset once per RTT. | |||
Either way, it would estimate s_ave as: | Either way, it would estimate s_ave as: | |||
s_ave ~= flightsize / packets_in_flight, | s_ave ~= flightsize / packets_in_flight, | |||
skipping to change at line 3179 ¶ | skipping to change at line 3185 ¶ | |||
B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | B.1. Three TCP Header Flags in the SYN-SYN/ACK Handshake | |||
AccECN uses a rather unorthodox approach to negotiate the highest | AccECN uses a rather unorthodox approach to negotiate the highest | |||
version TCP ECN feedback scheme that both ends support, as justified | version TCP ECN feedback scheme that both ends support, as justified | |||
below. It follows from the original TCP ECN capability negotiation | below. It follows from the original TCP ECN capability negotiation | |||
[RFC3168], in which the Client set the 2 least significant of the | [RFC3168], in which the Client set the 2 least significant of the | |||
original reserved flags in the TCP header, and fell back to No ECN | original reserved flags in the TCP header, and fell back to No ECN | |||
support if the Server responded with the 2 flags cleared, which had | support if the Server responded with the 2 flags cleared, which had | |||
previously been the default. | previously been the default. | |||
Classic ECN used header flags rather than a TCP option because it was | Classic ECN used header flags rather than a TCP Option because it was | |||
considered more efficient to use a header flag for 1 bit of feedback | considered more efficient to use a header flag for 1 bit of feedback | |||
per ACK, and this bit could be overloaded to indicate support for | per ACK, and this bit could be overloaded to indicate support for | |||
Classic ECN during the handshake. During the development of ECN, 1 | Classic ECN during the handshake. During the development of ECN, 1 | |||
bit crept up to 2, in order to deliver the feedback reliably and to | bit crept up to 2, in order to deliver the feedback reliably and to | |||
work round some broken hosts that reflected the reserved flags during | work round some broken hosts that reflected the reserved flags during | |||
the handshake. | the handshake. | |||
In order to be backward compatible with RFC 3168, AccECN continues | In order to be backward compatible with RFC 3168, AccECN continues | |||
this approach, using the 3rd least significant TCP header flag that | this approach, using the 3rd least significant TCP header flag that | |||
had previously been allocated for the ECN-nonce (now historic). | had previously been allocated for the ECN-nonce (now historic). | |||
Then, whatever form of Server an AccECN Client encounters, the | Then, whatever form of Server an AccECN Client encounters, the | |||
connection can fall back to the highest version of feedback protocol | connection can fall back to the highest version of feedback protocol | |||
that both ends support, as explained in Section 3.1. | that both ends support, as explained in Section 3.1. | |||
If AccECN capability negotiation had used the more orthodox approach | If AccECN capability negotiation had used the more orthodox approach | |||
of a TCP option, it would still have had to set the two ECN flags in | of a TCP Option, it would still have had to set the two ECN flags in | |||
the main TCP header, in order to be able to fall back to Classic ECN | the main TCP header, in order to be able to fall back to Classic ECN | |||
[RFC3168], or to disable ECN support, without another round of | [RFC3168], or to disable ECN support, without another round of | |||
negotiation. Then AccECN would also have had to handle all the | negotiation. Then AccECN would also have had to handle all the | |||
different ways that Servers currently respond to settings of the ECN | different ways that Servers currently respond to settings of the ECN | |||
flags in the main TCP header, including all of the conflicting cases | flags in the main TCP header, including all of the conflicting cases | |||
where a Server might have said it supported one approach in the flags | where a Server might have said it supported one approach in the flags | |||
and another approach in a new TCP option. And AccECN would have had | and another approach in a new TCP Option. And AccECN would have had | |||
to deal with all of the additional possibilities where a middlebox | to deal with all of the additional possibilities where a middlebox | |||
might have mangled the ECN flags, or removed TCP options. Thus, | might have mangled the ECN flags, or removed TCP Options. Thus, | |||
usage of the 3rd reserved TCP header flag simplified the protocol. | usage of the 3rd reserved TCP header flag simplified the protocol. | |||
The third flag was used in a way that could be distinguished from the | The third flag was used in a way that could be distinguished from the | |||
ECN-nonce, in case any nonce deployment was encountered. Previous | ECN-nonce, in case any nonce deployment was encountered. Previous | |||
usage of this flag for the ECN-nonce was integrated into the original | usage of this flag for the ECN-nonce was integrated into the original | |||
ECN negotiation. This further justified the third flag's use for | ECN negotiation. This further justified the third flag's use for | |||
AccECN, because a non-ECN usage of this flag would have had to use it | AccECN, because a non-ECN usage of this flag would have had to use it | |||
as a separate single bit, rather than in combination with the other 2 | as a separate single bit, rather than in combination with the other 2 | |||
ECN flags. | ECN flags. | |||
skipping to change at line 3232 ¶ | skipping to change at line 3238 ¶ | |||
indicate on the SYN/ACK, four already indicated earlier (or broken) | indicate on the SYN/ACK, four already indicated earlier (or broken) | |||
versions of ECN support, one now being Historic. In the early design | versions of ECN support, one now being Historic. In the early design | |||
of AccECN, an AccECN Server could use only 2 of the 4 remaining | of AccECN, an AccECN Server could use only 2 of the 4 remaining | |||
codepoints. They both indicated AccECN support, but one fed back | codepoints. They both indicated AccECN support, but one fed back | |||
that the SYN had arrived marked as CE. Even though ECN support on a | that the SYN had arrived marked as CE. Even though ECN support on a | |||
SYN is not yet on the Standards Track, the idea is for either end to | SYN is not yet on the Standards Track, the idea is for either end to | |||
act as a mechanistic reflector, so that future capabilities can be | act as a mechanistic reflector, so that future capabilities can be | |||
unilaterally deployed without requiring 2-ended deployment (justified | unilaterally deployed without requiring 2-ended deployment (justified | |||
in Section 2.5). | in Section 2.5). | |||
During traversal testing, it was discovered that the IP-ECN field in | During traversal testing, it was discovered that the IP ECN field in | |||
the SYN was mangled on a non-negligible proportion of paths. | the SYN was mangled on a non-negligible proportion of paths. | |||
Therefore, it was necessary to allow the SYN/ACK to feed all four IP- | Therefore, it was necessary to allow the SYN/ACK to feed all four IP | |||
ECN codepoints that the SYN could arrive with back to the Client. | ECN codepoints that the SYN could arrive with back to the Client. | |||
Without this, the Client could not know whether to disable ECN for | Without this, the Client could not know whether to disable ECN for | |||
the connection due to mangling of the IP-ECN field (also explained in | the connection due to mangling of the IP ECN field (also explained in | |||
Section 2.5). This development consumed the remaining two codepoints | Section 2.5). This development consumed the remaining two codepoints | |||
on the SYN/ACK that had been reserved for future use by AccECN in | on the SYN/ACK that had been reserved for future use by AccECN in | |||
earlier versions. | earlier draft versions of this document. | |||
B.3. Space for Future Evolution | B.3. Space for Future Evolution | |||
Despite availability of usable TCP header space being extremely | Despite availability of usable TCP header space being extremely | |||
scarce, the AccECN protocol has taken all possible steps to ensure | scarce, the AccECN protocol has taken all possible steps to ensure | |||
that there is space to negotiate possible future variants of the | that there is space to negotiate possible future variants of the | |||
protocol, either if a variant of AccECN is required, or if a | protocol, either if a variant of AccECN is required, or if a | |||
completely different ECN feedback approach is needed. | completely different ECN feedback approach is needed. | |||
Future AccECN variants: When the AccECN capability is negotiated | Future AccECN variants: When the AccECN capability is negotiated | |||
skipping to change at line 3300 ¶ | skipping to change at line 3306 ¶ | |||
equivalent to AccECN negotiation with (1,1,1) on the SYN. These | equivalent to AccECN negotiation with (1,1,1) on the SYN. These | |||
codepoints would not allow fall-back to Classic ECN support for a | codepoints would not allow fall-back to Classic ECN support for a | |||
Server that did not understand them, but this approach ensures | Server that did not understand them, but this approach ensures | |||
they are available in the future, perhaps for uses other than ECN | they are available in the future, perhaps for uses other than ECN | |||
alongside the AccECN scheme. All possible combinations of SYN/ACK | alongside the AccECN scheme. All possible combinations of SYN/ACK | |||
could be used in response except either (0,0,0) or reflection of | could be used in response except either (0,0,0) or reflection of | |||
the same values sent on the SYN. | the same values sent on the SYN. | |||
In order to extend AccECN or ECN in the future, other ways could | In order to extend AccECN or ECN in the future, other ways could | |||
be resorted to, although their traversal properties are likely to | be resorted to, although their traversal properties are likely to | |||
be inferior. They include a new TCP option; using the remaining | be inferior. They include a new TCP Option; using the remaining | |||
reserved flags in the main TCP header (preferably extending the | reserved flags in the main TCP header (preferably extending the | |||
3-bit combinations used by AccECN to 4-bit combinations, rather | 3-bit combinations used by AccECN to 4-bit combinations, rather | |||
than burning one bit for just one state); a non-zero urgent | than burning one bit for just one state); a non-zero urgent | |||
pointer in combination with the URG flag cleared; or some other | pointer in combination with the URG flag cleared; or some other | |||
unexpected combination of fields yet to be invented. | unexpected combination of fields yet to be invented. | |||
Acknowledgements | Acknowledgements | |||
We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | We want to thank Koen De Schepper, Praveen Balasubramanian, Michael | |||
Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | Welzl, Gorry Fairhurst, David Black, Spencer Dawkins, Michael Scharf, | |||
End of changes. 149 change blocks. | ||||
234 lines changed or deleted | 240 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |