rfc9841.original.xml   rfc9841.xml 
<?xml version='1.0' encoding='utf-8'?> <?xml version='1.0' encoding='utf-8'?>
<!-- [rfced] Because this document updates RFC 7932, please
review the errata reported for RFC 7932
(https://www.rfc-editor.org/errata/rfc7932)
and let us know if you confirm our opinion that none of them
are relevant to the content of this document.
-->
<!DOCTYPE rfc [ <!DOCTYPE rfc [
<!ENTITY nbsp "&#160;"> <!ENTITY nbsp "&#160;">
<!ENTITY zwsp "&#8203;"> <!ENTITY zwsp "&#8203;">
<!ENTITY nbhy "&#8209;"> <!ENTITY nbhy "&#8209;">
<!ENTITY wj "&#8288;"> <!ENTITY wj "&#8288;">
]> ]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="d
raft-vandevenne-shared-brotli-format-15" category="info" updates="7932" ipr="tru <rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="d
st200902" obsoletes="" xml:lang="en" symRefs="true" sortRefs="true" tocInclude=" raft-vandevenne-shared-brotli-format-15" number="9841" consensus="true" category
true" version="3"> ="info" updates="7932" ipr="trust200902" obsoletes="" xml:lang="en" symRefs="tru
<!-- xml2rfc v2v3 conversion 3.27.0 --> e" sortRefs="true" tocInclude="true" version="3">
<!-- Generated by id2xml 1.5.2 on 2025-02-12T17:39:08Z -->
<front> <front>
<title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Form at</title> <title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Form at</title>
<seriesInfo name="Internet-Draft" value="draft-vandevenne-shared-brotli-form at-15"/> <seriesInfo name="RFC" value="9841"/>
<author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala"> <author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala">
<organization abbrev="Google, Inc">Google, Inc.</organization> <organization abbrev="Google, Inc">Google, Inc.</organization>
<address> <address>
<email>jyrki@google.com</email> <email>jyrki@google.com</email>
</address> </address>
</author> </author>
<author initials="T." surname="Duong" fullname="Thai Duong"> <author initials="T." surname="Duong" fullname="Thai Duong">
<organization abbrev="Google, Inc">Google, Inc.</organization> <organization abbrev="Google, Inc">Google, Inc.</organization>
<address> <address>
<email>thaidn@google.com</email> <email>thaidn@google.com</email>
skipping to change at line 38 skipping to change at line 46
<address> <address>
<email>eustas@google.com</email> <email>eustas@google.com</email>
</address> </address>
</author> </author>
<author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka"> <author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka">
<organization abbrev="Google, Inc">Google, Inc.</organization> <organization abbrev="Google, Inc">Google, Inc.</organization>
<address> <address>
<email>szabadka@google.com</email> <email>szabadka@google.com</email>
</address> </address>
</author> </author>
<author initials="L." surname="Vandevenne" fullname="Lode Vandevenne"> <author initials="L." surname="Vandevenne" fullname="Lode Vandevenne" role=" editor">
<organization abbrev="Google, Inc">Google, Inc.</organization> <organization abbrev="Google, Inc">Google, Inc.</organization>
<address> <address>
<email>lode@google.com</email> <email>lode@google.com</email>
</address> </address>
</author> </author>
<date year="2025" month="June"/> <date year="2025" month="August"/>
<!-- [rfced] Please verify that WIT is the correct area for this document.
-->
<area>WIT</area>
<!-- [rfced] Please insert any keywords (beyond those that appear in
the title) for use on https://www.rfc-editor.org/search.
-->
<abstract> <abstract>
<t> <t>
This specification defines a data format for shared brotli This specification defines a data format for shared brotli
compression, which adds support for shared dictionaries, large window compression, which adds support for shared dictionaries, large window,
and a container format to brotli (RFC 7932). Shared dictionaries and and a container format to brotli (RFC 7932). Shared dictionaries and
large window support allow significant compression gains compared to large window support allow significant compression gains compared to
regular brotli. This document updates RFC 7932.</t> regular brotli. This document updates RFC 7932.</t>
</abstract> </abstract>
</front> </front>
<middle> <middle>
<section anchor="sect-1" numbered="true" toc="default"> <section anchor="sect-1" numbered="true" toc="default">
<name>Introduction</name> <name>Introduction</name>
<section anchor="sect-1.1" numbered="true" toc="default"> <section anchor="sect-1.1" numbered="true" toc="default">
<name>Purpose</name> <name>Purpose</name>
<t> <t>
The purpose of this specification is to extend the brotli compressed The purpose of this specification is to extend the brotli compressed
data format (<xref target="RFC7932" format="default"/>) with new abilities th data format <xref target="RFC7932" format="default"/> with new abilities that
at allow further allow further
compression gains:</t> compression gains.</t>
<ul spacing="normal"> <ul spacing="normal">
<li> <li>
<t>Shared dictionaries allow a static shared context between <t>Shared dictionaries allow a static shared context between
encoder and decoder for significant compression gains.</t> encoder and decoder for significant compression gains.</t>
</li> </li>
<li> <li>
<t>Large window brotli allows much larger back reference distanc es <t>Large window brotli allows much larger back reference distanc es
to give compression gains for files over 16MiB.</t> to give compression gains for files over 16 MiB.</t>
</li> </li>
<li> <li>
<t>The framing format is a container format that allows storage of <t>The framing format is a container format that allows storage of
multiple resources and that reference dictionaries.</t> multiple resources and references dictionaries.</t>
</li> </li>
</ul> </ul>
<t> <t>
This document is the authoritative specification of shared brotli This document is the authoritative specification of shared brotli
data formats and the backwards compatible changes to brotli, and data formats and the backwards compatible changes to brotli. This document al
defines:</t> so defines the following:</t>
<ul> <ul>
<li> <li>
<t>The data format of serialized shared dictionaries</t> <t>The data format of serialized shared dictionaries</t>
</li> </li>
<li> <li>
<t>The data format of the framing format</t> <t>The data format of the framing format</t>
</li> </li>
<li> <li>
<t>The encoding of window bits and distances for large window <t>The encoding of window bits and distances for large window
brotli in the brotli data format</t> brotli in the brotli data format</t>
</li> </li>
<li> <li>
<t>The encoding of shared dictionary references in the brotli da ta <t>The encoding of shared dictionary references in the brotli da ta
format</t> format</t>
</li> </li>
</ul> </ul>
</section> </section>
<section anchor="sect-1.2" numbered="true" toc="default"> <section anchor="sect-1.2" numbered="true" toc="default">
<name>Intended audience</name> <name>Intended Audience</name>
<t> <t>
This specification is intended for use by software implementers to This specification is intended for use by software implementers to
compress data into and/or decompress data from the shared brotli compress data into and/or decompress data from the shared brotli
dictionary format.</t> dictionary format.</t>
<t> <t>
The text of the specification assumes a basic background in The text of the specification assumes a basic background in
programming at the level of bits and other primitive data programming at the level of bits and other primitive data
representations. Familiarity with the technique of LZ77 coding <xref target=" LZ77"/> representations. Familiarity with the technique of LZ77 coding <xref target=" LZ77"/>
is helpful but not required.</t> is helpful, but not required.</t>
</section> </section>
<section anchor="sect-1.3" numbered="true" toc="default"> <section anchor="sect-1.3" numbered="true" toc="default">
<name>Scope</name> <name>Scope</name>
<t> <t>
This specification defines a data format for shared brotli This specification defines a data format for shared brotli
compression, which adds support for dictionaries and extended compression, which adds support for dictionaries and extended
features to brotli <xref target="RFC7932" format="default"/>.</t> features to brotli <xref target="RFC7932" format="default"/>.</t>
</section> </section>
<section anchor="sect-1.4" numbered="true" toc="default"> <section anchor="sect-1.4" numbered="true" toc="default">
<name>Compliance</name> <name>Compliance</name>
<t> <t>
Unless otherwise indicated below, a compliant decompressor must be Unless otherwise indicated below, a compliant decompressor must be
able to accept and decompress any data set that conforms to all the able to accept and decompress any data set that conforms to all the
specifications presented here. A compliant compressor must produce specifications presented here. Additionally, a compliant compressor must prod uce
data sets that conform to all the specifications presented here.</t> data sets that conform to all the specifications presented here.</t>
</section> </section>
<section anchor="sect-1.5" numbered="true" toc="default"> <section anchor="sect-1.5" numbered="true" toc="default">
<name>Definitions of terms and conventions used</name> <name>Definitions of Terms and Conventions Used</name>
<dl> <dl>
<dt>Byte:</dt><dd> 8 bits stored or transmitted as a unit (same as an octet). F or <dt>Byte:</dt><dd>8 bits stored or transmitted as a unit (same as an octet). Fo r
this specification, a byte is exactly 8 bits, even on machines that this specification, a byte is exactly 8 bits, even on machines that
store a character on a number of bits different from eight. See store a character on a number of bits different from eight. See
below for the numbering of bits within a byte.</dd> below for the numbering of bits within a byte.</dd>
<dt>String:</dt><dd>a sequence of arbitrary bytes.</dd> <dt>String:</dt><dd>A sequence of arbitrary bytes.</dd>
</dl> </dl>
<t> <t>
Bytes stored within a computer do not have a "bit order", since they Bytes stored within a computer do not have a "bit order" since they are
are always treated as a unit. However, a byte considered as an always treated as a unit. However, a byte considered as an integer between
integer between 0 and 255 does have a most- and least-significant 0 and 255 does have a most significant bit (MSB) and least significant bit
bit, and since we write numbers with the most-significant digit on (LSB), and since we write numbers with the most significant digit on the left
the left, we also write bytes with the most-significant bit on the ,
left. In the diagrams below, we number the bits of a byte so that bit bytes with the MSB are also written on the left. In the diagrams below, the
0 is the least-significant bit, i.e., the bits are numbered:</t> bits of a byte are written so that bit 0 is the LSB, i.e., the bits are
numbered as follows:</t>
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
+--------+ +--------+
|76543210| |76543210|
+--------+ +--------+
]]></artwork> ]]></artwork>
<t> <t>
Within a computer, a number may occupy multiple bytes. All multi-byte Within a computer, a number may occupy multiple bytes. All multi-byte
numbers in the format described here are unsigned and stored with the numbers in the format described here are unsigned and stored with the
least-significant byte first (at the lower memory address). For least significant byte first (at the lower memory address). For
example, the decimal 16-bit number 520 is stored as:</t> example, the decimal 16-bit number 520 is stored as:</t>
<!--[rfced] In the second figure in Section 5.1, is "more significant
byte" intended (we note that it was used in RFC 7932), or should
it be changed to "most significant byte", which is used more
often in the RFC Series?
Original:
0 1
+- - - - + - - - -+
|00001000|00000010|
+- - - - + - - - -+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8
-->
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
0 1 0 1
+--------+--------+ +--------+--------+
|00001000|00000010| |00001000|00000010|
+--------+--------+ +--------+--------+
^ ^ ^ ^
| | | |
| + more significant byte = 2 x 256 | + more significant byte = 2 x 256
+ less significant byte = 8 + less significant byte = 8
]]></artwork> ]]></artwork>
<section anchor="sect-1.5.1" numbered="true" toc="default"> <section anchor="sect-1.5.1" numbered="true" toc="default">
<name>Packing into bytes</name> <name>Packing into Bytes</name>
<t> <t>
This document does not address the issue of the order in which bits This document does not address the issue of the order in which bits
of a byte are transmitted on a bit-sequential medium, since the final of a byte are transmitted on a bit-sequential medium, since the final
data format described here is byte- rather than bit-oriented. data format described here is byte- rather than bit-oriented.
However, we describe the compressed block format below as a sequence However, the compressed block format is described below as a sequence
of data elements of various bit lengths, not a sequence of bytes. We of data elements of various bit lengths, not a sequence of bytes. Therefore,
must therefore specify how to pack these data elements into bytes to we must specify how to pack these data elements into bytes to
form the final compressed byte sequence:</t> form the final compressed byte sequence:</t>
<ul spacing="normal"> <ul spacing="normal">
<li> <li>
<t>Data elements are packed into bytes in order of <t>Data elements are packed into bytes in order of
increasing bit number within the byte, i.e., starting increasing bit number within the byte, i.e., starting
with the least-significant bit of the byte.</t> with the LSB of the byte.</t>
</li> </li>
<li> <li>
<t>Data elements other than prefix codes are packed <t>Data elements other than prefix codes are packed
starting with the least-significant bit of the data starting with the LSB of the data
element. These are referred to here as integer values element. These are referred to here as integer values
and are considered unsigned.</t> and are considered unsigned.</t>
</li> </li>
<li> <li>
<t>Prefix codes are packed starting with the most-significant <t>Prefix codes are packed starting with the MSB of the code.<
bit of the code.</t> /t>
</li> </li>
</ul> </ul>
<t> <t>
In other words, if one were to print out the compressed data as a In other words, if one were to print out the compressed data as a
sequence of bytes, starting with the first byte at the *right* margin sequence of bytes starting with the first byte at the *right* margin
and proceeding to the *left*, with the most-significant bit of each and proceeding to the *left*, with the MSB of each
byte on the left as usual, one would be able to parse the result from byte on the left as usual, one would be able to parse the result from
right to left, with fixed-width elements in the correct MSB-to-LSB right to left with fixed-width elements in the correct MSB-to-LSB
order and prefix codes in bit-reversed order (i.e., with the first order and prefix codes in bit-reversed order (i.e., with the first
bit of the code in the relative LSB position).</t> bit of the code in the relative LSB position).</t>
<t> <t>
As an example, consider packing the following data elements into a As an example, consider packing the following data elements into a
sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2, sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2,
3-bit prefix code b'110, 2-bit prefix code b'10, 12-bit integer value 3-bit prefix code b'110, 2-bit prefix code b'10, and 12-bit integer value
3628.</t> 3628.</t>
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
byte 2 byte 1 byte 0 byte 2 byte 1 byte 0
+--------+--------+--------+ +--------+--------+--------+
|11100010|11000101|10010110| |11100010|11000101|10010110|
+--------+--------+--------+ +--------+--------+--------+
^ ^ ^ ^ ^ ^ ^ ^ ^ ^
| | | | | | | | | |
| | | | +------ integer value 6 | | | | +------ integer value 6
| | | +---------- integer value 2 | | | +---------- integer value 2
skipping to change at line 223 skipping to change at line 255
| +---------------- prefix code 10 | +---------------- prefix code 10
+----------------------------- integer value 3628 +----------------------------- integer value 3628
]]></artwork> ]]></artwork>
</section> </section>
</section> </section>
</section> </section>
<section anchor="sect-2" numbered="true" toc="default"> <section anchor="sect-2" numbered="true" toc="default">
<name>Shared Brotli Overview</name> <name>Shared Brotli Overview</name>
<t> <t>
Shared brotli extends brotli <xref target="RFC7932" format="default"/> with s upport for shared Shared brotli extends brotli <xref target="RFC7932" format="default"/> with s upport for shared
dictionaries, larger LZ77 window and a framing format.</t> dictionaries, a larger LZ77 window, and a framing format.</t>
</section> </section>
<section anchor="sect-3" numbered="true" toc="default"> <section anchor="sect-3" numbered="true" toc="default">
<name>Shared Dictionaries</name> <name>Shared Dictionaries</name>
<t> <t>
A shared dictionary is a piece of data shared by a compressor and A shared dictionary is a piece of data shared by a compressor and
decompressor. The compressor can take advantage of the dictionary decompressor. The compressor can take advantage of the dictionary
context to encode the input in a more compact manner. The compressor context to encode the input in a more compact manner. The compressor
and the decompressor must use exactly the same dictionary. A shared and the decompressor must use exactly the same dictionary. A shared
dictionary is specially useful to compress short input sequences.</t> dictionary is specially useful to compress short input sequences.</t>
<t>A shared brotli dictionary can use two methods of sharing context:</t > <t>A shared brotli dictionary can use two methods of sharing context:</t >
<t>An LZ77 dictionary. The encoder and decoder could refer <dl><dt>LZ77 dictionary:</dt><dd>The encoder and decoder could ref er
to a given sequence of bytes. Multiple LZ77 dictionaries to a given sequence of bytes. Multiple LZ77 dictionaries
can be set.</t> can be set.</dd>
<t>A custom static dictionary: a word list with transforms. The <dt>Custom static dictionary:</dt><dd>A word list with transforms.
The
encoder and decoder will replace the static dictionary data encoder and decoder will replace the static dictionary data
with the data in the shared dictionary. The original static with the data in the shared dictionary. The original static
dictionary is described in <xref target="sect-8" format="default"/> in < xref target="RFC7932" format="default"/>. The original dictionary is described in <xref target="sect-8" format="default"/> in < xref target="RFC7932" format="default"/>. The original
data from Appendix A and Appendix B of <xref target="RFC7932" format="de fault"/> will be data from Appendices <xref section="A" target="RFC7932" sectionFormat="b are"/> and <xref section="B" target="RFC7932" sectionFormat="bare"/> of <xref ta rget="RFC7932"/> will be
replaced. In addition, it is possible to dynamically switch replaced. In addition, it is possible to dynamically switch
this dictionary based on the data compression context, and/or this dictionary based on the data compression context and/or
to include a reference to the original dictionary in the custom include a reference to the original dictionary in the custom
dictionary.</t> dictionary.</dd></dl>
<t> <t>
If no shared dictionary is set the decoder behaves the same as in If no shared dictionary is set, the decoder behaves the same as in
<xref target="RFC7932" format="default"/> on a brotli stream.</t> <xref target="RFC7932" format="default"/> on a brotli stream.</t>
<t> <t>
<!-- [rfced] We have updated the following sentence for
clarity. Please let us know of any objections.
Original:
If a shared dictionary is set, then it can set any of: LZ77 If a shared dictionary is set, then it can set any of: LZ77
dictionaries, overriding static dictionary words, and/or overriding dictionaries, overriding static dictionary words, and/or overriding
transforms.</t> transforms.
Current:
If a shared dictionary is set, then it can set LZ77
dictionaries, override static dictionary words, and/or override
transforms.
-->
If a shared dictionary is set, then it can set LZ77 dictionaries, override
static dictionary words, and/or override transforms.</t>
<section anchor="sect-3.1" numbered="true" toc="default"> <section anchor="sect-3.1" numbered="true" toc="default">
<name>Custom Static Dictionaries</name> <name>Custom Static Dictionaries</name>
<t> <t>
<!-- [rfced] May we rephrase the following sentences to avoid using "RFC
7932" as an adjective and for subject-verb agreement (due to multiple
behaviors being overridden)?
i)
Current:
If a custom word list is set, then the following behavior of the RFC
7932 decoder [RFC7932] is overridden...
Perhaps:
If a custom word list is set, then the following behaviors of the
decoder defined in [RFC7932] are overridden...
ii)
Current:
If a custom transforms list is set without context dependency, then
the following behavior of the RFC 7932 decoder [RFC7932] is
overridden...
Perhaps:
If a custom transforms list is set without context dependency, then
the following behaviors of the decoder defined in [RFC7932] are
overridden...
-->
If a custom word list is set, then the following behavior of the RFC If a custom word list is set, then the following behavior of the RFC
7932 decoder <xref target="RFC7932" format="default"/> is overridden:</t> 7932 decoder <xref target="RFC7932" format="default"/> is overridden:</t>
<t indent="3"> <t indent="3">
Instead of the Static Dictionary Data from Appendix A Instead of the Static Dictionary Data from <xref section="A" target="RFC79
of <xref target="RFC7932" format="default"/>, one or more word lists from 32" format="default"/>, one or more word lists from the custom static
the custom static
dictionary data are used.</t> dictionary data are used.</t>
<t indent="3"> <t indent="3">
Instead of NDBITS at the end of Appendix A, a custom Instead of NDBITS at the end of <xref section="A" target="RFC7932" format= "default"/>, a custom
SIZE_BITS_BY_LENGTH per custom word list is used. SIZE_BITS_BY_LENGTH per custom word list is used.
</t> </t>
<t indent="3"> <t indent="3">
The copy length for a static dictionary reference must be The copy length for a static dictionary reference must be
between 4 and 31 and may not be a value for which between 4 and 31 and may not be a value for which
SIZE_BITS_BY_LENGTH of this dictionary is 0.</t> SIZE_BITS_BY_LENGTH of this dictionary is 0.</t>
<t> <t>
If a custom transforms list is set without context dependency, then If a custom transforms list is set without context dependency, then
the following behavior of the RFC 7932 decoder <xref target="RFC7932" format= "default"/> is the following behavior of the RFC 7932 decoder <xref target="RFC7932" format= "default"/> is
overridden:</t> overridden:</t>
<t indent="3"> <t indent="3">
The "List of Word Transformations" from Appendix B is The "List of Word Transformations" from <xref section="B" target="RFC7932"
overridden by one or more lists of custom prefixes, suffixes and format="default"/> is
overridden by one or more lists of custom prefixes, suffixes, and
transform operations.</t> transform operations.</t>
<t indent="3"> <t indent="3">
The transform_id must be smaller than the number of transforms The transform_id must be smaller than the number of transforms
given in the custom transforms list.</t> given in the custom transforms list.</t>
<t> <t>
If the dictionary is context dependent, it includes a lookup table of If the dictionary is context dependent, it includes a lookup table of
64 word list and transform list combinations. When resolving a static a 64-word list and transform list combinations. When resolving a static
dictionary word, the decoder computes the literal context id, as in dictionary word, the decoder computes the literal Context ID as described in
section 7.1. of <xref target="RFC7932" format="default"/>. The literal contex <xref target="RFC7932" section="7.1"/>. The literal Context ID is used as the
t id is used as index in index in
the lookup tables to select the word list and transforms to use. If the lookup tables to select the word list and transforms to use. If
the dictionary is not context dependent, this id is implicitely 0 the dictionary is not context dependent, this ID is implicitly 0
instead.</t> instead.</t>
<t> <t>
If a distance goes beyond the dictionary for the current id and <!-- [rfced] To improve the readability of this paragraph, may we
multiple word list / transform list combinations are defined, then a format the text into a list as follows? Also, should "a next
dictionary" be rephrased to "the next dictionary", "a dictionary
that follows", or otherwise to use the correct article?
Current:
If a distance goes beyond the dictionary for the current ID and
multiple word/transform list combinations are defined, then a next
dictionary is used in the following order: if not context dependent,
the same order as defined in the shared dictionary. If context
dependent, the index matching the current context is used first, the
same order as defined in the shared dictionary excluding the current
context are used next.
Perhaps:
If a distance goes beyond the dictionary for the current ID and
multiple word/transform list combinations are defined, then the next
dictionary is used in the following order:
* If context dependent:
* use the index matching the current context first, and then
* use the same order as defined in the shared dictionary
(excluding the current context) next.
* If not context dependent:
* use the same order as defined in the shared dictionary.
-->
If a distance goes beyond the dictionary for the current ID and
multiple word/transform list combinations are defined, then a
next dictionary is used in the following order: if not context next dictionary is used in the following order: if not context
dependent, the same order as defined in the shared dictionary. If dependent, the same order as defined in the shared dictionary. If
context dependent, the index matching the current context is used context dependent, the index matching the current context is used
first, the same order as defined in the shared dictionary excluding first, the same order as defined in the shared dictionary excluding
the current context are used next.</t> the current context are used next.</t>
<section anchor="sect-3.1.1" numbered="true" toc="default"> <section anchor="sect-3.1.1" numbered="true" toc="default">
<name>Transform Operations</name> <name>Transform Operations</name>
<t> <t>
A shared dictionary may include custom word transformations, to A shared dictionary may include custom word transformations to
replace those specified in <xref target="sect-8" format="default"/> and Appen replace those specified in <xref target="sect-8" format="default"/> and <xref
dix B of <xref target="RFC7932" format="default"/>. A section="B" target="RFC7932" format="default"/>.
<!-- [rfced] Would the following proposed text retain the original
meaning of the sentence?
Current:
A transform consists of a possible prefix, a transform operation, for
some operations a parameter, and a possible suffix.
Perhaps:
A transform consists of a possible prefix, a transform operation, a
parameter (for some operations), and a possible suffix.
-->
A
transform consists of a possible prefix, a transform operation, for transform consists of a possible prefix, a transform operation, for
some operations a parameter, and a possible suffix. In the shared some operations a parameter, and a possible suffix. In the shared
dictionary format, the transform operation is represented by a dictionary format, the transform operation is represented by a
numerical ID, listed in the table below.</t> numerical ID, which is listed in the table below.</t>
<table anchor="operation-ids"> <!-- Assign an anchor --> <table anchor="operation-ids">
<name></name> <!-- Give the table a title --> <name></name>
<thead> <thead>
<tr> <tr>
<th>ID</th> <!-- <th>: header --> <th>ID</th>
<th>Operation</th> <th>Operation</th>
</tr> </tr>
</thead> </thead>
<tbody> <!-- The rows --> <tbody>
<tr> <tr>
<td>0</td> <td>Identity</td> <td>0</td> <td>Identity</td>
</tr><tr> </tr><tr>
<td>1</td> <td>OmitLast1</td> <td>1</td> <td>OmitLast1</td>
</tr><tr> </tr><tr>
<td>2</td> <td>OmitLast2</td> <td>2</td> <td>OmitLast2</td>
</tr><tr> </tr><tr>
<td>3</td> <td>OmitLast3</td> <td>3</td> <td>OmitLast3</td>
</tr><tr> </tr><tr>
<td>4</td> <td>OmitLast4</td> <td>4</td> <td>OmitLast4</td>
skipping to change at line 400 skipping to change at line 506
15 OmitFirst4 15 OmitFirst4
16 OmitFirst5 16 OmitFirst5
17 OmitFirst6 17 OmitFirst6
18 OmitFirst7 18 OmitFirst7
19 OmitFirst8 19 OmitFirst8
20 OmitFirst9 20 OmitFirst9
21 ShiftFirst (by PARAMETER) 21 ShiftFirst (by PARAMETER)
22 ShiftAll (by PARAMETER) 22 ShiftAll (by PARAMETER)
]]></artwork> ]]></artwork>
--> -->
<!-- [rfced] The original xref citation in the XML pointed to Section 8
of this document. We have updated as follows. Please let us know
any objections.
Current:
Operations 0 to 20 are specified in Section 8 of [RFC7932]. ShiftFirst
and ShiftAll transform specifically encoded SCALARs.
-->
<t> <t>
Operations 0 to 20 are specified in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>. Operations 0 to 20 are specified in <xref section="8" target="RFC7932" format ="default"/>.
ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t> ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t>
<t> <t>
A SCALAR is a 7-, 11-, 16- or 21-bit unsigned integer encoded with 1, A SCALAR is a 7-, 11-, 16-, or 21-bit unsigned integer encoded with 1,
2, 3 or 4 bytes respectively with following bit contents:</t> 2, 3, or 4 bytes, respectively, with the following bit contents:</t>
<!-- SG: should these be individual figures? -->
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
7-bit SCALAR: 7-bit SCALAR:
+--------+ +--------+
|0sssssss| |0sssssss|
+--------+ +--------+
11-bit SCALAR: 11-bit SCALAR:
+--------+--------+ +--------+--------+
|110sssss|XXssssss| |110sssss|XXssssss|
+--------+--------+ +--------+--------+
skipping to change at line 431 skipping to change at line 544
+--------+--------+--------+ +--------+--------+--------+
|1110ssss|XXssssss|XXssssss| |1110ssss|XXssssss|XXssssss|
+--------+--------+--------+ +--------+--------+--------+
21-bit SCALAR: 21-bit SCALAR:
+--------+--------+--------+--------+ +--------+--------+--------+--------+
|11110sss|XXssssss|XXssssss|XXssssss| |11110sss|XXssssss|XXssssss|XXssssss|
+--------+--------+--------+--------+ +--------+--------+--------+--------+
]]></artwork> ]]></artwork>
<t> <t>
Given the input bytes matching SCALAR encoding pattern, the SCALAR Given the input bytes matching the SCALAR encoding pattern, the SCALAR
value is obtained by concatenation of the "s" bits, with the most value is obtained by concatenation of the "s" bits, with the MSBs coming from
significant bits coming from the earliest byte. The "X" bits could the earliest byte. The "X" bits could
have arbitrary value.</t> have arbitrary value.</t>
<t> <t>
An ADDEND is defined as the result of limited sign extension of An ADDEND is defined as the result of limited sign extension of
16-bit unsigned PARAMETER:</t> a 16-bit unsigned PARAMETER:</t>
<t indent="3"> <t indent="3">
At first the PARAMETER is zero-extended to 32 bits. After this, At first, the PARAMETER is zero-extended to 32 bits. After this,
if the resulting value is greater or equal than 0x8000, 0xFF0000 is added if the resulting value is greater or equal than 0x8000.<
then 0xFF0000 is added.</t> /t>
<t> <t>
ShiftAll starts at the beginning of the word and repetitively applies ShiftAll starts at the beginning of the word and repetitively applies
the following transform until the whole word is transformed:</t> the following transformation until the whole word is transformed:</t>
<t indent="3"> <t indent="3">
If the next untransformed byte matches the first byte of the 7-, If the next untransformed byte matches the first byte of the 7-,
11-, 16- or 21-bit SCALAR pattern, then:</t> 11-, 16-, or 21-bit SCALAR pattern, then:</t>
<t indent="6"> <t indent="6">
If the untransformed part of the word is not long enough to If the untransformed part of the word is not long enough to
match the whole SCALAR pattern, then the whole word is match the whole SCALAR pattern, then the whole word is
marked as transformed.</t> marked as transformed.</t>
<t indent="6"> <t indent="6">
Otherwise, let SHIFTED be the sum of the ADDEND and the <!-- [rfced] We have rephrased the following sentence for
readability. Please let us know any objections.
Original:
Next, 1, 2, 3 or 4 not transformed bytes marked as
transformed, according to the SCALAR pattern length.
Current:
Next, 1, 2, 3, or 4 untransformed bytes are marked as
transformed according to the SCALAR pattern length.
-->
Otherwise, let SHIFTED be the sum of the ADDEND and the
encoded SCALAR. The lowest bits from SHIFTED encoded SCALAR. The lowest bits from SHIFTED
are written back into the corresponding "s" bits. The "0", are written back into the corresponding "s" bits. The "0",
"1" and "X" bits remain unchanged. Next, 1, 2, 3 or "1", and "X" bits remain unchanged. Next, 1, 2, 3, or
4 not transformed bytes marked as transformed, according to 4 untransformed bytes are marked as transformed according to
the SCALAR pattern length.</t> the SCALAR pattern length.</t>
<t indent="3"> <t indent="3">
Otherwise, the next untransformed byte is marked as transformed.</t> Otherwise, the next untransformed byte is marked as transformed.</t>
<t> <t>
ShiftFirst applies the same transform as ShiftAll, but does not ShiftFirst applies the same transformation as ShiftAll, but does not
iterate.</t> iterate.</t>
</section> </section>
</section> </section>
<section anchor="sect-3.2" numbered="true" toc="default"> <section anchor="sect-3.2" numbered="true" toc="default">
<name>LZ77 Dictionaries</name> <name>LZ77 Dictionaries</name>
<t> <t>
If an LZ77 dictionary is set, then the decoder treats this as a If an LZ77 dictionary is set, the decoder treats it as a
regular LZ77 copy, but behaves as if the bytes of this dictionary are regular LZ77 copy but behaves as if the bytes of this dictionary are
accessible as the uncompressed bytes outside of the regular LZ77 accessible as the uncompressed bytes outside of the regular LZ77
window for backwards references.</t> window for backwards references.</t>
<t> <t>
Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary. Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary.
Then word_id, described in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>, is redefined as:</t> Then word_id, described in <xref section="8" target="RFC7932" format="default "/>, is redefined as:</t>
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
word_id = distance - (max allowed distance + 1 + word_id = distance - (max allowed distance + 1 +
LZ77_DICTIONARY_LENGTH) LZ77_DICTIONARY_LENGTH)
]]></artwork> ]]></artwork>
<t> <t>
For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the
<xref target="RFC7932" format="default"/> definition.</t> <xref target="RFC7932" format="default"/> definition.</t>
<t> <t>
Let dictionary_address be</t> Let dictionary_address be:</t>
<t>
LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t> <t indent="3"> LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t>
<t> <t>
Then distance values of &lt;length, distance&gt; pairs <xref target="RFC7932" format="default"/> in range Then distance values of &lt;length, distance&gt; pairs <xref target="RFC7932" format="default"/> in range
(max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed
distance) are interpreted as references starting in the LZ77 distance) are interpreted as references starting in the LZ77
dictionary at the byte at dictionary_address. If length is longer dictionary at the byte at dictionary_address. If length is longer
than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the
reference continues to copy (length - LZ77_DICTIONARY_LENGTH + reference continues to copy (length - LZ77_DICTIONARY_LENGTH +
dictionary_address) bytes from the regular LZ77 window starting at dictionary_address) bytes from the regular LZ77 window starting at
the beginning.</t> the beginning.</t>
</section> </section>
skipping to change at line 499 skipping to change at line 621
(max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed
distance) are interpreted as references starting in the LZ77 distance) are interpreted as references starting in the LZ77
dictionary at the byte at dictionary_address. If length is longer dictionary at the byte at dictionary_address. If length is longer
than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the
reference continues to copy (length - LZ77_DICTIONARY_LENGTH + reference continues to copy (length - LZ77_DICTIONARY_LENGTH +
dictionary_address) bytes from the regular LZ77 window starting at dictionary_address) bytes from the regular LZ77 window starting at
the beginning.</t> the beginning.</t>
</section> </section>
</section> </section>
<section anchor="sect-4" numbered="true" toc="default"> <section anchor="sect-4" numbered="true" toc="default">
<name>Varint Encoding</name> <name>Varint Encoding</name>
<t>A varint is encoded in base 128 in one or more bytes as follows:</t> <t>A varint is encoded in base 128 in one or more bytes as follows:</t>
<artwork name="" type="" align="left" alt=""><![CDATA[ <artwork name="" type="" align="left" alt=""><![CDATA[
+--------+--------+ +--------+ +--------+--------+ +--------+
|1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx|
+--------+--------+ +--------+ +--------+--------+ +--------+
]]></artwork> ]]></artwork>
<t> <t>
where the "x" bits of the first byte are the least significant bits where the "x" bits of the first byte are the LSBs
of the value and the "x" bits of the last byte are the most of the value and the "x" bits of the last byte are the MSBs of the value.
significant bits of the value. The last byte must have its MSB set to
<!-- [rfced] May we rephrase as follows for clarity?
Current:
The last byte must have its MSB set to 0, all other bytes to 1 to
indicate there is a next byte.
Perhaps:
The last byte must have its MSB set to 0 and all other bytes must
have their MSBs set to 1 to indicate there is a next byte.
-->
The last byte must have its MSB set to
0, all other bytes to 1 to indicate there is a next byte.</t> 0, all other bytes to 1 to indicate there is a next byte.</t>
<t> <t>
The maximum allowed amount of bits to read is 63 bits, if the 9th The maximum allowed amount of bits to read is 63 bits; if the 9th
byte is present and has its MSB set then the stream must be byte is present and has its MSB set, then the stream must be
considered as invalid.</t> considered as invalid.</t>
</section> </section>
<!-- [rfced] Upon converting this document to XML, we have done our
best to preserve the original indentation of definition lists
that start in Section 5. Please let us know if any specific
adjustments need to be made or if the current indentation is
satisfactory.
-->
<section anchor="sect-5" numbered="true" toc="default"> <section anchor="sect-5" numbered="true" toc="default">
<name>Shared Dictionary Stream</name> <name>Shared Dictionary Stream</name>
<t> <t>
The shared dictionary stream encodes a custom dictionary for brotli The shared dictionary stream encodes a custom dictionary for brotli,
including custom words and/or custom transformations. A shared including custom words and/or custom transformations. A shared
dictionary may appear standalone or as contents of a resource in a dictionary may appear as a standalone or as contents of a resource in a
framing format container.</t> framing format container.</t>
<t> <t>
A compliant shared brotli dictionary stream must have the following A compliant shared brotli dictionary stream must have the following
format:</t> format:</t>
<dl newline="false" spacing="normal" indent="3"> <dl newline="false" spacing="normal" indent="3">
<dt>2 bytes:</dt> <dt>2 bytes:</dt>
<!-- [rfced] May we rephrase the following for clarity?
Original:
2 bytes: file signature, in hexadecimal the bytes 91, 0.
Perhaps:
2 bytes: File signature in hexadecimal format (bytes 91 and 0).
-->
<dd> <dd>
file signature, in hexadecimal the bytes 91, 0.</dd> File signature, in hexadecimal the bytes 91, 0.</dd>
<dt>varint:</dt> <dd>LZ77_DICTIONARY_LENGTH, number of bytes for a L <dt>varint:</dt> <dd>LZ77_DICTIONARY_LENGTH. The number of bytes for
Z77 an LZ7711
dictionary, or 0 if there is none. dictionary or 0 if there is none.
The maximum allowed value is the maximum possible sliding The maximum allowed value is the maximum possible sliding
window size of brotli or of large window brotli. window size of brotli or large window brotli.
</dd> </dd>
<dt>
LZ77_DICTIONARY_LENGTH bytes:</dt><dd> contents of the LZ77 dictionary.</d
d>
<dt>1 byte:</dt><dd>
<t>NUM_CUSTOM_WORD_LISTS, may have value 0 to 64</t>
<t> NUM_CUSTOM_WORD_LISTS times a word list, with the following <!--[rfced] In Section 5, may we add "in range" to these sentences for
format for each word list: clarity and consistency as shown below?
</t>
Original:
1 byte: NUM_CUSTOM_WORD_LISTS, may have value 0 to 64
1 byte: NUM_CUSTOM_TRANSFORM_LISTS, may have value 0 to 64
1 byte: NUM_DICTIONARIES, may have value 1 to 64
Perhaps:
1 byte: NUM_CUSTOM_WORD_LISTS. May have a value in range 0 to 64.
1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to 64.
1 byte: NUM_DICTIONARIES. May have a value in range 1 to 64.
-->
<dt>
LZ77_DICTIONARY_LENGTH bytes:</dt><dd>Contents of the LZ77 dictionary.</dd
>
<dt>1 byte:</dt><dd><t>NUM_CUSTOM_WORD_LISTS. May have a value of 0 to 64.
</t></dd>
<dt>NUM_CUSTOM_WORD_LISTS times a word list with the following format for each w
ord list:</dt>
<dd>
<t><br/></t>
<dl> <dl>
<dt>28 bytes:</dt><dd>SIZE_BITS_BY_LENGTH, array of 28 unsigned 8-bit <dt>28 bytes:</dt><dd>SIZE_BITS_BY_LENGTH. An array of 28 unsign ed 8-bit
integers, indexed by word lengths 4 to 31. The value integers, indexed by word lengths 4 to 31. The value
represents log2(number of words of this length), represents log2(number of words of this length),
with the exception of 0 meaning 0 words of this with the exception of 0 meaning 0 words of this
length. The max allowed length value is 15 bits. length. The max allowed length value is 15 bits.
OFFSETS_BY_LENGTH is computed from this as OFFSETS_BY_LENGTH is computed from this as
OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] +
(SIZE_BITS_BY_LENGTH[i] ? (i &lt;&lt; SIZE_BITS_BY_LENGTH[i]) (SIZE_BITS_BY_LENGTH[i] ? (i &lt;&lt; SIZE_BITS_BY_LENGTH[i])
: 0) </dd> : 0).</dd>
<dt>N bytes:</dt><dd> words dictionary data, where N is
<dt>N bytes:</dt><dd>Words dictionary data, where N is
OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ? OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ?
(31 &lt;&lt; SIZE_BITS_BY_LENGTH[31]) : 0), first all the word s of shortest length, then all words of the next length, and so on, where for ea ch length there are either 0 or a positive power of two amount of words. </dd> (31 &lt;&lt; SIZE_BITS_BY_LENGTH[31]) : 0), with all the words of shortest length first, then all words of the next length, and so on, where t here are either 0 or a positive power of two number of words for each length. </ dd>
</dl></dd> </dl></dd>
<dt> <dt>
1 byte:</dt><dd><t>NUM_CUSTOM_TRANSFORM_LISTS, may have value 0 to 64</t> 1 byte:</dt><dd>NUM_CUSTOM_TRANSFORM_LISTS. May have a value of 0 to 64.</ dd>
<t> <dt>
NUM_CUSTOM_TRANSFORM_LISTS times a transform list, with the NUM_CUSTOM_TRANSFORM_LISTS times a transform list with the
following format for each transform list: following format for each transform list:
</t> </dt>
<dd>
<t><br/></t>
<dl> <dl>
<dt>2 bytes:</dt><dd> PREFIX_SUFFIX_LENGTH, the length of prefix /suffix <dt>2 bytes:</dt><dd> PREFIX_SUFFIX_LENGTH. The length of prefix /suffix
data. Must be at least 1 because the list must data. Must be at least 1 because the list must
always end with a zero-length stringlet even always end with a zero-length stringlet even
if empty. if it is empty.
</dd> </dd>
<dt>NUM_PREFIX_SUFFIX times:</dt> <dd><t>prefix/suffix stringlet <dt>NUM_PREFIX_SUFFIX times:</dt><dd><t>Prefix/suffix stringlet.
.</t> NUM_PREFIX_SUFFIX is the number of stringlets parsed and
<t>
NUM_PREFIX_SUFFIX is the amount of stringlets parsed and
must be in range 1..256. must be in range 1..256.
</t><dl> </t><dl>
<dt>1 byte:</dt><dd> STRING_LENGTH, the length of the entry <dt>1 byte:</dt><dd> STRING_LENGTH. The length of the entry contents.
0 for the last (terminating) entry of the 0 for the last (terminating) entry of the
transform list. For other entries STRING_LENGTH transform list. For other entries, STRING_LENGTH
must be in range 1..255. The 0 entry must be must be in range 1..255. The 0 entry must be
present and must be the last byte of the present and must be the last byte of the
PREFIX_SUFFIX_LENGTH bytes of prefix/suffix PREFIX_SUFFIX_LENGTH bytes of prefix/suffix
data, else the stream must be rejected as data, else the stream must be rejected as
invalid.</dd> invalid.</dd>
<dt>STRING_LENGTH bytes:</dt><dd> contents of the prefix/suffix.</dd> <dt>STRING_LENGTH bytes:</dt><dd> Contents of the prefix/suffix.</dd>
</dl></dd> </dl></dd>
<dt>1 byte:</dt><dd> NTRANSFORMS, amount of transformation triplets.</dd> <!--[rfced] In Section 5, please consider the following changes for
<dt>NTRANSFORMS times:</dt><dd><t> data for each transform:</t> consistency within the list as we note variance with "TERM times
foo" vs. "TERM times: Foo:" (for example, "NUM_CUSTOM_WORD_LISTS
times a word list" vs. "NUM_DICTIONARIES times: The
DICTIONARY_MAP:"
Additionally, should text be added such as "listed below",
"the following" and "which contains" to introduce the next
list of items?
i)
Current:
NTRANSFORMS times: Data for each transform:
Perhaps:
NTRANSFORMS times the data for each transform listed below:
ii)
Current:
If and only if at least one transform has operation index
ShiftFirst or ShiftAll:
NTRANSFORMS times:
Perhaps:
If and only if at least one transform has operation index
ShiftFirst or ShiftAll, then NTRANSFORMS times the following:
iii)
Current:
NUM_DICTIONARIES times: The DICTIONARY_MAP:
Perhaps:
NUM_DICTIONARIES times the DICTIONARY_MAP, which contains:
-->
<dt>1 byte:</dt><dd> NTRANSFORMS. Number of transformation triplets.</dd>
<dt>NTRANSFORMS times:</dt><dd><t>Data for each transform:</t>
<dl> <dl>
<dt> <dt>
1 byte:</dt><dd> index of prefix in prefix/suffix data; 1 byte:</dt><dd>Index of prefix in prefix/suffix data;
must be less than NUM_PREFIX_SUFFIX. must be less than NUM_PREFIX_SUFFIX.
</dd> </dd>
<dt>1 byte:</dt><dd> index of suffix in prefix/suffix data; <dt>1 byte:</dt><dd>Index of suffix in prefix/suffix data;
must be less than NUM_PREFIX_SUFFIX.</dd> must be less than NUM_PREFIX_SUFFIX.</dd>
<dt>1 byte:</dt><dd> operation index, must be an index in th <dt>1 byte:</dt><dd>Operation index; must be an index in the
e table of table of
operations listed in the Section operations listed in <xref target="sect-3.1.1"/>.</dd></dl></dd></dl>
"Transform Operations".</dd></dl> <dl><dt>
<t>
If and only if at least one transform has operation index If and only if at least one transform has operation index
ShiftFirst or ShiftAll: ShiftFirst or ShiftAll:</dt><dd>
</t> <t><br/></t>
<dl>
<t> NTRANSFORMS times:</t> <dt> NTRANSFORMS times:</dt><dd><t><br/></t>
<dl> <dl>
<dt> <dt>
2 bytes:</dt><dd> parameters for the transform. If the transform 2 bytes:</dt><dd>Parameters for the transform. If the transform
does not have type ShiftFirst or ShiftAll, the does not have type ShiftFirst or ShiftAll, the
value must be 0. ShiftFirst and ShiftAll value must be 0. ShiftFirst and ShiftAll
interpret these bytes as an unsigned 16-bit interpret these bytes as an unsigned 16-bit
integer. integer.
</dd></dl> </dd></dl></dd></dl></dd></dl></dd></dl>
<t>if NUM_CUSTOM_WORD_LISTS &gt; 0 or NUM_CUSTOM_TRANSFORM_LISTS &gt <dl>
; 0 <dt>If NUM_CUSTOM_WORD_LISTS &gt; 0 or NUM_CUSTOM_TRANSFORM_LISTS &g
t; 0
(else implicitly NUM_DICTIONARIES is 1 and points to the (else implicitly NUM_DICTIONARIES is 1 and points to the
brotli built-in and there is no context map) brotli built-in and there is no context map):</dt>
</t> <dd>
<t><br/></t>
<dl> <dl>
<dt>1 byte:</dt><dd> NUM_DICTIONARIES, may have value 1 to 64. E <dt>1 byte:</dt>
ach <dd>NUM_DICTIONARIES. May have value 1 to 64. Each
dictionary is a combination of a word list and a dictionary is a combination of a word list and a
transform list. Each next dictionary is used when the transform list. Each next dictionary is used when the
distance goes beyond the previous. If a CONTEXT_MAP is distance goes beyond the previous. If a CONTEXT_MAP is
enabled, then the dictionary matching the context is enabled, then the dictionary matching the context is
moved to the front in the order for this context. moved to the front in the order for this context.
</dd> </dd>
<dt>NUM_DICTIONARIES times:</dt><dd> <t>the DICTIONARY_MAP:</t> <dt>NUM_DICTIONARIES times:</dt><dd> <t>The DICTIONARY_MAP:</t>
<dl><dt> <dl><dt>
1 byte:</dt><dd> index into a custom word list, or value 1 byte:</dt><dd>Index into a custom word list or value
NUM_CUSTOM_WORD_LISTS to indicate to use the brotli NUM_CUSTOM_WORD_LISTS to indicate using the brotli
<xref target="RFC7932" format="default"/> built-in default w <xref target="RFC7932" format="default"/> built-in default w
ord list ord list.
</dd> </dd>
<dt>1 byte:</dt><dd>index into a custom transform list, or v <dt>1 byte:</dt><dd>Index into a custom transform list or va
alue lue
NUM_CUSTOM_TRANSFORM_LISTS to indicate to use the NUM_CUSTOM_TRANSFORM_LISTS to indicate using the
brotli <xref target="RFC7932" format="default"/> built-in de brotli <xref target="RFC7932" format="default"/> built-in de
fault transform list fault transform list.
</dd> </dd>
</dl> </dl>
</dd> </dd>
<dt>1 byte:</dt><dd><t> CONTEXT_ENABLED, if 0 there is no contex
t map, if 1 a <dt>1 byte:</dt><dd>CONTEXT_ENABLED. If 0, there is no context ma
context map used to select the dictionary is encoded p. If 1, a
below</t></dd></dl> context map used to select the dictionary is encoded as
<t>If CONTEXT_ENABLED is 1, a context map for the 64 brotli below.</dd>
</dl>
<dl>
<dt>If CONTEXT_ENABLED is 1, there is a context map for the 64 b
rotli
<xref target="RFC7932" format="default"/> literals contexts: <xref target="RFC7932" format="default"/> literals contexts:
</t> </dt>
<dd><t><br/></t>
<dl> <dl>
<dt>64 bytes:</dt><dd> CONTEXT_MAP, index into the DICTIONAR <dt>64 bytes:</dt><dd> CONTEXT_MAP. Index into the DICTIONAR
Y_MAP for Y_MAP for
the first dictionary to use for this context the first dictionary to use for this context.
</dd></dl></dd> </dd></dl></dd>
</dl> </dl></dd></dl>
</dd>
</dl>
</section> </section>
<section anchor="sect-6" numbered="true" toc="default"> <section anchor="sect-6" numbered="true" toc="default">
<name>Large Window Brotli Compressed Data Stream</name> <name>Large Window Brotli Compressed Data Stream</name>
<t> <t>
Large window brotli allows a sliding window beyond the 24-bit maximum Large window brotli allows a sliding window beyond the 24-bit maximum
of regular brotli <xref target="RFC7932" format="default"/>.</t> of regular brotli <xref target="RFC7932" format="default"/>.</t>
<t> <t>
The compressed data stream is backwards compatible to brotli The compressed data stream is backwards compatible to brotli
<xref target="RFC7932" format="default"/>, and may optionally have the follow <xref target="RFC7932" format="default"/> and may optionally have the foll
ing differences:</t> owing differences:</t>
<dl><dt>Encoding of WBITS in the stream header:</dt><dd><t> the following new
pattern of 14 bits is supported:</t> <!--[rfced] Would the following text (second sentence that starts with
"Encoding") be easier to read if it was a complete sentence as
shown below (note that the first sentence is included for context
only)?
Also, under "6 bits", should "value" be singular or plural (e.g.,
"must have values in" or "must have a value in")?
Original:
The compressed data stream is backwards compatible to brotli
[RFC7932] and may optionally have the following differences:
Encoding of WBITS in the stream header: The following new
pattern of 14 bits is supported:
8 bits: Value 00010001 to indicate a large window brotli stream.
6 bits: WBITS. Must have value in range 10 to 62.
Perhaps:
The compressed data stream is backwards compatible to brotli
[RFC7932] and may optionally have the following differences.
If the encoding of WBITS is in the stream header, then the following
new pattern of 14 bits is supported:
8 bits: Value 00010001 to indicate a large window brotli stream.
6 bits: WBITS. Must have a value in range 10 to 62.
-->
<dl> <dl>
<dt>8 bits:</dt><dd> value 00010001, to indicate a large window <dt>Encoding of WBITS in the stream header:</dt><dd><t>The following new
brotli stream</dd> pattern of 14 bits is supported:</t>
<dl newline="false" spacing="normal">
<dt>8 bits:</dt><dd>Value 00010001 to indicate a large window
brotli stream.</dd>
<dt>6 bits:</dt><dd> WBITS, must have value in range 10 to 62</dd> <dt>6 bits:</dt><dd> WBITS. Must have value in range 10 to 62.</dd>
</dl></dd> </dl></dd>
<dt>Distance alphabet:</dt><dd>If the stream is a large window brotl
<dt>Distance alphabet:</dt><dd> if the stream is a large window brot i
li
stream, the maximum number of extra bits is 62 and the stream, the maximum number of extra bits is 62 and the
theoretical maximum size of the distance alphabet is theoretical maximum size of the distance alphabet is
(16 + NDIRECT + (124 &lt;&lt; NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" targe t="RFC7932"/> and affects the amount of bits in the encoding of the Simple Prefi x Code for distances as described in <xref section="3.4" sectionFormat="of" targ et="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 &lt;&lt; 63) - 4) when its extra b its have their maximum value. It depends on NPOSTFIX and NDIRECT when this can o ccur. </dd> (16 + NDIRECT + (124 &lt;&lt; NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" targe t="RFC7932"/> and affects the number of bits in the encoding of the Simple Prefi x Code for distances as described in <xref section="3.4" sectionFormat="of" targ et="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 &lt;&lt; 63) - 4) when its extra b its have their maximum value. It depends on NPOSTFIX and NDIRECT when this can o ccur. </dd>
</dl> </dl>
<t> <t>
A decoder that does not support 64-bit integers may reject a stream A decoder that does not support 64-bit integers may reject a stream
if WBITS is higher than 30 or a distance symbol from the distance if WBITS is higher than 30 or a distance symbol from the distance
alphabet is able to encode a distance larger than 2147483644.</t> alphabet is able to encode a distance larger than 2147483644.</t>
</section> </section>
<section anchor="sect-7" numbered="true" toc="default"> <section anchor="sect-7" numbered="true" toc="default">
<name>Shared Brotli Compressed Data Stream</name> <name>Shared Brotli Compressed Data Stream</name>
<t> <t>
The format of a shared brotli compressed data stream without framing The format of a shared brotli compressed data stream without a framing
format is backwards compatible with brotli <xref target="RFC7932" format="def format is backwards compatible with brotli <xref target="RFC7932" format="def
ault"/>, with the ault"/> with the
following optional differences:</t> following optional differences:</t>
<ul><li>LZ77 dictionaries as described above are supported</li> <ul><li>LZ77 dictionaries as described above are supported.</li>
<li> Custom static dictionaries replacing or extending the static <li> Custom static dictionaries replacing or extending the static
dictionary of brotli <xref target="RFC7932" format="default"/> with differe nt words or dictionary of brotli <xref target="RFC7932" format="default"/> with differe nt words or
transforms are supported</li> transforms are supported.</li>
<li>The stream may have the format of regular brotli <xref target="RFC7932"/>, <li>The stream may have the format of regular brotli <xref target="RFC7932"/>
or the format of large window brotli as described in section or the format of large window brotli as described in <xref target="sect-6
6.</li> " format="default"/>.</li>
</ul> </ul>
</section> </section>
<section anchor="sect-8" numbered="true" toc="default"> <section anchor="sect-8" numbered="true" toc="default">
<name>Shared Brotli Framing Format Stream</name> <name>Shared Brotli Framing Format Stream</name>
<t> <t>
A compliant shared brotli framing format stream has the format A compliant shared brotli framing format stream has the format
described below.</t> described below.</t>
<section anchor="sect-8.1" numbered="true" toc="default"> <section anchor="sect-8.1" numbered="true" toc="default">
<name>Main Format</name> <name>Main Format</name>
<!-- [rfced] May we rephrase the following sentence for clarity?
Current:
File signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, 0x52.
Perhaps:
File signature in hexadecimal format (bytes 0x91, 0x0a, 0x42, and 0x52).
-->
<dl> <dl>
<dt>4 bytes:</dt><dd> file signature, in hexadecimal the bytes 0x9 1, 0x0a, <dt>4 bytes:</dt><dd> File signature, in hexadecimal the bytes 0x9 1, 0x0a,
0x42, 0x52. The first byte contains the invalid WBITS 0x42, 0x52. The first byte contains the invalid WBITS
combination for brotli <xref target="RFC7932" format="default"/> a nd large window brotli. combination for brotli <xref target="RFC7932" format="default"/> a nd large window brotli.
</dd> </dd>
<dt>1 byte:</dt><dd><t> container flags, 8 bits with meanings:</t> <dt>1 byte:</dt><dd><t>Container flags that are 8 bits and have the foll
<dl><dt> bit 0 and 1:</dt><dd> version indicator, must be b'00, otherwise owing meanings:</t>
the
<!--[rfced] Is it correct that "bit" is singular in "bit 0 and 1",
"bit 2 and 3", and "bit 4-7"? Or should these instances be
updated as "bits 0 and 1", "bits 2 and 3", and "bits 4-7"? We
note 1 instance of "bits 2-7" in Section 8.4.3.
Current:
bit 0 and 1: Version Indicator...
bit 0 and 1: Dictionary source:
bit 2 and 3: Dictionary type:
bit 4-7: Must be 0
-->
<dl><dt> bit 0 and 1:</dt><dd>Version indicator that must be b'00. Otherwi
se, the
decoder must reject the data stream as invalid. decoder must reject the data stream as invalid.
</dd> </dd>
<dt>bit 2:</dt> <dd>if 0, the file contains no final footer, m ay not contain <dt>bit 2:</dt> <dd>If 0, the file contains no final footer, m ay not contain
any metadata chunks, may not contain a central directory, any metadata chunks, may not contain a central directory,
and may encode only a single resource (using one or more and may encode only a single resource (using one or more
data chunks). If 1, the file may contain one or more data chunks). If 1, the file may contain one or more
resources, metadata, central directory, and must contain a resources, metadata, and a central directory, and it must contain a
final footer. final footer.
</dd> </dd>
</dl> </dl>
</dd> </dd>
<dt>multiple times:</dt><dd> a chunk, each with the format specifi ed in section 8.2</dd> <dt>multiple times:</dt><dd>A chunk, each with the format specifie d in <xref target="sect-8.2"/>.</dd>
</dl> </dl>
</section> </section>
<section anchor="sect-8.2" numbered="true" toc="default"> <section anchor="sect-8.2" numbered="true" toc="default">
<name>Chunk Format</name> <name>Chunk Format</name>
<dl> <dl>
<dt>varint:</dt> <dd>length of this chunk excluding this varint bu <dt>varint:</dt><dd>Length of this chunk excluding this varint
t but including all next header bytes and data. If the value is 0,
including all next header bytes and data. If the value then the chunk type byte is not present and the chunk type is
is 0, then the chunk type byte is not present and the assumed to be 0.</dd>
chunk type is assumed to be 0.</dd>
<dt>1 byte:</dt><dd><t>CHUNK_TYPE</t> <dt>1 byte:</dt><dd><t>CHUNK_TYPE</t>
<dl> <dl indent="5" spacing="compact">
<dt> 0:</dt><dd> padding chunk</dd> <dt> 0:</dt><dd> padding chunk</dd>
<dt> 1:</dt><dd> metadata chunk</dd> <dt> 1:</dt><dd> metadata chunk</dd>
<dt> 2:</dt><dd> data chunk</dd> <dt> 2:</dt><dd> data chunk</dd>
<dt> 3:</dt><dd> first partial data chunk</dd> <dt> 3:</dt><dd> first partial data chunk</dd>
<dt> 4:</dt><dd> middle partial data chunk</dd> <dt> 4:</dt><dd> middle partial data chunk</dd>
<dt> 5:</dt><dd> last partial data chunk</dd> <dt> 5:</dt><dd> last partial data chunk</dd>
<dt> 6:</dt><dd> footer metadata chunk</dd> <dt> 6:</dt><dd> footer metadata chunk</dd>
<dt> 7:</dt><dd> global metadata chunk</dd> <dt> 7:</dt><dd> global metadata chunk</dd>
<dt> 8:</dt><dd> repeat metadata chunk</dd> <dt> 8:</dt><dd> repeat metadata chunk</dd>
<dt> 9:</dt><dd> central directory chunk</dd> <dt> 9:</dt><dd> central directory chunk</dd>
<dt> 10:</dt><dd> final footer</dd> <dt> 10:</dt><dd> final footer</dd>
</dl></dd></dl> </dl></dd>
<t>
if CHUNK_TYPE is not padding chunk, central directory or final
footer:</t>
<dt>If CHUNK_TYPE is not padding chunk, central directory, or final footer:</dt>
<dd>
<t><br/></t>
<dl><dt> 1 byte:</dt><dd><t> CODEC:</t> <dl><dt> 1 byte:</dt><dd><t> CODEC:</t>
<dl> <dl spacing="compact">
<dt>0:</dt><dd> uncompressed</dd> <dt>0:</dt><dd> uncompressed</dd>
<dt> 1:</dt><dd> keep decoder</dd> <dt> 1:</dt><dd> keep decoder</dd>
<dt> 2:</dt><dd> brotli</dd> <dt> 2:</dt><dd> brotli</dd>
<dt> 3:</dt><dd> shared brotli</dd> <dt> 3:</dt><dd> shared brotli</dd>
</dl> </dl>
</dd></dl> </dd>
</dl>
<t>if CODEC is not "uncompressed":</t> </dd>
<dt>If CODEC is not "uncompressed":</dt>
<dd>
<t><br/></t>
<dl><dt> <dl><dt>
varint:</dt><dd> uncompressed size in bytes of the data contained varint:</dt><dd>Uncompressed size in bytes of the data contained
within the compressed stream within the compressed stream.
</dd></dl> </dd></dl></dd>
<t>if CODEC is "shared brotli":</t> <dt>If CODEC is "shared brotli":</dt>
<dd><t><br/></t>
<dl><dt> <dl><dt>
1 byte:</dt><dd><t> amount of dictionary references. Multiple dictionary 1 byte:</dt><dd><t>Number of dictionary references. Multiple dictionary
references are possible with the following references are possible with the following
restrictions: there can be maximum 1 serialized restrictions: there can be 1 serialized
dictionary, and maximum 15 prefix dictionaries (a dictionary and 15 prefix dictionaries maximum (a
serialized dictionary may already contain one of serialized dictionary may already contain one of
those). Circular references are not allowed (any those). Circular references are not allowed (any
dictionary reference that directly or indirectly dictionary reference that directly or indirectly
uses this chunk itself as dictionary).</t></dd> uses this chunk itself as dictionary).</t></dd>
</dl>
<t> per dictionary reference:</t>
<dl><dt>1 byte:</dt><dd><t> flags:</t>
<dl><dt>bit 0 and 1:</dt><dd><t> dictionary source:</t>
<dl><dt>00:</dt><dd> Internal dictionary reference to a full resource <dt>Per dictionary reference:</dt>
<dd><t><br/></t>
<dl><dt>1 byte:</dt><dd><t> Flags:</t>
<dl><dt>bit 0 and 1:</dt><dd><t> Dictionary source:</t>
<dl indent="5"><dt>00:</dt><dd> Internal dictionary reference to a full r
esource
by pointer, which can span one or more chunks. by pointer, which can span one or more chunks.
Must point to a full data chunk or a first Must point to a full data chunk or a first
partial data chunk.</dd> partial data chunk.</dd>
<dt>01:</dt><dd> Internal dictionary reference to single c hunk <dt>01:</dt><dd> Internal dictionary reference to single c hunk
contents by pointer. May point to any chunk with contents by pointer. May point to any chunk with
content (data or metadata). If partial data content (data or metadata). If a partial data
chunk, only this part is the dictionary. In this chunk, only this part is the dictionary. In this
case, the dictionary type is not allowed to be a case, the dictionary type is not allowed to be a
serialised dictionary. serialized dictionary.
</dd> </dd>
<dt>10:</dt><dd> Reference to a dictionary by hash code of a <dt>10:</dt><dd> Reference to a dictionary by hash code of a
resource. The dictionary can come from an resource. The dictionary can come from an
external source such as a different container. external source, such as a different container.
The user of the decoder must be able to provide The user of the decoder must be able to provide
the dictionary contents given its hash code (even the dictionary contents given its hash code (even
if it comes from this container itself), or treat if it comes from this container itself) or treat
it as an error when the user does not have it it as an error when the user does not have it
available.</dd> available.</dd>
<dt>11:</dt><dd> invalid bit combination</dd> <dt>11:</dt><dd> Invalid bit combination</dd>
</dl>
</dd>
<dt> bit 2 and 3:</dt><dd> dictionary type:</d
d>
<dt>00:</dt><dd> <t>prefix dictionary, set in front of the sliding
window</t>
<dl>
<dt> 01:</dt><dd> serialized dictionary in
the shared brotli
format as specified in section 5.</dd>
<dt>
10:</dt><dd> invalid bit combination</dd>
<dt>11:</dt><dd> invalid bit combination</dd>
<dt>bit 4-7:</dt><dd> must be 0</dd>
<dt>if hash-based:</dt>
<dd>
<dl><dt>1 byte:</dt><dd> type of hash used. Only supported value: 3,
indicating 256-bit Highwayhash <xref target="HWYHASH" for
mat="default"/>.
</dd>
</dl>
</dd>
</dl> </dl>
</dd> </dd>
<dt> 32 bytes:</dt><dd><t> 256-bit Highwayhash <dt> bit 2 and 3:</dt><dd><t>Dictionary type:<
checksum to refer to /t>
dictionary.</t> <dl indent="5">
<dl> <dt>00:</dt><dd> <t>Prefix dictionary, set in front of the sliding
<dt>if pointer based:</dt><dd> varint encoded pointer to window</t></dd>
its
chunk in this container. The chunk must come earlier
in the container than the current chunk.</dd>
<dt>X bytes:</dt><dd> extra header bytes, depending on
CHUNK_TYPE. If present,
they are specified in the subsequent sections.
</dd>
</dl>
</dd>
<dt>remaining bytes:</dt><dd> <t>the chunk contents. The <dt>01:</dt><dd>Serialized dictionary in the shared brot
uncompressed data li
format as specified in <xref target="sect-5"/>.</dd>
<dt>
10:</dt><dd> Invalid bit combination</dd>
<dt>11:</dt><dd> Invalid bit combination</dd></dl></dd>
<dt>bit 4-7:</dt><dd> Must be 0</dd></dl></dd>
<dt>If hash-based:</dt><dd><t><br/></t>
<dl><dt>1 byte:</dt><dd>Type of hash used. Only supported value: 3,
indicating 256-bit HighwayHash <xref target="HWYHASH" for
mat="default"/>.
</dd>
<dt>32 bytes:</dt><dd><t> 256-bit HighwayHash checksum to re
fer to
dictionary.</t></dd></dl></dd>
<dt>If pointer based:</dt><dd>Varint-encoded pointer to
its
chunk in this container. The chunk must come in the container ear
lier
than the current chunk.</dd></dl></dd></dl></dd>
<dt>X bytes:</dt><dd><t>Extra header bytes, depending
on CHUNK_TYPE. If present,
they are specified in the subsequent sections.</t>
<dl>
<dt>remaining bytes:</dt><dd> <t>The chunk contents. The
uncompressed data
in the chunk content depends on CHUNK_TYPE in the chunk content depends on CHUNK_TYPE
and is specified in the subsequent sections. and is specified in the subsequent sections.
The compressed data has following The compressed data has following
format depending on CODEC:</t> format depending on CODEC:</t>
<ul><li>uncompressed: the raw bytes</li> <!-- [rfced] May we update the following unordered list into a
<li>if "keep decoder", the continuation of the compressed definition list for consistency with the rest of Section 8.2?
stream which was interrupted at the end of the previous
Original:
* uncompressed: the raw bytes
* if "keep decoder", the continuation of the compressed stream
which was interrupted at the end of the previous chunk. The
decoder from the previous chunk must be used and its state
it had at the end of the previous chunk must be kept at the
start of the decoding of this chunk.
* brotli: the bytes are in brotli format [RFC7932]
* shared brotli: the bytes are in the shared brotli format
specified in Section 7
Perhaps:
uncompressed: The raw bytes.
"keep decoder": If "keep decoder", the continuation of the compressed s
tream
that was interrupted at the end of the previous chunk. The
decoder from the previous chunk must be used and its state
it had at the end of the previous chunk must be kept at the
start of the decoding of this chunk.
brotli: The bytes are in brotli format [RFC7932].
shared brotli: The bytes are in the shared brotli format
specified in Section 7.
-->
<ul><li>uncompressed: The raw bytes.</li>
<li>If "keep decoder", the continuation of the compressed
stream that was interrupted at the end of the previous
chunk. The decoder from the previous chunk must be used chunk. The decoder from the previous chunk must be used
and its state it had at the end of the previous chunk and its state it had at the end of the previous chunk
must be kept at the start of the decoding of this chunk. must be kept at the start of the decoding of this chunk.
</li> </li>
<li>brotli: the bytes are in brotli format <li>brotli: The bytes are in brotli format
<xref target="RFC7932" format="default"/> <xref target="RFC7932" format="default"/>.
</li> </li>
<li>shared brotli: the bytes are in the <li>shared brotli: The bytes are in the
shared brotli format specified in section shared brotli format specified in <xref target="sect-7"/>.</li></ul></dd>
7</li></ul></dd>
</dl> </dl>
</dd> </dd>
</dl> </dl>
</section> </section>
<section anchor="sect-8.3" numbered="true" toc="default"> <section anchor="sect-8.3" numbered="true" toc="default">
<name>Metadata Format</name> <name>Metadata Format</name>
<t>All the metadata chunk types use the following format for the <t>All the metadata chunk types use the following format for the
uncompressed content:</t> uncompressed content:</t>
<dl newline="true"> <dl newline="true">
<dt>Per field:</dt> <dt>Per field:</dt>
<dd> <dd>
<dl><dt>2 bytes:</dt> <dl><dt>2 bytes:</dt>
<dd><t> code to identify this metadata field. This must be <dd><t>Code to identify this metadata field. This must be
two lowercase or two uppercase alpha ascii two lowercase or two uppercase alpha ASCII
characters. If the decoder encounters a lowercase characters. If the decoder encounters a lowercase
field that it does not recognise for the current field that it does not recognize for the current
chunk type, non-ascii characters or non-alpha chunk type, non-ASCII characters, or non-alpha
characters, the decoder must reject the data stream characters, the decoder must reject the data stream
as invalid. Uppercase codes may be used for custom as invalid. Uppercase codes may be used for custom
user metadata and can be ignored by a compliant user metadata and can be ignored by a compliant
decoder.</t></dd> decoder.</t></dd>
<dt>varint:</dt> <dt>varint:</dt>
<dd> <t>length of the content of this field in bytes, <dd><t>Length of the content of this field in bytes,
excluding the code bytes and this varint</t> excluding the code bytes and this varint.</t></dd>
<dl>
<dt>N bytes:</dt> <dt>N bytes:</dt>
<dd> the contents of this field</dd> <dd>The contents of this field.</dd>
</dl> </dl>
</dd> </dd>
</dl> </dl>
</dd>
</dl>
<t> <t>
The last field is reached when the chunk content end is reached. If The last field is reached when the chunk content end is reached. If
the length of the last field does not end at the same byte as the end the length of the last field does not end at the same byte as the end
of the uncompressed content of the chunk, the decoder must reject the of the uncompressed content of the chunk, the decoder must reject the
data stream as invalid.</t> data stream as invalid.</t>
</section> </section>
<section anchor="sect-8.4" numbered="true" toc="default"> <section anchor="sect-8.4" numbered="true" toc="default">
<name>Chunk Specifications</name> <name>Chunk Specifications</name>
<section anchor="sect-8.4.1" numbered="true" toc="default"> <section anchor="sect-8.4.1" numbered="true" toc="default">
<name>Padding Chunk (Type 0)</name> <name>Padding Chunk (Type 0)</name>
<t> <t>
All bytes in this chunk must be zero, except for the initial varint All bytes in this chunk must be zero except for the initial varint
that specifies the remaining chunk length.</t> that specifies the remaining chunk length.</t>
<t> <t>
Since the varint itself takes up bytes as well, when the goal is to Since the varint itself takes up bytes as well, when the goal is to
introduce an amount of padding bytes, the dependence of the length of introduce a number of padding bytes, the dependence of the length of
the varint on the value it encodes must be taken into account.</t> the varint on the value it encodes must be taken into account.</t>
<t> <t>
A single byte varint with value 0 is a padding chunk of length 1. A single byte varint with a value of 0 is a padding chunk of length 1.
For more padding, use higher varint values. Do not use multiple For more padding, use higher varint values. Do not use multiple
shorter padding chunks, since this is slower to decode.</t> shorter padding chunks since this is slower to decode.</t>
</section> </section>
<section anchor="sect-8.4.2" numbered="true" toc="default"> <section anchor="sect-8.4.2" numbered="true" toc="default">
<name>Metadata Chunk (Type 1)</name> <name>Metadata Chunk (Type 1)</name>
<t> <t>
This chunk contains metadata that applies to the resource whose This chunk contains metadata that applies to the resource whose
beginning is encoded in the subsequent data chunk or first partial beginning is encoded in the subsequent data chunk or first partial
data chunk.</t> data chunk.</t>
<t> <t>
The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t>
<t>The following field types are recognised:</t> <t>The following field types are recognized:</t>
<dl><dt>id:</dt><dd><t> name field. May appear 0 or 1 times. Has the following <!--[rfced] In Section 8.4.2, the formatting of the list is hard to
format:</t> read. While we note that this style follows RFC 7932, please
<dl> consider if you would like to update to a format that is more
<dt> N bytes:</dt><dd><t> name in UTF-8 encoding, leng typical in other RFCs. For example, see the excerpt below from
th determined by the RFC 9188.
Original:
id: name field. May appear 0 or 1 times. Has the following format:
N bytes: name in UTF-8 encoding, length determined by the
field length. Treated generically but may be used as field length. Treated generically but may be used as
filename. If used as filename, forward slashes '/' filename. If used as filename, forward slashes '/'
should be used as directory separator, relative paths should be used as directory separator, relative paths
should be used and filenames ending in a slash with should be used and filenames ending in a slash with
0-length content in the matching data chunk should be 0-length content in the matching data chunk should be
treated as an empty directory.</t> treated as an empty directory.
<dl>
<dt>mt:</dt> <dd><t>modification type. May appear 0 or 1 tim mt: modification type. May appear 0 or 1 times. Has the following format:
es. Has the following format:</t>
8 bytes: microseconds since epoch, as a little endian signed
twos complement 64-bit integer
Perhaps:
id (N bytes): Name field. May appear 0 or 1 times. Has the following
format: name in UTF-8 encoding, length determined by the
field length. Treated generically but may be used as a
filename. If used as a filename, forward slashes '/'
should be used as directory separators, relative paths
should be used, and filenames ending in a slash with
0-length content in the matching data chunk should be
treated as an empty directory.
mt (8 bytes): Modification type. May appear 0 or 1 times. Has the following
format: contains microseconds since epoch, as a little-endian,
signed two's complement 64-bit integer.
Example from Section 4.1 of RFC 9188:
Checksum (1 byte): This contains the (one's complement) checksum sum
of all 8 bits in the trailer. For purposes of computing the checksum,
the value of the Checksum field is zero. This field is present only if
the Checksum Present bit is set to 1.
First SDU Length (2 bytes): This is the length of the first IP packet in
the PDU, only included if a PDU contains multiple IP packets. This
field is present only if the Concatenation Present bit is set to 1.
Connection ID (1 byte): This contains an unsigned integer to identify
the anchor and delivery connection of the GMA PDU. This field is
present only if the Connection ID Present bit is set to 1.
-->
<dl><dt>id:</dt><dd><t>Name field. May appear 0 or 1 times. Has the following
format:</t>
<dl>
<dt> N bytes:</dt><dd><t>Name in UTF-8 encoding, length
determined by the field length. Treated generically but may
be used as a filename. If used as a filename, forward slashes
'/' should be used as directory separators, relative paths
should be used, and filenames ending in a slash with 0-length
content in the matching data chunk should be treated as an
empty directory.</t></dd>
</dl></dd></dl>
<dl><dt>mt:</dt> <dd><t>Modification type. May appear 0 or 1 t
imes. Has the following format:</t>
<dl> <dl>
<dt>8 bytes:</dt><dd> microseconds since epoch, as a little endi <dt>8 bytes:</dt><dd><t>Microseconds since epoch, as a little-en
an signed twos complement 64-bit integer</dd> dian, signed two's complement 64-bit integer.</t></dd>
<dt> custom user field:</dt><dd> any two uppercase ASCII characters.</dd> </dl></dd>
</dl> <dt>custom user field:</dt><dd>Any two uppercase ASCII characters.</dd>
</dd> </dl> <!--</dd></dl>-->
</dl>
</dd>
</dl>
</dd>
</dl>
</section> </section>
<section anchor="sect-8.4.3" numbered="true" toc="default"> <section anchor="sect-8.4.3" numbered="true" toc="default">
<name>Data Chunk (Type 2)</name> <name>Data Chunk (Type 2)</name>
<t> <t>
A data chunk contains the actual data of a resource.</t> A data chunk contains the actual data of a resource.</t>
<t>This chunk has the following extra header bytes:</t> <t>This chunk has the following extra header bytes:</t>
<dl> <dl>
<dt>1 byte: </dt> <dd><t>flags:
<dt>1 byte: </dt> <dd><t>Flags:
</t> </t>
<dl> <dl>
<dt> bit 0:</dt><dd> if true, indicates this is not a r esource that should be <dt> bit 0:</dt><dd> If true, indicates this is not a r esource that should be
output implicitly as part of extracting resources from output implicitly as part of extracting resources from
this container. Instead, it may be referred to only this container. Instead, it may be referred to only
explicitly, e.g. as a dictionary reference by hash code explicitly, e.g., as a dictionary reference by hash code
or offset. This flag should be set for data used as or offset. This flag should be set for data used as
dictionary to improve compression of actual resources.</dd> dictionary to improve compression of actual resources.</dd>
<dt> <dt>
bit 1:</dt><dd> if true, hash code is given</dd> bit 1:</dt><dd>If true, hash code is given</dd>
<dt> <dt>
bits 2-7:</dt><dd> must be zero</dd></dl> bits 2-7:</dt><dd>Must be zero.</dd></dl></dd>
<t>if hash code is given:</t> <dt>If hash code is given:</dt><dd><t><br/></t>
<dl> <dl>
<dt>1 byte:</dt><dd> type of hash used. Only supported value: 3, <dt>1 byte:</dt><dd>Type of hash used. Only supported value: 3,
indicating 256-bit Highwayhash <xref target="HWYHASH" format="de indicating 256-bit HighwayHash <xref target="HWYHASH" format="de
fault"/>. fault"/>.
</dd> </dd>
<dt> 32 bytes:</dt><dd> 256-bit Highwayhash checksum of t <dt> 32 bytes:</dt><dd> 256-bit HighwayHash checksum of t
he uncompressed he uncompressed
data</dd> data.</dd>
</dl> </dl>
</dd> </dd>
</dl> </dl>
<t> <t>
The uncompressed content bytes of this chunk are the actual data of The uncompressed content bytes of this chunk are the actual data of
the resource.</t> the resource.</t>
</section> </section>
<section anchor="sect-8.4.4" numbered="true" toc="default"> <section anchor="sect-8.4.4" numbered="true" toc="default">
<name>First Partial Data Chunk (Type 3)</name> <name>First Partial Data Chunk (Type 3)</name>
<t> <t>
This chunk contains partial data of a resource. This is the first This chunk contains partial data of a resource. This is the first
chunk in a series containing the entire data of the resource.</t> chunk in a series containing the entire data of the resource.</t>
<t> <t>
The format of this chunk is the same as the format of a Data Chunk The format of this chunk is the same as the format of a data chunk
(<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t>
<t> <t>
The second bit of flags must be set to 0 and no hash code given.</t> The second bit of flags must be set to 0 and no hash code given.</t>
<t> <t>
The uncompressed data size is only of this part of the resource, not The uncompressed data size is only of this part of the resource, not
of the full resource.</t> of the full resource.</t>
</section> </section>
<section anchor="sect-8.4.5" numbered="true" toc="default"> <section anchor="sect-8.4.5" numbered="true" toc="default">
<name>Middle Partial Data Chunk (Type 4)</name> <name>Middle Partial Data Chunk (Type 4)</name>
<t> <t>
This chunk contains partial data of a resource, and is neither the This chunk contains partial data of a resource and is neither the
first nor the last part of the full resource.</t> first nor the last part of the full resource.</t>
<t> <t>
The format of this chunk is the same as the format of a Data Chunk The format of this chunk is the same as the format of a data chunk
(<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t>
<t> <t>
The first and second bits of flags must be set to 0.</t> The first and second bits of flags must be set to 0.</t>
<t> <t>
The uncompressed data size is only of this part of the resource, not The uncompressed data size is only of this part of the resource, not
of the full resource.</t> of the full resource.</t>
</section> </section>
<section anchor="sect-8.4.6" numbered="true" toc="default"> <section anchor="sect-8.4.6" numbered="true" toc="default">
<name>Last Partial Data Chunk (Type 5)</name> <name>Last Partial Data Chunk (Type 5)</name>
<t> <t>
This chunk contains the final piece of partial data of a resource.</t> This chunk contains the final piece of partial data of a resource.</t>
<t> <t>
The format of this chunk is the same as the format of a Data Chunk The format of this chunk is the same as the format of a data chunk
(<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t> (<xref target="sect-8.4.3" format="default"/>) except for the differences not ed below.</t>
<t> <t>
The first bit of the flags must be set to 0.</t> The first bit of flags must be set to 0.</t>
<t> <t>
If a hash code is given, the hash code of the full resource If a hash code is given, the hash code of the full resource
(concatenated from all previous chunks and this chunk) is given in (concatenated from all previous chunks and this chunk) is given in
this chunk.</t> this chunk.</t>
<t> <t>
The uncompressed data size is only of this part of the resource, not The uncompressed data size is only of this part of the resource, not
of the full resource.</t> of the full resource.</t>
<t> <t>
The type of this chunk indicates that there are no further chunk The type of this chunk indicates that there are no further chunk
encoding this resource, so the full resource is now known.</t> encoding this resource, so the full resource is now known.</t>
</section> </section>
<section anchor="sect-8.4.7" numbered="true" toc="default"> <section anchor="sect-8.4.7" numbered="true" toc="default">
<name>Footer Metadata Chunk (Type 6)</name> <name>Footer Metadata Chunk (Type 6)</name>
<t> <t>
This metadata applies to the resource whose encoding ended in the This metadata applies to the resource whose encoding ended in the
preceding data chunk or last partial data chunk.</t> preceding data chunk or last partial data chunk.</t>
<t> <t>
The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t>
<t> <t>
There are no lowercase field types defined for global metadata. There are no lowercase field types defined for footer metadata.
Uppercase field types can be used as custom user data.</t> Uppercase field types can be used as custom user data.</t>
</section> </section>
<section anchor="sect-8.4.8" numbered="true" toc="default"> <section anchor="sect-8.4.8" numbered="true" toc="default">
<name>Global Metadata Chunk (Type 7)</name> <name>Global Metadata Chunk (Type 7)</name>
<t> <t>
This metadata applies to the whole container instead of a single This metadata applies to the whole container instead of a single
resource.</t> resource.</t>
<t> <t>
The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t>
<t> <t>
There are no lowercase field types defined for footer metadata. There are no lowercase field types defined for global metadata.
Uppercase field types can be used as custom user data.</t> Uppercase field types can be used as custom user data.</t>
</section> </section>
<section anchor="sect-8.4.9" numbered="true" toc="default"> <section anchor="sect-8.4.9" numbered="true" toc="default">
<name>Repeat Metadata Chunk (Type 8)</name> <name>Repeat Metadata Chunk (Type 8)</name>
<t> <t>
These chunks optionally repeat metadata that is interleaved between These chunks optionally repeat metadata that is interleaved between
data chunks. To use these chunks, it is necessary to also read data chunks. To use these chunks, it is necessary to also read
additional information, such as pointers to the original chunks, from additional information, such as pointers to the original chunks, from
the central directory.</t> the central directory.</t>
<t> <t>
The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t> The contents of this chunk follows the format described in <xref target="sect -8.3" format="default"/>.</t>
<t>This chunk has an extra header byte:</t> <t>This chunk has an extra header byte:</t>
<dl> <dt> <dl> <dt>
1 byte:</dt><dd> chunk type of repeated chunk (metadata chunk 1 byte:</dt><dd>Chunk type of repeated chunk (metadata chunk
or footer metadata chunk) or footer metadata chunk).
</dd></dl> </dd></dl>
<t>This set of chunks must follow the following restrictions:</t> <t>This set of chunks must follow the following restrictions:</t>
<ul><li> <ul><li>
It is optional whether or not repeat metadata chunks are It is optional whether or not repeat metadata chunks are
present.</li> present.</li>
<li>If they are present, then they must be present for all <li>If they are present, then they must be present for all
metadata chunks and footer metadata chunks. metadata chunks and footer metadata chunks.
</li> </li>
<li>There may be only 1 repeat metadata chunk per repeated metad ata chunk.</li> <li>There may be only 1 repeat metadata chunk per repeated metad ata chunk.</li>
<li>They must appear in the same order as the chunks appear in t he container, which is also the same order as listed in the <li>They must appear in the same order as the chunks appear in t he container, which is also the same order as listed in the
central directory. central directory.
</li> </li>
<li>Compression of these chunks is allowed, however it is not al lowed <li>Compression of these chunks is allowed; however, it is not a llowed
to use any internal dictionary except an earlier repeat to use any internal dictionary except an earlier repeat
metadata chunk of this series, and it is not allowed for a metadata chunk of this series, and it is not allowed for a
metadata chunk to keep the decoder state if the previous chunk metadata chunk to keep the decoder state if the previous chunk
is not a repeat metadata chunk. That is, the series of is not a repeat metadata chunk. That is, the series of
metadata chunks must be decompressible without using other metadata chunks must be decompressible without using other
chunks of the framing format file. chunks of the framing format file.
</li> </li>
</ul> </ul>
<t> <t>
The fields contained in this metadata chunk must follow the following The fields contained in this metadata chunk must follow the following
restrictions:</t> restrictions:</t>
<ul> <ul>
<li>If a field is present, it must <li>If a field is present, it must
exactly match the corresponding field of the copied chunk.</li> exactly match the corresponding field of the copied chunk.</li>
<li>It is allowed to leave out a field that is present <li>It is allowed to leave out a field that is present
in the copied chunk. in the copied chunk.
</li> </li>
<li>If a field is present, then it must be present in *all* othe <!-- [rfced] We note that there are a few instances throughout the
r document where asterisks "*" are used for emphasis. Would you
like to utilize the <strong> element in the XML? In the HTML and
PDF outputs, <strong> yields bold text. In the text output,
<strong> yields an asterisk before and after, similar to how it's
used currently.
Current:
* If a field is present, then it must be present in *all* other
repeat metadata chunks when the copied chunk contains this field.
-->
<li>If a field is present, then it must be present in *all* other
repeat metadata chunks when the copied chunk contains this repeat metadata chunks when the copied chunk contains this
field. In other words, if you know you can get the name field field. In other words, if you know you can get the name field
from a repeat chunk, you know that you will be able to get all from a repeat chunk, you know that you will be able to get all
names of all resources from all repeat chunks. names of all resources from all repeat chunks.
</li> </li>
</ul> </ul>
</section> </section>
<section anchor="sect-8.4.10" numbered="true" toc="default"> <section anchor="sect-8.4.10" numbered="true" toc="default">
<name>Central Directory Chunk (Type 9)</name> <name>Central Directory Chunk (Type 9)</name>
<t> <t>
The central directory chunk, along with the repeat metadata chunks, The central directory chunk along with the repeat metadata chunks
allow to quickly find and list compressed resources in the container allow quickly finding and listing compressed resources in the container
file.</t> file.</t>
<t> <t>
The central directory chunk is always uncompressed and does not have The central directory chunk is always uncompressed and does not have
the codec byte. It instead has the following format:</t> the codec byte. It instead has the following format:</t>
<dl> <dl>
<dt>varint:</dt><dd> <t>pointer into the file where the repeat m etadata chunks are located, or 0 if they are not present per chunk listed:</t> <dt>varint:</dt><dd> <t>Pointer into the file where the repeat m etadata chunks are located or 0 if they are not present per chunk listed:</t>
<dl> <dt> <dl> <dt>
varint:</dt><dd> pointer into the file where this chunk begins</dd> varint:</dt><dd>Pointer into the file where this chunk begins.</dd>
<dt>varint:</dt><dd> amount of header bytes N used below</dd> <dt>varint:</dt><dd>Number of header bytes N used below.</dd>
<dt>N bytes:</dt><dd> copy of all the header bytes of the poin <dt>N bytes:</dt><dd>Copy of all the header bytes of the point
ted at chunk, ed at chunk,
including total size, chunk type byte, codec, including total size, chunk type byte, codec,
uncompressed size, dictionary references, X extra uncompressed size, dictionary references, and X extra
header bytes. The content is not repeated here. header bytes. The content is not repeated here.
</dd> </dd>
</dl> </dl>
</dd> </dd>
</dl> </dl>
<t> <t>
The last listed chunk is reached when the end of the contents of the The last listed chunk is reached when the end of the contents of the
central directory are reached. If the end does not match the last central directory are reached. If the end does not match the last
byte of the central directory, the decoder must reject the data byte of the central directory, the decoder must reject the data
stream as invalid.</t> stream as invalid.</t>
<t> <t>
If present, the central directory must list all data and metadata If present, the central directory must list all data and metadata
chunks of all types.</t> chunks of all types.</t>
</section> </section>
<section anchor="sect-8.4.11" numbered="true" toc="default"> <section anchor="sect-8.4.11" numbered="true" toc="default">
<name>Final Footer Chunk (Type 10)</name> <name>Final Footer Chunk (Type 10)</name>
<t> <t>
<!--[rfced] In Section 8.4.11, how may we rephrase this sentence
for clarity? We note that "header" is not used elsewhere when
referring to "container flags"; should it be removed for
consistency?
Original:
Chunk that closes the file, only present if in the initial container Chunk that closes the file, only present if in the initial container
header flags bit 2 was set.
Perhaps:
The final footer chunk closes the file and is only present if bit 2
of the initial container flags was set.
-->
The final footer chunk closes the file and is only present if in the initial
container
header flags bit 2 was set.</t> header flags bit 2 was set.</t>
<t>This chunk has the following content, always uncompressed:</t> <t>This chunk has the following content, which is always uncompresse d:</t>
<dl> <dl>
<dt> <dt>
reversed varint:</dt><dd><t> size of this entire framing format file, reversed varint:</dt><dd><t>Size of this entire framing format file,
including these bytes themselves, or 0 if this including these bytes themselves, or 0 if this
size is not given</t> size is not given.</t></dd>
<dl> <dt>reversed varint:</dt><dd>Pointer to the start of the central
<dt>reversed varint:</dt><dd> pointer to the start of the centra directory, or 0 if there is none.
l directory,or 0 if there is none
</dd> </dd>
</dl> </dl>
</dd>
</dl>
<t> <t>
A reversed varint has the same format as a varint, but has its bytes A reversed varint has the same format as a varint but its bytes
in reversed order and is designed to be parsed from end of file are in reversed order, and it is designed to be parsed from the end of the fi
le
towards the beginning.</t> towards the beginning.</t>
</section> </section>
<section anchor="sect-8.4.12" numbered="true" toc="default"> <section anchor="sect-8.4.12" numbered="true" toc="default">
<name>Chunk ordering</name> <name>Chunk Ordering</name>
<t> <t>
The chunk ordering must follow the rules described below, if the The chunk ordering must follow the rules described below. If the
decoder sees otherwise, it must reject the data stream as invalid.</t> decoder sees otherwise, it must reject the data stream as invalid.</t>
<t indent="3"> <t indent="3">
Padding chunks may be inserted anywhere, even between chunks for Padding chunks may be inserted anywhere, even between chunks for
which the rules below say no other chunk types may come in which the rules below say no other chunk types may come in
between.</t> between.</t>
<t indent="3"> <t indent="3">
Metadata chunks must come immediately before the Data chunks of Metadata chunks must come immediately before the data chunks of
the resource they apply to.</t> the resource they apply to.</t>
<t indent="3"> <t indent="3">
Footer metadata chunks must come immediately after the Data Footer metadata chunks must come immediately after the data
chunks of the resource they apply to.</t> chunks of the resource they apply to.</t>
<t indent="3"> <t indent="3">
There may be only 0 or 1 metadata chunks per resource.</t> There may be only 0 or 1 metadata chunks per resource.</t>
<t indent="3"> <t indent="3">
There may be only 0 or 1 footer metadata chunks per resource.</t> There may be only 0 or 1 footer metadata chunks per resource.</t>
<t indent="3"> <t indent="3">
A resource must exist out of either 1 data chunk, or 1 first A resource must exist out of either 1 data chunk or 1 first
partial data chunk, 0 or more middle partial data partial data chunk, 0 or more middle partial data
chunks, and 1 last partial data chunk, in that order.</t> chunks, and 1 last partial data chunk, in that order.</t>
<t indent="3"> <t indent="3">
Repeat metadata chunks must follow the rules of section 8.4.9.</t> Repeat metadata chunks must follow the rules of <xref target="sect-8.4.9"/ >.</t>
<t indent="3"> <t indent="3">
There may be only 0 or 1 central directory chunks.</t> There may be only 0 or 1 central directory chunks.</t>
<t indent="3"> <t indent="3">
If bit 2 of the container flags is set, there may be only a If bit 2 of the container flags is set, there may be only a
single resource, no metadata chunks of any type, no central single resource, no metadata chunks of any type, no central
directory, and no final footer.</t> directory, and no final footer.</t>
<t indent="3"> <t indent="3">
If bit 2 of the container flags is not set, there must be exactly If bit 2 of the container flags is not set, there must be exactly
1 final footer chunk and it must be the last chunk in the file.</t> 1 final footer chunk, and it must be the last chunk in the file.</t>
</section> </section>
</section> </section>
</section> </section>
<section anchor="sect-9" numbered="true" toc="default"> <section anchor="sect-9" numbered="true" toc="default">
<name>Security Considerations</name> <name>Security Considerations</name>
<t> <t>
The security considerations for brotli <xref target="RFC7932" format="default "/> apply to shared The security considerations for brotli <xref target="RFC7932" format="default "/> apply to shared
brotli as well.</t> brotli as well.</t>
<t> <t>
In addition, the same considerations apply to the decoding of new In addition, the same considerations apply to the decoding of new
file format streams for shared brotli, including shared dictionaries, file format streams for shared brotli, including shared dictionaries,
the framing format and the shared brotli format.</t> the framing format, and the shared brotli format.</t>
<t> <t>
The dictionary must be treated with the same security precautions as The dictionary must be treated with the same security precautions as
the content, because a change to the dictionary can result in a the content because a change to the dictionary can result in a
change to the decompressed content.</t> change to the decompressed content.</t>
<t> <t>
The CRIME attack <xref target="CRIME" format="default"/> shows that it's a ba d idea to compress data The CRIME attack <xref target="CRIME" format="default"/> shows that it's a ba d idea to compress data
from mixed (e.g. public and private) sources -- the data sources from mixed (e.g., public and private) sources -- the data sources
include not only the compressed data but also the dictionaries. For include not only the compressed data but also the dictionaries. For
example, if you compress secret cookies using a public-data-only example, if you compress secret cookies using a public-data-only
dictionary, you still leak information about the cookies.</t> dictionary, you still leak information about the cookies.</t>
<t> <t>
Not only can the dictionary reveal information about the compressed Not only can the dictionary reveal information about the compressed
data, but vice versa, data compressed with the dictionary can reveal data, but vice versa; data compressed with the dictionary can reveal
the contents of the dictionary when an adversary can control parts of the contents of the dictionary when an adversary can control parts of
data to compress and see the compressed size. On the other hand, if data to compress and see the compressed size. On the other hand, if
the adversary can control the dictionary, the adversary can learn the adversary can control the dictionary, the adversary can learn
information about the compressed data.</t> information about the compressed data.</t>
<t> <t>
The most robust defense against CRIME is not to compress private data The most robust defense against CRIME is not to compress private data, e.g.,
(e.g., sensitive headers like cookies or any content with PII). The sensitive headers like cookies or any content with personally identifiable infor
challenge has been to identify secrets within a vast amount of to be mation (PII). The
compressed data. Cloudflare uses a regular expression <xref target="CLOUDFLAR challenge has been to identify secrets within a vast amount of data to be com
E" format="default"/>. pressed.
Cloudflare uses a regular expression <xref target="CLOUDFLARE" format="defaul
t"/>.
Another idea is to extend existing web template systems (e.g., Soy Another idea is to extend existing web template systems (e.g., Soy
<xref target="SOY" format="default"/>) to allow developers to mark secrets th at must not be <xref target="SOY" format="default"/>) to allow developers to mark secrets th at must not be
compressed.</t> compressed.</t>
<t> <t>
A less robust idea, but easier to implement, is to randomize the A less robust idea, but easier to implement, is to randomize the
compression algorithm, i.e., adding randomly generated padding, compression algorithm, i.e., adding randomly generated padding,
varying the compression ratio, etc. The tricky part is to find the varying the compression ratio, etc. The tricky part is to find the
right balance between cost and security, i.e., on one hand we don't right balance between cost and security (i.e., on one hand, we don't
want to add too much padding because it adds a cost to data, on the want to add too much padding because it adds a cost to data, but on the
other hand we don't want to add too little because the adversary can other hand, we don't want to add too little because the adversary can
detect a small amount of padding with traffic analysis.</t> detect a small amount of padding with traffic analysis).</t>
<t> <t>
Another defense in addition is to not use dictionaries for cross- Additionally, another defense is to not use dictionaries for cross-
domain requests, and only use shared brotli for the response when the domain requests and to only use shared brotli for the response when the
origin is the same as where the content is hosted (using CORS). This origin is the same as where the content is hosted (using CORS). This
prevents an adversary from using a private dictionary with user prevents an adversary from using a private dictionary with user
secrets to compress content hosted on the adversary's origin. It secrets to compress content hosted on the adversary's origin. It
also helps prevent CRIME attacks that try to benefit from a public also helps prevent CRIME attacks that try to benefit from a public
dictionary by preventing data compression with dictionaries for dictionary by preventing data compression with dictionaries for
requests that do not originate from the host itself.</t> requests that do not originate from the host itself.</t>
<t> <t>
The content of the dictionary itself should not be affected by The content of the dictionary itself should not be affected by
external users, allowing adversaries to control the dictionary allows external users; allowing adversaries to control the dictionary allows
a form of chosen plaintext attack. Instead, only base the dictionary a form of chosen plaintext attack. Instead, only base the dictionary
on content you control or generic large scale content such as a on content you control or generic large scale content such as a
spoken language, and update the dictionary with large time intervals spoken language and update the dictionary with large time intervals
(days, not seconds) to prevent fast probing.</t> (days, not seconds) to prevent fast probing.</t>
<t> <t>
The use of Highwayhash <xref target="HWYHASH" format="default"/> for dictiona ry identifiers does not The use of HighwayHash <xref target="HWYHASH" format="default"/> for dictiona ry identifiers does not
guarantee against collisions in an adversarial environment and is guarantee against collisions in an adversarial environment and is
intended to be used for identifying the dictionary within a trusted, intended to be used for identifying the dictionary within a trusted,
known set of dictionaries. In an adversarial environment, users of known set of dictionaries. In an adversarial environment, users of
shared brotli should use another mechanism to validate a negotiated shared brotli should use another mechanism to validate a negotiated
dictionary, such as using a cryptographically-proven secure hash.</t> dictionary such as a cryptographically proven secure hash.</t>
</section> </section>
<section anchor="sect-10" numbered="true" toc="default"> <section anchor="sect-10" numbered="true" toc="default">
<name>IANA Considerations</name> <name>IANA Considerations</name>
<t> <t>
This document has no IANA actions.</t> This document has no IANA actions.</t>
</section> </section>
</middle> </middle>
<back> <back>
<references> <references>
<name>References</name> <name>References</name>
<references> <references>
<name>Normative References</name> <name>Normative References</name>
<xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7 932.xml"/> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7 932.xml"/>
<reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257"> <reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257">
<front> <front>
<title>Fast keyed hash/pseudo-random function using SIMD multiply an d permute</title> <title>Fast keyed hash/pseudo-random function using SIMD multiply an d permute</title>
<author><organization> <author fullname="Jyrki Alakuijala"/>
Alakuijala, J., Cox, B., Wassenberg, J.</organization> <author fullname="Bill Cox"/>
</author> <author fullname="Jan Wassenberg"/>
<date month="February" year="2017"/>
<date/>
</front> </front>
<seriesInfo name="DOI" value="10.48550/arXiv.1612.06257"/>
</reference> </reference>
</references> </references>
<references> <references>
<name>Informative References</name> <name>Informative References</name>
<reference anchor="LZ77"> <reference anchor="LZ77">
<front> <front>
<title>A Universal Algorithm for Sequential Data Compression</title> <title>A Universal Algorithm for Sequential Data Compression</title>
<author initials="J." surname="Ziv" fullname="J. Ziv"/> <author initials="J." surname="Ziv" fullname="J. Ziv"/>
<author initials="A." surname="Lempel" fullname="A. Lempel"/> <author initials="A." surname="Lempel" fullname="A. Lempel"/>
<date month="May" year="1997"/> <date month="May" year="1977"/>
</front> </front>
<refcontent>IEEE Transactions on Information Theory. 23 (3): 337-343</refcontent <seriesInfo name="DOI" value="10.1109/TIT.1977.1055714"/>
> <refcontent>IEEE Transactions on Information Theory, vol. 23, no. 3, p
p. 337-343</refcontent>
</reference> </reference>
<reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-sol ution-to-compression-oracles-on-the-web/"> <reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-sol ution-to-compression-oracles-on-the-web/">
<front> <front>
<title/> <title>A Solution to Compression Oracles on the Web</title>
<author> <author fullname="Blake Loring"/>
</author> <date day="27" month="March" year="2018"/>
<date/>
</front> </front>
<refcontent>The Cloudflare Blog</refcontent>
</reference> </reference>
<!-- [rfced] Please review the following reference. The original URL for this
reference (https://developers.google.com/closure/templates/) redirects to
a page titled "Closure Tools" (https://developers.google.com/closure).
Is this reference still correct or is an update needed? Note that the only
instance of this reference being cited in the text is shown below.
Current (text):
Another idea is to extend existing web template systems (e.g., Soy
[SOY]) to allow developers to mark secrets that must not be
compressed.
Current (reference):
[SOY] Google Developers, "Closure Tools",
<https://developers.google.com/closure/templates/>.
-->
<reference anchor="SOY" target="https://developers.google.com/closure/te mplates/"> <reference anchor="SOY" target="https://developers.google.com/closure/te mplates/">
<front> <front>
<title/> <title>Closure Tools</title>
<author> <author>
</author> <organization>Google Developers</organization>
</author>
<date/> <date/>
</front> </front>
</reference> </reference>
<reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2 012-4929"> <reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2 012-4929">
<front> <front>
<title/> <title>CVE-2012-4929</title>
<author> <author>
<organization>CVE Program</organization>
</author> </author>
<date/> <date/>
</front> </front>
</reference> </reference>
</references> </references>
</references> </references>
<section numbered="false" anchor="acknowledgments" toc="default"> <section numbered="false" anchor="acknowledgments" toc="default">
<name>Acknowledgments</name> <name>Acknowledgments</name>
<t> <t>
The authors would like to thank Robert Obryk for suggesting The authors would like to thank <contact fullname="Robert Obryk"/> for sugges ting
improvements to the format and the text of the specification.</t> improvements to the format and the text of the specification.</t>
</section> </section>
</back>
</rfc> <!-- [rfced] FYI - We have added expansions for the following abbreviations
per Section 3.6 of RFC 7322 ("RFC Style Guide"). Please review each
expansion in the document carefully to ensure correctness.
most significant bit (MSB)
least significant bit (LSB)
personally identifiable information (PII)
-->
<!-- [rfced] Please review the "Inclusive Language" portion of the online
Style Guide <https://www.rfc-editor.org/styleguide/part2/#inclusive_language>
and let us know if any changes are needed. Updates of this nature typically
result in more precise language, which is helpful for readers.
Note that our script did not flag any words in particular, but this should
still be reviewed as a best practice.
-->
</back> </rfc>
 End of changes. 216 change blocks. 
417 lines changed or deleted 804 lines changed or added

This html diff was produced by rfcdiff 1.48.