<?xml version='1.0' encoding='utf-8'?> <!-- [rfced] Because this document updates RFC 7932, please review the errata reported for RFC 7932 (https://www.rfc-editor.org/errata/rfc7932) and let us know if you confirm our opinion that none of them are relevant to the content of this document. --> <!DOCTYPE rfc [ <!ENTITY nbsp " "> <!ENTITY zwsp "​"> <!ENTITY nbhy "‑"> <!ENTITY wj "⁠"> ]> <rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="IETF" docName="draft-vandevenne-shared-brotli-format-15" number="9841" consensus="true" category="info" updates="7932" ipr="trust200902" obsoletes="" xml:lang="en" symRefs="true" sortRefs="true" tocInclude="true" version="3"><!-- xml2rfc v2v3 conversion 3.27.0 --> <!-- Generated by id2xml 1.5.2 on 2025-02-12T17:39:08Z --><front> <title abbrev="Shared Brotli Data Format">Shared Brotli Compressed Data Format</title> <seriesInfoname="Internet-Draft" value="draft-vandevenne-shared-brotli-format-15"/>name="RFC" value="9841"/> <author initials="J." surname="Alakuijala" fullname="Jyrki Alakuijala"> <organization abbrev="Google, Inc">Google, Inc.</organization> <address> <email>jyrki@google.com</email> </address> </author> <author initials="T." surname="Duong" fullname="Thai Duong"> <organization abbrev="Google, Inc">Google, Inc.</organization> <address> <email>thaidn@google.com</email> </address> </author> <author initials="E." surname="Kliuchnikov" fullname="Evgenii Kliuchnikov"> <organization abbrev="Google, Inc">Google, Inc.</organization> <address> <email>eustas@google.com</email> </address> </author> <author initials="Z." surname="Szabadka" fullname="Zoltan Szabadka"> <organization abbrev="Google, Inc">Google, Inc.</organization> <address> <email>szabadka@google.com</email> </address> </author> <author initials="L." surname="Vandevenne" fullname="LodeVandevenne">Vandevenne" role="editor"> <organization abbrev="Google, Inc">Google, Inc.</organization> <address> <email>lode@google.com</email> </address> </author> <date year="2025"month="June"/>month="August"/> <!-- [rfced] Please verify that WIT is the correct area for this document. --> <area>WIT</area> <!-- [rfced] Please insert any keywords (beyond those that appear in the title) for use on https://www.rfc-editor.org/search. --> <abstract> <t> This specification defines a data format for shared brotli compression, which adds support for shared dictionaries, largewindowwindow, and a container format to brotli (RFC 7932). Shared dictionaries and large window support allow significant compression gains compared to regular brotli. This document updates RFC 7932.</t> </abstract> </front> <middle> <section anchor="sect-1" numbered="true" toc="default"> <name>Introduction</name> <section anchor="sect-1.1" numbered="true" toc="default"> <name>Purpose</name> <t> The purpose of this specification is to extend the brotli compressed data format(<xref<xref target="RFC7932"format="default"/>)format="default"/> with new abilities that allow further compressiongains:</t>gains.</t> <ul spacing="normal"> <li> <t>Shared dictionaries allow a static shared context between encoder and decoder for significant compression gains.</t> </li> <li> <t>Large window brotli allows much larger back reference distances to give compression gains for files over16MiB.</t>16 MiB.</t> </li> <li> <t>The framing format is a container format that allows storage of multiple resources andthat referencereferences dictionaries.</t> </li> </ul> <t> This document is the authoritative specification of shared brotli data formats and the backwards compatible changes tobrotli, and defines:</t>brotli. This document also defines the following:</t> <ul> <li> <t>The data format of serialized shared dictionaries</t> </li> <li> <t>The data format of the framing format</t> </li> <li> <t>The encoding of window bits and distances for large window brotli in the brotli data format</t> </li> <li> <t>The encoding of shared dictionary references in the brotli data format</t> </li> </ul> </section> <section anchor="sect-1.2" numbered="true" toc="default"> <name>Intendedaudience</name>Audience</name> <t> This specification is intended for use by software implementers to compress data into and/or decompress data from the shared brotli dictionary format.</t> <t> The text of the specification assumes a basic background in programming at the level of bits and other primitive data representations. Familiarity with the technique of LZ77 coding <xref target="LZ77"/> ishelpfulhelpful, but not required.</t> </section> <section anchor="sect-1.3" numbered="true" toc="default"> <name>Scope</name> <t> This specification defines a data format for shared brotli compression, which adds support for dictionaries and extended features to brotli <xref target="RFC7932" format="default"/>.</t> </section> <section anchor="sect-1.4" numbered="true" toc="default"> <name>Compliance</name> <t> Unless otherwise indicated below, a compliant decompressor must be able to accept and decompress any data set that conforms to all the specifications presented here.AAdditionally, a compliant compressor must produce data sets that conform to all the specifications presented here.</t> </section> <section anchor="sect-1.5" numbered="true" toc="default"> <name>Definitions oftermsTerms andconventions used</name>Conventions Used</name> <dl><dt>Byte:</dt><dd> 8<dt>Byte:</dt><dd>8 bits stored or transmitted as a unit (same as an octet). For this specification, a byte is exactly 8 bits, even on machines that store a character on a number of bits different from eight. See below for the numbering of bits within a byte.</dd><dt>String:</dt><dd>a<dt>String:</dt><dd>A sequence of arbitrary bytes.</dd> </dl> <t> Bytes stored within a computer do not have a "bitorder",order" since they are always treated as a unit. However, a byte considered as an integer between 0 and 255 does have amost-most significant bit (MSB) andleast-significant bit,least significant bit (LSB), and since we write numbers with themost-significantmost significant digit on the left,we also writebytes with themost-significant bitMSB are also written on the left. In the diagrams below,we numberthe bits of a byte are written so that bit 0 is theleast-significant bit,LSB, i.e., the bits arenumbered:</t>numbered as follows:</t> <artwork name="" type="" align="left" alt=""><![CDATA[ +--------+ |76543210| +--------+ ]]></artwork> <t> Within a computer, a number may occupy multiple bytes. All multi-byte numbers in the format described here are unsigned and stored with theleast-significantleast significant byte first (at the lower memory address). For example, the decimal 16-bit number 520 is stored as:</t> <!--[rfced] In the second figure in Section 5.1, is "more significant byte" intended (we note that it was used in RFC 7932), or should it be changed to "most significant byte", which is used more often in the RFC Series? Original: 0 1 +- - - - + - - - -+ |00001000|00000010| +- - - - + - - - -+ ^ ^ | | | + more significant byte = 2 x 256 + less significant byte = 8 --> <artwork name="" type="" align="left" alt=""><![CDATA[ 0 1 +--------+--------+ |00001000|00000010| +--------+--------+ ^ ^ | | | + more significant byte = 2 x 256 + less significant byte = 8 ]]></artwork> <section anchor="sect-1.5.1" numbered="true" toc="default"> <name>Packing intobytes</name>Bytes</name> <t> This document does not address the issue of the order in which bits of a byte are transmitted on a bit-sequential medium, since the final data format described here is byte- rather than bit-oriented. However,we describethe compressed block format is described below as a sequence of data elements of various bit lengths, not a sequence of bytes.WeTherefore, we mustthereforespecify how to pack these data elements into bytes to form the final compressed byte sequence:</t> <ul spacing="normal"> <li> <t>Data elements are packed into bytes in order of increasing bit number within the byte, i.e., starting with theleast-significant bitLSB of the byte.</t> </li> <li> <t>Data elements other than prefix codes are packed starting with theleast-significant bitLSB of the data element. These are referred to here as integer values and are considered unsigned.</t> </li> <li> <t>Prefix codes are packed starting with themost-significant bitMSB of the code.</t> </li> </ul> <t> In other words, if one were to print out the compressed data as a sequence ofbytes,bytes starting with the first byte at the *right* margin and proceeding to the *left*, with themost-significant bitMSB of each byte on the left as usual, one would be able to parse the result from right toleft,left with fixed-width elements in the correct MSB-to-LSB order and prefix codes in bit-reversed order (i.e., with the first bit of the code in the relative LSB position).</t> <t> As an example, consider packing the following data elements into a sequence of 3 bytes: 3-bit integer value 6, 4-bit integer value 2, 3-bit prefix code b'110, 2-bit prefix code b'10, and 12-bit integer value 3628.</t> <artwork name="" type="" align="left" alt=""><![CDATA[ byte 2 byte 1 byte 0 +--------+--------+--------+ |11100010|11000101|10010110| +--------+--------+--------+ ^ ^ ^ ^ ^ | | | | | | | | | +------ integer value 6 | | | +---------- integer value 2 | | +-------------- prefix code 110 | +---------------- prefix code 10 +----------------------------- integer value 3628 ]]></artwork> </section> </section> </section> <section anchor="sect-2" numbered="true" toc="default"> <name>Shared Brotli Overview</name> <t> Shared brotli extends brotli <xref target="RFC7932" format="default"/> with support for shared dictionaries, a larger LZ77windowwindow, and a framing format.</t> </section> <section anchor="sect-3" numbered="true" toc="default"> <name>Shared Dictionaries</name> <t> A shared dictionary is a piece of data shared by a compressor and decompressor. The compressor can take advantage of the dictionary context to encode the input in a more compact manner. The compressor and the decompressor must use exactly the same dictionary. A shared dictionary is specially useful to compress short input sequences.</t> <t>A shared brotli dictionary can use two methods of sharing context:</t><t>An LZ77 dictionary. The<dl><dt>LZ77 dictionary:</dt><dd>The encoder and decoder could refer to a given sequence of bytes. Multiple LZ77 dictionaries can beset.</t> <t>A customset.</dd> <dt>Custom staticdictionary: adictionary:</dt><dd>A word list with transforms. The encoder and decoder will replace the static dictionary data with the data in the shared dictionary. The original static dictionary is described in <xref target="sect-8" format="default"/> in <xref target="RFC7932" format="default"/>. The original data fromAppendix AAppendices <xref section="A" target="RFC7932" sectionFormat="bare"/> andAppendix B of<xref section="B" target="RFC7932"format="default"/>sectionFormat="bare"/> of <xref target="RFC7932"/> will be replaced. In addition, it is possible to dynamically switch this dictionary based on the data compressioncontext,context and/ortoinclude a reference to the original dictionary in the customdictionary.</t>dictionary.</dd></dl> <t> If no shared dictionary issetset, the decoder behaves the same as in <xref target="RFC7932" format="default"/> on a brotli stream.</t> <t> <!-- [rfced] We have updated the following sentence for clarity. Please let us know of any objections. Original: If a shared dictionary is set, then it can set any of: LZ77 dictionaries, overriding static dictionary words, and/or overriding transforms. Current: If a shared dictionary is set, then it can set LZ77 dictionaries, override static dictionary words, and/or override transforms. --> If a shared dictionary is set, then it can set LZ77 dictionaries, override static dictionary words, and/or override transforms.</t> <section anchor="sect-3.1" numbered="true" toc="default"> <name>Custom Static Dictionaries</name> <t> <!-- [rfced] May we rephrase the following sentences to avoid using "RFC 7932" as an adjective and for subject-verb agreement (due to multiple behaviors being overridden)? i) Current: If a custom word list is set, then the following behavior of the RFC 7932 decoder [RFC7932] is overridden... Perhaps: If a custom word list is set, then the following behaviors of the decoder defined in [RFC7932] are overridden... ii) Current: If a custom transforms list is set without context dependency, then the following behavior of the RFC 7932 decoder [RFC7932] is overridden... Perhaps: If a custom transforms list is set without context dependency, then the following behaviors of the decoder defined in [RFC7932] are overridden... --> If a custom word list is set, then the following behavior of the RFC 7932 decoder <xref target="RFC7932" format="default"/> is overridden:</t> <t indent="3"> Instead of the Static Dictionary Data fromAppendix A of<xref section="A" target="RFC7932" format="default"/>, one or more word lists from the custom static dictionary data are used.</t> <t indent="3"> Instead of NDBITS at the end ofAppendix A,<xref section="A" target="RFC7932" format="default"/>, a custom SIZE_BITS_BY_LENGTH per custom word list is used. </t> <t indent="3"> The copy length for a static dictionary reference must be between 4 and 31 and may not be a value for which SIZE_BITS_BY_LENGTH of this dictionary is 0.</t> <t> If a custom transforms list is set without context dependency, then the following behavior of the RFC 7932 decoder <xref target="RFC7932" format="default"/> is overridden:</t> <t indent="3"> The "List of Word Transformations" fromAppendix B<xref section="B" target="RFC7932" format="default"/> is overridden by one or more lists of custom prefixes,suffixessuffixes, and transform operations.</t> <t indent="3"> The transform_id must be smaller than the number of transforms given in the custom transforms list.</t> <t> If the dictionary is context dependent, it includes a lookup table of64 worda 64-word list and transform list combinations. When resolving a static dictionary word, the decoder computes the literalcontext id,Context ID as described insection 7.1. of<xref target="RFC7932"format="default"/>.section="7.1"/>. The literalcontext idContext ID is used as the index in the lookup tables to select the word list and transforms to use. If the dictionary is not context dependent, thisidID isimplicitelyimplicitly 0 instead.</t> <t> <!-- [rfced] To improve the readability of this paragraph, may we format the text into a list as follows? Also, should "a next dictionary" be rephrased to "the next dictionary", "a dictionary that follows", or otherwise to use the correct article? Current: If a distance goes beyond the dictionary for the currentidID and multiplewordword/transform list/ transformcombinations are defined, then a next dictionary is used in the following order: if not context dependent, the same order as defined in the shared dictionary. If context dependent, the index matching the current context is used first, the same order as defined in the shared dictionary excluding the current context are used next. Perhaps: If a distance goes beyond the dictionary for the current ID and multiple word/transform list combinations are defined, then the next dictionary is used in the following order: * If context dependent: * use the index matching the current context first, and then * use the same order as defined in the shared dictionary (excluding the current context) next. * If not context dependent: * use the same order as defined in the shared dictionary. --> If a distance goes beyond the dictionary for the current ID and multiple word/transform list combinations are defined, then a next dictionary is used in the following order: if not context dependent, the same order as defined in the shared dictionary. If context dependent, the index matching the current context is used first, the same order as defined in the shared dictionary excluding the current context are used next.</t> <section anchor="sect-3.1.1" numbered="true" toc="default"> <name>Transform Operations</name> <t> A shared dictionary may include custom wordtransformations,transformations to replace those specified in <xref target="sect-8" format="default"/> andAppendix B of<xref section="B" target="RFC7932" format="default"/>. <!-- [rfced] Would the following proposed text retain the original meaning of the sentence? Current: A transform consists of a possible prefix, a transform operation, for some operations a parameter, and a possible suffix. Perhaps: A transform consists of a possible prefix, a transform operation, a parameter (for some operations), and a possible suffix. --> A transform consists of a possible prefix, a transform operation, for some operations a parameter, and a possible suffix. In the shared dictionary format, the transform operation is represented by a numerical ID, which is listed in the table below.</t> <table anchor="operation-ids"><!-- Assign an anchor --><name></name><!-- Give the table a title --><thead> <tr> <th>ID</th><!-- <th>: header --><th>Operation</th> </tr> </thead> <tbody><!-- The rows --><tr> <td>0</td> <td>Identity</td> </tr><tr> <td>1</td> <td>OmitLast1</td> </tr><tr> <td>2</td> <td>OmitLast2</td> </tr><tr> <td>3</td> <td>OmitLast3</td> </tr><tr> <td>4</td> <td>OmitLast4</td> </tr><tr> <td>5</td> <td>OmitLast5</td> </tr><tr> <td>6</td> <td>OmitLast6</td> </tr><tr> <td>7</td> <td>OmitLast7</td> </tr><tr> <td>8</td> <td>OmitLast8</td> </tr><tr> <td>9</td> <td>OmitLast9</td> </tr><tr> <td>10</td> <td>FermentFirst</td> </tr><tr> <td>11</td> <td>FermentAll</td> </tr><tr> <td>12</td> <td>OmitFirst1</td> </tr> <tr> <td>13</td> <td>OmitFirst2</td> </tr><tr> <td>14</td> <td>OmitFirst3</td> </tr><tr> <td>15</td> <td>OmitFirst4</td> </tr><tr> <td>16</td> <td>OmitFirst5</td> </tr><tr> <td>17</td> <td>OmitFirst6</td> </tr><tr> <td>18</td> <td>OmitFirst7</td> </tr><tr> <td>19</td> <td>OmitFirst8</td> </tr><tr> <td>20</td> <td>OmitFirst9</td> </tr><tr> <td>21</td> <td>ShiftFirst (by PARAMETER)</td> </tr><tr> <td>22</td> <td>ShiftAll (by PARAMETER)</td> </tr> </tbody> </table> <!-- <artwork name="" type="" align="left" alt=""><![CDATA[ ID Operation 0 Identity 1 OmitLast1 2 OmitLast2 3 OmitLast3 4 OmitLast4 5 OmitLast5 6 OmitLast6 7 OmitLast7 8 OmitLast8 9 OmitLast9 10 FermentFirst 11 FermentAll 12 OmitFirst1 13 OmitFirst2 14 OmitFirst3 15 OmitFirst4 16 OmitFirst5 17 OmitFirst6 18 OmitFirst7 19 OmitFirst8 20 OmitFirst9 21 ShiftFirst (by PARAMETER) 22 ShiftAll (by PARAMETER) ]]></artwork> --><t><!-- [rfced] The original xref citation in the XML pointed to Section 8 of this document. We have updated as follows. Please let us know any objections. Current: Operations 0 to 20 are specified in<xref target="sect-8" format="default"/>Section 8 of [RFC7932]. ShiftFirst and ShiftAll transform specifically encoded SCALARs. --> <t> Operations 0 to 20 are specified in <xref section="8" target="RFC7932" format="default"/>. ShiftFirst and ShiftAll transform specifically encoded SCALARs.</t> <t> A SCALAR is a 7-, 11-,16-16-, or 21-bit unsigned integer encoded with 1, 2,33, or 4bytes respectivelybytes, respectively, with the following bit contents:</t><!-- SG: should these be individual figures? --><artwork name="" type="" align="left" alt=""><![CDATA[ 7-bit SCALAR: +--------+ |0sssssss| +--------+ 11-bit SCALAR: +--------+--------+ |110sssss|XXssssss| +--------+--------+ 16-bit SCALAR: +--------+--------+--------+ |1110ssss|XXssssss|XXssssss| +--------+--------+--------+ 21-bit SCALAR: +--------+--------+--------+--------+ |11110sss|XXssssss|XXssssss|XXssssss| +--------+--------+--------+--------+ ]]></artwork> <t> Given the input bytes matching the SCALAR encoding pattern, the SCALAR value is obtained by concatenation of the "s" bits, with themost significant bitsMSBs coming from the earliest byte. The "X" bits could have arbitrary value.</t> <t> An ADDEND is defined as the result of limited sign extension of a 16-bit unsigned PARAMETER:</t> <t indent="3"> Atfirstfirst, the PARAMETER is zero-extended to 32 bits. After this, 0xFF0000 is added if the resulting value is greater or equal than0x8000, then 0xFF0000 is added.</t>0x8000.</t> <t> ShiftAll starts at the beginning of the word and repetitively applies the followingtransformtransformation until the whole word is transformed:</t> <t indent="3"> If the next untransformed byte matches the first byte of the 7-, 11-,16-16-, or 21-bit SCALAR pattern, then:</t> <t indent="6"> If the untransformed part of the word is not long enough to match the whole SCALAR pattern, then the whole word is marked as transformed.</t> <t indent="6"> <!-- [rfced] We have rephrased the following sentence for readability. Please let us know any objections. Original: Next, 1, 2, 3 or 4 not transformed bytes marked as transformed, according to the SCALAR pattern length. Current: Next, 1, 2, 3, or 4 untransformed bytes are marked as transformed according to the SCALAR pattern length. --> Otherwise, let SHIFTED be the sum of the ADDEND and the encoded SCALAR. The lowest bits from SHIFTED are written back into the corresponding "s" bits. The "0","1""1", and "X" bits remain unchanged. Next, 1, 2,33, or 4not transformeduntransformed bytes are marked astransformed,transformed according to the SCALAR pattern length.</t> <t indent="3"> Otherwise, the next untransformed byte is marked as transformed.</t> <t> ShiftFirst applies the sametransformtransformation as ShiftAll, but does not iterate.</t> </section> </section> <section anchor="sect-3.2" numbered="true" toc="default"> <name>LZ77 Dictionaries</name> <t> If an LZ77 dictionary is set,thenthe decoder treatsthisit as a regular LZ77copy,copy but behaves as if the bytes of this dictionary are accessible as the uncompressed bytes outside of the regular LZ77 window for backwards references.</t> <t> Let LZ77_DICTIONARY_LENGTH be the length of the LZ77 dictionary. Then word_id, described in <xreftarget="sect-8" format="default"/> in <xrefsection="8" target="RFC7932" format="default"/>, is redefined as:</t> <artwork name="" type="" align="left" alt=""><![CDATA[ word_id = distance - (max allowed distance + 1 + LZ77_DICTIONARY_LENGTH) ]]></artwork> <t> For the case when LZ77_DICTIONARY_LENGTH is 0, word_id matches the <xref target="RFC7932" format="default"/> definition.</t> <t> Let dictionary_addressbe</t> <t>be:</t> <t indent="3"> LZ77_DICTIONARY_LENGTH + max allowed distance - distance</t> <t> Then distance values of <length, distance> pairs <xref target="RFC7932" format="default"/> in range (max allowed distance + 1)..(LZ77_DICTIONARY_LENGTH + max allowed distance) are interpreted as references starting in the LZ77 dictionary at the byte at dictionary_address. If length is longer than (LZ77_DICTIONARY_LENGTH - dictionary_address), then the reference continues to copy (length - LZ77_DICTIONARY_LENGTH + dictionary_address) bytes from the regular LZ77 window starting at the beginning.</t> </section> </section> <section anchor="sect-4" numbered="true" toc="default"> <name>Varint Encoding</name> <t>A varint is encoded in base 128 in one or more bytes as follows:</t> <artwork name="" type="" align="left" alt=""><![CDATA[ +--------+--------+ +--------+ |1xxxxxxx|1xxxxxxx| {0-8 times} |0xxxxxxx| +--------+--------+ +--------+ ]]></artwork> <t> where the "x" bits of the first byte are theleast significant bitsLSBs of the value and the "x" bits of the last byte are themost significant bitsMSBs of the value. <!-- [rfced] May we rephrase as follows for clarity? Current: The last byte must have its MSB set to 0, all other bytes to 1 to indicate there is a next byte. Perhaps: The last byte must have its MSB set to 0 and all other bytes must have their MSBs set to 1 to indicate there is a next byte. --> The last byte must have its MSB set to 0, all other bytes to 1 to indicate there is a next byte.</t> <t> The maximum allowed amount of bits to read is 63bits,bits; if the 9th byte is present and has its MSBsetset, then the stream must be considered as invalid.</t> </section> <!-- [rfced] Upon converting this document to XML, we have done our best to preserve the original indentation of definition lists that start in Section 5. Please let us know if any specific adjustments need to be made or if the current indentation is satisfactory. --> <section anchor="sect-5" numbered="true" toc="default"> <name>Shared Dictionary Stream</name> <t> The shared dictionary stream encodes a custom dictionary forbrotlibrotli, including custom words and/or custom transformations. A shared dictionary may appear as a standalone or as contents of a resource in a framing format container.</t> <t> A compliant shared brotli dictionary stream must have the following format:</t> <dl newline="false" spacing="normal" indent="3"> <dt>2 bytes:</dt><dd><!-- [rfced] May we rephrase the following for clarity? Original: 2 bytes: file signature, in hexadecimal the bytes 91, 0. Perhaps: 2 bytes: File signature in hexadecimal format (bytes 91 and 0). --> <dd> File signature, in hexadecimal the bytes 91, 0.</dd> <dt>varint:</dt><dd>LZ77_DICTIONARY_LENGTH,<dd>LZ77_DICTIONARY_LENGTH. The number of bytes fora LZ77 dictionary,an LZ7711 dictionary or 0 if there is none. The maximum allowed value is the maximum possible sliding window size of brotli oroflarge window brotli. </dd> <!--[rfced] In Section 5, may we add "in range" to these sentences for clarity and consistency as shown below? Original: 1 byte: NUM_CUSTOM_WORD_LISTS, may have value 0 to 64 1 byte: NUM_CUSTOM_TRANSFORM_LISTS, may have value 0 to 64 1 byte: NUM_DICTIONARIES, may have value 1 to 64 Perhaps: 1 byte: NUM_CUSTOM_WORD_LISTS. May have a value in range 0 to 64. 1 byte: NUM_CUSTOM_TRANSFORM_LISTS. May have a value in range 0 to 64. 1 byte: NUM_DICTIONARIES. May have a value in range 1 to 64. --> <dt> LZ77_DICTIONARY_LENGTHbytes:</dt><dd> contentsbytes:</dt><dd>Contents of the LZ77 dictionary.</dd> <dt>1byte:</dt><dd> <t>NUM_CUSTOM_WORD_LISTS, maybyte:</dt><dd><t>NUM_CUSTOM_WORD_LISTS. May have a value of 0 to64</t> <t> NUM_CUSTOM_WORD_LISTS64.</t></dd> <dt>NUM_CUSTOM_WORD_LISTS times a wordlist,list with the following format for each wordlist: </t>list:</dt> <dd> <t><br/></t> <dl> <dt>28bytes:</dt><dd>SIZE_BITS_BY_LENGTH,bytes:</dt><dd>SIZE_BITS_BY_LENGTH. An array of 28 unsigned 8-bit integers, indexed by word lengths 4 to 31. The value represents log2(number of words of this length), with the exception of 0 meaning 0 words of this length. The max allowed length value is 15 bits. OFFSETS_BY_LENGTH is computed from this as OFFSETS_BY_LENGTH[i + 1] = OFFSETS_BY_LENGTH[i] + (SIZE_BITS_BY_LENGTH[i] ? (i << SIZE_BITS_BY_LENGTH[i]) :0) </dd>0).</dd> <dt>Nbytes:</dt><dd> wordsbytes:</dt><dd>Words dictionary data, where N is OFFSETS_BY_LENGTH[31] + (SIZE_BITS_BY_LENGTH[31] ? (31 << SIZE_BITS_BY_LENGTH[31]) : 0),firstwith all the words of shortestlength,length first, then all words of the next length, and so on, wherefor each lengththere are either 0 or a positive power of twoamountnumber ofwords.words for each length. </dd> </dl></dd> <dt> 1byte:</dt><dd><t>NUM_CUSTOM_TRANSFORM_LISTS, maybyte:</dt><dd>NUM_CUSTOM_TRANSFORM_LISTS. May have a value of 0 to64</t> <t>64.</dd> <dt> NUM_CUSTOM_TRANSFORM_LISTS times a transformlist,list with the following format for each transform list:</t></dt> <dd> <t><br/></t> <dl> <dt>2 bytes:</dt><dd>PREFIX_SUFFIX_LENGTH, thePREFIX_SUFFIX_LENGTH. The length of prefix/suffix data. Must be at least 1 because the list must always end with a zero-length stringlet even if it is empty. </dd> <dt>NUM_PREFIX_SUFFIXtimes:</dt> <dd><t>prefix/suffix stringlet.</t> <t>times:</dt><dd><t>Prefix/suffix stringlet. NUM_PREFIX_SUFFIX is theamountnumber of stringlets parsed and must be in range 1..256. </t><dl> <dt>1 byte:</dt><dd>STRING_LENGTH, theSTRING_LENGTH. The length of the entry contents. 0 for the last (terminating) entry of the transform list. For otherentriesentries, STRING_LENGTH must be in range 1..255. The 0 entry must be present and must be the last byte of the PREFIX_SUFFIX_LENGTH bytes of prefix/suffix data, else the stream must be rejected as invalid.</dd> <dt>STRING_LENGTH bytes:</dt><dd>contentsContents of the prefix/suffix.</dd> </dl></dd> <!--[rfced] In Section 5, please consider the following changes for consistency within the list as we note variance with "TERM times foo" vs. "TERM times: Foo:" (for example, "NUM_CUSTOM_WORD_LISTS times a word list" vs. "NUM_DICTIONARIES times: The DICTIONARY_MAP:" Additionally, should text be added such as "listed below", "the following" and "which contains" to introduce the next list of items? i) Current: NTRANSFORMS times: Data for each transform: Perhaps: NTRANSFORMS times the data for each transform listed below: ii) Current: If and only if at least one transform has operation index ShiftFirst or ShiftAll: NTRANSFORMS times: Perhaps: If and only if at least one transform has operation index ShiftFirst or ShiftAll, then NTRANSFORMS times the following: iii) Current: NUM_DICTIONARIES times: The DICTIONARY_MAP: Perhaps: NUM_DICTIONARIES times the DICTIONARY_MAP, which contains: --> <dt>1 byte:</dt><dd>NTRANSFORMS, amountNTRANSFORMS. Number of transformation triplets.</dd> <dt>NTRANSFORMStimes:</dt><dd><t> datatimes:</dt><dd><t>Data for each transform:</t> <dl> <dt> 1byte:</dt><dd> indexbyte:</dt><dd>Index of prefix in prefix/suffix data; must be less than NUM_PREFIX_SUFFIX. </dd> <dt>1byte:</dt><dd> indexbyte:</dt><dd>Index of suffix in prefix/suffix data; must be less than NUM_PREFIX_SUFFIX.</dd> <dt>1byte:</dt><dd> operation index,byte:</dt><dd>Operation index; must be an index in the table of operations listed inthe Section "Transform Operations".</dd></dl> <t><xref target="sect-3.1.1"/>.</dd></dl></dd></dl> <dl><dt> If and only if at least one transform has operation index ShiftFirst orShiftAll: </t> <t>ShiftAll:</dt><dd> <t><br/></t> <dl> <dt> NTRANSFORMStimes:</t>times:</dt><dd><t><br/></t> <dl> <dt> 2bytes:</dt><dd> parametersbytes:</dt><dd>Parameters for the transform. If the transform does not have type ShiftFirst or ShiftAll, the value must be 0. ShiftFirst and ShiftAll interpret these bytes as an unsigned 16-bit integer.</dd></dl> <t>if</dd></dl></dd></dl></dd></dl></dd></dl> <dl> <dt>If NUM_CUSTOM_WORD_LISTS > 0 or NUM_CUSTOM_TRANSFORM_LISTS > 0 (else implicitly NUM_DICTIONARIES is 1 and points to the brotli built-in and there is no contextmap) </t>map):</dt> <dd> <t><br/></t> <dl> <dt>1byte:</dt><dd> NUM_DICTIONARIES, maybyte:</dt> <dd>NUM_DICTIONARIES. May have value 1 to 64. Each dictionary is a combination of a word list and a transform list. Each next dictionary is used when the distance goes beyond the previous. If a CONTEXT_MAP is enabled, then the dictionary matching the context is moved to the front in the order for this context. </dd> <dt>NUM_DICTIONARIES times:</dt><dd><t>the<t>The DICTIONARY_MAP:</t> <dl><dt> 1byte:</dt><dd> indexbyte:</dt><dd>Index into a custom wordlist,list or value NUM_CUSTOM_WORD_LISTS to indicateto useusing the brotli <xref target="RFC7932" format="default"/> built-in default wordlistlist. </dd> <dt>1byte:</dt><dd>indexbyte:</dt><dd>Index into a custom transformlist,list or value NUM_CUSTOM_TRANSFORM_LISTS to indicateto useusing the brotli <xref target="RFC7932" format="default"/> built-in default transformlistlist. </dd> </dl> </dd> <dt>1byte:</dt><dd><t> CONTEXT_ENABLED, if 0byte:</dt><dd>CONTEXT_ENABLED. If 0, there is no contextmap, if 1map. If 1, a context map used to select the dictionary is encodedbelow</t></dd></dl> <t>Ifas below.</dd> </dl> <dl> <dt>If CONTEXT_ENABLED is 1, there is a context map for the 64 brotli <xref target="RFC7932" format="default"/> literals contexts:</t></dt> <dd><t><br/></t> <dl> <dt>64 bytes:</dt><dd>CONTEXT_MAP, indexCONTEXT_MAP. Index into the DICTIONARY_MAP for the first dictionary to use for thiscontextcontext. </dd></dl></dd></dl> </dd> </dl></dl></dd></dl> </section> <section anchor="sect-6" numbered="true" toc="default"> <name>Large Window Brotli Compressed Data Stream</name> <t> Large window brotli allows a sliding window beyond the 24-bit maximum of regular brotli <xref target="RFC7932" format="default"/>.</t> <t> The compressed data stream is backwards compatible to brotli <xref target="RFC7932"format="default"/>,format="default"/> and may optionally have the following differences:</t><dl><dt>Encoding<!--[rfced] Would the following text (second sentence that starts with "Encoding") be easier to read if it was a complete sentence as shown below (note that the first sentence is included for context only)? Also, under "6 bits", should "value" be singular or plural (e.g., "must have values in" or "must have a value in")? Original: The compressed data stream is backwards compatible to brotli [RFC7932] and may optionally have the following differences: Encoding of WBITS in the streamheader:</dt><dd><t>header: The following new pattern of 14 bits is supported: 8 bits: Value 00010001 to indicate a large window brotli stream. 6 bits: WBITS. Must have value in range 10 to 62. Perhaps: The compressed data stream is backwards compatible to brotli [RFC7932] and may optionally have the following differences. If the encoding of WBITS is in the stream header, then the following new pattern of 14 bits issupported:</t>supported: 8 bits: Value 00010001 to indicate a large window brotli stream. 6 bits: WBITS. Must have a value in range 10 to 62. --> <dl> <dt>Encoding of WBITS in the stream header:</dt><dd><t>The following new pattern of 14 bits is supported:</t> <dl newline="false" spacing="normal"> <dt>8bits:</dt><dd> value 00010001,bits:</dt><dd>Value 00010001 to indicate a large window brotlistream</dd>stream.</dd> <dt>6 bits:</dt><dd>WBITS, mustWBITS. Must have value in range 10 to62</dd>62.</dd> </dl></dd> <dt>Distancealphabet:</dt><dd> ifalphabet:</dt><dd>If the stream is a large window brotli stream, the maximum number of extra bits is 62 and the theoretical maximum size of the distance alphabet is (16 + NDIRECT + (124 << NPOSTFIX)). This overrides the value for the distance alphabet size given in <xref section="3.3" sectionFormat="of" target="RFC7932"/> and affects theamountnumber of bits in the encoding of the Simple Prefix Code for distances as described in <xref section="3.4" sectionFormat="of" target="RFC7932"/>. An additional limitation to distances, despite the large allowed alphabet size, is that the alphabet is not allowed to contain a distance symbol able to represent a distance larger than ((1 << 63) - 4) when its extra bits have their maximum value. It depends on NPOSTFIX and NDIRECT when this can occur. </dd> </dl> <t> A decoder that does not support 64-bit integers may reject a stream if WBITS is higher than 30 or a distance symbol from the distance alphabet is able to encode a distance larger than 2147483644.</t> </section> <section anchor="sect-7" numbered="true" toc="default"> <name>Shared Brotli Compressed Data Stream</name> <t> The format of a shared brotli compressed data stream without a framing format is backwards compatible with brotli <xref target="RFC7932"format="default"/>,format="default"/> with the following optional differences:</t> <ul><li>LZ77 dictionaries as described above aresupported</li>supported.</li> <li> Custom static dictionaries replacing or extending the static dictionary of brotli <xref target="RFC7932" format="default"/> with different words or transforms aresupported</li>supported.</li> <li>The stream may have the format of regular brotli <xreftarget="RFC7932"/>,target="RFC7932"/> or the format of large window brotli as described insection 6.</li><xref target="sect-6" format="default"/>.</li> </ul> </section> <section anchor="sect-8" numbered="true" toc="default"> <name>Shared Brotli Framing Format Stream</name> <t> A compliant shared brotli framing format stream has the format described below.</t> <section anchor="sect-8.1" numbered="true" toc="default"> <name>Main Format</name> <!-- [rfced] May we rephrase the following sentence for clarity? Current: File signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, 0x52. Perhaps: File signature in hexadecimal format (bytes 0x91, 0x0a, 0x42, and 0x52). --> <dl> <dt>4 bytes:</dt><dd>fileFile signature, in hexadecimal the bytes 0x91, 0x0a, 0x42, 0x52. The first byte contains the invalid WBITS combination for brotli <xref target="RFC7932" format="default"/> and large window brotli. </dd> <dt>1byte:</dt><dd><t> container flags,byte:</dt><dd><t>Container flags that are 8 bitswithand have the following meanings:</t> <!--[rfced] Is it correct that "bit" is singular in "bit 0 and 1", "bit 2 and 3", and "bit 4-7"? Or should these instances be updated as "bits 0 and 1", "bits 2 and 3", and "bits 4-7"? We note 1 instance of "bits 2-7" in Section 8.4.3. Current: bit 0 and 1: Version Indicator... bit 0 and 1: Dictionary source: bit 2 and 3: Dictionary type: bit 4-7: Must be 0 --> <dl><dt> bit 0 and1:</dt><dd> version indicator,1:</dt><dd>Version indicator that must beb'00, otherwiseb'00. Otherwise, the decoder must reject the data stream as invalid. </dd> <dt>bit 2:</dt><dd>if<dd>If 0, the file contains no final footer, may not contain any metadata chunks, may not contain a central directory, and may encode only a single resource (using one or more data chunks). If 1, the file may contain one or more resources, metadata, and a central directory, and it must contain a final footer. </dd> </dl> </dd> <dt>multipletimes:</dt><dd> atimes:</dt><dd>A chunk, each with the format specified insection 8.2</dd><xref target="sect-8.2"/>.</dd> </dl> </section> <section anchor="sect-8.2" numbered="true" toc="default"> <name>Chunk Format</name> <dl><dt>varint:</dt> <dd>length<dt>varint:</dt><dd>Length of this chunk excluding this varint but including all next header bytes and data. If the value is 0, then the chunk type byte is not present and the chunk type is assumed to be 0.</dd> <dt>1 byte:</dt><dd><t>CHUNK_TYPE</t><dl><dl indent="5" spacing="compact"> <dt> 0:</dt><dd> padding chunk</dd> <dt> 1:</dt><dd> metadata chunk</dd> <dt> 2:</dt><dd> data chunk</dd> <dt> 3:</dt><dd> first partial data chunk</dd> <dt> 4:</dt><dd> middle partial data chunk</dd> <dt> 5:</dt><dd> last partial data chunk</dd> <dt> 6:</dt><dd> footer metadata chunk</dd> <dt> 7:</dt><dd> global metadata chunk</dd> <dt> 8:</dt><dd> repeat metadata chunk</dd> <dt> 9:</dt><dd> central directory chunk</dd> <dt> 10:</dt><dd> final footer</dd></dl></dd></dl> <t> if</dl></dd> <dt>If CHUNK_TYPE is not padding chunk, centraldirectorydirectory, or finalfooter:</t>footer:</dt> <dd> <t><br/></t> <dl><dt> 1 byte:</dt><dd><t> CODEC:</t><dl><dl spacing="compact"> <dt>0:</dt><dd> uncompressed</dd> <dt> 1:</dt><dd> keep decoder</dd> <dt> 2:</dt><dd> brotli</dd> <dt> 3:</dt><dd> shared brotli</dd> </dl></dd></dl> <t>if</dd> </dl> </dd> <dt>If CODEC is not"uncompressed":</t>"uncompressed":</dt> <dd> <t><br/></t> <dl><dt>varint:</dt><dd> uncompressedvarint:</dt><dd>Uncompressed size in bytes of the data contained within the compressedstream </dd></dl> <t>ifstream. </dd></dl></dd> <dt>If CODEC is "sharedbrotli":</t>brotli":</dt> <dd><t><br/></t> <dl><dt> 1byte:</dt><dd><t> amountbyte:</dt><dd><t>Number of dictionary references. Multiple dictionary references are possible with the following restrictions: there can bemaximum1 serializeddictionary,dictionary andmaximum15 prefix dictionaries maximum (a serialized dictionary may already contain one of those). Circular references are not allowed (any dictionary reference that directly or indirectly uses this chunk itself as dictionary).</t></dd></dl> <t> per<dt>Per dictionaryreference:</t>reference:</dt> <dd><t><br/></t> <dl><dt>1 byte:</dt><dd><t>flags:</t>Flags:</t> <dl><dt>bit 0 and 1:</dt><dd><t>dictionaryDictionary source:</t><dl><dt>00:</dt><dd><dl indent="5"><dt>00:</dt><dd> Internal dictionary reference to a full resource by pointer, which can span one or more chunks. Must point to a full data chunk or a first partial data chunk.</dd> <dt>01:</dt><dd> Internal dictionary reference to single chunk contents by pointer. May point to any chunk with content (data or metadata). If a partial data chunk, only this part is the dictionary. In this case, the dictionary type is not allowed to be aserialisedserialized dictionary. </dd> <dt>10:</dt><dd> Reference to a dictionary by hash code of a resource. The dictionary can come from an externalsourcesource, such as a different container. The user of the decoder must be able to provide the dictionary contents given its hash code (even if it comes from this containeritself),itself) or treat it as an error when the user does not have it available.</dd> <dt>11:</dt><dd>invalidInvalid bit combination</dd> </dl> </dd> <dt> bit 2 and3:</dt><dd> dictionary type:</dd>3:</dt><dd><t>Dictionary type:</t> <dl indent="5"> <dt>00:</dt><dd><t>prefix<t>Prefix dictionary, set in front of the slidingwindow</t> <dl> <dt> 01:</dt><dd> serializedwindow</t></dd> <dt>01:</dt><dd>Serialized dictionary in the shared brotli format as specified insection 5.</dd><xref target="sect-5"/>.</dd> <dt> 10:</dt><dd>invalidInvalid bit combination</dd> <dt>11:</dt><dd>invalidInvalid bitcombination</dd>combination</dd></dl></dd> <dt>bit 4-7:</dt><dd>mustMust be0</dd> <dt>if hash-based:</dt> <dd>0</dd></dl></dd> <dt>If hash-based:</dt><dd><t><br/></t> <dl><dt>1byte:</dt><dd> typebyte:</dt><dd>Type of hash used. Only supported value: 3, indicating 256-bitHighwayhashHighwayHash <xref target="HWYHASH" format="default"/>. </dd></dl> </dd> </dl> </dd> <dt> 32<dt>32 bytes:</dt><dd><t> 256-bitHighwayhashHighwayHash checksum to refer todictionary.</t> <dl> <dt>ifdictionary.</t></dd></dl></dd> <dt>If pointerbased:</dt><dd> varint encodedbased:</dt><dd>Varint-encoded pointer to its chunk in this container. The chunk must comeearlierin the container earlier than the currentchunk.</dd>chunk.</dd></dl></dd></dl></dd> <dt>Xbytes:</dt><dd> extrabytes:</dt><dd><t>Extra header bytes, depending on CHUNK_TYPE. If present, they are specified in the subsequentsections. </dd> </dl> </dd>sections.</t> <dl> <dt>remaining bytes:</dt><dd><t>the<t>The chunk contents. The uncompressed data in the chunk content depends on CHUNK_TYPE and is specified in the subsequent sections. The compressed data has following format depending on CODEC:</t><ul><li>uncompressed:<!-- [rfced] May we update the following unordered list into a definition list for consistency with the rest of Section 8.2? Original: * uncompressed: the rawbytes</li> <li>ifbytes * if "keep decoder", the continuation of the compressed stream which was interrupted at the end of the previous chunk. The decoder from the previous chunk must be used and its state it had at the end of the previous chunk must be kept at the start of the decoding of this chunk. * brotli: the bytes are in brotli format [RFC7932] * shared brotli: the bytes are in the shared brotli format specified in Section 7 Perhaps: uncompressed: The raw bytes. "keep decoder": If "keep decoder", the continuation of the compressed stream that was interrupted at the end of the previous chunk. The decoder from the previous chunk must be used and its state it had at the end of the previous chunk must be kept at the start of the decoding of this chunk. brotli: The bytes are in brotli format [RFC7932]. shared brotli: The bytes are in the shared brotli format specified in Section 7. --> <ul><li>uncompressed: The raw bytes.</li> <li>If "keep decoder", the continuation of the compressed stream that was interrupted at the end of the previous chunk. The decoder from the previous chunk must be used and its state it had at the end of the previous chunk must be kept at the start of the decoding of this chunk. </li> <li>brotli:theThe bytes are in brotli format <xref target="RFC7932"format="default"/>format="default"/>. </li> <li>shared brotli:theThe bytes are in the shared brotli format specified insection 7</li></ul></dd><xref target="sect-7"/>.</li></ul></dd> </dl> </dd> </dl> </section> <section anchor="sect-8.3" numbered="true" toc="default"> <name>Metadata Format</name> <t>All the metadata chunk types use the following format for the uncompressed content:</t> <dl newline="true"> <dt>Per field:</dt> <dd> <dl><dt>2 bytes:</dt><dd><t> code<dd><t>Code to identify this metadata field. This must be two lowercase or two uppercase alphaasciiASCII characters. If the decoder encounters a lowercase field that it does notrecogniserecognize for the current chunk type,non-ascii charactersnon-ASCII characters, or non-alpha characters, the decoder must reject the data stream as invalid. Uppercase codes may be used for custom user metadata and can be ignored by a compliant decoder.</t></dd> <dt>varint:</dt><dd> <t>length<dd><t>Length of the content of this field in bytes, excluding the code bytes and thisvarint</t> <dl>varint.</t></dd> <dt>N bytes:</dt><dd> the<dd>The contents of thisfield</dd> </dl> </dd>field.</dd> </dl> </dd> </dl> <t> The last field is reached when the chunk content end is reached. If the length of the last field does not end at the same byte as the end of the uncompressed content of the chunk, the decoder must reject the data stream as invalid.</t> </section> <section anchor="sect-8.4" numbered="true" toc="default"> <name>Chunk Specifications</name> <section anchor="sect-8.4.1" numbered="true" toc="default"> <name>Padding Chunk (Type 0)</name> <t> All bytes in this chunk must bezero,zero except for the initial varint that specifies the remaining chunk length.</t> <t> Since the varint itself takes up bytes as well, when the goal is to introducean amounta number of padding bytes, the dependence of the length of the varint on the value it encodes must be taken into account.</t> <t> A single byte varint with a value of 0 is a padding chunk of length 1. For more padding, use higher varint values. Do not use multiple shorter paddingchunks,chunks since this is slower to decode.</t> </section> <section anchor="sect-8.4.2" numbered="true" toc="default"> <name>Metadata Chunk (Type 1)</name> <t> This chunk contains metadata that applies to the resource whose beginning is encoded in the subsequent data chunk or first partial data chunk.</t> <t> The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t> <t>The following field types arerecognised:</t> <dl><dt>id:</dt><dd><t>recognized:</t> <!--[rfced] In Section 8.4.2, the formatting of the list is hard to read. While we note that this style follows RFC 7932, please consider if you would like to update to a format that is more typical in other RFCs. For example, see the excerpt below from RFC 9188. Original: id: name field. May appear 0 or 1 times. Has the followingformat:</t> <dl> <dt>format: Nbytes:</dt><dd><t>bytes: name in UTF-8 encoding, length determined by the field length. Treated generically but may be used as filename. If used as filename, forward slashes '/' should be used as directory separator, relative paths should be used and filenames ending in a slash with 0-length content in the matching data chunk should be treated as an emptydirectory.</t> <dl> <dt>mt:</dt> <dd><t>modificationdirectory. mt: modification type. May appear 0 or 1 times. Has the followingformat:</t> <dl> <dt>8 bytes:</dt><dd>format: 8 bytes: microseconds since epoch, as a little endian signed twos complement 64-bitinteger</dd>integer Perhaps: id (N bytes): Name field. May appear 0 or 1 times. Has the following format: name in UTF-8 encoding, length determined by the field length. Treated generically but may be used as a filename. If used as a filename, forward slashes '/' should be used as directory separators, relative paths should be used, and filenames ending in a slash with 0-length content in the matching data chunk should be treated as an empty directory. mt (8 bytes): Modification type. May appear 0 or 1 times. Has the following format: contains microseconds since epoch, as a little-endian, signed two's complement 64-bit integer. Example from Section 4.1 of RFC 9188: Checksum (1 byte): This contains the (one's complement) checksum sum of all 8 bits in the trailer. For purposes of computing the checksum, the value of the Checksum field is zero. This field is present only if the Checksum Present bit is set to 1. First SDU Length (2 bytes): This is the length of the first IP packet in the PDU, only included if a PDU contains multiple IP packets. This field is present only if the Concatenation Present bit is set to 1. Connection ID (1 byte): This contains an unsigned integer to identify the anchor and delivery connection of the GMA PDU. This field is present only if the Connection ID Present bit is set to 1. --> <dl><dt>id:</dt><dd><t>Name field. May appear 0 or 1 times. Has the following format:</t> <dl> <dt>customN bytes:</dt><dd><t>Name in UTF-8 encoding, length determined by the field length. Treated generically but may be used as a filename. If used as a filename, forward slashes '/' should be used as directory separators, relative paths should be used, and filenames ending in a slash with 0-length content in the matching data chunk should be treated as an empty directory.</t></dd> </dl></dd></dl> <dl><dt>mt:</dt> <dd><t>Modification type. May appear 0 or 1 times. Has the following format:</t> <dl> <dt>8 bytes:</dt><dd><t>Microseconds since epoch, as a little-endian, signed two's complement 64-bit integer.</t></dd> </dl></dd> <dt>custom userfield:</dt><dd> anyfield:</dt><dd>Any two uppercase ASCII characters.</dd> </dl></dd> </dl> </dd> </dl> </dd> </dl><!--</dd></dl>--> </section> <section anchor="sect-8.4.3" numbered="true" toc="default"> <name>Data Chunk (Type 2)</name> <t> A data chunk contains the actual data of a resource.</t> <t>This chunk has the following extra header bytes:</t> <dl> <dt>1 byte: </dt><dd><t>flags:<dd><t>Flags: </t> <dl> <dt> bit 0:</dt><dd>ifIf true, indicates this is not a resource that should be output implicitly as part of extracting resources from this container. Instead, it may be referred to only explicitly,e.g.e.g., as a dictionary reference by hash code or offset. This flag should be set for data used as dictionary to improve compression of actual resources.</dd> <dt> bit1:</dt><dd> if1:</dt><dd>If true, hash code is given</dd> <dt> bits2-7:</dt><dd> must2-7:</dt><dd>Must bezero</dd></dl> <t>ifzero.</dd></dl></dd> <dt>If hash code isgiven:</t>given:</dt><dd><t><br/></t> <dl> <dt>1byte:</dt><dd> typebyte:</dt><dd>Type of hash used. Only supported value: 3, indicating 256-bitHighwayhashHighwayHash <xref target="HWYHASH" format="default"/>. </dd> <dt> 32 bytes:</dt><dd> 256-bitHighwayhashHighwayHash checksum of the uncompresseddata</dd>data.</dd> </dl> </dd> </dl> <t> The uncompressed content bytes of this chunk are the actual data of the resource.</t> </section> <section anchor="sect-8.4.4" numbered="true" toc="default"> <name>First Partial Data Chunk (Type 3)</name> <t> This chunk contains partial data of a resource. This is the first chunk in a series containing the entire data of the resource.</t> <t> The format of this chunk is the same as the format of aData Chunkdata chunk (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t> <t> The second bit of flags must be set to 0 and no hash code given.</t> <t> The uncompressed data size is only of this part of the resource, not of the full resource.</t> </section> <section anchor="sect-8.4.5" numbered="true" toc="default"> <name>Middle Partial Data Chunk (Type 4)</name> <t> This chunk contains partial data of aresource,resource and is neither the first nor the last part of the full resource.</t> <t> The format of this chunk is the same as the format of aData Chunkdata chunk (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t> <t> The first and second bits of flags must be set to 0.</t> <t> The uncompressed data size is only of this part of the resource, not of the full resource.</t> </section> <section anchor="sect-8.4.6" numbered="true" toc="default"> <name>Last Partial Data Chunk (Type 5)</name> <t> This chunk contains the final piece of partial data of a resource.</t> <t> The format of this chunk is the same as the format of aData Chunkdata chunk (<xref target="sect-8.4.3" format="default"/>) except for the differences noted below.</t> <t> The first bit oftheflags must be set to 0.</t> <t> If a hash code is given, the hash code of the full resource (concatenated from all previous chunks and this chunk) is given in this chunk.</t> <t> The uncompressed data size is only of this part of the resource, not of the full resource.</t> <t> The type of this chunk indicates that there are no further chunk encoding this resource, so the full resource is now known.</t> </section> <section anchor="sect-8.4.7" numbered="true" toc="default"> <name>Footer Metadata Chunk (Type 6)</name> <t> This metadata applies to the resource whose encoding ended in the preceding data chunk or last partial data chunk.</t> <t> The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t> <t> There are no lowercase field types defined forglobalfooter metadata. Uppercase field types can be used as custom user data.</t> </section> <section anchor="sect-8.4.8" numbered="true" toc="default"> <name>Global Metadata Chunk (Type 7)</name> <t> This metadata applies to the whole container instead of a single resource.</t> <t> The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t> <t> There are no lowercase field types defined forfooterglobal metadata. Uppercase field types can be used as custom user data.</t> </section> <section anchor="sect-8.4.9" numbered="true" toc="default"> <name>Repeat Metadata Chunk (Type 8)</name> <t> These chunks optionally repeat metadata that is interleaved between data chunks. To use these chunks, it is necessary to also read additional information, such as pointers to the original chunks, from the central directory.</t> <t> The contents of this chunk follows the format described in <xref target="sect-8.3" format="default"/>.</t> <t>This chunk has an extra header byte:</t> <dl> <dt> 1byte:</dt><dd> chunkbyte:</dt><dd>Chunk type of repeated chunk (metadata chunk or footer metadatachunk)chunk). </dd></dl> <t>This set of chunks must follow the following restrictions:</t> <ul><li> It is optional whether or not repeat metadata chunks are present.</li> <li>If they are present, then they must be present for all metadata chunks and footer metadata chunks. </li> <li>There may be only 1 repeat metadata chunk per repeated metadata chunk.</li> <li>They must appear in the same order as the chunks appear in the container, which is also the same order as listed in the central directory. </li> <li>Compression of these chunks isallowed, howeverallowed; however, it is not allowed to use any internal dictionary except an earlier repeat metadata chunk of this series, and it is not allowed for a metadata chunk to keep the decoder state if the previous chunk is not a repeat metadata chunk. That is, the series of metadata chunks must be decompressible without using other chunks of the framing format file. </li> </ul> <t> The fields contained in this metadata chunk must follow the following restrictions:</t> <ul> <li>If a field is present, it must exactly match the corresponding field of the copied chunk.</li> <li>It is allowed to leave out a field that is present in the copied chunk. </li> <!-- [rfced] We note that there are a few instances throughout the document where asterisks "*" are used for emphasis. Would you like to utilize the <strong> element in the XML? In the HTML and PDF outputs, <strong> yields bold text. In the text output, <strong> yields an asterisk before and after, similar to how it's used currently. Current: * If a field is present, then it must be present in *all* other repeat metadata chunks when the copied chunk contains this field. --> <li>If a field is present, then it must be present in *all* other repeat metadata chunks when the copied chunk contains this field. In other words, if you know you can get the name field from a repeat chunk, you know that you will be able to get all names of all resources from all repeat chunks. </li> </ul> </section> <section anchor="sect-8.4.10" numbered="true" toc="default"> <name>Central Directory Chunk (Type 9)</name> <t> The central directorychunk,chunk along with the repeat metadatachunks,chunks allowtoquicklyfindfinding andlistlisting compressed resources in the container file.</t> <t> The central directory chunk is always uncompressed and does not have the codec byte. It instead has the following format:</t> <dl> <dt>varint:</dt><dd><t>pointer<t>Pointer into the file where the repeat metadata chunks arelocated,located or 0 if they are not present per chunk listed:</t> <dl> <dt>varint:</dt><dd> pointervarint:</dt><dd>Pointer into the file where this chunkbegins</dd> <dt>varint:</dt><dd> amountbegins.</dd> <dt>varint:</dt><dd>Number of header bytes N usedbelow</dd>below.</dd> <dt>Nbytes:</dt><dd> copybytes:</dt><dd>Copy of all the header bytes of the pointed at chunk, including total size, chunk type byte, codec, uncompressed size, dictionary references, and X extra header bytes. The content is not repeated here. </dd> </dl> </dd> </dl> <t> The last listed chunk is reached when the end of the contents of the central directory are reached. If the end does not match the last byte of the central directory, the decoder must reject the data stream as invalid.</t> <t> If present, the central directory must list all data and metadata chunks of all types.</t> </section> <section anchor="sect-8.4.11" numbered="true" toc="default"> <name>Final Footer Chunk (Type 10)</name> <t> <!--[rfced] In Section 8.4.11, how may we rephrase this sentence for clarity? We note that "header" is not used elsewhere when referring to "container flags"; should it be removed for consistency? Original: Chunk that closes the file, only present if in the initial container header flags bit 2 was set. Perhaps: The final footer chunk closes the file and is only present if bit 2 of the initial container flags was set. --> The final footer chunk closes the file and is only present if in the initial container header flags bit 2 was set.</t> <t>This chunk has the following content, which is always uncompressed:</t> <dl> <dt> reversedvarint:</dt><dd><t> sizevarint:</dt><dd><t>Size of this entire framing format file, including these bytes themselves, or 0 if this size is notgiven</t> <dl>given.</t></dd> <dt>reversedvarint:</dt><dd> pointervarint:</dt><dd>Pointer to the start of the centraldirectory,ordirectory, or 0 if there isnone </dd> </dl>none. </dd> </dl> <t> A reversed varint has the same format as avarint,varint buthasits bytes are in reversedorderorder, and it is designed to be parsed from the end of the file towards the beginning.</t> </section> <section anchor="sect-8.4.12" numbered="true" toc="default"> <name>Chunkordering</name>Ordering</name> <t> The chunk ordering must follow the rules describedbelow, ifbelow. If the decoder sees otherwise, it must reject the data stream as invalid.</t> <t indent="3"> Padding chunks may be inserted anywhere, even between chunks for which the rules below say no other chunk types may come in between.</t> <t indent="3"> Metadata chunks must come immediately before theDatadata chunks of the resource they apply to.</t> <t indent="3"> Footer metadata chunks must come immediately after theDatadata chunks of the resource they apply to.</t> <t indent="3"> There may be only 0 or 1 metadata chunks per resource.</t> <t indent="3"> There may be only 0 or 1 footer metadata chunks per resource.</t> <t indent="3"> A resource must exist out of either 1 datachunk,chunk or 1 first partial data chunk, 0 or more middle partial data chunks, and 1 last partial data chunk, in that order.</t> <t indent="3"> Repeat metadata chunks must follow the rules ofsection 8.4.9.</t><xref target="sect-8.4.9"/>.</t> <t indent="3"> There may be only 0 or 1 central directory chunks.</t> <t indent="3"> If bit 2 of the container flags is set, there may be only a single resource, no metadata chunks of any type, no central directory, and no final footer.</t> <t indent="3"> If bit 2 of the container flags is not set, there must be exactly 1 final footerchunkchunk, and it must be the last chunk in the file.</t> </section> </section> </section> <section anchor="sect-9" numbered="true" toc="default"> <name>Security Considerations</name> <t> The security considerations for brotli <xref target="RFC7932" format="default"/> apply to shared brotli as well.</t> <t> In addition, the same considerations apply to the decoding of new file format streams for shared brotli, including shared dictionaries, the framingformatformat, and the shared brotli format.</t> <t> The dictionary must be treated with the same security precautions as thecontent,content because a change to the dictionary can result in a change to the decompressed content.</t> <t> The CRIME attack <xref target="CRIME" format="default"/> shows that it's a bad idea to compress data from mixed(e.g.(e.g., public and private) sources -- the data sources include not only the compressed data but also the dictionaries. For example, if you compress secret cookies using a public-data-only dictionary, you still leak information about the cookies.</t> <t> Not only can the dictionary reveal information about the compressed data, but viceversa,versa; data compressed with the dictionary can reveal the contents of the dictionary when an adversary can control parts of data to compress and see the compressed size. On the other hand, if the adversary can control the dictionary, the adversary can learn information about the compressed data.</t> <t> The most robust defense against CRIME is not to compress privatedata (e.g.,data, e.g., sensitive headers like cookies or any content withPII).personally identifiable information (PII). The challenge has been to identify secrets within a vast amount of data to becompressed data.compressed. Cloudflare uses a regular expression <xref target="CLOUDFLARE" format="default"/>. Another idea is to extend existing web template systems (e.g., Soy <xref target="SOY" format="default"/>) to allow developers to mark secrets that must not be compressed.</t> <t> A less robust idea, but easier to implement, is to randomize the compression algorithm, i.e., adding randomly generated padding, varying the compression ratio, etc. The tricky part is to find the right balance between cost andsecurity, i.e.,security (i.e., on onehandhand, we don't want to add too much padding because it adds a cost to data, but on the otherhandhand, we don't want to add too little because the adversary can detect a small amount of padding with trafficanalysis.</t>analysis).</t> <t>AnotherAdditionally, another defensein additionis to not use dictionaries for cross- domainrequests,requests and to only use shared brotli for the response when the origin is the same as where the content is hosted (using CORS). This prevents an adversary from using a private dictionary with user secrets to compress content hosted on the adversary's origin. It also helps prevent CRIME attacks that try to benefit from a public dictionary by preventing data compression with dictionaries for requests that do not originate from the host itself.</t> <t> The content of the dictionary itself should not be affected by externalusers,users; allowing adversaries to control the dictionary allows a form of chosen plaintext attack. Instead, only base the dictionary on content you control or generic large scale content such as a spokenlanguage,language and update the dictionary with large time intervals (days, not seconds) to prevent fast probing.</t> <t> The use ofHighwayhashHighwayHash <xref target="HWYHASH" format="default"/> for dictionary identifiers does not guarantee against collisions in an adversarial environment and is intended to be used for identifying the dictionary within a trusted, known set of dictionaries. In an adversarial environment, users of shared brotli should use another mechanism to validate a negotiateddictionary,dictionary such asusingacryptographically-provencryptographically proven secure hash.</t> </section> <section anchor="sect-10" numbered="true" toc="default"> <name>IANA Considerations</name> <t> This document has no IANA actions.</t> </section> </middle> <back> <references> <name>References</name> <references> <name>Normative References</name> <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.7932.xml"/> <reference anchor="HWYHASH" target="https://arxiv.org/abs/1612.06257"> <front> <title>Fast keyed hash/pseudo-random function using SIMD multiply and permute</title><author><organization> Alakuijala, J., Cox, B., Wassenberg, J.</organization> </author> <date/><author fullname="Jyrki Alakuijala"/> <author fullname="Bill Cox"/> <author fullname="Jan Wassenberg"/> <date month="February" year="2017"/> </front> <seriesInfo name="DOI" value="10.48550/arXiv.1612.06257"/> </reference> </references> <references> <name>Informative References</name> <reference anchor="LZ77"> <front> <title>A Universal Algorithm for Sequential Data Compression</title> <author initials="J." surname="Ziv" fullname="J. Ziv"/> <author initials="A." surname="Lempel" fullname="A. Lempel"/> <date month="May"year="1997"/>year="1977"/> </front> <seriesInfo name="DOI" value="10.1109/TIT.1977.1055714"/> <refcontent>IEEE Transactions on InformationTheory. 23 (3):Theory, vol. 23, no. 3, pp. 337-343</refcontent> </reference> <reference anchor="CLOUDFLARE" target="https://blog.cloudflare.com/a-solution-to-compression-oracles-on-the-web/"> <front><title/> <author> </author> <date/><title>A Solution to Compression Oracles on the Web</title> <author fullname="Blake Loring"/> <date day="27" month="March" year="2018"/> </front> <refcontent>The Cloudflare Blog</refcontent> </reference> <!-- [rfced] Please review the following reference. The original URL for this reference (https://developers.google.com/closure/templates/) redirects to a page titled "Closure Tools" (https://developers.google.com/closure). Is this reference still correct or is an update needed? Note that the only instance of this reference being cited in the text is shown below. Current (text): Another idea is to extend existing web template systems (e.g., Soy [SOY]) to allow developers to mark secrets that must not be compressed. Current (reference): [SOY] Google Developers, "Closure Tools", <https://developers.google.com/closure/templates/>. --> <reference anchor="SOY" target="https://developers.google.com/closure/templates/"> <front><title/><title>Closure Tools</title> <author> <organization>Google Developers</organization> </author> <date/> </front> </reference> <reference anchor="CRIME" target="https://www.cve.org/CVERecord?id=CVE-2012-4929"> <front><title/><title>CVE-2012-4929</title> <author> <organization>CVE Program</organization> </author> <date/> </front> </reference> </references> </references> <section numbered="false" anchor="acknowledgments" toc="default"> <name>Acknowledgments</name> <t> The authors would like to thankRobert Obryk<contact fullname="Robert Obryk"/> for suggesting improvements to the format and the text of the specification.</t> </section> <!-- [rfced] FYI - We have added expansions for the following abbreviations per Section 3.6 of RFC 7322 ("RFC Style Guide"). Please review each expansion in the document carefully to ensure correctness. most significant bit (MSB) least significant bit (LSB) personally identifiable information (PII) --> <!-- [rfced] Please review the "Inclusive Language" portion of the online Style Guide <https://www.rfc-editor.org/styleguide/part2/#inclusive_language> and let us know if any changes are needed. Updates of this nature typically result in more precise language, which is helpful for readers. Note that our script did not flag any words in particular, but this should still be reviewed as a best practice. --> </back> </rfc>