+File: APPNOTE.TXT - .ZIP File Format Specification\r
+Version: 6.3.2 \r
+Revised: September 28, 2007\r
+Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.\r
+\r
+The use of certain technological aspects disclosed in the current\r
+APPNOTE is available pursuant to the below section entitled\r
+"Incorporating PKWARE Proprietary Technology into Your Product".\r
+\r
+I. Purpose\r
+----------\r
+\r
+This specification is intended to define a cross-platform,\r
+interoperable file storage and transfer format. Since its \r
+first publication in 1989, PKWARE has remained committed to \r
+ensuring the interoperability of the .ZIP file format through \r
+publication and maintenance of this specification. We trust that \r
+all .ZIP compatible vendors and application developers that have \r
+adopted and benefited from this format will share and support \r
+this commitment to interoperability.\r
+\r
+II. Contacting PKWARE\r
+---------------------\r
+\r
+ PKWARE, Inc.\r
+ 648 N. Plankinton Avenue, Suite 220\r
+ Milwaukee, WI 53203\r
+ +1-414-289-9788\r
+ +1-414-289-9789 FAX\r
+ zipformat@pkware.com\r
+\r
+III. Disclaimer\r
+---------------\r
+\r
+Although PKWARE will attempt to supply current and accurate\r
+information relating to its file formats, algorithms, and the\r
+subject programs, the possibility of error or omission cannot \r
+be eliminated. PKWARE therefore expressly disclaims any warranty \r
+that the information contained in the associated materials relating \r
+to the subject programs and/or the format of the files created or\r
+accessed by the subject programs and/or the algorithms used by\r
+the subject programs, or any other matter, is current, correct or\r
+accurate as delivered. Any risk of damage due to any possible\r
+inaccurate information is assumed by the user of the information.\r
+Furthermore, the information relating to the subject programs\r
+and/or the file formats created or accessed by the subject\r
+programs and/or the algorithms used by the subject programs is\r
+subject to change without notice.\r
+\r
+If the version of this file is marked as a NOTIFICATION OF CHANGE,\r
+the content defines an Early Feature Specification (EFS) change \r
+to the .ZIP file format that may be subject to modification prior \r
+to publication of the Final Feature Specification (FFS). This\r
+document may also contain information on Planned Feature \r
+Specifications (PFS) defining recognized future extensions.\r
+\r
+IV. Change Log\r
+--------------\r
+\r
+Version Change Description Date\r
+------- ------------------ ----------\r
+5.2 -Single Password Symmetric Encryption 06/02/2003\r
+ storage\r
+\r
+6.1.0 -Smartcard compatibility 01/20/2004\r
+ -Documentation on certificate storage\r
+\r
+6.2.0 -Introduction of Central Directory 04/26/2004\r
+ Encryption for encrypting metadata\r
+ -Added OS/X to Version Made By values\r
+\r
+6.2.1 -Added Extra Field placeholder for 04/01/2005\r
+ POSZIP using ID 0x4690\r
+\r
+ -Clarified size field on \r
+ "zip64 end of central directory record"\r
+\r
+6.2.2 -Documented Final Feature Specification 01/06/2006\r
+ for Strong Encryption\r
+\r
+ -Clarifications and typographical \r
+ corrections\r
+\r
+6.3.0 -Added tape positioning storage 09/29/2006\r
+ parameters\r
+\r
+ -Expanded list of supported hash algorithms\r
+\r
+ -Expanded list of supported compression\r
+ algorithms\r
+\r
+ -Expanded list of supported encryption\r
+ algorithms\r
+\r
+ -Added option for Unicode filename \r
+ storage\r
+\r
+ -Clarifications for consistent use\r
+ of Data Descriptor records\r
+\r
+ -Added additional "Extra Field" \r
+ definitions\r
+\r
+6.3.1 -Corrected standard hash values for 04/11/2007\r
+ SHA-256/384/512\r
+\r
+6.3.2 -Added compression method 97 09/28/2007\r
+\r
+ -Documented InfoZIP "Extra Field"\r
+ values for UTF-8 file name and\r
+ file comment storage\r
+\r
+V. General Format of a .ZIP file\r
+--------------------------------\r
+\r
+ Files stored in arbitrary order. Large .ZIP files can span multiple\r
+ volumes or be split into user-defined segment sizes. All values\r
+ are stored in little-endian byte order unless otherwise specified. \r
+\r
+ Overall .ZIP file format:\r
+\r
+ [local file header 1]\r
+ [file data 1]\r
+ [data descriptor 1]\r
+ . \r
+ .\r
+ .\r
+ [local file header n]\r
+ [file data n]\r
+ [data descriptor n]\r
+ [archive decryption header] \r
+ [archive extra data record] \r
+ [central directory]\r
+ [zip64 end of central directory record]\r
+ [zip64 end of central directory locator] \r
+ [end of central directory record]\r
+\r
+\r
+ A. Local file header:\r
+\r
+ local file header signature 4 bytes (0x04034b50)\r
+ version needed to extract 2 bytes\r
+ general purpose bit flag 2 bytes\r
+ compression method 2 bytes\r
+ last mod file time 2 bytes\r
+ last mod file date 2 bytes\r
+ crc-32 4 bytes\r
+ compressed size 4 bytes\r
+ uncompressed size 4 bytes\r
+ file name length 2 bytes\r
+ extra field length 2 bytes\r
+\r
+ file name (variable size)\r
+ extra field (variable size)\r
+\r
+ B. File data\r
+\r
+ Immediately following the local header for a file\r
+ is the compressed or stored data for the file. \r
+ The series of [local file header][file data][data\r
+ descriptor] repeats for each file in the .ZIP archive. \r
+\r
+ C. Data descriptor:\r
+\r
+ crc-32 4 bytes\r
+ compressed size 4 bytes\r
+ uncompressed size 4 bytes\r
+\r
+ This descriptor exists only if bit 3 of the general\r
+ purpose bit flag is set (see below). It is byte aligned\r
+ and immediately follows the last byte of compressed data.\r
+ This descriptor is used only when it was not possible to\r
+ seek in the output .ZIP file, e.g., when the output .ZIP file\r
+ was standard output or a non-seekable device. For ZIP64(tm) format\r
+ archives, the compressed and uncompressed sizes are 8 bytes each.\r
+\r
+ When compressing files, compressed and uncompressed sizes \r
+ should be stored in ZIP64 format (as 8 byte values) when a \r
+ files size exceeds 0xFFFFFFFF. However ZIP64 format may be \r
+ used regardless of the size of a file. When extracting, if \r
+ the zip64 extended information extra field is present for \r
+ the file the compressed and uncompressed sizes will be 8\r
+ byte values. \r
+\r
+ Although not originally assigned a signature, the value \r
+ 0x08074b50 has commonly been adopted as a signature value \r
+ for the data descriptor record. Implementers should be \r
+ aware that ZIP files may be encountered with or without this \r
+ signature marking data descriptors and should account for\r
+ either case when reading ZIP files to ensure compatibility.\r
+ When writing ZIP files, it is recommended to include the\r
+ signature value marking the data descriptor record. When\r
+ the signature is used, the fields currently defined for\r
+ the data descriptor record will immediately follow the\r
+ signature.\r
+\r
+ An extensible data descriptor will be released in a future\r
+ version of this APPNOTE. This new record is intended to\r
+ resolve conflicts with the use of this record going forward,\r
+ and to provide better support for streamed file processing.\r
+\r
+ When the Central Directory Encryption method is used, the data\r
+ descriptor record is not required, but may be used. If present,\r
+ and bit 3 of the general purpose bit field is set to indicate\r
+ its presence, the values in fields of the data descriptor\r
+ record should be set to binary zeros.\r
+\r
+ D. Archive decryption header: \r
+\r
+ The Archive Decryption Header is introduced in version 6.2\r
+ of the ZIP format specification. This record exists in support\r
+ of the Central Directory Encryption Feature implemented as part of \r
+ the Strong Encryption Specification as described in this document.\r
+ When the Central Directory Structure is encrypted, this decryption\r
+ header will precede the encrypted data segment. The encrypted\r
+ data segment will consist of the Archive extra data record (if\r
+ present) and the encrypted Central Directory Structure data.\r
+ The format of this data record is identical to the Decryption\r
+ header record preceding compressed file data. If the central \r
+ directory structure is encrypted, the location of the start of\r
+ this data record is determined using the Start of Central Directory\r
+ field in the Zip64 End of Central Directory record. Refer to the \r
+ section on the Strong Encryption Specification for information\r
+ on the fields used in the Archive Decryption Header record.\r
+\r
+\r
+ E. Archive extra data record: \r
+\r
+ archive extra data signature 4 bytes (0x08064b50)\r
+ extra field length 4 bytes\r
+ extra field data (variable size)\r
+\r
+ The Archive Extra Data Record is introduced in version 6.2\r
+ of the ZIP format specification. This record exists in support\r
+ of the Central Directory Encryption Feature implemented as part of \r
+ the Strong Encryption Specification as described in this document.\r
+ When present, this record immediately precedes the central \r
+ directory data structure. The size of this data record will be\r
+ included in the Size of the Central Directory field in the\r
+ End of Central Directory record. If the central directory structure\r
+ is compressed, but not encrypted, the location of the start of\r
+ this data record is determined using the Start of Central Directory\r
+ field in the Zip64 End of Central Directory record. \r
+\r
+\r
+ F. Central directory structure:\r
+\r
+ [file header 1]\r
+ .\r
+ .\r
+ . \r
+ [file header n]\r
+ [digital signature] \r
+\r
+ File header:\r
+\r
+ central file header signature 4 bytes (0x02014b50)\r
+ version made by 2 bytes\r
+ version needed to extract 2 bytes\r
+ general purpose bit flag 2 bytes\r
+ compression method 2 bytes\r
+ last mod file time 2 bytes\r
+ last mod file date 2 bytes\r
+ crc-32 4 bytes\r
+ compressed size 4 bytes\r
+ uncompressed size 4 bytes\r
+ file name length 2 bytes\r
+ extra field length 2 bytes\r
+ file comment length 2 bytes\r
+ disk number start 2 bytes\r
+ internal file attributes 2 bytes\r
+ external file attributes 4 bytes\r
+ relative offset of local header 4 bytes\r
+\r
+ file name (variable size)\r
+ extra field (variable size)\r
+ file comment (variable size)\r
+\r
+ Digital signature:\r
+\r
+ header signature 4 bytes (0x05054b50)\r
+ size of data 2 bytes\r
+ signature data (variable size)\r
+\r
+ With the introduction of the Central Directory Encryption \r
+ feature in version 6.2 of this specification, the Central \r
+ Directory Structure may be stored both compressed and encrypted. \r
+ Although not required, it is assumed when encrypting the\r
+ Central Directory Structure, that it will be compressed\r
+ for greater storage efficiency. Information on the\r
+ Central Directory Encryption feature can be found in the section\r
+ describing the Strong Encryption Specification. The Digital \r
+ Signature record will be neither compressed nor encrypted.\r
+\r
+ G. Zip64 end of central directory record\r
+\r
+ zip64 end of central dir \r
+ signature 4 bytes (0x06064b50)\r
+ size of zip64 end of central\r
+ directory record 8 bytes\r
+ version made by 2 bytes\r
+ version needed to extract 2 bytes\r
+ number of this disk 4 bytes\r
+ number of the disk with the \r
+ start of the central directory 4 bytes\r
+ total number of entries in the\r
+ central directory on this disk 8 bytes\r
+ total number of entries in the\r
+ central directory 8 bytes\r
+ size of the central directory 8 bytes\r
+ offset of start of central\r
+ directory with respect to\r
+ the starting disk number 8 bytes\r
+ zip64 extensible data sector (variable size)\r
+\r
+ The value stored into the "size of zip64 end of central\r
+ directory record" should be the size of the remaining\r
+ record and should not include the leading 12 bytes.\r
+ \r
+ Size = SizeOfFixedFields + SizeOfVariableData - 12.\r
+\r
+ The above record structure defines Version 1 of the \r
+ zip64 end of central directory record. Version 1 was \r
+ implemented in versions of this specification preceding \r
+ 6.2 in support of the ZIP64 large file feature. The \r
+ introduction of the Central Directory Encryption feature \r
+ implemented in version 6.2 as part of the Strong Encryption \r
+ Specification defines Version 2 of this record structure. \r
+ Refer to the section describing the Strong Encryption \r
+ Specification for details on the version 2 format for \r
+ this record.\r
+\r
+ Special purpose data may reside in the zip64 extensible data\r
+ sector field following either a V1 or V2 version of this\r
+ record. To ensure identification of this special purpose data\r
+ it must include an identifying header block consisting of the\r
+ following:\r
+\r
+ Header ID - 2 bytes\r
+ Data Size - 4 bytes\r
+\r
+ The Header ID field indicates the type of data that is in the \r
+ data block that follows.\r
+\r
+ Data Size identifies the number of bytes that follow for this\r
+ data block type.\r
+\r
+ Multiple special purpose data blocks may be present, but each\r
+ must be preceded by a Header ID and Data Size field. Current\r
+ mappings of Header ID values supported in this field are as\r
+ defined in APPENDIX C.\r
+\r
+ H. Zip64 end of central directory locator\r
+\r
+ zip64 end of central dir locator \r
+ signature 4 bytes (0x07064b50)\r
+ number of the disk with the\r
+ start of the zip64 end of \r
+ central directory 4 bytes\r
+ relative offset of the zip64\r
+ end of central directory record 8 bytes\r
+ total number of disks 4 bytes\r
+ \r
+ I. End of central directory record:\r
+\r
+ end of central dir signature 4 bytes (0x06054b50)\r
+ number of this disk 2 bytes\r
+ number of the disk with the\r
+ start of the central directory 2 bytes\r
+ total number of entries in the\r
+ central directory on this disk 2 bytes\r
+ total number of entries in\r
+ the central directory 2 bytes\r
+ size of the central directory 4 bytes\r
+ offset of start of central\r
+ directory with respect to\r
+ the starting disk number 4 bytes\r
+ .ZIP file comment length 2 bytes\r
+ .ZIP file comment (variable size)\r
+\r
+ J. Explanation of fields:\r
+\r
+ version made by (2 bytes)\r
+\r
+ The upper byte indicates the compatibility of the file\r
+ attribute information. If the external file attributes \r
+ are compatible with MS-DOS and can be read by PKZIP for \r
+ DOS version 2.04g then this value will be zero. If these \r
+ attributes are not compatible, then this value will \r
+ identify the host system on which the attributes are \r
+ compatible. Software can use this information to determine\r
+ the line record format for text files etc. The current\r
+ mappings are:\r
+\r
+ 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)\r
+ 1 - Amiga 2 - OpenVMS\r
+ 3 - UNIX 4 - VM/CMS\r
+ 5 - Atari ST 6 - OS/2 H.P.F.S.\r
+ 7 - Macintosh 8 - Z-System\r
+ 9 - CP/M 10 - Windows NTFS\r
+ 11 - MVS (OS/390 - Z/OS) 12 - VSE\r
+ 13 - Acorn Risc 14 - VFAT\r
+ 15 - alternate MVS 16 - BeOS\r
+ 17 - Tandem 18 - OS/400\r
+ 19 - OS/X (Darwin) 20 thru 255 - unused\r
+\r
+ The lower byte indicates the ZIP specification version \r
+ (the version of this document) supported by the software \r
+ used to encode the file. The value/10 indicates the major \r
+ version number, and the value mod 10 is the minor version \r
+ number. \r
+\r
+ version needed to extract (2 bytes)\r
+\r
+ The minimum supported ZIP specification version needed to \r
+ extract the file, mapped as above. This value is based on \r
+ the specific format features a ZIP program must support to \r
+ be able to extract the file. If multiple features are\r
+ applied to a file, the minimum version should be set to the \r
+ feature having the highest value. New features or feature \r
+ changes affecting the published format specification will be \r
+ implemented using higher version numbers than the last \r
+ published value to avoid conflict.\r
+\r
+ Current minimum feature versions are as defined below:\r
+\r
+ 1.0 - Default value\r
+ 1.1 - File is a volume label\r
+ 2.0 - File is a folder (directory)\r
+ 2.0 - File is compressed using Deflate compression\r
+ 2.0 - File is encrypted using traditional PKWARE encryption\r
+ 2.1 - File is compressed using Deflate64(tm)\r
+ 2.5 - File is compressed using PKWARE DCL Implode \r
+ 2.7 - File is a patch data set \r
+ 4.5 - File uses ZIP64 format extensions\r
+ 4.6 - File is compressed using BZIP2 compression*\r
+ 5.0 - File is encrypted using DES\r
+ 5.0 - File is encrypted using 3DES\r
+ 5.0 - File is encrypted using original RC2 encryption\r
+ 5.0 - File is encrypted using RC4 encryption\r
+ 5.1 - File is encrypted using AES encryption\r
+ 5.1 - File is encrypted using corrected RC2 encryption**\r
+ 5.2 - File is encrypted using corrected RC2-64 encryption**\r
+ 6.1 - File is encrypted using non-OAEP key wrapping***\r
+ 6.2 - Central directory encryption\r
+ 6.3 - File is compressed using LZMA\r
+ 6.3 - File is compressed using PPMd+\r
+ 6.3 - File is encrypted using Blowfish\r
+ 6.3 - File is encrypted using Twofish\r
+\r
+\r
+ * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the\r
+ version needed to extract for BZIP2 compression to be 50\r
+ when it should have been 46.\r
+\r
+ ** Refer to the section on Strong Encryption Specification\r
+ for additional information regarding RC2 corrections.\r
+\r
+ *** Certificate encryption using non-OAEP key wrapping is the\r
+ intended mode of operation for all versions beginning with 6.1.\r
+ Support for OAEP key wrapping should only be used for\r
+ backward compatibility when sending ZIP files to be opened by\r
+ versions of PKZIP older than 6.1 (5.0 or 6.0).\r
+\r
+ + Files compressed using PPMd should set the version\r
+ needed to extract field to 6.3, however, not all ZIP \r
+ programs enforce this and may be unable to decompress \r
+ data files compressed using PPMd if this value is set.\r
+\r
+ When using ZIP64 extensions, the corresponding value in the\r
+ zip64 end of central directory record should also be set. \r
+ This field should be set appropriately to indicate whether \r
+ Version 1 or Version 2 format is in use. \r
+\r
+ general purpose bit flag: (2 bytes)\r
+\r
+ Bit 0: If set, indicates that the file is encrypted.\r
+\r
+ (For Method 6 - Imploding)\r
+ Bit 1: If the compression method used was type 6,\r
+ Imploding, then this bit, if set, indicates\r
+ an 8K sliding dictionary was used. If clear,\r
+ then a 4K sliding dictionary was used.\r
+ Bit 2: If the compression method used was type 6,\r
+ Imploding, then this bit, if set, indicates\r
+ 3 Shannon-Fano trees were used to encode the\r
+ sliding dictionary output. If clear, then 2\r
+ Shannon-Fano trees were used.\r
+\r
+ (For Methods 8 and 9 - Deflating)\r
+ Bit 2 Bit 1\r
+ 0 0 Normal (-en) compression option was used.\r
+ 0 1 Maximum (-exx/-ex) compression option was used.\r
+ 1 0 Fast (-ef) compression option was used.\r
+ 1 1 Super Fast (-es) compression option was used.\r
+\r
+ (For Method 14 - LZMA)\r
+ Bit 1: If the compression method used was type 14,\r
+ LZMA, then this bit, if set, indicates\r
+ an end-of-stream (EOS) marker is used to\r
+ mark the end of the compressed data stream.\r
+ If clear, then an EOS marker is not present\r
+ and the compressed data size must be known\r
+ to extract.\r
+\r
+ Note: Bits 1 and 2 are undefined if the compression\r
+ method is any other.\r
+\r
+ Bit 3: If this bit is set, the fields crc-32, compressed \r
+ size and uncompressed size are set to zero in the \r
+ local header. The correct values are put in the \r
+ data descriptor immediately following the compressed\r
+ data. (Note: PKZIP version 2.04g for DOS only \r
+ recognizes this bit for method 8 compression, newer \r
+ versions of PKZIP recognize this bit for any \r
+ compression method.)\r
+\r
+ Bit 4: Reserved for use with method 8, for enhanced\r
+ deflating. \r
+\r
+ Bit 5: If this bit is set, this indicates that the file is \r
+ compressed patched data. (Note: Requires PKZIP \r
+ version 2.70 or greater)\r
+\r
+ Bit 6: Strong encryption. If this bit is set, you should\r
+ set the version needed to extract value to at least\r
+ 50 and you must also set bit 0. If AES encryption\r
+ is used, the version needed to extract value must \r
+ be at least 51.\r
+\r
+ Bit 7: Currently unused.\r
+\r
+ Bit 8: Currently unused.\r
+\r
+ Bit 9: Currently unused.\r
+\r
+ Bit 10: Currently unused.\r
+\r
+ Bit 11: Language encoding flag (EFS). If this bit is set,\r
+ the filename and comment fields for this file\r
+ must be encoded using UTF-8. (see APPENDIX D)\r
+\r
+ Bit 12: Reserved by PKWARE for enhanced compression.\r
+\r
+ Bit 13: Used when encrypting the Central Directory to indicate \r
+ selected data values in the Local Header are masked to\r
+ hide their actual values. See the section describing \r
+ the Strong Encryption Specification for details.\r
+\r
+ Bit 14: Reserved by PKWARE.\r
+\r
+ Bit 15: Reserved by PKWARE.\r
+\r
+ compression method: (2 bytes)\r
+\r
+ (see accompanying documentation for algorithm\r
+ descriptions)\r
+\r
+ 0 - The file is stored (no compression)\r
+ 1 - The file is Shrunk\r
+ 2 - The file is Reduced with compression factor 1\r
+ 3 - The file is Reduced with compression factor 2\r
+ 4 - The file is Reduced with compression factor 3\r
+ 5 - The file is Reduced with compression factor 4\r
+ 6 - The file is Imploded\r
+ 7 - Reserved for Tokenizing compression algorithm\r
+ 8 - The file is Deflated\r
+ 9 - Enhanced Deflating using Deflate64(tm)\r
+ 10 - PKWARE Data Compression Library Imploding (old IBM TERSE)\r
+ 11 - Reserved by PKWARE\r
+ 12 - File is compressed using BZIP2 algorithm\r
+ 13 - Reserved by PKWARE\r
+ 14 - LZMA (EFS)\r
+ 15 - Reserved by PKWARE\r
+ 16 - Reserved by PKWARE\r
+ 17 - Reserved by PKWARE\r
+ 18 - File is compressed using IBM TERSE (new)\r
+ 19 - IBM LZ77 z Architecture (PFS)\r
+ 97 - WavPack compressed data\r
+ 98 - PPMd version I, Rev 1\r
+\r
+ date and time fields: (2 bytes each)\r
+\r
+ The date and time are encoded in standard MS-DOS format.\r
+ If input came from standard input, the date and time are\r
+ those at which compression was started for this data. \r
+ If encrypting the central directory and general purpose bit \r
+ flag 13 is set indicating masking, the value stored in the \r
+ Local Header will be zero. \r
+\r
+ CRC-32: (4 bytes)\r
+\r
+ The CRC-32 algorithm was generously contributed by\r
+ David Schwaderer and can be found in his excellent\r
+ book "C Programmers Guide to NetBIOS" published by\r
+ Howard W. Sams & Co. Inc. The 'magic number' for\r
+ the CRC is 0xdebb20e3. The proper CRC pre and post\r
+ conditioning is used, meaning that the CRC register\r
+ is pre-conditioned with all ones (a starting value\r
+ of 0xffffffff) and the value is post-conditioned by\r
+ taking the one's complement of the CRC residual.\r
+ If bit 3 of the general purpose flag is set, this\r
+ field is set to zero in the local header and the correct\r
+ value is put in the data descriptor and in the central\r
+ directory. When encrypting the central directory, if the\r
+ local header is not in ZIP64 format and general purpose \r
+ bit flag 13 is set indicating masking, the value stored \r
+ in the Local Header will be zero. \r
+\r
+ compressed size: (4 bytes)\r
+ uncompressed size: (4 bytes)\r
+\r
+ The size of the file compressed and uncompressed,\r
+ respectively. When a decryption header is present it will\r
+ be placed in front of the file data and the value of the\r
+ compressed file size will include the bytes of the decryption\r
+ header. If bit 3 of the general purpose bit flag is set, \r
+ these fields are set to zero in the local header and the \r
+ correct values are put in the data descriptor and\r
+ in the central directory. If an archive is in ZIP64 format\r
+ and the value in this field is 0xFFFFFFFF, the size will be\r
+ in the corresponding 8 byte ZIP64 extended information \r
+ extra field. When encrypting the central directory, if the\r
+ local header is not in ZIP64 format and general purpose bit \r
+ flag 13 is set indicating masking, the value stored for the \r
+ uncompressed size in the Local Header will be zero. \r
+\r
+ file name length: (2 bytes)\r
+ extra field length: (2 bytes)\r
+ file comment length: (2 bytes)\r
+\r
+ The length of the file name, extra field, and comment\r
+ fields respectively. The combined length of any\r
+ directory record and these three fields should not\r
+ generally exceed 65,535 bytes. If input came from standard\r
+ input, the file name length is set to zero. \r
+\r
+ disk number start: (2 bytes)\r
+\r
+ The number of the disk on which this file begins. If an \r
+ archive is in ZIP64 format and the value in this field is \r
+ 0xFFFF, the size will be in the corresponding 4 byte zip64 \r
+ extended information extra field.\r
+\r
+ internal file attributes: (2 bytes)\r
+\r
+ Bits 1 and 2 are reserved for use by PKWARE.\r
+\r
+ The lowest bit of this field indicates, if set, that\r
+ the file is apparently an ASCII or text file. If not\r
+ set, that the file apparently contains binary data.\r
+ The remaining bits are unused in version 1.0.\r
+\r
+ The 0x0002 bit of this field indicates, if set, that a \r
+ 4 byte variable record length control field precedes each \r
+ logical record indicating the length of the record. The \r
+ record length control field is stored in little-endian byte\r
+ order. This flag is independent of text control characters, \r
+ and if used in conjunction with text data, includes any \r
+ control characters in the total length of the record. This \r
+ value is provided for mainframe data transfer support.\r
+\r
+ external file attributes: (4 bytes)\r
+\r
+ The mapping of the external attributes is\r
+ host-system dependent (see 'version made by'). For\r
+ MS-DOS, the low order byte is the MS-DOS directory\r
+ attribute byte. If input came from standard input, this\r
+ field is set to zero.\r
+\r
+ relative offset of local header: (4 bytes)\r
+\r
+ This is the offset from the start of the first disk on\r
+ which this file appears, to where the local header should\r
+ be found. If an archive is in ZIP64 format and the value\r
+ in this field is 0xFFFFFFFF, the size will be in the \r
+ corresponding 8 byte zip64 extended information extra field.\r
+\r
+ file name: (Variable)\r
+\r
+ The name of the file, with optional relative path.\r
+ The path stored should not contain a drive or\r
+ device letter, or a leading slash. All slashes\r
+ should be forward slashes '/' as opposed to\r
+ backwards slashes '\' for compatibility with Amiga\r
+ and UNIX file systems etc. If input came from standard\r
+ input, there is no file name field. If encrypting\r
+ the central directory and general purpose bit flag 13 is set \r
+ indicating masking, the file name stored in the Local Header \r
+ will not be the actual file name. A masking value consisting \r
+ of a unique hexadecimal value will be stored. This value will \r
+ be sequentially incremented for each file in the archive. See\r
+ the section on the Strong Encryption Specification for details \r
+ on retrieving the encrypted file name. \r
+\r
+ extra field: (Variable)\r
+\r
+ This is for expansion. If additional information\r
+ needs to be stored for special needs or for specific \r
+ platforms, it should be stored here. Earlier versions \r
+ of the software can then safely skip this file, and \r
+ find the next file or header. This field will be 0 \r
+ length in version 1.0.\r
+\r
+ In order to allow different programs and different types\r
+ of information to be stored in the 'extra' field in .ZIP\r
+ files, the following structure should be used for all\r
+ programs storing data in this field:\r
+\r
+ header1+data1 + header2+data2 . . .\r
+\r
+ Each header should consist of:\r
+\r
+ Header ID - 2 bytes\r
+ Data Size - 2 bytes\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ The Header ID field indicates the type of data that is in\r
+ the following data block.\r
+\r
+ Header ID's of 0 thru 31 are reserved for use by PKWARE.\r
+ The remaining ID's can be used by third party vendors for\r
+ proprietary usage.\r
+\r
+ The current Header ID mappings defined by PKWARE are:\r
+\r
+ 0x0001 Zip64 extended information extra field\r
+ 0x0007 AV Info\r
+ 0x0008 Reserved for extended language encoding data (PFS)\r
+ (see APPENDIX D)\r
+ 0x0009 OS/2\r
+ 0x000a NTFS \r
+ 0x000c OpenVMS\r
+ 0x000d UNIX\r
+ 0x000e Reserved for file stream and fork descriptors\r
+ 0x000f Patch Descriptor\r
+ 0x0014 PKCS#7 Store for X.509 Certificates\r
+ 0x0015 X.509 Certificate ID and Signature for \r
+ individual file\r
+ 0x0016 X.509 Certificate ID for Central Directory\r
+ 0x0017 Strong Encryption Header\r
+ 0x0018 Record Management Controls\r
+ 0x0019 PKCS#7 Encryption Recipient Certificate List\r
+ 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes \r
+ - uncompressed\r
+ 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) \r
+ attributes - compressed\r
+ 0x4690 POSZIP 4690 (reserved) \r
+\r
+ Third party mappings commonly used are:\r
+\r
+\r
+ 0x07c8 Macintosh\r
+ 0x2605 ZipIt Macintosh\r
+ 0x2705 ZipIt Macintosh 1.3.5+\r
+ 0x2805 ZipIt Macintosh 1.3.5+\r
+ 0x334d Info-ZIP Macintosh\r
+ 0x4341 Acorn/SparkFS \r
+ 0x4453 Windows NT security descriptor (binary ACL)\r
+ 0x4704 VM/CMS\r
+ 0x470f MVS\r
+ 0x4b46 FWKCS MD5 (see below)\r
+ 0x4c41 OS/2 access control list (text ACL)\r
+ 0x4d49 Info-ZIP OpenVMS\r
+ 0x4f4c Xceed original location extra field\r
+ 0x5356 AOS/VS (ACL)\r
+ 0x5455 extended timestamp\r
+ 0x554e Xceed unicode extra field\r
+ 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc)\r
+ 0x6375 Info-ZIP Unicode Comment Extra Field\r
+ 0x6542 BeOS/BeBox\r
+ 0x7075 Info-ZIP Unicode Path Extra Field\r
+ 0x756e ASi UNIX\r
+ 0x7855 Info-ZIP UNIX (new)\r
+ 0xa220 Microsoft Open Packaging Growth Hint\r
+ 0xfd4a SMS/QDOS\r
+\r
+ Detailed descriptions of Extra Fields defined by third \r
+ party mappings will be documented as information on\r
+ these data structures is made available to PKWARE. \r
+ PKWARE does not guarantee the accuracy of any published\r
+ third party data.\r
+\r
+ The Data Size field indicates the size of the following\r
+ data block. Programs can use this value to skip to the\r
+ next header block, passing over any data blocks that are\r
+ not of interest.\r
+\r
+ Note: As stated above, the size of the entire .ZIP file\r
+ header, including the file name, comment, and extra\r
+ field should not exceed 64K in size.\r
+\r
+ In case two different programs should appropriate the same\r
+ Header ID value, it is strongly recommended that each\r
+ program place a unique signature of at least two bytes in\r
+ size (and preferably 4 bytes or bigger) at the start of\r
+ each data area. Every program should verify that its\r
+ unique signature is present, in addition to the Header ID\r
+ value being correct, before assuming that it is a block of\r
+ known type.\r
+\r
+ -Zip64 Extended Information Extra Field (0x0001):\r
+\r
+ The following is the layout of the zip64 extended \r
+ information "extra" block. If one of the size or\r
+ offset fields in the Local or Central directory\r
+ record is too small to hold the required data,\r
+ a Zip64 extended information record is created.\r
+ The order of the fields in the zip64 extended \r
+ information record is fixed, but the fields will\r
+ only appear if the corresponding Local or Central\r
+ directory record field is set to 0xFFFF or 0xFFFFFFFF.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (ZIP64) 0x0001 2 bytes Tag for this "extra" block type\r
+ Size 2 bytes Size of this "extra" block\r
+ Original \r
+ Size 8 bytes Original uncompressed file size\r
+ Compressed\r
+ Size 8 bytes Size of compressed data\r
+ Relative Header\r
+ Offset 8 bytes Offset of local header record\r
+ Disk Start\r
+ Number 4 bytes Number of the disk on which\r
+ this file starts \r
+\r
+ This entry in the Local header must include BOTH original\r
+ and compressed file size fields. If encrypting the \r
+ central directory and bit 13 of the general purpose bit\r
+ flag is set indicating masking, the value stored in the\r
+ Local Header for the original file size will be zero.\r
+\r
+\r
+ -OS/2 Extra Field (0x0009):\r
+\r
+ The following is the layout of the OS/2 attributes "extra" \r
+ block. (Last Revision 09/05/95)\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (OS/2) 0x0009 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size for the following data block\r
+ BSize 4 bytes Uncompressed Block Size\r
+ CType 2 bytes Compression type\r
+ EACRC 4 bytes CRC value for uncompress block\r
+ (var) variable Compressed block\r
+\r
+ The OS/2 extended attribute structure (FEA2LIST) is \r
+ compressed and then stored in it's entirety within this \r
+ structure. There will only ever be one "block" of data in \r
+ VarFields[].\r
+\r
+ -NTFS Extra Field (0x000a):\r
+\r
+ The following is the layout of the NTFS attributes \r
+ "extra" block. (Note: At this time the Mtime, Atime\r
+ and Ctime values may be used on any WIN32 system.) \r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (NTFS) 0x000a 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of the total "extra" block\r
+ Reserved 4 bytes Reserved for future use\r
+ Tag1 2 bytes NTFS attribute tag value #1\r
+ Size1 2 bytes Size of attribute #1, in bytes\r
+ (var.) Size1 Attribute #1 data\r
+ .\r
+ .\r
+ .\r
+ TagN 2 bytes NTFS attribute tag value #N\r
+ SizeN 2 bytes Size of attribute #N, in bytes\r
+ (var.) SizeN Attribute #N data\r
+\r
+ For NTFS, values for Tag1 through TagN are as follows:\r
+ (currently only one set of attributes is defined for NTFS)\r
+\r
+ Tag Size Description\r
+ ----- ---- -----------\r
+ 0x0001 2 bytes Tag for attribute #1 \r
+ Size1 2 bytes Size of attribute #1, in bytes\r
+ Mtime 8 bytes File last modification time\r
+ Atime 8 bytes File last access time\r
+ Ctime 8 bytes File creation time\r
+\r
+ -OpenVMS Extra Field (0x000c):\r
+\r
+ The following is the layout of the OpenVMS attributes \r
+ "extra" block.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (VMS) 0x000c 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of the total "extra" block\r
+ CRC 4 bytes 32-bit CRC for remainder of the block\r
+ Tag1 2 bytes OpenVMS attribute tag value #1\r
+ Size1 2 bytes Size of attribute #1, in bytes\r
+ (var.) Size1 Attribute #1 data\r
+ .\r
+ .\r
+ .\r
+ TagN 2 bytes OpenVMS attribute tag value #N\r
+ SizeN 2 bytes Size of attribute #N, in bytes\r
+ (var.) SizeN Attribute #N data\r
+\r
+ Rules:\r
+\r
+ 1. There will be one or more of attributes present, which \r
+ will each be preceded by the above TagX & SizeX values. \r
+ These values are identical to the ATR$C_XXXX and \r
+ ATR$S_XXXX constants which are defined in ATR.H under \r
+ OpenVMS C. Neither of these values will ever be zero.\r
+\r
+ 2. No word alignment or padding is performed.\r
+\r
+ 3. A well-behaved PKZIP/OpenVMS program should never produce\r
+ more than one sub-block with the same TagX value. Also,\r
+ there will never be more than one "extra" block of type\r
+ 0x000c in a particular directory record.\r
+\r
+ -UNIX Extra Field (0x000d):\r
+\r
+ The following is the layout of the UNIX "extra" block.\r
+ Note: all fields are stored in Intel low-byte/high-byte \r
+ order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (UNIX) 0x000d 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size for the following data block\r
+ Atime 4 bytes File last access time\r
+ Mtime 4 bytes File last modification time\r
+ Uid 2 bytes File user ID\r
+ Gid 2 bytes File group ID\r
+ (var) variable Variable length data field\r
+\r
+ The variable length data field will contain file type \r
+ specific data. Currently the only values allowed are\r
+ the original "linked to" file names for hard or symbolic \r
+ links, and the major and minor device node numbers for\r
+ character and block device nodes. Since device nodes\r
+ cannot be either symbolic or hard links, only one set of\r
+ variable length data is stored. Link files will have the\r
+ name of the original file stored. This name is NOT NULL\r
+ terminated. Its size can be determined by checking TSize -\r
+ 12. Device entries will have eight bytes stored as two 4\r
+ byte entries (in little endian format). The first entry\r
+ will be the major device number, and the second the minor\r
+ device number.\r
+ \r
+ -PATCH Descriptor Extra Field (0x000f):\r
+\r
+ The following is the layout of the Patch Descriptor "extra"\r
+ block.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Patch) 0x000f 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of the total "extra" block\r
+ Version 2 bytes Version of the descriptor\r
+ Flags 4 bytes Actions and reactions (see below) \r
+ OldSize 4 bytes Size of the file about to be patched \r
+ OldCRC 4 bytes 32-bit CRC of the file to be patched \r
+ NewSize 4 bytes Size of the resulting file \r
+ NewCRC 4 bytes 32-bit CRC of the resulting file \r
+\r
+ Actions and reactions\r
+\r
+ Bits Description\r
+ ---- ----------------\r
+ 0 Use for auto detection\r
+ 1 Treat as a self-patch\r
+ 2-3 RESERVED\r
+ 4-5 Action (see below)\r
+ 6-7 RESERVED\r
+ 8-9 Reaction (see below) to absent file \r
+ 10-11 Reaction (see below) to newer file\r
+ 12-13 Reaction (see below) to unknown file\r
+ 14-15 RESERVED\r
+ 16-31 RESERVED\r
+\r
+ Actions\r
+\r
+ Action Value\r
+ ------ ----- \r
+ none 0\r
+ add 1\r
+ delete 2\r
+ patch 3\r
+\r
+ Reactions\r
+ \r
+ Reaction Value\r
+ -------- -----\r
+ ask 0\r
+ skip 1\r
+ ignore 2\r
+ fail 3\r
+\r
+ Patch support is provided by PKPatchMaker(tm) technology and is \r
+ covered under U.S. Patents and Patents Pending. The use or \r
+ implementation in a product of certain technological aspects set\r
+ forth in the current APPNOTE, including those with regard to \r
+ strong encryption, patching, or extended tape operations requires\r
+ a license from PKWARE. Please contact PKWARE with regard to \r
+ acquiring a license. \r
+\r
+ -PKCS#7 Store for X.509 Certificates (0x0014):\r
+\r
+ This field contains information about each of the certificates \r
+ files may be signed with. When the Central Directory Encryption \r
+ feature is enabled for a ZIP file, this record will appear in \r
+ the Archive Extra Data Record, otherwise it will appear in the \r
+ first central directory record and will be ignored in any \r
+ other record.\r
+ \r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Store) 0x0014 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of the store data\r
+ TData TSize Data about the store\r
+\r
+\r
+ -X.509 Certificate ID and Signature for individual file (0x0015):\r
+\r
+ This field contains the information about which certificate in \r
+ the PKCS#7 store was used to sign a particular file. It also \r
+ contains the signature data. This field can appear multiple \r
+ times, but can only appear once per certificate.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (CID) 0x0015 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of data that follows\r
+ TData TSize Signature Data\r
+\r
+ -X.509 Certificate ID and Signature for central directory (0x0016):\r
+\r
+ This field contains the information about which certificate in \r
+ the PKCS#7 store was used to sign the central directory structure.\r
+ When the Central Directory Encryption feature is enabled for a \r
+ ZIP file, this record will appear in the Archive Extra Data Record, \r
+ otherwise it will appear in the first central directory record.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (CDID) 0x0016 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of data that follows\r
+ TData TSize Data\r
+\r
+ -Strong Encryption Header (0x0017):\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ 0x0017 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of data that follows\r
+ Format 2 bytes Format definition for this record\r
+ AlgID 2 bytes Encryption algorithm identifier\r
+ Bitlen 2 bytes Bit length of encryption key\r
+ Flags 2 bytes Processing flags\r
+ CertData TSize-8 Certificate decryption extra field data\r
+ (refer to the explanation for CertData\r
+ in the section describing the \r
+ Certificate Processing Method under \r
+ the Strong Encryption Specification)\r
+\r
+\r
+ -Record Management Controls (0x0018):\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+(Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type\r
+ CSize 2 bytes Size of total extra block data\r
+ Tag1 2 bytes Record control attribute 1\r
+ Size1 2 bytes Size of attribute 1, in bytes\r
+ Data1 Size1 Attribute 1 data\r
+ .\r
+ .\r
+ .\r
+ TagN 2 bytes Record control attribute N\r
+ SizeN 2 bytes Size of attribute N, in bytes\r
+ DataN SizeN Attribute N data\r
+\r
+\r
+ -PKCS#7 Encryption Recipient Certificate List (0x0019): \r
+\r
+ This field contains information about each of the certificates\r
+ used in encryption processing and it can be used to identify who is\r
+ allowed to decrypt encrypted files. This field should only appear \r
+ in the archive extra data record. This field is not required and \r
+ serves only to aide archive modifications by preserving public \r
+ encryption key data. Individual security requirements may dictate \r
+ that this data be omitted to deter information exposure.\r
+\r
+ Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (CStore) 0x0019 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size of the store data\r
+ TData TSize Data about the store\r
+\r
+ TData:\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ Version 2 bytes Format version number - must 0x0001 at this time\r
+ CStore (var) PKCS#7 data blob\r
+\r
+\r
+ -MVS Extra Field (0x0065):\r
+\r
+ The following is the layout of the MVS "extra" block.\r
+ Note: Some fields are stored in Big Endian format.\r
+ All text is in EBCDIC format unless otherwise specified.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (MVS) 0x0065 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size for the following data block\r
+ ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or\r
+ "T4MV" for TargetFour\r
+ (var) TSize-4 Attribute data (see APPENDIX B)\r
+\r
+\r
+ -OS/400 Extra Field (0x0065):\r
+\r
+ The following is the layout of the OS/400 "extra" block.\r
+ Note: Some fields are stored in Big Endian format.\r
+ All text is in EBCDIC format unless otherwise specified.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (OS400) 0x0065 2 bytes Tag for this "extra" block type\r
+ TSize 2 bytes Size for the following data block\r
+ ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or\r
+ "T4MV" for TargetFour\r
+ (var) TSize-4 Attribute data (see APPENDIX A)\r
+\r
+\r
+ Third-party Mappings:\r
+ \r
+ -ZipIt Macintosh Extra Field (long) (0x2605):\r
+\r
+ The following is the layout of the ZipIt extra block \r
+ for Macintosh. The local-header and central-header versions \r
+ are identical. This block must be present if the file is \r
+ stored MacBinary-encoded and it should not be used if the file \r
+ is not stored MacBinary-encoded.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Mac2) 0x2605 Short tag for this extra block type\r
+ TSize Short total data size for this block\r
+ "ZPIT" beLong extra-field signature\r
+ FnLen Byte length of FileName\r
+ FileName variable full Macintosh filename\r
+ FileType Byte[4] four-byte Mac file type string\r
+ Creator Byte[4] four-byte Mac creator string\r
+\r
+\r
+ -ZipIt Macintosh Extra Field (short, for files) (0x2705):\r
+\r
+ The following is the layout of a shortened variant of the\r
+ ZipIt extra block for Macintosh (without "full name" entry).\r
+ This variant is used by ZipIt 1.3.5 and newer for entries of\r
+ files (not directories) that do not have a MacBinary encoded\r
+ file. The local-header and central-header versions are identical.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Mac2b) 0x2705 Short tag for this extra block type\r
+ TSize Short total data size for this block (12)\r
+ "ZPIT" beLong extra-field signature\r
+ FileType Byte[4] four-byte Mac file type string\r
+ Creator Byte[4] four-byte Mac creator string\r
+ fdFlags beShort attributes from FInfo.frFlags,\r
+ may be omitted\r
+ 0x0000 beShort reserved, may be omitted\r
+\r
+\r
+ -ZipIt Macintosh Extra Field (short, for directories) (0x2805):\r
+\r
+ The following is the layout of a shortened variant of the\r
+ ZipIt extra block for Macintosh used only for directory\r
+ entries. This variant is used by ZipIt 1.3.5 and newer to \r
+ save some optional Mac-specific information about directories.\r
+ The local-header and central-header versions are identical.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Mac2c) 0x2805 Short tag for this extra block type\r
+ TSize Short total data size for this block (12)\r
+ "ZPIT" beLong extra-field signature\r
+ frFlags beShort attributes from DInfo.frFlags, may\r
+ be omitted\r
+ View beShort ZipIt view flag, may be omitted\r
+\r
+\r
+ The View field specifies ZipIt-internal settings as follows:\r
+\r
+ Bits of the Flags:\r
+ bit 0 if set, the folder is shown expanded (open)\r
+ when the archive contents are viewed in ZipIt.\r
+ bits 1-15 reserved, zero;\r
+\r
+\r
+ -FWKCS MD5 Extra Field (0x4b46):\r
+\r
+ The FWKCS Contents_Signature System, used in\r
+ automatically identifying files independent of file name,\r
+ optionally adds and uses an extra field to support the\r
+ rapid creation of an enhanced contents_signature:\r
+\r
+ Header ID = 0x4b46\r
+ Data Size = 0x0013\r
+ Preface = 'M','D','5'\r
+ followed by 16 bytes containing the uncompressed file's\r
+ 128_bit MD5 hash(1), low byte first.\r
+\r
+ When FWKCS revises a .ZIP file central directory to add\r
+ this extra field for a file, it also replaces the\r
+ central directory entry for that file's uncompressed\r
+ file length with a measured value.\r
+\r
+ FWKCS provides an option to strip this extra field, if\r
+ present, from a .ZIP file central directory. In adding\r
+ this extra field, FWKCS preserves .ZIP file Authenticity\r
+ Verification; if stripping this extra field, FWKCS\r
+ preserves all versions of AV through PKZIP version 2.04g.\r
+\r
+ FWKCS, and FWKCS Contents_Signature System, are\r
+ trademarks of Frederick W. Kantor.\r
+\r
+ (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer\r
+ Science and RSA Data Security, Inc., April 1992.\r
+ ll.76-77: "The MD5 algorithm is being placed in the\r
+ public domain for review and possible adoption as a\r
+ standard."\r
+\r
+\r
+ -Info-ZIP Unicode Comment Extra Field (0x6375):\r
+\r
+ Stores the UTF-8 version of the file comment as stored in the\r
+ central directory header. (Last Revision 20070912)\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (UCom) 0x6375 Short tag for this extra block type ("uc")\r
+ TSize Short total data size for this block\r
+ Version 1 byte version of this extra field, currently 1\r
+ ComCRC32 4 bytes Comment Field CRC32 Checksum\r
+ UnicodeCom Variable UTF-8 version of the entry comment\r
+\r
+ Currently Version is set to the number 1. If there is a need\r
+ to change this field, the version will be incremented. Changes\r
+ may not be backward compatible so this extra field should not be\r
+ used if the version is not recognized.\r
+\r
+ The ComCRC32 is the standard zip CRC32 checksum of the File Comment\r
+ field in the central directory header. This is used to verify that\r
+ the comment field has not changed since the Unicode Comment extra field\r
+ was created. This can happen if a utility changes the File Comment \r
+ field but does not update the UTF-8 Comment extra field. If the CRC \r
+ check fails, this Unicode Comment extra field should be ignored and \r
+ the File Comment field in the header should be used instead.\r
+\r
+ The UnicodeCom field is the UTF-8 version of the File Comment field\r
+ in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte\r
+ order mark (BOM) is used. The length of this field is determined by\r
+ subtracting the size of the previous fields from TSize. If both the\r
+ File Name and Comment fields are UTF-8, the new General Purpose Bit\r
+ Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate\r
+ both the header File Name and Comment fields are UTF-8 and, in this\r
+ case, the Unicode Path and Unicode Comment extra fields are not\r
+ needed and should not be created. Note that, for backward\r
+ compatibility, bit 11 should only be used if the native character set\r
+ of the paths and comments being zipped up are already in UTF-8. It is\r
+ expected that the same file comment storage method, either general\r
+ purpose bit 11 or extra fields, be used in both the Local and Central\r
+ Directory Header for a file.\r
+\r
+\r
+ -Info-ZIP Unicode Path Extra Field (0x7075):\r
+\r
+ Stores the UTF-8 version of the file name field as stored in the\r
+ local header and central directory header. (Last Revision 20070912)\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (UPath) 0x7075 Short tag for this extra block type ("up")\r
+ TSize Short total data size for this block\r
+ Version 1 byte version of this extra field, currently 1\r
+ NameCRC32 4 bytes File Name Field CRC32 Checksum\r
+ UnicodeName Variable UTF-8 version of the entry File Name\r
+\r
+ Currently Version is set to the number 1. If there is a need\r
+ to change this field, the version will be incremented. Changes\r
+ may not be backward compatible so this extra field should not be\r
+ used if the version is not recognized.\r
+\r
+ The NameCRC32 is the standard zip CRC32 checksum of the File Name\r
+ field in the header. This is used to verify that the header\r
+ File Name field has not changed since the Unicode Path extra field\r
+ was created. This can happen if a utility renames the File Name but\r
+ does not update the UTF-8 path extra field. If the CRC check fails,\r
+ this UTF-8 Path Extra Field should be ignored and the File Name field\r
+ in the header should be used instead.\r
+\r
+ The UnicodeName is the UTF-8 version of the contents of the File Name\r
+ field in the header. As UnicodeName is defined to be UTF-8, no UTF-8\r
+ byte order mark (BOM) is used. The length of this field is determined\r
+ by subtracting the size of the previous fields from TSize. If both\r
+ the File Name and Comment fields are UTF-8, the new General Purpose\r
+ Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to\r
+ indicate that both the header File Name and Comment fields are UTF-8\r
+ and, in this case, the Unicode Path and Unicode Comment extra fields\r
+ are not needed and should not be created. Note that, for backward\r
+ compatibility, bit 11 should only be used if the native character set\r
+ of the paths and comments being zipped up are already in UTF-8. It is\r
+ expected that the same file name storage method, either general\r
+ purpose bit 11 or extra fields, be used in both the Local and Central\r
+ Directory Header for a file.\r
+ \r
+\r
+ -Microsoft Open Packaging Growth Hint (0xa220):\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ 0xa220 Short tag for this extra block type\r
+ TSize Short size of Sig + PadVal + Padding\r
+ Sig Short verification signature (A028)\r
+ PadVal Short Initial padding value\r
+ Padding variable filled with NULL characters\r
+\r
+\r
+ file comment: (Variable)\r
+\r
+ The comment for this file.\r
+\r
+ number of this disk: (2 bytes)\r
+\r
+ The number of this disk, which contains central\r
+ directory end record. If an archive is in ZIP64 format\r
+ and the value in this field is 0xFFFF, the size will \r
+ be in the corresponding 4 byte zip64 end of central \r
+ directory field.\r
+\r
+\r
+ number of the disk with the start of the central\r
+ directory: (2 bytes)\r
+\r
+ The number of the disk on which the central\r
+ directory starts. If an archive is in ZIP64 format\r
+ and the value in this field is 0xFFFF, the size will \r
+ be in the corresponding 4 byte zip64 end of central \r
+ directory field.\r
+\r
+ total number of entries in the central dir on \r
+ this disk: (2 bytes)\r
+\r
+ The number of central directory entries on this disk.\r
+ If an archive is in ZIP64 format and the value in \r
+ this field is 0xFFFF, the size will be in the \r
+ corresponding 8 byte zip64 end of central \r
+ directory field.\r
+\r
+ total number of entries in the central dir: (2 bytes)\r
+\r
+ The total number of files in the .ZIP file. If an \r
+ archive is in ZIP64 format and the value in this field\r
+ is 0xFFFF, the size will be in the corresponding 8 byte \r
+ zip64 end of central directory field.\r
+\r
+ size of the central directory: (4 bytes)\r
+\r
+ The size (in bytes) of the entire central directory.\r
+ If an archive is in ZIP64 format and the value in \r
+ this field is 0xFFFFFFFF, the size will be in the \r
+ corresponding 8 byte zip64 end of central \r
+ directory field.\r
+\r
+ offset of start of central directory with respect to\r
+ the starting disk number: (4 bytes)\r
+\r
+ Offset of the start of the central directory on the\r
+ disk on which the central directory starts. If an \r
+ archive is in ZIP64 format and the value in this \r
+ field is 0xFFFFFFFF, the size will be in the \r
+ corresponding 8 byte zip64 end of central \r
+ directory field.\r
+\r
+ .ZIP file comment length: (2 bytes)\r
+\r
+ The length of the comment for this .ZIP file.\r
+\r
+ .ZIP file comment: (Variable)\r
+\r
+ The comment for this .ZIP file. ZIP file comment data\r
+ is stored unsecured. No encryption or data authentication\r
+ is applied to this area at this time. Confidential information\r
+ should not be stored in this section.\r
+\r
+ zip64 extensible data sector (variable size)\r
+\r
+ (currently reserved for use by PKWARE)\r
+\r
+\r
+ K. Splitting and Spanning ZIP files\r
+\r
+ Spanning is the process of segmenting a ZIP file across \r
+ multiple removable media. This support has typically only \r
+ been provided for DOS formatted floppy diskettes. \r
+\r
+ File splitting is a newer derivative of spanning. \r
+ Splitting follows the same segmentation process as\r
+ spanning, however, it does not require writing each\r
+ segment to a unique removable medium and instead supports\r
+ placing all pieces onto local or non-removable locations\r
+ such as file systems, local drives, folders, etc...\r
+\r
+ A key difference between spanned and split ZIP files is\r
+ that all pieces of a spanned ZIP file have the same name. \r
+ Since each piece is written to a separate volume, no name \r
+ collisions occur and each segment can reuse the original \r
+ .ZIP file name given to the archive.\r
+\r
+ Sequence ordering for DOS spanned archives uses the DOS \r
+ volume label to determine segment numbers. Volume labels\r
+ for each segment are written using the form PKBACK#xxx, \r
+ where xxx is the segment number written as a decimal \r
+ value from 001 - nnn.\r
+\r
+ Split ZIP files are typically written to the same location\r
+ and are subject to name collisions if the spanned name\r
+ format is used since each segment will reside on the same \r
+ drive. To avoid name collisions, split archives are named \r
+ as follows.\r
+\r
+ Segment 1 = filename.z01\r
+ Segment n-1 = filename.z(n-1)\r
+ Segment n = filename.zip\r
+\r
+ The .ZIP extension is used on the last segment to support\r
+ quickly reading the central directory. The segment number\r
+ n should be a decimal value.\r
+\r
+ Spanned ZIP files may be PKSFX Self-extracting ZIP files.\r
+ PKSFX files may also be split, however, in this case\r
+ the first segment must be named filename.exe. The first\r
+ segment of a split PKSFX archive must be large enough to\r
+ include the entire executable program.\r
+\r
+ Capacities for split archives are as follows.\r
+\r
+ Maximum number of segments = 4,294,967,295 - 1\r
+ Maximum .ZIP segment size = 4,294,967,295 bytes\r
+ Minimum segment size = 64K\r
+ Maximum PKSFX segment size = 2,147,483,647 bytes\r
+ \r
+ Segment sizes may be different however by convention, all \r
+ segment sizes should be the same with the exception of the \r
+ last, which may be smaller. Local and central directory \r
+ header records must never be split across a segment boundary. \r
+ When writing a header record, if the number of bytes remaining \r
+ within a segment is less than the size of the header record,\r
+ end the current segment and write the header at the start\r
+ of the next segment. The central directory may span segment\r
+ boundaries, but no single record in the central directory\r
+ should be split across segments.\r
+\r
+ Spanned/Split archives created using PKZIP for Windows\r
+ (V2.50 or greater), PKZIP Command Line (V2.50 or greater),\r
+ or PKZIP Explorer will include a special spanning \r
+ signature as the first 4 bytes of the first segment of\r
+ the archive. This signature (0x08074b50) will be \r
+ followed immediately by the local header signature for\r
+ the first file in the archive. \r
+\r
+ A special spanning marker may also appear in spanned/split \r
+ archives if the spanning or splitting process starts but \r
+ only requires one segment. In this case the 0x08074b50 \r
+ signature will be replaced with the temporary spanning \r
+ marker signature of 0x30304b50. Split archives can\r
+ only be uncompressed by other versions of PKZIP that\r
+ know how to create a split archive.\r
+\r
+ The signature value 0x08074b50 is also used by some\r
+ ZIP implementations as a marker for the Data Descriptor \r
+ record. Conflict in this alternate assignment can be\r
+ avoided by ensuring the position of the signature\r
+ within the ZIP file to determine the use for which it\r
+ is intended. \r
+\r
+ L. General notes:\r
+\r
+ 1) All fields unless otherwise noted are unsigned and stored\r
+ in Intel low-byte:high-byte, low-word:high-word order.\r
+\r
+ 2) String fields are not null terminated, since the\r
+ length is given explicitly.\r
+\r
+ 3) The entries in the central directory may not necessarily\r
+ be in the same order that files appear in the .ZIP file.\r
+\r
+ 4) If one of the fields in the end of central directory\r
+ record is too small to hold required data, the field\r
+ should be set to -1 (0xFFFF or 0xFFFFFFFF) and the\r
+ ZIP64 format record should be created.\r
+\r
+ 5) The end of central directory record and the\r
+ Zip64 end of central directory locator record must\r
+ reside on the same disk when splitting or spanning\r
+ an archive.\r
+\r
+VI. Explanation of compression methods\r
+--------------------------------------\r
+\r
+UnShrinking - Method 1\r
+----------------------\r
+\r
+Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm\r
+with partial clearing. The initial code size is 9 bits, and\r
+the maximum code size is 13 bits. Shrinking differs from\r
+conventional Dynamic Ziv-Lempel-Welch implementations in several\r
+respects:\r
+\r
+1) The code size is controlled by the compressor, and is not\r
+ automatically increased when codes larger than the current\r
+ code size are created (but not necessarily used). When\r
+ the decompressor encounters the code sequence 256\r
+ (decimal) followed by 1, it should increase the code size\r
+ read from the input stream to the next bit size. No\r
+ blocking of the codes is performed, so the next code at\r
+ the increased size should be read from the input stream\r
+ immediately after where the previous code at the smaller\r
+ bit size was read. Again, the decompressor should not\r
+ increase the code size used until the sequence 256,1 is\r
+ encountered.\r
+\r
+2) When the table becomes full, total clearing is not\r
+ performed. Rather, when the compressor emits the code\r
+ sequence 256,2 (decimal), the decompressor should clear\r
+ all leaf nodes from the Ziv-Lempel tree, and continue to\r
+ use the current code size. The nodes that are cleared\r
+ from the Ziv-Lempel tree are then re-used, with the lowest\r
+ code value re-used first, and the highest code value\r
+ re-used last. The compressor can emit the sequence 256,2\r
+ at any time.\r
+\r
+Expanding - Methods 2-5\r
+-----------------------\r
+\r
+The Reducing algorithm is actually a combination of two\r
+distinct algorithms. The first algorithm compresses repeated\r
+byte sequences, and the second algorithm takes the compressed\r
+stream from the first algorithm and applies a probabilistic\r
+compression method.\r
+\r
+The probabilistic compression stores an array of 'follower\r
+sets' S(j), for j=0 to 255, corresponding to each possible\r
+ASCII character. Each set contains between 0 and 32\r
+characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.\r
+The sets are stored at the beginning of the data area for a\r
+Reduced file, in reverse order, with S(255) first, and S(0)\r
+last.\r
+\r
+The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },\r
+where N(j) is the size of set S(j). N(j) can be 0, in which\r
+case the follower set for S(j) is empty. Each N(j) value is\r
+encoded in 6 bits, followed by N(j) eight bit character values\r
+corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If\r
+N(j) is 0, then no values for S(j) are stored, and the value\r
+for N(j-1) immediately follows.\r
+\r
+Immediately after the follower sets, is the compressed data\r
+stream. The compressed data stream can be interpreted for the\r
+probabilistic decompression as follows:\r
+\r
+let Last-Character <- 0.\r
+loop until done\r
+ if the follower set S(Last-Character) is empty then\r
+ read 8 bits from the input stream, and copy this\r
+ value to the output stream.\r
+ otherwise if the follower set S(Last-Character) is non-empty then\r
+ read 1 bit from the input stream.\r
+ if this bit is not zero then\r
+ read 8 bits from the input stream, and copy this\r
+ value to the output stream.\r
+ otherwise if this bit is zero then\r
+ read B(N(Last-Character)) bits from the input\r
+ stream, and assign this value to I.\r
+ Copy the value of S(Last-Character)[I] to the\r
+ output stream.\r
+\r
+ assign the last value placed on the output stream to\r
+ Last-Character.\r
+end loop\r
+\r
+B(N(j)) is defined as the minimal number of bits required to\r
+encode the value N(j)-1.\r
+\r
+The decompressed stream from above can then be expanded to\r
+re-create the original file as follows:\r
+\r
+let State <- 0.\r
+\r
+loop until done\r
+ read 8 bits from the input stream into C.\r
+ case State of\r
+ 0: if C is not equal to DLE (144 decimal) then\r
+ copy C to the output stream.\r
+ otherwise if C is equal to DLE then\r
+ let State <- 1.\r
+\r
+ 1: if C is non-zero then\r
+ let V <- C.\r
+ let Len <- L(V)\r
+ let State <- F(Len).\r
+ otherwise if C is zero then\r
+ copy the value 144 (decimal) to the output stream.\r
+ let State <- 0\r
+\r
+ 2: let Len <- Len + C\r
+ let State <- 3.\r
+\r
+ 3: move backwards D(V,C) bytes in the output stream\r
+ (if this position is before the start of the output\r
+ stream, then assume that all the data before the\r
+ start of the output stream is filled with zeros).\r
+ copy Len+3 bytes from this position to the output stream.\r
+ let State <- 0.\r
+ end case\r
+end loop\r
+\r
+The functions F,L, and D are dependent on the 'compression\r
+factor', 1 through 4, and are defined as follows:\r
+\r
+For compression factor 1:\r
+ L(X) equals the lower 7 bits of X.\r
+ F(X) equals 2 if X equals 127 otherwise F(X) equals 3.\r
+ D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.\r
+For compression factor 2:\r
+ L(X) equals the lower 6 bits of X.\r
+ F(X) equals 2 if X equals 63 otherwise F(X) equals 3.\r
+ D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.\r
+For compression factor 3:\r
+ L(X) equals the lower 5 bits of X.\r
+ F(X) equals 2 if X equals 31 otherwise F(X) equals 3.\r
+ D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.\r
+For compression factor 4:\r
+ L(X) equals the lower 4 bits of X.\r
+ F(X) equals 2 if X equals 15 otherwise F(X) equals 3.\r
+ D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.\r
+\r
+Imploding - Method 6\r
+--------------------\r
+\r
+The Imploding algorithm is actually a combination of two distinct\r
+algorithms. The first algorithm compresses repeated byte\r
+sequences using a sliding dictionary. The second algorithm is\r
+used to compress the encoding of the sliding dictionary output,\r
+using multiple Shannon-Fano trees.\r
+\r
+The Imploding algorithm can use a 4K or 8K sliding dictionary\r
+size. The dictionary size used can be determined by bit 1 in the\r
+general purpose flag word; a 0 bit indicates a 4K dictionary\r
+while a 1 bit indicates an 8K dictionary.\r
+\r
+The Shannon-Fano trees are stored at the start of the compressed\r
+file. The number of trees stored is defined by bit 2 in the\r
+general purpose flag word; a 0 bit indicates two trees stored, a\r
+1 bit indicates three trees are stored. If 3 trees are stored,\r
+the first Shannon-Fano tree represents the encoding of the\r
+Literal characters, the second tree represents the encoding of\r
+the Length information, the third represents the encoding of the\r
+Distance information. When 2 Shannon-Fano trees are stored, the\r
+Length tree is stored first, followed by the Distance tree.\r
+\r
+The Literal Shannon-Fano tree, if present is used to represent\r
+the entire ASCII character set, and contains 256 values. This\r
+tree is used to compress any data not compressed by the sliding\r
+dictionary algorithm. When this tree is present, the Minimum\r
+Match Length for the sliding dictionary is 3. If this tree is\r
+not present, the Minimum Match Length is 2.\r
+\r
+The Length Shannon-Fano tree is used to compress the Length part\r
+of the (length,distance) pairs from the sliding dictionary\r
+output. The Length tree contains 64 values, ranging from the\r
+Minimum Match Length, to 63 plus the Minimum Match Length.\r
+\r
+The Distance Shannon-Fano tree is used to compress the Distance\r
+part of the (length,distance) pairs from the sliding dictionary\r
+output. The Distance tree contains 64 values, ranging from 0 to\r
+63, representing the upper 6 bits of the distance value. The\r
+distance values themselves will be between 0 and the sliding\r
+dictionary size, either 4K or 8K.\r
+\r
+The Shannon-Fano trees themselves are stored in a compressed\r
+format. The first byte of the tree data represents the number of\r
+bytes of data representing the (compressed) Shannon-Fano tree\r
+minus 1. The remaining bytes represent the Shannon-Fano tree\r
+data encoded as:\r
+\r
+ High 4 bits: Number of values at this bit length + 1. (1 - 16)\r
+ Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)\r
+\r
+The Shannon-Fano codes can be constructed from the bit lengths\r
+using the following algorithm:\r
+\r
+1) Sort the Bit Lengths in ascending order, while retaining the\r
+ order of the original lengths stored in the file.\r
+\r
+2) Generate the Shannon-Fano trees:\r
+\r
+ Code <- 0\r
+ CodeIncrement <- 0\r
+ LastBitLength <- 0\r
+ i <- number of Shannon-Fano codes - 1 (either 255 or 63)\r
+\r
+ loop while i >= 0\r
+ Code = Code + CodeIncrement\r
+ if BitLength(i) <> LastBitLength then\r
+ LastBitLength=BitLength(i)\r
+ CodeIncrement = 1 shifted left (16 - LastBitLength)\r
+ ShannonCode(i) = Code\r
+ i <- i - 1\r
+ end loop\r
+\r
+3) Reverse the order of all the bits in the above ShannonCode()\r
+ vector, so that the most significant bit becomes the least\r
+ significant bit. For example, the value 0x1234 (hex) would\r
+ become 0x2C48 (hex).\r
+\r
+4) Restore the order of Shannon-Fano codes as originally stored\r
+ within the file.\r
+\r
+Example:\r
+\r
+ This example will show the encoding of a Shannon-Fano tree\r
+ of size 8. Notice that the actual Shannon-Fano trees used\r
+ for Imploding are either 64 or 256 entries in size.\r
+\r
+Example: 0x02, 0x42, 0x01, 0x13\r
+\r
+ The first byte indicates 3 values in this table. Decoding the\r
+ bytes:\r
+ 0x42 = 5 codes of 3 bits long\r
+ 0x01 = 1 code of 2 bits long\r
+ 0x13 = 2 codes of 4 bits long\r
+\r
+ This would generate the original bit length array of:\r
+ (3, 3, 3, 3, 3, 2, 4, 4)\r
+\r
+ There are 8 codes in this table for the values 0 thru 7. Using \r
+ the algorithm to obtain the Shannon-Fano codes produces:\r
+\r
+ Reversed Order Original\r
+Val Sorted Constructed Code Value Restored Length\r
+--- ------ ----------------- -------- -------- ------\r
+0: 2 1100000000000000 11 101 3\r
+1: 3 1010000000000000 101 001 3\r
+2: 3 1000000000000000 001 110 3\r
+3: 3 0110000000000000 110 010 3\r
+4: 3 0100000000000000 010 100 3\r
+5: 3 0010000000000000 100 11 2\r
+6: 4 0001000000000000 1000 1000 4\r
+7: 4 0000000000000000 0000 0000 4\r
+\r
+The values in the Val, Order Restored and Original Length columns\r
+now represent the Shannon-Fano encoding tree that can be used for\r
+decoding the Shannon-Fano encoded data. How to parse the\r
+variable length Shannon-Fano values from the data stream is beyond\r
+the scope of this document. (See the references listed at the end of\r
+this document for more information.) However, traditional decoding\r
+schemes used for Huffman variable length decoding, such as the\r
+Greenlaw algorithm, can be successfully applied.\r
+\r
+The compressed data stream begins immediately after the\r
+compressed Shannon-Fano data. The compressed data stream can be\r
+interpreted as follows:\r
+\r
+loop until done\r
+ read 1 bit from input stream.\r
+\r
+ if this bit is non-zero then (encoded data is literal data)\r
+ if Literal Shannon-Fano tree is present\r
+ read and decode character using Literal Shannon-Fano tree.\r
+ otherwise\r
+ read 8 bits from input stream.\r
+ copy character to the output stream.\r
+ otherwise (encoded data is sliding dictionary match)\r
+ if 8K dictionary size\r
+ read 7 bits for offset Distance (lower 7 bits of offset).\r
+ otherwise\r
+ read 6 bits for offset Distance (lower 6 bits of offset).\r
+\r
+ using the Distance Shannon-Fano tree, read and decode the\r
+ upper 6 bits of the Distance value.\r
+\r
+ using the Length Shannon-Fano tree, read and decode\r
+ the Length value.\r
+\r
+ Length <- Length + Minimum Match Length\r
+\r
+ if Length = 63 + Minimum Match Length\r
+ read 8 bits from the input stream,\r
+ add this value to Length.\r
+\r
+ move backwards Distance+1 bytes in the output stream, and\r
+ copy Length characters from this position to the output\r
+ stream. (if this position is before the start of the output\r
+ stream, then assume that all the data before the start of\r
+ the output stream is filled with zeros).\r
+end loop\r
+\r
+Tokenizing - Method 7\r
+---------------------\r
+\r
+This method is not used by PKZIP.\r
+\r
+Deflating - Method 8\r
+--------------------\r
+\r
+The Deflate algorithm is similar to the Implode algorithm using\r
+a sliding dictionary of up to 32K with secondary compression\r
+from Huffman/Shannon-Fano codes.\r
+\r
+The compressed data is stored in blocks with a header describing\r
+the block and the Huffman codes used in the data block. The header\r
+format is as follows:\r
+\r
+ Bit 0: Last Block bit This bit is set to 1 if this is the last\r
+ compressed block in the data.\r
+ Bits 1-2: Block type\r
+ 00 (0) - Block is stored - All stored data is byte aligned.\r
+ Skip bits until next byte, then next word = block \r
+ length, followed by the ones compliment of the block\r
+ length word. Remaining data in block is the stored \r
+ data.\r
+\r
+ 01 (1) - Use fixed Huffman codes for literal and distance codes.\r
+ Lit Code Bits Dist Code Bits\r
+ --------- ---- --------- ----\r
+ 0 - 143 8 0 - 31 5\r
+ 144 - 255 9\r
+ 256 - 279 7\r
+ 280 - 287 8\r
+\r
+ Literal codes 286-287 and distance codes 30-31 are \r
+ never used but participate in the huffman construction.\r
+\r
+ 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)\r
+\r
+ 11 (3) - Reserved - Flag a "Error in compressed data" if seen.\r
+\r
+Expanding Huffman Codes\r
+-----------------------\r
+If the data block is stored with dynamic Huffman codes, the Huffman\r
+codes are sent in the following compressed format:\r
+\r
+ 5 Bits: # of Literal codes sent - 256 (256 - 286)\r
+ All other codes are never sent.\r
+ 5 Bits: # of Dist codes - 1 (1 - 32)\r
+ 4 Bits: # of Bit Length codes - 3 (3 - 19)\r
+\r
+The Huffman codes are sent as bit lengths and the codes are built as\r
+described in the implode algorithm. The bit lengths themselves are\r
+compressed with Huffman codes. There are 19 bit length codes:\r
+\r
+ 0 - 15: Represent bit lengths of 0 - 15\r
+ 16: Copy the previous bit length 3 - 6 times.\r
+ The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)\r
+ Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will\r
+ expand to 12 bit lengths of 8 (1 + 6 + 5)\r
+ 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)\r
+ 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)\r
+\r
+The lengths of the bit length codes are sent packed 3 bits per value\r
+(0 - 7) in the following order:\r
+\r
+ 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15\r
+\r
+The Huffman codes should be built as described in the Implode algorithm\r
+except codes are assigned starting at the shortest bit length, i.e. the\r
+shortest code should be all 0's rather than all 1's. Also, codes with\r
+a bit length of zero do not participate in the tree construction. The\r
+codes are then used to decode the bit lengths for the literal and \r
+distance tables.\r
+\r
+The bit lengths for the literal tables are sent first with the number\r
+of entries sent described by the 5 bits sent earlier. There are up\r
+to 286 literal characters; the first 256 represent the respective 8\r
+bit character, code 256 represents the End-Of-Block code, the remaining\r
+29 codes represent copy lengths of 3 thru 258. There are up to 30\r
+distance codes representing distances from 1 thru 32k as described\r
+below.\r
+\r
+ Length Codes\r
+ ------------\r
+ Extra Extra Extra Extra\r
+ Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)\r
+ ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------\r
+ 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162\r
+ 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194\r
+ 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226\r
+ 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257\r
+ 261 0 7 269 2 19-22 277 4 67-82 285 0 258\r
+ 262 0 8 270 2 23-26 278 4 83-98\r
+ 263 0 9 271 2 27-30 279 4 99-114\r
+ 264 0 10 272 2 31-34 280 4 115-130\r
+\r
+ Distance Codes\r
+ --------------\r
+ Extra Extra Extra Extra\r
+ Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance\r
+ ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------\r
+ 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144\r
+ 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192\r
+ 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288\r
+ 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384\r
+ 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576\r
+ 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768\r
+ 6 2 9-12 14 6 129-192 22 10 2049-3072\r
+ 7 2 13-16 15 6 193-256 23 10 3073-4096\r
+\r
+The compressed data stream begins immediately after the\r
+compressed header data. The compressed data stream can be\r
+interpreted as follows:\r
+\r
+do\r
+ read header from input stream.\r
+\r
+ if stored block\r
+ skip bits until byte aligned\r
+ read count and 1's compliment of count\r
+ copy count bytes data block\r
+ otherwise\r
+ loop until end of block code sent\r
+ decode literal character from input stream\r
+ if literal < 256\r
+ copy character to the output stream\r
+ otherwise\r
+ if literal = end of block\r
+ break from loop\r
+ otherwise\r
+ decode distance from input stream\r
+\r
+ move backwards distance bytes in the output stream, and\r
+ copy length characters from this position to the output\r
+ stream.\r
+ end loop\r
+while not last block\r
+\r
+if data descriptor exists\r
+ skip bits until byte aligned\r
+ read crc and sizes\r
+endif\r
+\r
+Enhanced Deflating - Method 9\r
+-----------------------------\r
+\r
+The Enhanced Deflating algorithm is similar to Deflate but\r
+uses a sliding dictionary of up to 64K. Deflate64(tm) is supported\r
+by the Deflate extractor. \r
+\r
+BZIP2 - Method 12\r
+-----------------\r
+\r
+BZIP2 is an open-source data compression algorithm developed by \r
+Julian Seward. Information and source code for this algorithm\r
+can be found on the internet.\r
+\r
+LZMA - Method 14 (EFS)\r
+----------------------\r
+\r
+LZMA is a block-oriented, general purpose data compression algorithm \r
+developed and maintained by Igor Pavlov. It is a derivative of LZ77\r
+that utilizes Markov chains and a range coder. Information and \r
+source code for this algorithm can be found on the internet. Consult \r
+with the author of this algorithm for information on terms or \r
+restrictions on use.\r
+\r
+Support for LZMA within the ZIP format is defined as follows: \r
+\r
+The Compression method field within the ZIP Local and Central \r
+Header records will be set to the value 14 to indicate data was\r
+compressed using LZMA. \r
+\r
+The Version needed to extract field within the ZIP Local and \r
+Central Header records will be set to 6.3 to indicate the \r
+minimum ZIP format version supporting this feature.\r
+\r
+File data compressed using the LZMA algorithm must be placed \r
+immediately following the Local Header for the file. If a \r
+standard ZIP encryption header is required, it will follow \r
+the Local Header and will precede the LZMA compressed file \r
+data segment. The location of LZMA compressed data segment \r
+within the ZIP format will be as shown:\r
+\r
+ [local header file 1]\r
+ [encryption header file 1]\r
+ [LZMA compressed data segment for file 1]\r
+ [data descriptor 1]\r
+ [local header file 2]\r
+\r
+The encryption header and data descriptor records may\r
+be conditionally present. The LZMA Compressed Data Segment \r
+will consist of an LZMA Properties Header followed by the \r
+LZMA Compressed Data as shown:\r
+\r
+ [LZMA properties header for file 1]\r
+ [LZMA compressed data for file 1]\r
+\r
+The LZMA Compressed Data will be stored as provided by the \r
+LZMA compression library. Compressed size, uncompressed \r
+size and other file characteristics about the file being \r
+compressed must be stored in standard ZIP storage format.\r
+\r
+The LZMA Properties Header will store specific data required to \r
+decompress the LZMA compressed Data. This data is set by the \r
+LZMA compression engine using the function WriteCoderProperties() \r
+as documented within the LZMA SDK. \r
+ \r
+Storage fields for the property information within the LZMA \r
+Properties Header are as follows:\r
+\r
+ LZMA Version Information 2 bytes\r
+ LZMA Properties Size 2 bytes\r
+ LZMA Properties Data variable, defined by "LZMA Properties Size"\r
+\r
+LZMA Version Information - this field identifies which version of \r
+ the LZMA SDK was used to compress a file. The first byte will \r
+ store the major version number of the LZMA SDK and the second \r
+ byte will store the minor number. \r
+\r
+LZMA Properties Size - this field defines the size of the remaining \r
+ property data. Typically this size should be determined by the \r
+ version of the SDK. This size field is included as a convenience\r
+ and to help avoid any ambiguity should it arise in the future due\r
+ to changes in this compression algorithm. \r
+\r
+LZMA Property Data - this variable sized field records the required \r
+ values for the decompressor as defined by the LZMA SDK. The \r
+ data stored in this field should be obtained using the \r
+ WriteCoderProperties() in the version of the SDK defined by \r
+ the "LZMA Version Information" field. \r
+\r
+The layout of the "LZMA Properties Data" field is a function of the\r
+LZMA compression algorithm. It is possible that this layout may be\r
+changed by the author over time. The data layout in version 4.32 \r
+of the LZMA SDK defines a 5 byte array that uses 4 bytes to store \r
+the dictionary size in little-endian order. This is preceded by a \r
+single packed byte as the first element of the array that contains\r
+the following fields:\r
+\r
+ PosStateBits\r
+ LiteralPosStateBits\r
+ LiteralContextBits\r
+\r
+Refer to the LZMA documentation for a more detailed explanation of \r
+these fields. \r
+\r
+Data compressed with method 14, LZMA, may include an end-of-stream\r
+(EOS) marker ending the compressed data stream. This marker is not\r
+required, but its use is highly recommended to facilitate processing\r
+and implementers should include the EOS marker whenever possible.\r
+When the EOS marker is used, general purpose bit 1 must be set. If\r
+general purpose bit 1 is not set, the EOS marker is not present.\r
+\r
+WavPack - Method 97\r
+-------------------\r
+\r
+Information describing the use of compression method 97 is \r
+provided by WinZIP International, LLC. This method relies on the\r
+open source WavPack audio compression utility developed by David Bryant. \r
+Information on WavPack is available at www.wavpack.com. Please consult \r
+with the author of this algorithm for information on terms and \r
+restrictions on use.\r
+\r
+WavPack data for a file begins immediately after the end of the\r
+local header data. This data is the output from WavPack compression\r
+routines. Within the ZIP file, the use of WavPack compression is\r
+indicated by setting the compression method field to a value of 97 \r
+in both the local header and the central directory header. The Version \r
+needed to extract and version made by fields use the same values as are \r
+used for data compressed using the Deflate algorithm.\r
+\r
+An implementation note for storing digital sample data when using \r
+WavPack compression within ZIP files is that all of the bytes of\r
+the sample data should be compressed. This includes any unused\r
+bits up to the byte boundary. An example is a 2 byte sample that\r
+uses only 12 bits for the sample data with 4 unused bits. If only\r
+12 bits are passed as the sample size to the WavPack routines, the 4 \r
+unused bits will be set to 0 on extraction regardless of their original \r
+state. To avoid this, the full 16 bits of the sample data size\r
+should be provided. \r
+\r
+PPMd - Method 98\r
+----------------\r
+\r
+PPMd is a data compression algorithm developed by Dmitry Shkarin\r
+which includes a carryless rangecoder developed by Dmitry Subbotin.\r
+This algorithm is based on predictive phrase matching on multiple\r
+order contexts. Information and source code for this algorithm\r
+can be found on the internet. Consult with the author of this\r
+algorithm for information on terms or restrictions on use.\r
+\r
+Support for PPMd within the ZIP format currently is provided only \r
+for version I, revision 1 of the algorithm. Storage requirements\r
+for using this algorithm are as follows:\r
+\r
+Parameters needed to control the algorithm are stored in the two\r
+bytes immediately preceding the compressed data. These bytes are\r
+used to store the following fields:\r
+\r
+Model order - sets the maximum model order, default is 8, possible\r
+ values are from 2 to 16 inclusive\r
+\r
+Sub-allocator size - sets the size of sub-allocator in MB, default is 50,\r
+ possible values are from 1MB to 256MB inclusive\r
+\r
+Model restoration method - sets the method used to restart context\r
+ model at memory insufficiency, values are:\r
+\r
+ 0 - restarts model from scratch - default\r
+ 1 - cut off model - decreases performance by as much as 2x\r
+ 2 - freeze context tree - not recommended\r
+\r
+An example for packing these fields into the 2 byte storage field is\r
+illustrated below. These values are stored in Intel low-byte/high-byte\r
+order.\r
+\r
+wPPMd = (Model order - 1) + \r
+ ((Sub-allocator size - 1) << 4) + \r
+ (Model restoration method << 12)\r
+\r
+\r
+VII. Traditional PKWARE Encryption\r
+----------------------------------\r
+\r
+The following information discusses the decryption steps\r
+required to support traditional PKWARE encryption. This\r
+form of encryption is considered weak by today's standards\r
+and its use is recommended only for situations with\r
+low security needs or for compatibility with older .ZIP \r
+applications.\r
+\r
+Decryption\r
+----------\r
+\r
+PKWARE is grateful to Mr. Roger Schlafly for his expert contribution \r
+towards the development of PKWARE's traditional encryption.\r
+\r
+PKZIP encrypts the compressed data stream. Encrypted files must\r
+be decrypted before they can be extracted.\r
+\r
+Each encrypted file has an extra 12 bytes stored at the start of\r
+the data area defining the encryption header for that file. The\r
+encryption header is originally set to random values, and then\r
+itself encrypted, using three, 32-bit keys. The key values are\r
+initialized using the supplied encryption password. After each byte\r
+is encrypted, the keys are then updated using pseudo-random number\r
+generation techniques in combination with the same CRC-32 algorithm\r
+used in PKZIP and described elsewhere in this document.\r
+\r
+The following is the basic steps required to decrypt a file:\r
+\r
+1) Initialize the three 32-bit keys with the password.\r
+2) Read and decrypt the 12-byte encryption header, further\r
+ initializing the encryption keys.\r
+3) Read and decrypt the compressed data stream using the\r
+ encryption keys.\r
+\r
+Step 1 - Initializing the encryption keys\r
+-----------------------------------------\r
+\r
+Key(0) <- 305419896\r
+Key(1) <- 591751049\r
+Key(2) <- 878082192\r
+\r
+loop for i <- 0 to length(password)-1\r
+ update_keys(password(i))\r
+end loop\r
+\r
+Where update_keys() is defined as:\r
+\r
+update_keys(char):\r
+ Key(0) <- crc32(key(0),char)\r
+ Key(1) <- Key(1) + (Key(0) & 000000ffH)\r
+ Key(1) <- Key(1) * 134775813 + 1\r
+ Key(2) <- crc32(key(2),key(1) >> 24)\r
+end update_keys\r
+\r
+Where crc32(old_crc,char) is a routine that given a CRC value and a\r
+character, returns an updated CRC value after applying the CRC-32\r
+algorithm described elsewhere in this document.\r
+\r
+Step 2 - Decrypting the encryption header\r
+-----------------------------------------\r
+\r
+The purpose of this step is to further initialize the encryption\r
+keys, based on random data, to render a plaintext attack on the\r
+data ineffective.\r
+\r
+Read the 12-byte encryption header into Buffer, in locations\r
+Buffer(0) thru Buffer(11).\r
+\r
+loop for i <- 0 to 11\r
+ C <- buffer(i) ^ decrypt_byte()\r
+ update_keys(C)\r
+ buffer(i) <- C\r
+end loop\r
+\r
+Where decrypt_byte() is defined as:\r
+\r
+unsigned char decrypt_byte()\r
+ local unsigned short temp\r
+ temp <- Key(2) | 2\r
+ decrypt_byte <- (temp * (temp ^ 1)) >> 8\r
+end decrypt_byte\r
+\r
+After the header is decrypted, the last 1 or 2 bytes in Buffer\r
+should be the high-order word/byte of the CRC for the file being\r
+decrypted, stored in Intel low-byte/high-byte order. Versions of\r
+PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is\r
+used on versions after 2.0. This can be used to test if the password\r
+supplied is correct or not.\r
+\r
+Step 3 - Decrypting the compressed data stream\r
+----------------------------------------------\r
+\r
+The compressed data stream can be decrypted as follows:\r
+\r
+loop until done\r
+ read a character into C\r
+ Temp <- C ^ decrypt_byte()\r
+ update_keys(temp)\r
+ output Temp\r
+end loop\r
+\r
+\r
+VIII. Strong Encryption Specification\r
+-------------------------------------\r
+\r
+The Strong Encryption technology defined in this specification is \r
+covered under a pending patent application. The use or implementation\r
+in a product of certain technological aspects set forth in the current\r
+APPNOTE, including those with regard to strong encryption, patching, \r
+or extended tape operations requires a license from PKWARE. Portions\r
+of this Strong Encryption technology are available for use at no charge.\r
+Contact PKWARE for licensing terms and conditions. Refer to section II\r
+of this APPNOTE (Contacting PKWARE) for information on how to \r
+contact PKWARE. \r
+\r
+Version 5.x of this specification introduced support for strong \r
+encryption algorithms. These algorithms can be used with either \r
+a password or an X.509v3 digital certificate to encrypt each file. \r
+This format specification supports either password or certificate \r
+based encryption to meet the security needs of today, to enable \r
+interoperability between users within both PKI and non-PKI \r
+environments, and to ensure interoperability between different \r
+computing platforms that are running a ZIP program. \r
+\r
+Password based encryption is the most common form of encryption \r
+people are familiar with. However, inherent weaknesses with \r
+passwords (e.g. susceptibility to dictionary/brute force attack) \r
+as well as password management and support issues make certificate \r
+based encryption a more secure and scalable option. Industry \r
+efforts and support are defining and moving towards more advanced \r
+security solutions built around X.509v3 digital certificates and \r
+Public Key Infrastructures(PKI) because of the greater scalability, \r
+administrative options, and more robust security over traditional \r
+password based encryption. \r
+\r
+Most standard encryption algorithms are supported with this\r
+specification. Reference implementations for many of these \r
+algorithms are available from either commercial or open source \r
+distributors. Readily available cryptographic toolkits make\r
+implementation of the encryption features straight-forward. \r
+This document is not intended to provide a treatise on data \r
+encryption principles or theory. Its purpose is to document the \r
+data structures required for implementing interoperable data \r
+encryption within the .ZIP format. It is strongly recommended that \r
+you have a good understanding of data encryption before reading \r
+further.\r
+\r
+The algorithms introduced in Version 5.0 of this specification \r
+include:\r
+\r
+ RC2 40 bit, 64 bit, and 128 bit\r
+ RC4 40 bit, 64 bit, and 128 bit\r
+ DES\r
+ 3DES 112 bit and 168 bit\r
+ \r
+Version 5.1 adds support for the following:\r
+\r
+ AES 128 bit, 192 bit, and 256 bit\r
+\r
+\r
+Version 6.1 introduces encryption data changes to support \r
+interoperability with Smartcard and USB Token certificate storage \r
+methods which do not support the OAEP strengthening standard.\r
+\r
+Version 6.2 introduces support for encrypting metadata by compressing \r
+and encrypting the central directory data structure to reduce information \r
+leakage. Information leakage can occur in legacy ZIP applications \r
+through exposure of information about a file even though that file is \r
+stored encrypted. The information exposed consists of file \r
+characteristics stored within the records and fields defined by this \r
+specification. This includes data such as a files name, its original \r
+size, timestamp and CRC32 value. \r
+\r
+Version 6.3 introduces support for encrypting data using the Blowfish\r
+and Twofish algorithms. These are symmetric block ciphers developed \r
+by Bruce Schneier. Blowfish supports using a variable length key from \r
+32 to 448 bits. Block size is 64 bits. Implementations should use 16\r
+rounds and the only mode supported within ZIP files is CBC. Twofish \r
+supports key sizes 128, 192 and 256 bits. Block size is 128 bits. \r
+Implementations should use 16 rounds and the only mode supported within\r
+ZIP files is CBC. Information and source code for both Blowfish and \r
+Twofish algorithms can be found on the internet. Consult with the author\r
+of these algorithms for information on terms or restrictions on use.\r
+\r
+Central Directory Encryption provides greater protection against \r
+information leakage by encrypting the Central Directory structure and \r
+by masking key values that are replicated in the unencrypted Local \r
+Header. ZIP compatible programs that cannot interpret an encrypted \r
+Central Directory structure cannot rely on the data in the corresponding \r
+Local Header for decompression information. \r
+\r
+Extra Field records that may contain information about a file that should \r
+not be exposed should not be stored in the Local Header and should only \r
+be written to the Central Directory where they can be encrypted. This \r
+design currently does not support streaming. Information in the End of \r
+Central Directory record, the Zip64 End of Central Directory Locator, \r
+and the Zip64 End of Central Directory records are not encrypted. Access \r
+to view data on files within a ZIP file with an encrypted Central Directory\r
+requires the appropriate password or private key for decryption prior to \r
+viewing any files, or any information about the files, in the archive. \r
+\r
+Older ZIP compatible programs not familiar with the Central Directory \r
+Encryption feature will no longer be able to recognize the Central \r
+Directory and may assume the ZIP file is corrupt. Programs that \r
+attempt streaming access using Local Headers will see invalid \r
+information for each file. Central Directory Encryption need not be \r
+used for every ZIP file. Its use is recommended for greater security. \r
+ZIP files not using Central Directory Encryption should operate as \r
+in the past. \r
+\r
+This strong encryption feature specification is intended to provide for \r
+scalable, cross-platform encryption needs ranging from simple password\r
+encryption to authenticated public/private key encryption. \r
+\r
+Encryption provides data confidentiality and privacy. It is \r
+recommended that you combine X.509 digital signing with encryption \r
+to add authentication and non-repudiation.\r
+\r
+\r
+Single Password Symmetric Encryption Method:\r
+-------------------------------------------\r
+\r
+The Single Password Symmetric Encryption Method using strong \r
+encryption algorithms operates similarly to the traditional \r
+PKWARE encryption defined in this format. Additional data \r
+structures are added to support the processing needs of the \r
+strong algorithms.\r
+\r
+The Strong Encryption data structures are:\r
+\r
+1. General Purpose Bits - Bits 0 and 6 of the General Purpose bit \r
+flag in both local and central header records. Both bits set \r
+indicates strong encryption. Bit 13, when set indicates the Central\r
+Directory is encrypted and that selected fields in the Local Header\r
+are masked to hide their actual value.\r
+\r
+\r
+2. Extra Field 0x0017 in central header only.\r
+\r
+ Fields to consider in this record are:\r
+\r
+ Format - the data format identifier for this record. The only\r
+ value allowed at this time is the integer value 2.\r
+\r
+ AlgId - integer identifier of the encryption algorithm from the\r
+ following range\r
+\r
+ 0x6601 - DES\r
+ 0x6602 - RC2 (version needed to extract < 5.2)\r
+ 0x6603 - 3DES 168\r
+ 0x6609 - 3DES 112\r
+ 0x660E - AES 128 \r
+ 0x660F - AES 192 \r
+ 0x6610 - AES 256 \r
+ 0x6702 - RC2 (version needed to extract >= 5.2)\r
+ 0x6720 - Blowfish\r
+ 0x6721 - Twofish\r
+ 0x6801 - RC4\r
+ 0xFFFF - Unknown algorithm\r
+\r
+ Bitlen - Explicit bit length of key\r
+\r
+ 32 - 448 bits\r
+ \r
+ Flags - Processing flags needed for decryption\r
+\r
+ 0x0001 - Password is required to decrypt\r
+ 0x0002 - Certificates only\r
+ 0x0003 - Password or certificate required to decrypt\r
+\r
+ Values > 0x0003 reserved for certificate processing\r
+\r
+\r
+3. Decryption header record preceding compressed file data.\r
+\r
+ -Decryption Header:\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ IVSize 2 bytes Size of initialization vector (IV)\r
+ IVData IVSize Initialization vector for this file\r
+ Size 4 bytes Size of remaining decryption header data\r
+ Format 2 bytes Format definition for this record\r
+ AlgID 2 bytes Encryption algorithm identifier\r
+ Bitlen 2 bytes Bit length of encryption key\r
+ Flags 2 bytes Processing flags\r
+ ErdSize 2 bytes Size of Encrypted Random Data\r
+ ErdData ErdSize Encrypted Random Data\r
+ Reserved1 4 bytes Reserved certificate processing data\r
+ Reserved2 (var) Reserved for certificate processing data\r
+ VSize 2 bytes Size of password validation data\r
+ VData VSize-4 Password validation data\r
+ VCRC32 4 bytes Standard ZIP CRC32 of password validation data\r
+\r
+ IVData - The size of the IV should match the algorithm block size.\r
+ The IVData can be completely random data. If the size of\r
+ the randomly generated data does not match the block size\r
+ it should be complemented with zero's or truncated as\r
+ necessary. If IVSize is 0,then IV = CRC32 + Uncompressed\r
+ File Size (as a 64 bit little-endian, unsigned integer value).\r
+\r
+ Format - the data format identifier for this record. The only\r
+ value allowed at this time is the integer value 3.\r
+\r
+ AlgId - integer identifier of the encryption algorithm from the\r
+ following range\r
+\r
+ 0x6601 - DES\r
+ 0x6602 - RC2 (version needed to extract < 5.2)\r
+ 0x6603 - 3DES 168\r
+ 0x6609 - 3DES 112\r
+ 0x660E - AES 128 \r
+ 0x660F - AES 192 \r
+ 0x6610 - AES 256 \r
+ 0x6702 - RC2 (version needed to extract >= 5.2)\r
+ 0x6720 - Blowfish\r
+ 0x6721 - Twofish\r
+ 0x6801 - RC4\r
+ 0xFFFF - Unknown algorithm\r
+\r
+ Bitlen - Explicit bit length of key\r
+\r
+ 32 - 448 bits\r
+ \r
+ Flags - Processing flags needed for decryption\r
+\r
+ 0x0001 - Password is required to decrypt\r
+ 0x0002 - Certificates only\r
+ 0x0003 - Password or certificate required to decrypt\r
+\r
+ Values > 0x0003 reserved for certificate processing\r
+\r
+ ErdData - Encrypted random data is used to store random data that\r
+ is used to generate a file session key for encrypting \r
+ each file. SHA1 is used to calculate hash data used to \r
+ derive keys. File session keys are derived from a master \r
+ session key generated from the user-supplied password.\r
+ If the Flags field in the decryption header contains \r
+ the value 0x4000, then the ErdData field must be \r
+ decrypted using 3DES. If the value 0x4000 is not set,\r
+ then the ErdData field must be decrypted using AlgId.\r
+\r
+\r
+ Reserved1 - Reserved for certificate processing, if value is\r
+ zero, then Reserved2 data is absent. See the explanation\r
+ under the Certificate Processing Method for details on\r
+ this data structure.\r
+\r
+ Reserved2 - If present, the size of the Reserved2 data structure \r
+ is located by skipping the first 4 bytes of this field \r
+ and using the next 2 bytes as the remaining size. See\r
+ the explanation under the Certificate Processing Method\r
+ for details on this data structure.\r
+\r
+ VSize - This size value will always include the 4 bytes of the\r
+ VCRC32 data and will be greater than 4 bytes.\r
+\r
+ VData - Random data for password validation. This data is VSize\r
+ in length and VSize must be a multiple of the encryption\r
+ block size. VCRC32 is a checksum value of VData. \r
+ VData and VCRC32 are stored encrypted and start the\r
+ stream of encrypted data for a file.\r
+\r
+\r
+4. Useful Tips\r
+\r
+Strong Encryption is always applied to a file after compression. The\r
+block oriented algorithms all operate in Cypher Block Chaining (CBC) \r
+mode. The block size used for AES encryption is 16. All other block\r
+algorithms use a block size of 8. Two ID's are defined for RC2 to \r
+account for a discrepancy found in the implementation of the RC2\r
+algorithm in the cryptographic library on Windows XP SP1 and all \r
+earlier versions of Windows. It is recommended that zero length files\r
+not be encrypted, however programs should be prepared to extract them\r
+if they are found within a ZIP file.\r
+\r
+A pseudo-code representation of the encryption process is as follows:\r
+\r
+Password = GetUserPassword()\r
+MasterSessionKey = DeriveKey(SHA1(Password)) \r
+RD = CryptographicStrengthRandomData() \r
+For Each File\r
+ IV = CryptographicStrengthRandomData() \r
+ VData = CryptographicStrengthRandomData()\r
+ VCRC32 = CRC32(VData)\r
+ FileSessionKey = DeriveKey(SHA1(IV + RD) \r
+ ErdData = Encrypt(RD,MasterSessionKey,IV) \r
+ Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)\r
+Done\r
+\r
+The function names and parameter requirements will depend on\r
+the choice of the cryptographic toolkit selected. Almost any\r
+toolkit supporting the reference implementations for each\r
+algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft\r
+CryptoAPI libraries are all known to work well. \r
+\r
+\r
+Single Password - Central Directory Encryption:\r
+-----------------------------------------------\r
+\r
+Central Directory Encryption is achieved within the .ZIP format by \r
+encrypting the Central Directory structure. This encapsulates the metadata \r
+most often used for processing .ZIP files. Additional metadata is stored for \r
+redundancy in the Local Header for each file. The process of concealing \r
+metadata by encrypting the Central Directory does not protect the data within \r
+the Local Header. To avoid information leakage from the exposed metadata \r
+in the Local Header, the fields containing information about a file are masked. \r
+\r
+Local Header:\r
+\r
+Masking replaces the true content of the fields for a file in the Local \r
+Header with false information. When masked, the Local Header is not \r
+suitable for streaming access and the options for data recovery of damaged\r
+archives is reduced. Extra Data fields that may contain confidential\r
+data should not be stored within the Local Header. The value set into\r
+the Version needed to extract field should be the correct value needed to\r
+extract the file without regard to Central Directory Encryption. The fields \r
+within the Local Header targeted for masking when the Central Directory is \r
+encrypted are:\r
+\r
+ Field Name Mask Value\r
+ ------------------ ---------------------------\r
+ compression method 0\r
+ last mod file time 0\r
+ last mod file date 0\r
+ crc-32 0\r
+ compressed size 0\r
+ uncompressed size 0\r
+ file name (variable size) Base 16 value from the\r
+ range 1 - 0xFFFFFFFFFFFFFFFF\r
+ represented as a string whose\r
+ size will be set into the\r
+ file name length field\r
+\r
+The Base 16 value assigned as a masked file name is simply a sequentially\r
+incremented value for each file starting with 1 for the first file. \r
+Modifications to a ZIP file may cause different values to be stored for \r
+each file. For compatibility, the file name field in the Local Header \r
+should never be left blank. As of Version 6.2 of this specification, \r
+the Compression Method and Compressed Size fields are not yet masked.\r
+Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format\r
+should not be masked. \r
+\r
+Encrypting the Central Directory:\r
+\r
+Encryption of the Central Directory does not include encryption of the \r
+Central Directory Signature data, the Zip64 End of Central Directory\r
+record, the Zip64 End of Central Directory Locator, or the End\r
+of Central Directory record. The ZIP file comment data is never\r
+encrypted.\r
+\r
+Before encrypting the Central Directory, it may optionally be compressed.\r
+Compression is not required, but for storage efficiency it is assumed\r
+this structure will be compressed before encrypting. Similarly, this \r
+specification supports compressing the Central Directory without\r
+requiring that it also be encrypted. Early implementations of this\r
+feature will assume the encryption method applied to files matches the \r
+encryption applied to the Central Directory.\r
+\r
+Encryption of the Central Directory is done in a manner similar to\r
+that of file encryption. The encrypted data is preceded by a \r
+decryption header. The decryption header is known as the Archive\r
+Decryption Header. The fields of this record are identical to\r
+the decryption header preceding each encrypted file. The location\r
+of the Archive Decryption Header is determined by the value in the\r
+Start of the Central Directory field in the Zip64 End of Central\r
+Directory record. When the Central Directory is encrypted, the\r
+Zip64 End of Central Directory record will always be present.\r
+\r
+The layout of the Zip64 End of Central Directory record for all\r
+versions starting with 6.2 of this specification will follow the\r
+Version 2 format. The Version 2 format is as follows:\r
+\r
+The leading fixed size fields within the Version 1 format for this\r
+record remain unchanged. The record signature for both Version 1 \r
+and Version 2 will be 0x06064b50. Immediately following the last\r
+byte of the field known as the Offset of Start of Central \r
+Directory With Respect to the Starting Disk Number will begin the \r
+new fields defining Version 2 of this record. \r
+\r
+New fields for Version 2:\r
+\r
+Note: all fields stored in Intel low-byte/high-byte order.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ Compression Method 2 bytes Method used to compress the\r
+ Central Directory\r
+ Compressed Size 8 bytes Size of the compressed data\r
+ Original Size 8 bytes Original uncompressed size\r
+ AlgId 2 bytes Encryption algorithm ID\r
+ BitLen 2 bytes Encryption key length\r
+ Flags 2 bytes Encryption flags\r
+ HashID 2 bytes Hash algorithm identifier\r
+ Hash Length 2 bytes Length of hash data\r
+ Hash Data (variable) Hash data\r
+\r
+The Compression Method accepts the same range of values as the \r
+corresponding field in the Central Header.\r
+\r
+The Compressed Size and Original Size values will not include the\r
+data of the Central Directory Signature which is compressed or\r
+encrypted.\r
+\r
+The AlgId, BitLen, and Flags fields accept the same range of values\r
+the corresponding fields within the 0x0017 record. \r
+\r
+Hash ID identifies the algorithm used to hash the Central Directory \r
+data. This data does not have to be hashed, in which case the\r
+values for both the HashID and Hash Length will be 0. Possible \r
+values for HashID are:\r
+\r
+ Value Algorithm\r
+ ------ ---------\r
+ 0x0000 none\r
+ 0x0001 CRC32\r
+ 0x8003 MD5\r
+ 0x8004 SHA1\r
+ 0x8007 RIPEMD160\r
+ 0x800C SHA256\r
+ 0x800D SHA384\r
+ 0x800E SHA512\r
+\r
+When the Central Directory data is signed, the same hash algorithm\r
+used to hash the Central Directory for signing should be used.\r
+This is recommended for processing efficiency, however, it is \r
+permissible for any of the above algorithms to be used independent \r
+of the signing process.\r
+\r
+The Hash Data will contain the hash data for the Central Directory.\r
+The length of this data will vary depending on the algorithm used.\r
+\r
+The Version Needed to Extract should be set to 62.\r
+\r
+The value for the Total Number of Entries on the Current Disk will\r
+be 0. These records will no longer support random access when\r
+encrypting the Central Directory.\r
+\r
+When the Central Directory is compressed and/or encrypted, the\r
+End of Central Directory record will store the value 0xFFFFFFFF\r
+as the value for the Total Number of Entries in the Central\r
+Directory. The value stored in the Total Number of Entries in\r
+the Central Directory on this Disk field will be 0. The actual\r
+values will be stored in the equivalent fields of the Zip64\r
+End of Central Directory record.\r
+\r
+Decrypting and decompressing the Central Directory is accomplished\r
+in the same manner as decrypting and decompressing a file.\r
+\r
+Certificate Processing Method:\r
+-----------------------------\r
+\r
+The Certificate Processing Method of for ZIP file encryption \r
+defines the following additional data fields:\r
+\r
+1. Certificate Flag Values\r
+\r
+Additional processing flags that can be present in the Flags field of both \r
+the 0x0017 field of the central directory Extra Field and the Decryption \r
+header record preceding compressed file data are:\r
+\r
+ 0x0007 - reserved for future use\r
+ 0x000F - reserved for future use\r
+ 0x0100 - Indicates non-OAEP key wrapping was used. If this\r
+ this field is set, the version needed to extract must\r
+ be at least 61. This means OAEP key wrapping is not\r
+ used when generating a Master Session Key using\r
+ ErdData.\r
+ 0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the\r
+ same algorithm used for encrypting the file contents.\r
+ 0x8000 - reserved for future use\r
+\r
+\r
+2. CertData - Extra Field 0x0017 record certificate data structure\r
+\r
+The data structure used to store certificate data within the section\r
+of the Extra Field defined by the CertData field of the 0x0017\r
+record are as shown:\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ RCount 4 bytes Number of recipients. \r
+ HashAlg 2 bytes Hash algorithm identifier\r
+ HSize 2 bytes Hash size\r
+ SRList (var) Simple list of recipients hashed public keys\r
+\r
+ \r
+ RCount This defines the number intended recipients whose \r
+ public keys were used for encryption. This identifies\r
+ the number of elements in the SRList.\r
+\r
+ HashAlg This defines the hash algorithm used to calculate\r
+ the public key hash of each public key used\r
+ for encryption. This field currently supports\r
+ only the following value for SHA-1\r
+\r
+ 0x8004 - SHA1\r
+\r
+ HSize This defines the size of a hashed public key.\r
+\r
+ SRList This is a variable length list of the hashed \r
+ public keys for each intended recipient. Each \r
+ element in this list is HSize. The total size of \r
+ SRList is determined using RCount * HSize.\r
+\r
+\r
+3. Reserved1 - Certificate Decryption Header Reserved1 Data:\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ RCount 4 bytes Number of recipients. \r
+ \r
+ RCount This defines the number intended recipients whose \r
+ public keys were used for encryption. This defines\r
+ the number of elements in the REList field defined below.\r
+\r
+\r
+4. Reserved2 - Certificate Decryption Header Reserved2 Data Structures:\r
+\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ HashAlg 2 bytes Hash algorithm identifier\r
+ HSize 2 bytes Hash size\r
+ REList (var) List of recipient data elements\r
+\r
+\r
+ HashAlg This defines the hash algorithm used to calculate\r
+ the public key hash of each public key used\r
+ for encryption. This field currently supports\r
+ only the following value for SHA-1\r
+\r
+ 0x8004 - SHA1\r
+\r
+ HSize This defines the size of a hashed public key\r
+ defined in REHData.\r
+\r
+ REList This is a variable length of list of recipient data. \r
+ Each element in this list consists of a Recipient\r
+ Element data structure as follows:\r
+\r
+\r
+ Recipient Element (REList) Data Structure:\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ RESize 2 bytes Size of REHData + REKData\r
+ REHData HSize Hash of recipients public key\r
+ REKData (var) Simple key blob\r
+\r
+\r
+ RESize This defines the size of an individual REList \r
+ element. This value is the combined size of the\r
+ REHData field + REKData field. REHData is defined by\r
+ HSize. REKData is variable and can be calculated\r
+ for each REList element using RESize and HSize.\r
+\r
+ REHData Hashed public key for this recipient.\r
+\r
+ REKData Simple Key Blob. The format of this data structure\r
+ is identical to that defined in the Microsoft\r
+ CryptoAPI and generated using the CryptExportKey()\r
+ function. The version of the Simple Key Blob\r
+ supported at this time is 0x02 as defined by\r
+ Microsoft.\r
+\r
+Certificate Processing - Central Directory Encryption:\r
+------------------------------------------------------\r
+\r
+Central Directory Encryption using Digital Certificates will \r
+operate in a manner similar to that of Single Password Central\r
+Directory Encryption. This record will only be present when there \r
+is data to place into it. Currently, data is placed into this\r
+record when digital certificates are used for either encrypting \r
+or signing the files within a ZIP file. When only password \r
+encryption is used with no certificate encryption or digital \r
+signing, this record is not currently needed. When present, this \r
+record will appear before the start of the actual Central Directory \r
+data structure and will be located immediately after the Archive \r
+Decryption Header if the Central Directory is encrypted.\r
+\r
+The Archive Extra Data record will be used to store the following\r
+information. Additional data may be added in future versions.\r
+\r
+Extra Data Fields:\r
+\r
+0x0014 - PKCS#7 Store for X.509 Certificates\r
+0x0016 - X.509 Certificate ID and Signature for central directory\r
+0x0019 - PKCS#7 Encryption Recipient Certificate List\r
+\r
+The 0x0014 and 0x0016 Extra Data records that otherwise would be \r
+located in the first record of the Central Directory for digital \r
+certificate processing. When encrypting or compressing the Central \r
+Directory, the 0x0014 and 0x0016 records must be located in the \r
+Archive Extra Data record and they should not remain in the first \r
+Central Directory record. The Archive Extra Data record will also \r
+be used to store the 0x0019 data. \r
+\r
+When present, the size of the Archive Extra Data record will be\r
+included in the size of the Central Directory. The data of the\r
+Archive Extra Data record will also be compressed and encrypted\r
+along with the Central Directory data structure.\r
+\r
+Certificate Processing Differences:\r
+\r
+The Certificate Processing Method of encryption differs from the\r
+Single Password Symmetric Encryption Method as follows. Instead\r
+of using a user-defined password to generate a master session key,\r
+cryptographically random data is used. The key material is then\r
+wrapped using standard key-wrapping techniques. This key material\r
+is wrapped using the public key of each recipient that will need\r
+to decrypt the file using their corresponding private key.\r
+\r
+This specification currently assumes digital certificates will follow\r
+the X.509 V3 format for 1024 bit and higher RSA format digital\r
+certificates. Implementation of this Certificate Processing Method\r
+requires supporting logic for key access and management. This logic\r
+is outside the scope of this specification.\r
+\r
+OAEP Processing with Certificate-based Encryption:\r
+\r
+OAEP stands for Optimal Asymmetric Encryption Padding. It is a\r
+strengthening technique used for small encoded items such as decryption\r
+keys. This is commonly applied in cryptographic key-wrapping techniques\r
+and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification \r
+were designed to support OAEP key-wrapping for certificate-based \r
+decryption keys for additional security. \r
+\r
+Support for private keys stored on Smartcards or Tokens introduced\r
+a conflict with this OAEP logic. Most card and token products do \r
+not support the additional strengthening applied to OAEP key-wrapped \r
+data. In order to resolve this conflict, versions 6.1 and above of this \r
+specification will no longer support OAEP when encrypting using \r
+digital certificates. \r
+\r
+Versions of PKZIP available during initial development of the \r
+certificate processing method set a value of 61 into the \r
+version needed to extract field for a file. This indicates that \r
+non-OAEP key wrapping is used. This affects certificate encryption \r
+only, and password encryption functions should not be affected by \r
+this value. This means values of 61 may be found on files encrypted\r
+with certificates only, or on files encrypted with both password\r
+encryption and certificate encryption. Files encrypted with both\r
+methods can safely be decrypted using the password methods documented.\r
+\r
+IX. Change Process\r
+------------------\r
+\r
+In order for the .ZIP file format to remain a viable definition, this\r
+specification should be considered as open for periodic review and\r
+revision. Although this format was originally designed with a \r
+certain level of extensibility, not all changes in technology\r
+(present or future) were or will be necessarily considered in its\r
+design. If your application requires new definitions to the\r
+extensible sections in this format, or if you would like to \r
+submit new data structures, please forward your request to\r
+zipformat@pkware.com. All submissions will be reviewed by the\r
+ZIP File Specification Committee for possible inclusion into\r
+future versions of this specification. Periodic revisions\r
+to this specification will be published to ensure interoperability. \r
+We encourage comments and feedback that may help improve clarity \r
+or content.\r
+\r
+X. Incorporating PKWARE Proprietary Technology into Your Product\r
+----------------------------------------------------------------\r
+\r
+PKWARE is committed to the interoperability and advancement of the\r
+.ZIP format. PKWARE offers a free license for certain technological\r
+aspects described above under certain restrictions and conditions.\r
+However, the use or implementation in a product of certain technological\r
+aspects set forth in the current APPNOTE, including those with regard to\r
+strong encryption, patching, or extended tape operations requires a \r
+license from PKWARE. Please contact PKWARE with regard to acquiring\r
+a license.\r
+\r
+XI. Acknowledgements\r
+---------------------\r
+\r
+In addition to the above mentioned contributors to PKZIP and PKUNZIP,\r
+I would like to extend special thanks to Robert Mahoney for suggesting\r
+the extension .ZIP for this software.\r
+\r
+XII. References\r
+---------------\r
+\r
+ Fiala, Edward R., and Greene, Daniel H., "Data compression with\r
+ finite windows", Communications of the ACM, Volume 32, Number 4,\r
+ April 1989, pages 490-505.\r
+\r
+ Held, Gilbert, "Data Compression, Techniques and Applications,\r
+ Hardware and Software Considerations", John Wiley & Sons, 1987.\r
+\r
+ Huffman, D.A., "A method for the construction of minimum-redundancy\r
+ codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,\r
+ pages 1098-1101.\r
+\r
+ Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,\r
+ Number 10, October 1989, pages 29-37.\r
+\r
+ Nelson, Mark, "The Data Compression Book", M&T Books, 1991.\r
+\r
+ Storer, James A., "Data Compression, Methods and Theory",\r
+ Computer Science Press, 1988\r
+\r
+ Welch, Terry, "A Technique for High-Performance Data Compression",\r
+ IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.\r
+\r
+ Ziv, J. and Lempel, A., "A universal algorithm for sequential data\r
+ compression", Communications of the ACM, Volume 30, Number 6,\r
+ June 1987, pages 520-540.\r
+\r
+ Ziv, J. and Lempel, A., "Compression of individual sequences via\r
+ variable-rate coding", IEEE Transactions on Information Theory,\r
+ Volume 24, Number 5, September 1978, pages 530-536.\r
+\r
+\r
+APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions\r
+--------------------------------------------------------------\r
+\r
+Field Definition Structure:\r
+\r
+ a. field length including length 2 bytes\r
+ b. field code 2 bytes\r
+ c. data x bytes\r
+\r
+Field Code Description\r
+ 4001 Source type i.e. CLP etc\r
+ 4002 The text description of the library \r
+ 4003 The text description of the file\r
+ 4004 The text description of the member\r
+ 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC\r
+ 4007 Database Type Code 1 byte\r
+ 4008 Database file and fields definition\r
+ 4009 GZIP file type 2 bytes\r
+ 400B IFS code page 2 bytes\r
+ 400C IFS Creation Time 4 bytes\r
+ 400D IFS Access Time 4 bytes\r
+ 400E IFS Modification time 4 bytes\r
+ 005C Length of the records in the file 2 bytes\r
+ 0068 GZIP two words 8 bytes\r
+\r
+APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions\r
+------------------------------------------------------------\r
+\r
+Field Definition Structure:\r
+\r
+ a. field length including length 2 bytes\r
+ b. field code 2 bytes\r
+ c. data x bytes\r
+\r
+Field Code Description\r
+ 0001 File Type 2 bytes \r
+ 0002 NonVSAM Record Format 1 byte\r
+ 0003 Reserved \r
+ 0004 NonVSAM Block Size 2 bytes Big Endian\r
+ 0005 Primary Space Allocation 3 bytes Big Endian\r
+ 0006 Secondary Space Allocation 3 bytes Big Endian\r
+ 0007 Space Allocation Type1 byte flag \r
+ 0008 Modification Date Retired with PKZIP 5.0 +\r
+ 0009 Expiration Date Retired with PKZIP 5.0 +\r
+ 000A PDS Directory Block Allocation 3 bytes Big Endian binary value\r
+ 000B NonVSAM Volume List variable \r
+ 000C UNIT Reference Retired with PKZIP 5.0 +\r
+ 000D DF/SMS Management Class 8 bytes EBCDIC Text Value\r
+ 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value\r
+ 000F DF/SMS Data Class 8 bytes EBCDIC Text Value\r
+ 0010 PDS/PDSE Member Info. 30 bytes \r
+ 0011 VSAM sub-filetype 2 bytes \r
+ 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)"\r
+ 0013 VSAM Cluster Name Retired with PKZIP 5.0 +\r
+ 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)"\r
+ 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks\r
+ 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks\r
+ 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks\r
+ 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks\r
+ 0019 VSAM Data Name 1-44 bytes EBCDIC text string\r
+ 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string\r
+ 001B VSAM Catalog Name 1-44 bytes EBCDIC text string\r
+ 001C VSAM Data Space Type 9 bytes EBCDIC text string\r
+ 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified\r
+ 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified\r
+ 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs\r
+ 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified\r
+ 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified\r
+ 0022 VSAM Erase Flag 1 byte flag \r
+ 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified\r
+ 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified\r
+ 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs\r
+ 0026 VSAM Ordered Flag 1 byte flag \r
+ 0027 VSAM REUSE Flag 1 byte flag \r
+ 0028 VSAM SPANNED Flag 1 byte flag \r
+ 0029 VSAM Recovery Flag 1 byte flag \r
+ 002A VSAM WRITECHK Flag 1 byte flag \r
+ 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y" \r
+ 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y" \r
+ 002D VSAM Index Space Type 9 bytes EBCDIC text string\r
+ 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified\r
+ 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified\r
+ 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified\r
+ 0031 VSAM Index IMBED 1 byte flag \r
+ 0032 VSAM Index Ordered Flag 1 byte flag \r
+ 0033 VSAM REPLICATE Flag 1 byte flag \r
+ 0034 VSAM Index REUSE Flag 1 byte flag \r
+ 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 +\r
+ 0036 VSAM Owner 8 bytes EBCDIC text string\r
+ 0037 VSAM Index Owner 8 bytes EBCDIC text string\r
+ 0038 Reserved\r
+ 0039 Reserved\r
+ 003A Reserved\r
+ 003B Reserved\r
+ 003C Reserved\r
+ 003D Reserved\r
+ 003E Reserved\r
+ 003F Reserved\r
+ 0040 Reserved\r
+ 0041 Reserved\r
+ 0042 Reserved\r
+ 0043 Reserved\r
+ 0044 Reserved\r
+ 0045 Reserved\r
+ 0046 Reserved\r
+ 0047 Reserved\r
+ 0048 Reserved\r
+ 0049 Reserved\r
+ 004A Reserved\r
+ 004B Reserved\r
+ 004C Reserved\r
+ 004D Reserved\r
+ 004E Reserved\r
+ 004F Reserved\r
+ 0050 Reserved\r
+ 0051 Reserved\r
+ 0052 Reserved\r
+ 0053 Reserved\r
+ 0054 Reserved\r
+ 0055 Reserved\r
+ 0056 Reserved\r
+ 0057 Reserved\r
+ 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian\r
+ 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian\r
+ 005A PDS LMOD EP Rec # 4 bytes Big Endian\r
+ 005B Reserved\r
+ 005C Max Length of records 2 bytes Big Endian\r
+ 005D PDSE Flag 1 byte flag\r
+ 005E Reserved\r
+ 005F Reserved\r
+ 0060 Reserved\r
+ 0061 Reserved\r
+ 0062 Reserved\r
+ 0063 Reserved\r
+ 0064 Reserved\r
+ 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd"\r
+ 0066 Date Created 4 bytes Packed Hex "yyyymmdd"\r
+ 0068 GZIP two words 8 bytes\r
+ 0071 Extended NOTE Location 12 bytes Big Endian\r
+ 0072 Archive device UNIT 6 bytes EBCDIC\r
+ 0073 Archive 1st Volume 6 bytes EBCDIC\r
+ 0074 Archive 1st VOL File Seq# 2 bytes Binary\r
+\r
+APPENDIX C - Zip64 Extensible Data Sector Mappings (EFS)\r
+--------------------------------------------------------\r
+\r
+ -Z390 Extra Field:\r
+\r
+ The following is the general layout of the attributes for the \r
+ ZIP 64 "extra" block for extended tape operations. Portions of \r
+ this extended tape processing technology is covered under a \r
+ pending patent application. The use or implementation in a \r
+ product of certain technological aspects set forth in the \r
+ current APPNOTE, including those with regard to strong encryption,\r
+ patching or extended tape operations, requires a license from\r
+ PKWARE. Please contact PKWARE with regard to acquiring a license. \r
+ \r
+\r
+ Note: some fields stored in Big Endian format. All text is \r
+ in EBCDIC format unless otherwise specified.\r
+\r
+ Value Size Description\r
+ ----- ---- -----------\r
+ (Z390) 0x0065 2 bytes Tag for this "extra" block type\r
+ Size 4 bytes Size for the following data block\r
+ Tag 4 bytes EBCDIC "Z390"\r
+ Length71 2 bytes Big Endian\r
+ Subcode71 2 bytes Enote type code\r
+ FMEPos 1 byte\r
+ Length72 2 bytes Big Endian\r
+ Subcode72 2 bytes Unit type code\r
+ Unit 1 byte Unit\r
+ Length73 2 bytes Big Endian\r
+ Subcode73 2 bytes Volume1 type code\r
+ FirstVol 1 byte Volume\r
+ Length74 2 bytes Big Endian\r
+ Subcode74 2 bytes FirstVol file sequence\r
+ FileSeq 2 bytes Sequence \r
+\r
+APPENDIX D - Language Encoding (EFS)\r
+------------------------------------\r
+\r
+The ZIP format has historically supported only the original IBM PC character \r
+encoding set, commonly referred to as IBM Code Page 437. This limits storing \r
+file name characters to only those within the original MS-DOS range of values \r
+and does not properly support file names in other character encodings, or \r
+languages. To address this limitation, this specification will support the \r
+following change. \r
+\r
+If general purpose bit 11 is unset, the file name and comment should conform \r
+to the original ZIP character encoding. If general purpose bit 11 is set, the \r
+filename and comment must support The Unicode Standard, Version 4.1.0 or \r
+greater using the character encoding form defined by the UTF-8 storage \r
+specification. The Unicode Standard is published by the The Unicode\r
+Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files \r
+is expected to not include a byte order mark (BOM). \r
+\r
+Applications may choose to supplement this file name storage through the use \r
+of the 0x0008 Extra Field. Storage for this optional field is currently \r
+undefined, however it will be used to allow storing extended information \r
+on source or target encoding that may further assist applications with file \r
+name, or file content encoding tasks. Please contact PKWARE with any\r
+requirements on how this field should be used.\r
+\r
+The 0x0008 Extra Field storage may be used with either setting for general \r
+purpose bit 11. Examples of the intended usage for this field is to store \r
+whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other \r
+commonly used character encoding (code page) designations can be indicated \r
+through this field. Formalized values for use of the 0x0008 record remain \r
+undefined at this time. The definition for the layout of the 0x0008 field\r
+will be published when available. Use of the 0x0008 Extra Field provides\r
+for storing data within a ZIP file in an encoding other than IBM Code\r
+Page 437 or UTF-8.\r
+\r
+General purpose bit 11 will not imply any encoding of file content or\r
+password. Values defining character encoding for file content or \r
+password must be stored within the 0x0008 Extended Language Encoding \r
+Extra Field.\r
+\r
+Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records \r
+that can be used to store UTF-8 file name and file comment fields. These\r
+records can be used for cases when the general purpose bit 11 method\r
+for storing UTF-8 data in the standard file name and comment fields is\r
+not desirable. A common case for this alternate method is if backward\r
+compatibility with older programs is required.\r
+\r
+Definitions for the record structure of these fields are included above \r
+in the section on 3rd party mappings for "extra field" records. These\r
+records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment \r
+Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field).\r
+\r
+The choice of which storage method to use when writing a ZIP file is left\r
+to the implementation. Developers should expect that a ZIP file may \r
+contain either method and should provide support for reading data in \r
+either format. Use of general purpose bit 11 reduces storage requirements \r
+for file name data by not requiring additional "extra field" data for\r
+each file, but can result in older ZIP programs not being able to extract \r
+files. Use of the 0x6375 and 0x7075 records will result in a ZIP file \r
+that should always be readable by older ZIP programs, but requires more \r
+storage per file to write file name and/or file comment fields.\r
+\r
+ \r
+\r
+\r