1 File: APPNOTE.TXT - .ZIP File Format Specification
3 Revised: September 28, 2007
4 Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.
6 The use of certain technological aspects disclosed in the current
7 APPNOTE is available pursuant to the below section entitled
8 "Incorporating PKWARE Proprietary Technology into Your Product".
13 This specification is intended to define a cross-platform,
14 interoperable file storage and transfer format. Since its
15 first publication in 1989, PKWARE has remained committed to
16 ensuring the interoperability of the .ZIP file format through
17 publication and maintenance of this specification. We trust that
18 all .ZIP compatible vendors and application developers that have
19 adopted and benefited from this format will share and support
20 this commitment to interoperability.
26 648 N. Plankinton Avenue, Suite 220
35 Although PKWARE will attempt to supply current and accurate
36 information relating to its file formats, algorithms, and the
37 subject programs, the possibility of error or omission cannot
38 be eliminated. PKWARE therefore expressly disclaims any warranty
39 that the information contained in the associated materials relating
40 to the subject programs and/or the format of the files created or
41 accessed by the subject programs and/or the algorithms used by
42 the subject programs, or any other matter, is current, correct or
43 accurate as delivered. Any risk of damage due to any possible
44 inaccurate information is assumed by the user of the information.
45 Furthermore, the information relating to the subject programs
46 and/or the file formats created or accessed by the subject
47 programs and/or the algorithms used by the subject programs is
48 subject to change without notice.
50 If the version of this file is marked as a NOTIFICATION OF CHANGE,
51 the content defines an Early Feature Specification (EFS) change
52 to the .ZIP file format that may be subject to modification prior
53 to publication of the Final Feature Specification (FFS). This
54 document may also contain information on Planned Feature
55 Specifications (PFS) defining recognized future extensions.
60 Version Change Description Date
61 ------- ------------------ ----------
62 5.2 -Single Password Symmetric Encryption 06/02/2003
65 6.1.0 -Smartcard compatibility 01/20/2004
66 -Documentation on certificate storage
68 6.2.0 -Introduction of Central Directory 04/26/2004
69 Encryption for encrypting metadata
70 -Added OS/X to Version Made By values
72 6.2.1 -Added Extra Field placeholder for 04/01/2005
73 POSZIP using ID 0x4690
75 -Clarified size field on
76 "zip64 end of central directory record"
78 6.2.2 -Documented Final Feature Specification 01/06/2006
81 -Clarifications and typographical
84 6.3.0 -Added tape positioning storage 09/29/2006
87 -Expanded list of supported hash algorithms
89 -Expanded list of supported compression
92 -Expanded list of supported encryption
95 -Added option for Unicode filename
98 -Clarifications for consistent use
99 of Data Descriptor records
101 -Added additional "Extra Field"
104 6.3.1 -Corrected standard hash values for 04/11/2007
107 6.3.2 -Added compression method 97 09/28/2007
109 -Documented InfoZIP "Extra Field"
110 values for UTF-8 file name and
113 V. General Format of a .ZIP file
114 --------------------------------
116 Files stored in arbitrary order. Large .ZIP files can span multiple
117 volumes or be split into user-defined segment sizes. All values
118 are stored in little-endian byte order unless otherwise specified.
120 Overall .ZIP file format:
122 [local file header 1]
128 [local file header n]
131 [archive decryption header]
132 [archive extra data record]
134 [zip64 end of central directory record]
135 [zip64 end of central directory locator]
136 [end of central directory record]
139 A. Local file header:
141 local file header signature 4 bytes (0x04034b50)
142 version needed to extract 2 bytes
143 general purpose bit flag 2 bytes
144 compression method 2 bytes
145 last mod file time 2 bytes
146 last mod file date 2 bytes
148 compressed size 4 bytes
149 uncompressed size 4 bytes
150 file name length 2 bytes
151 extra field length 2 bytes
153 file name (variable size)
154 extra field (variable size)
158 Immediately following the local header for a file
159 is the compressed or stored data for the file.
160 The series of [local file header][file data][data
161 descriptor] repeats for each file in the .ZIP archive.
166 compressed size 4 bytes
167 uncompressed size 4 bytes
169 This descriptor exists only if bit 3 of the general
170 purpose bit flag is set (see below). It is byte aligned
171 and immediately follows the last byte of compressed data.
172 This descriptor is used only when it was not possible to
173 seek in the output .ZIP file, e.g., when the output .ZIP file
174 was standard output or a non-seekable device. For ZIP64(tm) format
175 archives, the compressed and uncompressed sizes are 8 bytes each.
177 When compressing files, compressed and uncompressed sizes
178 should be stored in ZIP64 format (as 8 byte values) when a
179 files size exceeds 0xFFFFFFFF. However ZIP64 format may be
180 used regardless of the size of a file. When extracting, if
181 the zip64 extended information extra field is present for
182 the file the compressed and uncompressed sizes will be 8
185 Although not originally assigned a signature, the value
186 0x08074b50 has commonly been adopted as a signature value
187 for the data descriptor record. Implementers should be
188 aware that ZIP files may be encountered with or without this
189 signature marking data descriptors and should account for
190 either case when reading ZIP files to ensure compatibility.
191 When writing ZIP files, it is recommended to include the
192 signature value marking the data descriptor record. When
193 the signature is used, the fields currently defined for
194 the data descriptor record will immediately follow the
197 An extensible data descriptor will be released in a future
198 version of this APPNOTE. This new record is intended to
199 resolve conflicts with the use of this record going forward,
200 and to provide better support for streamed file processing.
202 When the Central Directory Encryption method is used, the data
203 descriptor record is not required, but may be used. If present,
204 and bit 3 of the general purpose bit field is set to indicate
205 its presence, the values in fields of the data descriptor
206 record should be set to binary zeros.
208 D. Archive decryption header:
210 The Archive Decryption Header is introduced in version 6.2
211 of the ZIP format specification. This record exists in support
212 of the Central Directory Encryption Feature implemented as part of
213 the Strong Encryption Specification as described in this document.
214 When the Central Directory Structure is encrypted, this decryption
215 header will precede the encrypted data segment. The encrypted
216 data segment will consist of the Archive extra data record (if
217 present) and the encrypted Central Directory Structure data.
218 The format of this data record is identical to the Decryption
219 header record preceding compressed file data. If the central
220 directory structure is encrypted, the location of the start of
221 this data record is determined using the Start of Central Directory
222 field in the Zip64 End of Central Directory record. Refer to the
223 section on the Strong Encryption Specification for information
224 on the fields used in the Archive Decryption Header record.
227 E. Archive extra data record:
229 archive extra data signature 4 bytes (0x08064b50)
230 extra field length 4 bytes
231 extra field data (variable size)
233 The Archive Extra Data Record is introduced in version 6.2
234 of the ZIP format specification. This record exists in support
235 of the Central Directory Encryption Feature implemented as part of
236 the Strong Encryption Specification as described in this document.
237 When present, this record immediately precedes the central
238 directory data structure. The size of this data record will be
239 included in the Size of the Central Directory field in the
240 End of Central Directory record. If the central directory structure
241 is compressed, but not encrypted, the location of the start of
242 this data record is determined using the Start of Central Directory
243 field in the Zip64 End of Central Directory record.
246 F. Central directory structure:
257 central file header signature 4 bytes (0x02014b50)
258 version made by 2 bytes
259 version needed to extract 2 bytes
260 general purpose bit flag 2 bytes
261 compression method 2 bytes
262 last mod file time 2 bytes
263 last mod file date 2 bytes
265 compressed size 4 bytes
266 uncompressed size 4 bytes
267 file name length 2 bytes
268 extra field length 2 bytes
269 file comment length 2 bytes
270 disk number start 2 bytes
271 internal file attributes 2 bytes
272 external file attributes 4 bytes
273 relative offset of local header 4 bytes
275 file name (variable size)
276 extra field (variable size)
277 file comment (variable size)
281 header signature 4 bytes (0x05054b50)
283 signature data (variable size)
285 With the introduction of the Central Directory Encryption
286 feature in version 6.2 of this specification, the Central
287 Directory Structure may be stored both compressed and encrypted.
288 Although not required, it is assumed when encrypting the
289 Central Directory Structure, that it will be compressed
290 for greater storage efficiency. Information on the
291 Central Directory Encryption feature can be found in the section
292 describing the Strong Encryption Specification. The Digital
293 Signature record will be neither compressed nor encrypted.
295 G. Zip64 end of central directory record
297 zip64 end of central dir
298 signature 4 bytes (0x06064b50)
299 size of zip64 end of central
300 directory record 8 bytes
301 version made by 2 bytes
302 version needed to extract 2 bytes
303 number of this disk 4 bytes
304 number of the disk with the
305 start of the central directory 4 bytes
306 total number of entries in the
307 central directory on this disk 8 bytes
308 total number of entries in the
309 central directory 8 bytes
310 size of the central directory 8 bytes
311 offset of start of central
312 directory with respect to
313 the starting disk number 8 bytes
314 zip64 extensible data sector (variable size)
316 The value stored into the "size of zip64 end of central
317 directory record" should be the size of the remaining
318 record and should not include the leading 12 bytes.
320 Size = SizeOfFixedFields + SizeOfVariableData - 12.
322 The above record structure defines Version 1 of the
323 zip64 end of central directory record. Version 1 was
324 implemented in versions of this specification preceding
325 6.2 in support of the ZIP64 large file feature. The
326 introduction of the Central Directory Encryption feature
327 implemented in version 6.2 as part of the Strong Encryption
328 Specification defines Version 2 of this record structure.
329 Refer to the section describing the Strong Encryption
330 Specification for details on the version 2 format for
333 Special purpose data may reside in the zip64 extensible data
334 sector field following either a V1 or V2 version of this
335 record. To ensure identification of this special purpose data
336 it must include an identifying header block consisting of the
342 The Header ID field indicates the type of data that is in the
343 data block that follows.
345 Data Size identifies the number of bytes that follow for this
348 Multiple special purpose data blocks may be present, but each
349 must be preceded by a Header ID and Data Size field. Current
350 mappings of Header ID values supported in this field are as
351 defined in APPENDIX C.
353 H. Zip64 end of central directory locator
355 zip64 end of central dir locator
356 signature 4 bytes (0x07064b50)
357 number of the disk with the
358 start of the zip64 end of
359 central directory 4 bytes
360 relative offset of the zip64
361 end of central directory record 8 bytes
362 total number of disks 4 bytes
364 I. End of central directory record:
366 end of central dir signature 4 bytes (0x06054b50)
367 number of this disk 2 bytes
368 number of the disk with the
369 start of the central directory 2 bytes
370 total number of entries in the
371 central directory on this disk 2 bytes
372 total number of entries in
373 the central directory 2 bytes
374 size of the central directory 4 bytes
375 offset of start of central
376 directory with respect to
377 the starting disk number 4 bytes
378 .ZIP file comment length 2 bytes
379 .ZIP file comment (variable size)
381 J. Explanation of fields:
383 version made by (2 bytes)
385 The upper byte indicates the compatibility of the file
386 attribute information. If the external file attributes
387 are compatible with MS-DOS and can be read by PKZIP for
388 DOS version 2.04g then this value will be zero. If these
389 attributes are not compatible, then this value will
390 identify the host system on which the attributes are
391 compatible. Software can use this information to determine
392 the line record format for text files etc. The current
395 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
396 1 - Amiga 2 - OpenVMS
398 5 - Atari ST 6 - OS/2 H.P.F.S.
399 7 - Macintosh 8 - Z-System
400 9 - CP/M 10 - Windows NTFS
401 11 - MVS (OS/390 - Z/OS) 12 - VSE
402 13 - Acorn Risc 14 - VFAT
403 15 - alternate MVS 16 - BeOS
404 17 - Tandem 18 - OS/400
405 19 - OS/X (Darwin) 20 thru 255 - unused
407 The lower byte indicates the ZIP specification version
408 (the version of this document) supported by the software
409 used to encode the file. The value/10 indicates the major
410 version number, and the value mod 10 is the minor version
413 version needed to extract (2 bytes)
415 The minimum supported ZIP specification version needed to
416 extract the file, mapped as above. This value is based on
417 the specific format features a ZIP program must support to
418 be able to extract the file. If multiple features are
419 applied to a file, the minimum version should be set to the
420 feature having the highest value. New features or feature
421 changes affecting the published format specification will be
422 implemented using higher version numbers than the last
423 published value to avoid conflict.
425 Current minimum feature versions are as defined below:
428 1.1 - File is a volume label
429 2.0 - File is a folder (directory)
430 2.0 - File is compressed using Deflate compression
431 2.0 - File is encrypted using traditional PKWARE encryption
432 2.1 - File is compressed using Deflate64(tm)
433 2.5 - File is compressed using PKWARE DCL Implode
434 2.7 - File is a patch data set
435 4.5 - File uses ZIP64 format extensions
436 4.6 - File is compressed using BZIP2 compression*
437 5.0 - File is encrypted using DES
438 5.0 - File is encrypted using 3DES
439 5.0 - File is encrypted using original RC2 encryption
440 5.0 - File is encrypted using RC4 encryption
441 5.1 - File is encrypted using AES encryption
442 5.1 - File is encrypted using corrected RC2 encryption**
443 5.2 - File is encrypted using corrected RC2-64 encryption**
444 6.1 - File is encrypted using non-OAEP key wrapping***
445 6.2 - Central directory encryption
446 6.3 - File is compressed using LZMA
447 6.3 - File is compressed using PPMd+
448 6.3 - File is encrypted using Blowfish
449 6.3 - File is encrypted using Twofish
452 * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the
453 version needed to extract for BZIP2 compression to be 50
454 when it should have been 46.
456 ** Refer to the section on Strong Encryption Specification
457 for additional information regarding RC2 corrections.
459 *** Certificate encryption using non-OAEP key wrapping is the
460 intended mode of operation for all versions beginning with 6.1.
461 Support for OAEP key wrapping should only be used for
462 backward compatibility when sending ZIP files to be opened by
463 versions of PKZIP older than 6.1 (5.0 or 6.0).
465 + Files compressed using PPMd should set the version
466 needed to extract field to 6.3, however, not all ZIP
467 programs enforce this and may be unable to decompress
468 data files compressed using PPMd if this value is set.
470 When using ZIP64 extensions, the corresponding value in the
471 zip64 end of central directory record should also be set.
472 This field should be set appropriately to indicate whether
473 Version 1 or Version 2 format is in use.
475 general purpose bit flag: (2 bytes)
477 Bit 0: If set, indicates that the file is encrypted.
479 (For Method 6 - Imploding)
480 Bit 1: If the compression method used was type 6,
481 Imploding, then this bit, if set, indicates
482 an 8K sliding dictionary was used. If clear,
483 then a 4K sliding dictionary was used.
484 Bit 2: If the compression method used was type 6,
485 Imploding, then this bit, if set, indicates
486 3 Shannon-Fano trees were used to encode the
487 sliding dictionary output. If clear, then 2
488 Shannon-Fano trees were used.
490 (For Methods 8 and 9 - Deflating)
492 0 0 Normal (-en) compression option was used.
493 0 1 Maximum (-exx/-ex) compression option was used.
494 1 0 Fast (-ef) compression option was used.
495 1 1 Super Fast (-es) compression option was used.
497 (For Method 14 - LZMA)
498 Bit 1: If the compression method used was type 14,
499 LZMA, then this bit, if set, indicates
500 an end-of-stream (EOS) marker is used to
501 mark the end of the compressed data stream.
502 If clear, then an EOS marker is not present
503 and the compressed data size must be known
506 Note: Bits 1 and 2 are undefined if the compression
509 Bit 3: If this bit is set, the fields crc-32, compressed
510 size and uncompressed size are set to zero in the
511 local header. The correct values are put in the
512 data descriptor immediately following the compressed
513 data. (Note: PKZIP version 2.04g for DOS only
514 recognizes this bit for method 8 compression, newer
515 versions of PKZIP recognize this bit for any
518 Bit 4: Reserved for use with method 8, for enhanced
521 Bit 5: If this bit is set, this indicates that the file is
522 compressed patched data. (Note: Requires PKZIP
523 version 2.70 or greater)
525 Bit 6: Strong encryption. If this bit is set, you should
526 set the version needed to extract value to at least
527 50 and you must also set bit 0. If AES encryption
528 is used, the version needed to extract value must
531 Bit 7: Currently unused.
533 Bit 8: Currently unused.
535 Bit 9: Currently unused.
537 Bit 10: Currently unused.
539 Bit 11: Language encoding flag (EFS). If this bit is set,
540 the filename and comment fields for this file
541 must be encoded using UTF-8. (see APPENDIX D)
543 Bit 12: Reserved by PKWARE for enhanced compression.
545 Bit 13: Used when encrypting the Central Directory to indicate
546 selected data values in the Local Header are masked to
547 hide their actual values. See the section describing
548 the Strong Encryption Specification for details.
550 Bit 14: Reserved by PKWARE.
552 Bit 15: Reserved by PKWARE.
554 compression method: (2 bytes)
556 (see accompanying documentation for algorithm
559 0 - The file is stored (no compression)
560 1 - The file is Shrunk
561 2 - The file is Reduced with compression factor 1
562 3 - The file is Reduced with compression factor 2
563 4 - The file is Reduced with compression factor 3
564 5 - The file is Reduced with compression factor 4
565 6 - The file is Imploded
566 7 - Reserved for Tokenizing compression algorithm
567 8 - The file is Deflated
568 9 - Enhanced Deflating using Deflate64(tm)
569 10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
570 11 - Reserved by PKWARE
571 12 - File is compressed using BZIP2 algorithm
572 13 - Reserved by PKWARE
574 15 - Reserved by PKWARE
575 16 - Reserved by PKWARE
576 17 - Reserved by PKWARE
577 18 - File is compressed using IBM TERSE (new)
578 19 - IBM LZ77 z Architecture (PFS)
579 97 - WavPack compressed data
580 98 - PPMd version I, Rev 1
582 date and time fields: (2 bytes each)
584 The date and time are encoded in standard MS-DOS format.
585 If input came from standard input, the date and time are
586 those at which compression was started for this data.
587 If encrypting the central directory and general purpose bit
588 flag 13 is set indicating masking, the value stored in the
589 Local Header will be zero.
593 The CRC-32 algorithm was generously contributed by
594 David Schwaderer and can be found in his excellent
595 book "C Programmers Guide to NetBIOS" published by
596 Howard W. Sams & Co. Inc. The 'magic number' for
597 the CRC is 0xdebb20e3. The proper CRC pre and post
598 conditioning is used, meaning that the CRC register
599 is pre-conditioned with all ones (a starting value
600 of 0xffffffff) and the value is post-conditioned by
601 taking the one's complement of the CRC residual.
602 If bit 3 of the general purpose flag is set, this
603 field is set to zero in the local header and the correct
604 value is put in the data descriptor and in the central
605 directory. When encrypting the central directory, if the
606 local header is not in ZIP64 format and general purpose
607 bit flag 13 is set indicating masking, the value stored
608 in the Local Header will be zero.
610 compressed size: (4 bytes)
611 uncompressed size: (4 bytes)
613 The size of the file compressed and uncompressed,
614 respectively. When a decryption header is present it will
615 be placed in front of the file data and the value of the
616 compressed file size will include the bytes of the decryption
617 header. If bit 3 of the general purpose bit flag is set,
618 these fields are set to zero in the local header and the
619 correct values are put in the data descriptor and
620 in the central directory. If an archive is in ZIP64 format
621 and the value in this field is 0xFFFFFFFF, the size will be
622 in the corresponding 8 byte ZIP64 extended information
623 extra field. When encrypting the central directory, if the
624 local header is not in ZIP64 format and general purpose bit
625 flag 13 is set indicating masking, the value stored for the
626 uncompressed size in the Local Header will be zero.
628 file name length: (2 bytes)
629 extra field length: (2 bytes)
630 file comment length: (2 bytes)
632 The length of the file name, extra field, and comment
633 fields respectively. The combined length of any
634 directory record and these three fields should not
635 generally exceed 65,535 bytes. If input came from standard
636 input, the file name length is set to zero.
638 disk number start: (2 bytes)
640 The number of the disk on which this file begins. If an
641 archive is in ZIP64 format and the value in this field is
642 0xFFFF, the size will be in the corresponding 4 byte zip64
643 extended information extra field.
645 internal file attributes: (2 bytes)
647 Bits 1 and 2 are reserved for use by PKWARE.
649 The lowest bit of this field indicates, if set, that
650 the file is apparently an ASCII or text file. If not
651 set, that the file apparently contains binary data.
652 The remaining bits are unused in version 1.0.
654 The 0x0002 bit of this field indicates, if set, that a
655 4 byte variable record length control field precedes each
656 logical record indicating the length of the record. The
657 record length control field is stored in little-endian byte
658 order. This flag is independent of text control characters,
659 and if used in conjunction with text data, includes any
660 control characters in the total length of the record. This
661 value is provided for mainframe data transfer support.
663 external file attributes: (4 bytes)
665 The mapping of the external attributes is
666 host-system dependent (see 'version made by'). For
667 MS-DOS, the low order byte is the MS-DOS directory
668 attribute byte. If input came from standard input, this
669 field is set to zero.
671 relative offset of local header: (4 bytes)
673 This is the offset from the start of the first disk on
674 which this file appears, to where the local header should
675 be found. If an archive is in ZIP64 format and the value
676 in this field is 0xFFFFFFFF, the size will be in the
677 corresponding 8 byte zip64 extended information extra field.
679 file name: (Variable)
681 The name of the file, with optional relative path.
682 The path stored should not contain a drive or
683 device letter, or a leading slash. All slashes
684 should be forward slashes '/' as opposed to
685 backwards slashes '\' for compatibility with Amiga
686 and UNIX file systems etc. If input came from standard
687 input, there is no file name field. If encrypting
688 the central directory and general purpose bit flag 13 is set
689 indicating masking, the file name stored in the Local Header
690 will not be the actual file name. A masking value consisting
691 of a unique hexadecimal value will be stored. This value will
692 be sequentially incremented for each file in the archive. See
693 the section on the Strong Encryption Specification for details
694 on retrieving the encrypted file name.
696 extra field: (Variable)
698 This is for expansion. If additional information
699 needs to be stored for special needs or for specific
700 platforms, it should be stored here. Earlier versions
701 of the software can then safely skip this file, and
702 find the next file or header. This field will be 0
703 length in version 1.0.
705 In order to allow different programs and different types
706 of information to be stored in the 'extra' field in .ZIP
707 files, the following structure should be used for all
708 programs storing data in this field:
710 header1+data1 + header2+data2 . . .
712 Each header should consist of:
717 Note: all fields stored in Intel low-byte/high-byte order.
719 The Header ID field indicates the type of data that is in
720 the following data block.
722 Header ID's of 0 thru 31 are reserved for use by PKWARE.
723 The remaining ID's can be used by third party vendors for
726 The current Header ID mappings defined by PKWARE are:
728 0x0001 Zip64 extended information extra field
730 0x0008 Reserved for extended language encoding data (PFS)
736 0x000e Reserved for file stream and fork descriptors
737 0x000f Patch Descriptor
738 0x0014 PKCS#7 Store for X.509 Certificates
739 0x0015 X.509 Certificate ID and Signature for
741 0x0016 X.509 Certificate ID for Central Directory
742 0x0017 Strong Encryption Header
743 0x0018 Record Management Controls
744 0x0019 PKCS#7 Encryption Recipient Certificate List
745 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes
747 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400)
748 attributes - compressed
749 0x4690 POSZIP 4690 (reserved)
751 Third party mappings commonly used are:
755 0x2605 ZipIt Macintosh
756 0x2705 ZipIt Macintosh 1.3.5+
757 0x2805 ZipIt Macintosh 1.3.5+
758 0x334d Info-ZIP Macintosh
760 0x4453 Windows NT security descriptor (binary ACL)
763 0x4b46 FWKCS MD5 (see below)
764 0x4c41 OS/2 access control list (text ACL)
765 0x4d49 Info-ZIP OpenVMS
766 0x4f4c Xceed original location extra field
768 0x5455 extended timestamp
769 0x554e Xceed unicode extra field
770 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc)
771 0x6375 Info-ZIP Unicode Comment Extra Field
773 0x7075 Info-ZIP Unicode Path Extra Field
775 0x7855 Info-ZIP UNIX (new)
776 0xa220 Microsoft Open Packaging Growth Hint
779 Detailed descriptions of Extra Fields defined by third
780 party mappings will be documented as information on
781 these data structures is made available to PKWARE.
782 PKWARE does not guarantee the accuracy of any published
785 The Data Size field indicates the size of the following
786 data block. Programs can use this value to skip to the
787 next header block, passing over any data blocks that are
790 Note: As stated above, the size of the entire .ZIP file
791 header, including the file name, comment, and extra
792 field should not exceed 64K in size.
794 In case two different programs should appropriate the same
795 Header ID value, it is strongly recommended that each
796 program place a unique signature of at least two bytes in
797 size (and preferably 4 bytes or bigger) at the start of
798 each data area. Every program should verify that its
799 unique signature is present, in addition to the Header ID
800 value being correct, before assuming that it is a block of
803 -Zip64 Extended Information Extra Field (0x0001):
805 The following is the layout of the zip64 extended
806 information "extra" block. If one of the size or
807 offset fields in the Local or Central directory
808 record is too small to hold the required data,
809 a Zip64 extended information record is created.
810 The order of the fields in the zip64 extended
811 information record is fixed, but the fields will
812 only appear if the corresponding Local or Central
813 directory record field is set to 0xFFFF or 0xFFFFFFFF.
815 Note: all fields stored in Intel low-byte/high-byte order.
817 Value Size Description
818 ----- ---- -----------
819 (ZIP64) 0x0001 2 bytes Tag for this "extra" block type
820 Size 2 bytes Size of this "extra" block
822 Size 8 bytes Original uncompressed file size
824 Size 8 bytes Size of compressed data
826 Offset 8 bytes Offset of local header record
828 Number 4 bytes Number of the disk on which
831 This entry in the Local header must include BOTH original
832 and compressed file size fields. If encrypting the
833 central directory and bit 13 of the general purpose bit
834 flag is set indicating masking, the value stored in the
835 Local Header for the original file size will be zero.
838 -OS/2 Extra Field (0x0009):
840 The following is the layout of the OS/2 attributes "extra"
841 block. (Last Revision 09/05/95)
843 Note: all fields stored in Intel low-byte/high-byte order.
845 Value Size Description
846 ----- ---- -----------
847 (OS/2) 0x0009 2 bytes Tag for this "extra" block type
848 TSize 2 bytes Size for the following data block
849 BSize 4 bytes Uncompressed Block Size
850 CType 2 bytes Compression type
851 EACRC 4 bytes CRC value for uncompress block
852 (var) variable Compressed block
854 The OS/2 extended attribute structure (FEA2LIST) is
855 compressed and then stored in it's entirety within this
856 structure. There will only ever be one "block" of data in
859 -NTFS Extra Field (0x000a):
861 The following is the layout of the NTFS attributes
862 "extra" block. (Note: At this time the Mtime, Atime
863 and Ctime values may be used on any WIN32 system.)
865 Note: all fields stored in Intel low-byte/high-byte order.
867 Value Size Description
868 ----- ---- -----------
869 (NTFS) 0x000a 2 bytes Tag for this "extra" block type
870 TSize 2 bytes Size of the total "extra" block
871 Reserved 4 bytes Reserved for future use
872 Tag1 2 bytes NTFS attribute tag value #1
873 Size1 2 bytes Size of attribute #1, in bytes
874 (var.) Size1 Attribute #1 data
878 TagN 2 bytes NTFS attribute tag value #N
879 SizeN 2 bytes Size of attribute #N, in bytes
880 (var.) SizeN Attribute #N data
882 For NTFS, values for Tag1 through TagN are as follows:
883 (currently only one set of attributes is defined for NTFS)
886 ----- ---- -----------
887 0x0001 2 bytes Tag for attribute #1
888 Size1 2 bytes Size of attribute #1, in bytes
889 Mtime 8 bytes File last modification time
890 Atime 8 bytes File last access time
891 Ctime 8 bytes File creation time
893 -OpenVMS Extra Field (0x000c):
895 The following is the layout of the OpenVMS attributes
898 Note: all fields stored in Intel low-byte/high-byte order.
900 Value Size Description
901 ----- ---- -----------
902 (VMS) 0x000c 2 bytes Tag for this "extra" block type
903 TSize 2 bytes Size of the total "extra" block
904 CRC 4 bytes 32-bit CRC for remainder of the block
905 Tag1 2 bytes OpenVMS attribute tag value #1
906 Size1 2 bytes Size of attribute #1, in bytes
907 (var.) Size1 Attribute #1 data
911 TagN 2 bytes OpenVMS attribute tag value #N
912 SizeN 2 bytes Size of attribute #N, in bytes
913 (var.) SizeN Attribute #N data
917 1. There will be one or more of attributes present, which
918 will each be preceded by the above TagX & SizeX values.
919 These values are identical to the ATR$C_XXXX and
920 ATR$S_XXXX constants which are defined in ATR.H under
921 OpenVMS C. Neither of these values will ever be zero.
923 2. No word alignment or padding is performed.
925 3. A well-behaved PKZIP/OpenVMS program should never produce
926 more than one sub-block with the same TagX value. Also,
927 there will never be more than one "extra" block of type
928 0x000c in a particular directory record.
930 -UNIX Extra Field (0x000d):
932 The following is the layout of the UNIX "extra" block.
933 Note: all fields are stored in Intel low-byte/high-byte
936 Value Size Description
937 ----- ---- -----------
938 (UNIX) 0x000d 2 bytes Tag for this "extra" block type
939 TSize 2 bytes Size for the following data block
940 Atime 4 bytes File last access time
941 Mtime 4 bytes File last modification time
942 Uid 2 bytes File user ID
943 Gid 2 bytes File group ID
944 (var) variable Variable length data field
946 The variable length data field will contain file type
947 specific data. Currently the only values allowed are
948 the original "linked to" file names for hard or symbolic
949 links, and the major and minor device node numbers for
950 character and block device nodes. Since device nodes
951 cannot be either symbolic or hard links, only one set of
952 variable length data is stored. Link files will have the
953 name of the original file stored. This name is NOT NULL
954 terminated. Its size can be determined by checking TSize -
955 12. Device entries will have eight bytes stored as two 4
956 byte entries (in little endian format). The first entry
957 will be the major device number, and the second the minor
960 -PATCH Descriptor Extra Field (0x000f):
962 The following is the layout of the Patch Descriptor "extra"
965 Note: all fields stored in Intel low-byte/high-byte order.
967 Value Size Description
968 ----- ---- -----------
969 (Patch) 0x000f 2 bytes Tag for this "extra" block type
970 TSize 2 bytes Size of the total "extra" block
971 Version 2 bytes Version of the descriptor
972 Flags 4 bytes Actions and reactions (see below)
973 OldSize 4 bytes Size of the file about to be patched
974 OldCRC 4 bytes 32-bit CRC of the file to be patched
975 NewSize 4 bytes Size of the resulting file
976 NewCRC 4 bytes 32-bit CRC of the resulting file
978 Actions and reactions
981 ---- ----------------
982 0 Use for auto detection
983 1 Treat as a self-patch
985 4-5 Action (see below)
987 8-9 Reaction (see below) to absent file
988 10-11 Reaction (see below) to newer file
989 12-13 Reaction (see below) to unknown file
1011 Patch support is provided by PKPatchMaker(tm) technology and is
1012 covered under U.S. Patents and Patents Pending. The use or
1013 implementation in a product of certain technological aspects set
1014 forth in the current APPNOTE, including those with regard to
1015 strong encryption, patching, or extended tape operations requires
1016 a license from PKWARE. Please contact PKWARE with regard to
1017 acquiring a license.
1019 -PKCS#7 Store for X.509 Certificates (0x0014):
1021 This field contains information about each of the certificates
1022 files may be signed with. When the Central Directory Encryption
1023 feature is enabled for a ZIP file, this record will appear in
1024 the Archive Extra Data Record, otherwise it will appear in the
1025 first central directory record and will be ignored in any
1028 Note: all fields stored in Intel low-byte/high-byte order.
1030 Value Size Description
1031 ----- ---- -----------
1032 (Store) 0x0014 2 bytes Tag for this "extra" block type
1033 TSize 2 bytes Size of the store data
1034 TData TSize Data about the store
1037 -X.509 Certificate ID and Signature for individual file (0x0015):
1039 This field contains the information about which certificate in
1040 the PKCS#7 store was used to sign a particular file. It also
1041 contains the signature data. This field can appear multiple
1042 times, but can only appear once per certificate.
1044 Note: all fields stored in Intel low-byte/high-byte order.
1046 Value Size Description
1047 ----- ---- -----------
1048 (CID) 0x0015 2 bytes Tag for this "extra" block type
1049 TSize 2 bytes Size of data that follows
1050 TData TSize Signature Data
1052 -X.509 Certificate ID and Signature for central directory (0x0016):
1054 This field contains the information about which certificate in
1055 the PKCS#7 store was used to sign the central directory structure.
1056 When the Central Directory Encryption feature is enabled for a
1057 ZIP file, this record will appear in the Archive Extra Data Record,
1058 otherwise it will appear in the first central directory record.
1060 Note: all fields stored in Intel low-byte/high-byte order.
1062 Value Size Description
1063 ----- ---- -----------
1064 (CDID) 0x0016 2 bytes Tag for this "extra" block type
1065 TSize 2 bytes Size of data that follows
1068 -Strong Encryption Header (0x0017):
1070 Value Size Description
1071 ----- ---- -----------
1072 0x0017 2 bytes Tag for this "extra" block type
1073 TSize 2 bytes Size of data that follows
1074 Format 2 bytes Format definition for this record
1075 AlgID 2 bytes Encryption algorithm identifier
1076 Bitlen 2 bytes Bit length of encryption key
1077 Flags 2 bytes Processing flags
1078 CertData TSize-8 Certificate decryption extra field data
1079 (refer to the explanation for CertData
1080 in the section describing the
1081 Certificate Processing Method under
1082 the Strong Encryption Specification)
1085 -Record Management Controls (0x0018):
1087 Value Size Description
1088 ----- ---- -----------
1089 (Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type
1090 CSize 2 bytes Size of total extra block data
1091 Tag1 2 bytes Record control attribute 1
1092 Size1 2 bytes Size of attribute 1, in bytes
1093 Data1 Size1 Attribute 1 data
1097 TagN 2 bytes Record control attribute N
1098 SizeN 2 bytes Size of attribute N, in bytes
1099 DataN SizeN Attribute N data
1102 -PKCS#7 Encryption Recipient Certificate List (0x0019):
1104 This field contains information about each of the certificates
1105 used in encryption processing and it can be used to identify who is
1106 allowed to decrypt encrypted files. This field should only appear
1107 in the archive extra data record. This field is not required and
1108 serves only to aide archive modifications by preserving public
1109 encryption key data. Individual security requirements may dictate
1110 that this data be omitted to deter information exposure.
1112 Note: all fields stored in Intel low-byte/high-byte order.
1114 Value Size Description
1115 ----- ---- -----------
1116 (CStore) 0x0019 2 bytes Tag for this "extra" block type
1117 TSize 2 bytes Size of the store data
1118 TData TSize Data about the store
1122 Value Size Description
1123 ----- ---- -----------
1124 Version 2 bytes Format version number - must 0x0001 at this time
1125 CStore (var) PKCS#7 data blob
1128 -MVS Extra Field (0x0065):
1130 The following is the layout of the MVS "extra" block.
1131 Note: Some fields are stored in Big Endian format.
1132 All text is in EBCDIC format unless otherwise specified.
1134 Value Size Description
1135 ----- ---- -----------
1136 (MVS) 0x0065 2 bytes Tag for this "extra" block type
1137 TSize 2 bytes Size for the following data block
1138 ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or
1139 "T4MV" for TargetFour
1140 (var) TSize-4 Attribute data (see APPENDIX B)
1143 -OS/400 Extra Field (0x0065):
1145 The following is the layout of the OS/400 "extra" block.
1146 Note: Some fields are stored in Big Endian format.
1147 All text is in EBCDIC format unless otherwise specified.
1149 Value Size Description
1150 ----- ---- -----------
1151 (OS400) 0x0065 2 bytes Tag for this "extra" block type
1152 TSize 2 bytes Size for the following data block
1153 ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or
1154 "T4MV" for TargetFour
1155 (var) TSize-4 Attribute data (see APPENDIX A)
1158 Third-party Mappings:
1160 -ZipIt Macintosh Extra Field (long) (0x2605):
1162 The following is the layout of the ZipIt extra block
1163 for Macintosh. The local-header and central-header versions
1164 are identical. This block must be present if the file is
1165 stored MacBinary-encoded and it should not be used if the file
1166 is not stored MacBinary-encoded.
1168 Value Size Description
1169 ----- ---- -----------
1170 (Mac2) 0x2605 Short tag for this extra block type
1171 TSize Short total data size for this block
1172 "ZPIT" beLong extra-field signature
1173 FnLen Byte length of FileName
1174 FileName variable full Macintosh filename
1175 FileType Byte[4] four-byte Mac file type string
1176 Creator Byte[4] four-byte Mac creator string
1179 -ZipIt Macintosh Extra Field (short, for files) (0x2705):
1181 The following is the layout of a shortened variant of the
1182 ZipIt extra block for Macintosh (without "full name" entry).
1183 This variant is used by ZipIt 1.3.5 and newer for entries of
1184 files (not directories) that do not have a MacBinary encoded
1185 file. The local-header and central-header versions are identical.
1187 Value Size Description
1188 ----- ---- -----------
1189 (Mac2b) 0x2705 Short tag for this extra block type
1190 TSize Short total data size for this block (12)
1191 "ZPIT" beLong extra-field signature
1192 FileType Byte[4] four-byte Mac file type string
1193 Creator Byte[4] four-byte Mac creator string
1194 fdFlags beShort attributes from FInfo.frFlags,
1196 0x0000 beShort reserved, may be omitted
1199 -ZipIt Macintosh Extra Field (short, for directories) (0x2805):
1201 The following is the layout of a shortened variant of the
1202 ZipIt extra block for Macintosh used only for directory
1203 entries. This variant is used by ZipIt 1.3.5 and newer to
1204 save some optional Mac-specific information about directories.
1205 The local-header and central-header versions are identical.
1207 Value Size Description
1208 ----- ---- -----------
1209 (Mac2c) 0x2805 Short tag for this extra block type
1210 TSize Short total data size for this block (12)
1211 "ZPIT" beLong extra-field signature
1212 frFlags beShort attributes from DInfo.frFlags, may
1214 View beShort ZipIt view flag, may be omitted
1217 The View field specifies ZipIt-internal settings as follows:
1220 bit 0 if set, the folder is shown expanded (open)
1221 when the archive contents are viewed in ZipIt.
1222 bits 1-15 reserved, zero;
1225 -FWKCS MD5 Extra Field (0x4b46):
1227 The FWKCS Contents_Signature System, used in
1228 automatically identifying files independent of file name,
1229 optionally adds and uses an extra field to support the
1230 rapid creation of an enhanced contents_signature:
1234 Preface = 'M','D','5'
1235 followed by 16 bytes containing the uncompressed file's
1236 128_bit MD5 hash(1), low byte first.
1238 When FWKCS revises a .ZIP file central directory to add
1239 this extra field for a file, it also replaces the
1240 central directory entry for that file's uncompressed
1241 file length with a measured value.
1243 FWKCS provides an option to strip this extra field, if
1244 present, from a .ZIP file central directory. In adding
1245 this extra field, FWKCS preserves .ZIP file Authenticity
1246 Verification; if stripping this extra field, FWKCS
1247 preserves all versions of AV through PKZIP version 2.04g.
1249 FWKCS, and FWKCS Contents_Signature System, are
1250 trademarks of Frederick W. Kantor.
1252 (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
1253 Science and RSA Data Security, Inc., April 1992.
1254 ll.76-77: "The MD5 algorithm is being placed in the
1255 public domain for review and possible adoption as a
1259 -Info-ZIP Unicode Comment Extra Field (0x6375):
1261 Stores the UTF-8 version of the file comment as stored in the
1262 central directory header. (Last Revision 20070912)
1264 Value Size Description
1265 ----- ---- -----------
1266 (UCom) 0x6375 Short tag for this extra block type ("uc")
1267 TSize Short total data size for this block
1268 Version 1 byte version of this extra field, currently 1
1269 ComCRC32 4 bytes Comment Field CRC32 Checksum
1270 UnicodeCom Variable UTF-8 version of the entry comment
1272 Currently Version is set to the number 1. If there is a need
1273 to change this field, the version will be incremented. Changes
1274 may not be backward compatible so this extra field should not be
1275 used if the version is not recognized.
1277 The ComCRC32 is the standard zip CRC32 checksum of the File Comment
1278 field in the central directory header. This is used to verify that
1279 the comment field has not changed since the Unicode Comment extra field
1280 was created. This can happen if a utility changes the File Comment
1281 field but does not update the UTF-8 Comment extra field. If the CRC
1282 check fails, this Unicode Comment extra field should be ignored and
1283 the File Comment field in the header should be used instead.
1285 The UnicodeCom field is the UTF-8 version of the File Comment field
1286 in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte
1287 order mark (BOM) is used. The length of this field is determined by
1288 subtracting the size of the previous fields from TSize. If both the
1289 File Name and Comment fields are UTF-8, the new General Purpose Bit
1290 Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate
1291 both the header File Name and Comment fields are UTF-8 and, in this
1292 case, the Unicode Path and Unicode Comment extra fields are not
1293 needed and should not be created. Note that, for backward
1294 compatibility, bit 11 should only be used if the native character set
1295 of the paths and comments being zipped up are already in UTF-8. It is
1296 expected that the same file comment storage method, either general
1297 purpose bit 11 or extra fields, be used in both the Local and Central
1298 Directory Header for a file.
1301 -Info-ZIP Unicode Path Extra Field (0x7075):
1303 Stores the UTF-8 version of the file name field as stored in the
1304 local header and central directory header. (Last Revision 20070912)
1306 Value Size Description
1307 ----- ---- -----------
1308 (UPath) 0x7075 Short tag for this extra block type ("up")
1309 TSize Short total data size for this block
1310 Version 1 byte version of this extra field, currently 1
1311 NameCRC32 4 bytes File Name Field CRC32 Checksum
1312 UnicodeName Variable UTF-8 version of the entry File Name
1314 Currently Version is set to the number 1. If there is a need
1315 to change this field, the version will be incremented. Changes
1316 may not be backward compatible so this extra field should not be
1317 used if the version is not recognized.
1319 The NameCRC32 is the standard zip CRC32 checksum of the File Name
1320 field in the header. This is used to verify that the header
1321 File Name field has not changed since the Unicode Path extra field
1322 was created. This can happen if a utility renames the File Name but
1323 does not update the UTF-8 path extra field. If the CRC check fails,
1324 this UTF-8 Path Extra Field should be ignored and the File Name field
1325 in the header should be used instead.
1327 The UnicodeName is the UTF-8 version of the contents of the File Name
1328 field in the header. As UnicodeName is defined to be UTF-8, no UTF-8
1329 byte order mark (BOM) is used. The length of this field is determined
1330 by subtracting the size of the previous fields from TSize. If both
1331 the File Name and Comment fields are UTF-8, the new General Purpose
1332 Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to
1333 indicate that both the header File Name and Comment fields are UTF-8
1334 and, in this case, the Unicode Path and Unicode Comment extra fields
1335 are not needed and should not be created. Note that, for backward
1336 compatibility, bit 11 should only be used if the native character set
1337 of the paths and comments being zipped up are already in UTF-8. It is
1338 expected that the same file name storage method, either general
1339 purpose bit 11 or extra fields, be used in both the Local and Central
1340 Directory Header for a file.
1343 -Microsoft Open Packaging Growth Hint (0xa220):
1345 Value Size Description
1346 ----- ---- -----------
1347 0xa220 Short tag for this extra block type
1348 TSize Short size of Sig + PadVal + Padding
1349 Sig Short verification signature (A028)
1350 PadVal Short Initial padding value
1351 Padding variable filled with NULL characters
1354 file comment: (Variable)
1356 The comment for this file.
1358 number of this disk: (2 bytes)
1360 The number of this disk, which contains central
1361 directory end record. If an archive is in ZIP64 format
1362 and the value in this field is 0xFFFF, the size will
1363 be in the corresponding 4 byte zip64 end of central
1367 number of the disk with the start of the central
1368 directory: (2 bytes)
1370 The number of the disk on which the central
1371 directory starts. If an archive is in ZIP64 format
1372 and the value in this field is 0xFFFF, the size will
1373 be in the corresponding 4 byte zip64 end of central
1376 total number of entries in the central dir on
1377 this disk: (2 bytes)
1379 The number of central directory entries on this disk.
1380 If an archive is in ZIP64 format and the value in
1381 this field is 0xFFFF, the size will be in the
1382 corresponding 8 byte zip64 end of central
1385 total number of entries in the central dir: (2 bytes)
1387 The total number of files in the .ZIP file. If an
1388 archive is in ZIP64 format and the value in this field
1389 is 0xFFFF, the size will be in the corresponding 8 byte
1390 zip64 end of central directory field.
1392 size of the central directory: (4 bytes)
1394 The size (in bytes) of the entire central directory.
1395 If an archive is in ZIP64 format and the value in
1396 this field is 0xFFFFFFFF, the size will be in the
1397 corresponding 8 byte zip64 end of central
1400 offset of start of central directory with respect to
1401 the starting disk number: (4 bytes)
1403 Offset of the start of the central directory on the
1404 disk on which the central directory starts. If an
1405 archive is in ZIP64 format and the value in this
1406 field is 0xFFFFFFFF, the size will be in the
1407 corresponding 8 byte zip64 end of central
1410 .ZIP file comment length: (2 bytes)
1412 The length of the comment for this .ZIP file.
1414 .ZIP file comment: (Variable)
1416 The comment for this .ZIP file. ZIP file comment data
1417 is stored unsecured. No encryption or data authentication
1418 is applied to this area at this time. Confidential information
1419 should not be stored in this section.
1421 zip64 extensible data sector (variable size)
1423 (currently reserved for use by PKWARE)
1426 K. Splitting and Spanning ZIP files
1428 Spanning is the process of segmenting a ZIP file across
1429 multiple removable media. This support has typically only
1430 been provided for DOS formatted floppy diskettes.
1432 File splitting is a newer derivative of spanning.
1433 Splitting follows the same segmentation process as
1434 spanning, however, it does not require writing each
1435 segment to a unique removable medium and instead supports
1436 placing all pieces onto local or non-removable locations
1437 such as file systems, local drives, folders, etc...
1439 A key difference between spanned and split ZIP files is
1440 that all pieces of a spanned ZIP file have the same name.
1441 Since each piece is written to a separate volume, no name
1442 collisions occur and each segment can reuse the original
1443 .ZIP file name given to the archive.
1445 Sequence ordering for DOS spanned archives uses the DOS
1446 volume label to determine segment numbers. Volume labels
1447 for each segment are written using the form PKBACK#xxx,
1448 where xxx is the segment number written as a decimal
1449 value from 001 - nnn.
1451 Split ZIP files are typically written to the same location
1452 and are subject to name collisions if the spanned name
1453 format is used since each segment will reside on the same
1454 drive. To avoid name collisions, split archives are named
1457 Segment 1 = filename.z01
1458 Segment n-1 = filename.z(n-1)
1459 Segment n = filename.zip
1461 The .ZIP extension is used on the last segment to support
1462 quickly reading the central directory. The segment number
1463 n should be a decimal value.
1465 Spanned ZIP files may be PKSFX Self-extracting ZIP files.
1466 PKSFX files may also be split, however, in this case
1467 the first segment must be named filename.exe. The first
1468 segment of a split PKSFX archive must be large enough to
1469 include the entire executable program.
1471 Capacities for split archives are as follows.
1473 Maximum number of segments = 4,294,967,295 - 1
1474 Maximum .ZIP segment size = 4,294,967,295 bytes
1475 Minimum segment size = 64K
1476 Maximum PKSFX segment size = 2,147,483,647 bytes
1478 Segment sizes may be different however by convention, all
1479 segment sizes should be the same with the exception of the
1480 last, which may be smaller. Local and central directory
1481 header records must never be split across a segment boundary.
1482 When writing a header record, if the number of bytes remaining
1483 within a segment is less than the size of the header record,
1484 end the current segment and write the header at the start
1485 of the next segment. The central directory may span segment
1486 boundaries, but no single record in the central directory
1487 should be split across segments.
1489 Spanned/Split archives created using PKZIP for Windows
1490 (V2.50 or greater), PKZIP Command Line (V2.50 or greater),
1491 or PKZIP Explorer will include a special spanning
1492 signature as the first 4 bytes of the first segment of
1493 the archive. This signature (0x08074b50) will be
1494 followed immediately by the local header signature for
1495 the first file in the archive.
1497 A special spanning marker may also appear in spanned/split
1498 archives if the spanning or splitting process starts but
1499 only requires one segment. In this case the 0x08074b50
1500 signature will be replaced with the temporary spanning
1501 marker signature of 0x30304b50. Split archives can
1502 only be uncompressed by other versions of PKZIP that
1503 know how to create a split archive.
1505 The signature value 0x08074b50 is also used by some
1506 ZIP implementations as a marker for the Data Descriptor
1507 record. Conflict in this alternate assignment can be
1508 avoided by ensuring the position of the signature
1509 within the ZIP file to determine the use for which it
1514 1) All fields unless otherwise noted are unsigned and stored
1515 in Intel low-byte:high-byte, low-word:high-word order.
1517 2) String fields are not null terminated, since the
1518 length is given explicitly.
1520 3) The entries in the central directory may not necessarily
1521 be in the same order that files appear in the .ZIP file.
1523 4) If one of the fields in the end of central directory
1524 record is too small to hold required data, the field
1525 should be set to -1 (0xFFFF or 0xFFFFFFFF) and the
1526 ZIP64 format record should be created.
1528 5) The end of central directory record and the
1529 Zip64 end of central directory locator record must
1530 reside on the same disk when splitting or spanning
1533 VI. Explanation of compression methods
1534 --------------------------------------
1536 UnShrinking - Method 1
1537 ----------------------
1539 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
1540 with partial clearing. The initial code size is 9 bits, and
1541 the maximum code size is 13 bits. Shrinking differs from
1542 conventional Dynamic Ziv-Lempel-Welch implementations in several
1545 1) The code size is controlled by the compressor, and is not
1546 automatically increased when codes larger than the current
1547 code size are created (but not necessarily used). When
1548 the decompressor encounters the code sequence 256
1549 (decimal) followed by 1, it should increase the code size
1550 read from the input stream to the next bit size. No
1551 blocking of the codes is performed, so the next code at
1552 the increased size should be read from the input stream
1553 immediately after where the previous code at the smaller
1554 bit size was read. Again, the decompressor should not
1555 increase the code size used until the sequence 256,1 is
1558 2) When the table becomes full, total clearing is not
1559 performed. Rather, when the compressor emits the code
1560 sequence 256,2 (decimal), the decompressor should clear
1561 all leaf nodes from the Ziv-Lempel tree, and continue to
1562 use the current code size. The nodes that are cleared
1563 from the Ziv-Lempel tree are then re-used, with the lowest
1564 code value re-used first, and the highest code value
1565 re-used last. The compressor can emit the sequence 256,2
1568 Expanding - Methods 2-5
1569 -----------------------
1571 The Reducing algorithm is actually a combination of two
1572 distinct algorithms. The first algorithm compresses repeated
1573 byte sequences, and the second algorithm takes the compressed
1574 stream from the first algorithm and applies a probabilistic
1577 The probabilistic compression stores an array of 'follower
1578 sets' S(j), for j=0 to 255, corresponding to each possible
1579 ASCII character. Each set contains between 0 and 32
1580 characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
1581 The sets are stored at the beginning of the data area for a
1582 Reduced file, in reverse order, with S(255) first, and S(0)
1585 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
1586 where N(j) is the size of set S(j). N(j) can be 0, in which
1587 case the follower set for S(j) is empty. Each N(j) value is
1588 encoded in 6 bits, followed by N(j) eight bit character values
1589 corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If
1590 N(j) is 0, then no values for S(j) are stored, and the value
1591 for N(j-1) immediately follows.
1593 Immediately after the follower sets, is the compressed data
1594 stream. The compressed data stream can be interpreted for the
1595 probabilistic decompression as follows:
1597 let Last-Character <- 0.
1599 if the follower set S(Last-Character) is empty then
1600 read 8 bits from the input stream, and copy this
1601 value to the output stream.
1602 otherwise if the follower set S(Last-Character) is non-empty then
1603 read 1 bit from the input stream.
1604 if this bit is not zero then
1605 read 8 bits from the input stream, and copy this
1606 value to the output stream.
1607 otherwise if this bit is zero then
1608 read B(N(Last-Character)) bits from the input
1609 stream, and assign this value to I.
1610 Copy the value of S(Last-Character)[I] to the
1613 assign the last value placed on the output stream to
1617 B(N(j)) is defined as the minimal number of bits required to
1618 encode the value N(j)-1.
1620 The decompressed stream from above can then be expanded to
1621 re-create the original file as follows:
1626 read 8 bits from the input stream into C.
1628 0: if C is not equal to DLE (144 decimal) then
1629 copy C to the output stream.
1630 otherwise if C is equal to DLE then
1633 1: if C is non-zero then
1636 let State <- F(Len).
1637 otherwise if C is zero then
1638 copy the value 144 (decimal) to the output stream.
1641 2: let Len <- Len + C
1644 3: move backwards D(V,C) bytes in the output stream
1645 (if this position is before the start of the output
1646 stream, then assume that all the data before the
1647 start of the output stream is filled with zeros).
1648 copy Len+3 bytes from this position to the output stream.
1653 The functions F,L, and D are dependent on the 'compression
1654 factor', 1 through 4, and are defined as follows:
1656 For compression factor 1:
1657 L(X) equals the lower 7 bits of X.
1658 F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
1659 D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
1660 For compression factor 2:
1661 L(X) equals the lower 6 bits of X.
1662 F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
1663 D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
1664 For compression factor 3:
1665 L(X) equals the lower 5 bits of X.
1666 F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
1667 D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
1668 For compression factor 4:
1669 L(X) equals the lower 4 bits of X.
1670 F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
1671 D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
1673 Imploding - Method 6
1674 --------------------
1676 The Imploding algorithm is actually a combination of two distinct
1677 algorithms. The first algorithm compresses repeated byte
1678 sequences using a sliding dictionary. The second algorithm is
1679 used to compress the encoding of the sliding dictionary output,
1680 using multiple Shannon-Fano trees.
1682 The Imploding algorithm can use a 4K or 8K sliding dictionary
1683 size. The dictionary size used can be determined by bit 1 in the
1684 general purpose flag word; a 0 bit indicates a 4K dictionary
1685 while a 1 bit indicates an 8K dictionary.
1687 The Shannon-Fano trees are stored at the start of the compressed
1688 file. The number of trees stored is defined by bit 2 in the
1689 general purpose flag word; a 0 bit indicates two trees stored, a
1690 1 bit indicates three trees are stored. If 3 trees are stored,
1691 the first Shannon-Fano tree represents the encoding of the
1692 Literal characters, the second tree represents the encoding of
1693 the Length information, the third represents the encoding of the
1694 Distance information. When 2 Shannon-Fano trees are stored, the
1695 Length tree is stored first, followed by the Distance tree.
1697 The Literal Shannon-Fano tree, if present is used to represent
1698 the entire ASCII character set, and contains 256 values. This
1699 tree is used to compress any data not compressed by the sliding
1700 dictionary algorithm. When this tree is present, the Minimum
1701 Match Length for the sliding dictionary is 3. If this tree is
1702 not present, the Minimum Match Length is 2.
1704 The Length Shannon-Fano tree is used to compress the Length part
1705 of the (length,distance) pairs from the sliding dictionary
1706 output. The Length tree contains 64 values, ranging from the
1707 Minimum Match Length, to 63 plus the Minimum Match Length.
1709 The Distance Shannon-Fano tree is used to compress the Distance
1710 part of the (length,distance) pairs from the sliding dictionary
1711 output. The Distance tree contains 64 values, ranging from 0 to
1712 63, representing the upper 6 bits of the distance value. The
1713 distance values themselves will be between 0 and the sliding
1714 dictionary size, either 4K or 8K.
1716 The Shannon-Fano trees themselves are stored in a compressed
1717 format. The first byte of the tree data represents the number of
1718 bytes of data representing the (compressed) Shannon-Fano tree
1719 minus 1. The remaining bytes represent the Shannon-Fano tree
1722 High 4 bits: Number of values at this bit length + 1. (1 - 16)
1723 Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)
1725 The Shannon-Fano codes can be constructed from the bit lengths
1726 using the following algorithm:
1728 1) Sort the Bit Lengths in ascending order, while retaining the
1729 order of the original lengths stored in the file.
1731 2) Generate the Shannon-Fano trees:
1736 i <- number of Shannon-Fano codes - 1 (either 255 or 63)
1739 Code = Code + CodeIncrement
1740 if BitLength(i) <> LastBitLength then
1741 LastBitLength=BitLength(i)
1742 CodeIncrement = 1 shifted left (16 - LastBitLength)
1743 ShannonCode(i) = Code
1747 3) Reverse the order of all the bits in the above ShannonCode()
1748 vector, so that the most significant bit becomes the least
1749 significant bit. For example, the value 0x1234 (hex) would
1750 become 0x2C48 (hex).
1752 4) Restore the order of Shannon-Fano codes as originally stored
1757 This example will show the encoding of a Shannon-Fano tree
1758 of size 8. Notice that the actual Shannon-Fano trees used
1759 for Imploding are either 64 or 256 entries in size.
1761 Example: 0x02, 0x42, 0x01, 0x13
1763 The first byte indicates 3 values in this table. Decoding the
1765 0x42 = 5 codes of 3 bits long
1766 0x01 = 1 code of 2 bits long
1767 0x13 = 2 codes of 4 bits long
1769 This would generate the original bit length array of:
1770 (3, 3, 3, 3, 3, 2, 4, 4)
1772 There are 8 codes in this table for the values 0 thru 7. Using
1773 the algorithm to obtain the Shannon-Fano codes produces:
1775 Reversed Order Original
1776 Val Sorted Constructed Code Value Restored Length
1777 --- ------ ----------------- -------- -------- ------
1778 0: 2 1100000000000000 11 101 3
1779 1: 3 1010000000000000 101 001 3
1780 2: 3 1000000000000000 001 110 3
1781 3: 3 0110000000000000 110 010 3
1782 4: 3 0100000000000000 010 100 3
1783 5: 3 0010000000000000 100 11 2
1784 6: 4 0001000000000000 1000 1000 4
1785 7: 4 0000000000000000 0000 0000 4
1787 The values in the Val, Order Restored and Original Length columns
1788 now represent the Shannon-Fano encoding tree that can be used for
1789 decoding the Shannon-Fano encoded data. How to parse the
1790 variable length Shannon-Fano values from the data stream is beyond
1791 the scope of this document. (See the references listed at the end of
1792 this document for more information.) However, traditional decoding
1793 schemes used for Huffman variable length decoding, such as the
1794 Greenlaw algorithm, can be successfully applied.
1796 The compressed data stream begins immediately after the
1797 compressed Shannon-Fano data. The compressed data stream can be
1798 interpreted as follows:
1801 read 1 bit from input stream.
1803 if this bit is non-zero then (encoded data is literal data)
1804 if Literal Shannon-Fano tree is present
1805 read and decode character using Literal Shannon-Fano tree.
1807 read 8 bits from input stream.
1808 copy character to the output stream.
1809 otherwise (encoded data is sliding dictionary match)
1810 if 8K dictionary size
1811 read 7 bits for offset Distance (lower 7 bits of offset).
1813 read 6 bits for offset Distance (lower 6 bits of offset).
1815 using the Distance Shannon-Fano tree, read and decode the
1816 upper 6 bits of the Distance value.
1818 using the Length Shannon-Fano tree, read and decode
1821 Length <- Length + Minimum Match Length
1823 if Length = 63 + Minimum Match Length
1824 read 8 bits from the input stream,
1825 add this value to Length.
1827 move backwards Distance+1 bytes in the output stream, and
1828 copy Length characters from this position to the output
1829 stream. (if this position is before the start of the output
1830 stream, then assume that all the data before the start of
1831 the output stream is filled with zeros).
1834 Tokenizing - Method 7
1835 ---------------------
1837 This method is not used by PKZIP.
1839 Deflating - Method 8
1840 --------------------
1842 The Deflate algorithm is similar to the Implode algorithm using
1843 a sliding dictionary of up to 32K with secondary compression
1844 from Huffman/Shannon-Fano codes.
1846 The compressed data is stored in blocks with a header describing
1847 the block and the Huffman codes used in the data block. The header
1848 format is as follows:
1850 Bit 0: Last Block bit This bit is set to 1 if this is the last
1851 compressed block in the data.
1852 Bits 1-2: Block type
1853 00 (0) - Block is stored - All stored data is byte aligned.
1854 Skip bits until next byte, then next word = block
1855 length, followed by the ones compliment of the block
1856 length word. Remaining data in block is the stored
1859 01 (1) - Use fixed Huffman codes for literal and distance codes.
1860 Lit Code Bits Dist Code Bits
1861 --------- ---- --------- ----
1867 Literal codes 286-287 and distance codes 30-31 are
1868 never used but participate in the huffman construction.
1870 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)
1872 11 (3) - Reserved - Flag a "Error in compressed data" if seen.
1874 Expanding Huffman Codes
1875 -----------------------
1876 If the data block is stored with dynamic Huffman codes, the Huffman
1877 codes are sent in the following compressed format:
1879 5 Bits: # of Literal codes sent - 256 (256 - 286)
1880 All other codes are never sent.
1881 5 Bits: # of Dist codes - 1 (1 - 32)
1882 4 Bits: # of Bit Length codes - 3 (3 - 19)
1884 The Huffman codes are sent as bit lengths and the codes are built as
1885 described in the implode algorithm. The bit lengths themselves are
1886 compressed with Huffman codes. There are 19 bit length codes:
1888 0 - 15: Represent bit lengths of 0 - 15
1889 16: Copy the previous bit length 3 - 6 times.
1890 The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
1891 Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
1892 expand to 12 bit lengths of 8 (1 + 6 + 5)
1893 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
1894 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
1896 The lengths of the bit length codes are sent packed 3 bits per value
1897 (0 - 7) in the following order:
1899 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
1901 The Huffman codes should be built as described in the Implode algorithm
1902 except codes are assigned starting at the shortest bit length, i.e. the
1903 shortest code should be all 0's rather than all 1's. Also, codes with
1904 a bit length of zero do not participate in the tree construction. The
1905 codes are then used to decode the bit lengths for the literal and
1908 The bit lengths for the literal tables are sent first with the number
1909 of entries sent described by the 5 bits sent earlier. There are up
1910 to 286 literal characters; the first 256 represent the respective 8
1911 bit character, code 256 represents the End-Of-Block code, the remaining
1912 29 codes represent copy lengths of 3 thru 258. There are up to 30
1913 distance codes representing distances from 1 thru 32k as described
1918 Extra Extra Extra Extra
1919 Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)
1920 ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------
1921 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162
1922 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194
1923 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226
1924 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257
1925 261 0 7 269 2 19-22 277 4 67-82 285 0 258
1926 262 0 8 270 2 23-26 278 4 83-98
1927 263 0 9 271 2 27-30 279 4 99-114
1928 264 0 10 272 2 31-34 280 4 115-130
1932 Extra Extra Extra Extra
1933 Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance
1934 ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
1935 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
1936 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
1937 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
1938 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
1939 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
1940 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
1941 6 2 9-12 14 6 129-192 22 10 2049-3072
1942 7 2 13-16 15 6 193-256 23 10 3073-4096
1944 The compressed data stream begins immediately after the
1945 compressed header data. The compressed data stream can be
1946 interpreted as follows:
1949 read header from input stream.
1952 skip bits until byte aligned
1953 read count and 1's compliment of count
1954 copy count bytes data block
1956 loop until end of block code sent
1957 decode literal character from input stream
1959 copy character to the output stream
1961 if literal = end of block
1964 decode distance from input stream
1966 move backwards distance bytes in the output stream, and
1967 copy length characters from this position to the output
1970 while not last block
1972 if data descriptor exists
1973 skip bits until byte aligned
1977 Enhanced Deflating - Method 9
1978 -----------------------------
1980 The Enhanced Deflating algorithm is similar to Deflate but
1981 uses a sliding dictionary of up to 64K. Deflate64(tm) is supported
1982 by the Deflate extractor.
1987 BZIP2 is an open-source data compression algorithm developed by
1988 Julian Seward. Information and source code for this algorithm
1989 can be found on the internet.
1991 LZMA - Method 14 (EFS)
1992 ----------------------
1994 LZMA is a block-oriented, general purpose data compression algorithm
1995 developed and maintained by Igor Pavlov. It is a derivative of LZ77
1996 that utilizes Markov chains and a range coder. Information and
1997 source code for this algorithm can be found on the internet. Consult
1998 with the author of this algorithm for information on terms or
1999 restrictions on use.
2001 Support for LZMA within the ZIP format is defined as follows:
2003 The Compression method field within the ZIP Local and Central
2004 Header records will be set to the value 14 to indicate data was
2005 compressed using LZMA.
2007 The Version needed to extract field within the ZIP Local and
2008 Central Header records will be set to 6.3 to indicate the
2009 minimum ZIP format version supporting this feature.
2011 File data compressed using the LZMA algorithm must be placed
2012 immediately following the Local Header for the file. If a
2013 standard ZIP encryption header is required, it will follow
2014 the Local Header and will precede the LZMA compressed file
2015 data segment. The location of LZMA compressed data segment
2016 within the ZIP format will be as shown:
2018 [local header file 1]
2019 [encryption header file 1]
2020 [LZMA compressed data segment for file 1]
2022 [local header file 2]
2024 The encryption header and data descriptor records may
2025 be conditionally present. The LZMA Compressed Data Segment
2026 will consist of an LZMA Properties Header followed by the
2027 LZMA Compressed Data as shown:
2029 [LZMA properties header for file 1]
2030 [LZMA compressed data for file 1]
2032 The LZMA Compressed Data will be stored as provided by the
2033 LZMA compression library. Compressed size, uncompressed
2034 size and other file characteristics about the file being
2035 compressed must be stored in standard ZIP storage format.
2037 The LZMA Properties Header will store specific data required to
2038 decompress the LZMA compressed Data. This data is set by the
2039 LZMA compression engine using the function WriteCoderProperties()
2040 as documented within the LZMA SDK.
2042 Storage fields for the property information within the LZMA
2043 Properties Header are as follows:
2045 LZMA Version Information 2 bytes
2046 LZMA Properties Size 2 bytes
2047 LZMA Properties Data variable, defined by "LZMA Properties Size"
2049 LZMA Version Information - this field identifies which version of
2050 the LZMA SDK was used to compress a file. The first byte will
2051 store the major version number of the LZMA SDK and the second
2052 byte will store the minor number.
2054 LZMA Properties Size - this field defines the size of the remaining
2055 property data. Typically this size should be determined by the
2056 version of the SDK. This size field is included as a convenience
2057 and to help avoid any ambiguity should it arise in the future due
2058 to changes in this compression algorithm.
2060 LZMA Property Data - this variable sized field records the required
2061 values for the decompressor as defined by the LZMA SDK. The
2062 data stored in this field should be obtained using the
2063 WriteCoderProperties() in the version of the SDK defined by
2064 the "LZMA Version Information" field.
2066 The layout of the "LZMA Properties Data" field is a function of the
2067 LZMA compression algorithm. It is possible that this layout may be
2068 changed by the author over time. The data layout in version 4.32
2069 of the LZMA SDK defines a 5 byte array that uses 4 bytes to store
2070 the dictionary size in little-endian order. This is preceded by a
2071 single packed byte as the first element of the array that contains
2072 the following fields:
2078 Refer to the LZMA documentation for a more detailed explanation of
2081 Data compressed with method 14, LZMA, may include an end-of-stream
2082 (EOS) marker ending the compressed data stream. This marker is not
2083 required, but its use is highly recommended to facilitate processing
2084 and implementers should include the EOS marker whenever possible.
2085 When the EOS marker is used, general purpose bit 1 must be set. If
2086 general purpose bit 1 is not set, the EOS marker is not present.
2091 Information describing the use of compression method 97 is
2092 provided by WinZIP International, LLC. This method relies on the
2093 open source WavPack audio compression utility developed by David Bryant.
2094 Information on WavPack is available at www.wavpack.com. Please consult
2095 with the author of this algorithm for information on terms and
2096 restrictions on use.
2098 WavPack data for a file begins immediately after the end of the
2099 local header data. This data is the output from WavPack compression
2100 routines. Within the ZIP file, the use of WavPack compression is
2101 indicated by setting the compression method field to a value of 97
2102 in both the local header and the central directory header. The Version
2103 needed to extract and version made by fields use the same values as are
2104 used for data compressed using the Deflate algorithm.
2106 An implementation note for storing digital sample data when using
2107 WavPack compression within ZIP files is that all of the bytes of
2108 the sample data should be compressed. This includes any unused
2109 bits up to the byte boundary. An example is a 2 byte sample that
2110 uses only 12 bits for the sample data with 4 unused bits. If only
2111 12 bits are passed as the sample size to the WavPack routines, the 4
2112 unused bits will be set to 0 on extraction regardless of their original
2113 state. To avoid this, the full 16 bits of the sample data size
2119 PPMd is a data compression algorithm developed by Dmitry Shkarin
2120 which includes a carryless rangecoder developed by Dmitry Subbotin.
2121 This algorithm is based on predictive phrase matching on multiple
2122 order contexts. Information and source code for this algorithm
2123 can be found on the internet. Consult with the author of this
2124 algorithm for information on terms or restrictions on use.
2126 Support for PPMd within the ZIP format currently is provided only
2127 for version I, revision 1 of the algorithm. Storage requirements
2128 for using this algorithm are as follows:
2130 Parameters needed to control the algorithm are stored in the two
2131 bytes immediately preceding the compressed data. These bytes are
2132 used to store the following fields:
2134 Model order - sets the maximum model order, default is 8, possible
2135 values are from 2 to 16 inclusive
2137 Sub-allocator size - sets the size of sub-allocator in MB, default is 50,
2138 possible values are from 1MB to 256MB inclusive
2140 Model restoration method - sets the method used to restart context
2141 model at memory insufficiency, values are:
2143 0 - restarts model from scratch - default
2144 1 - cut off model - decreases performance by as much as 2x
2145 2 - freeze context tree - not recommended
2147 An example for packing these fields into the 2 byte storage field is
2148 illustrated below. These values are stored in Intel low-byte/high-byte
2151 wPPMd = (Model order - 1) +
2152 ((Sub-allocator size - 1) << 4) +
2153 (Model restoration method << 12)
2156 VII. Traditional PKWARE Encryption
2157 ----------------------------------
2159 The following information discusses the decryption steps
2160 required to support traditional PKWARE encryption. This
2161 form of encryption is considered weak by today's standards
2162 and its use is recommended only for situations with
2163 low security needs or for compatibility with older .ZIP
2169 PKWARE is grateful to Mr. Roger Schlafly for his expert contribution
2170 towards the development of PKWARE's traditional encryption.
2172 PKZIP encrypts the compressed data stream. Encrypted files must
2173 be decrypted before they can be extracted.
2175 Each encrypted file has an extra 12 bytes stored at the start of
2176 the data area defining the encryption header for that file. The
2177 encryption header is originally set to random values, and then
2178 itself encrypted, using three, 32-bit keys. The key values are
2179 initialized using the supplied encryption password. After each byte
2180 is encrypted, the keys are then updated using pseudo-random number
2181 generation techniques in combination with the same CRC-32 algorithm
2182 used in PKZIP and described elsewhere in this document.
2184 The following is the basic steps required to decrypt a file:
2186 1) Initialize the three 32-bit keys with the password.
2187 2) Read and decrypt the 12-byte encryption header, further
2188 initializing the encryption keys.
2189 3) Read and decrypt the compressed data stream using the
2192 Step 1 - Initializing the encryption keys
2193 -----------------------------------------
2199 loop for i <- 0 to length(password)-1
2200 update_keys(password(i))
2203 Where update_keys() is defined as:
2206 Key(0) <- crc32(key(0),char)
2207 Key(1) <- Key(1) + (Key(0) & 000000ffH)
2208 Key(1) <- Key(1) * 134775813 + 1
2209 Key(2) <- crc32(key(2),key(1) >> 24)
2212 Where crc32(old_crc,char) is a routine that given a CRC value and a
2213 character, returns an updated CRC value after applying the CRC-32
2214 algorithm described elsewhere in this document.
2216 Step 2 - Decrypting the encryption header
2217 -----------------------------------------
2219 The purpose of this step is to further initialize the encryption
2220 keys, based on random data, to render a plaintext attack on the
2223 Read the 12-byte encryption header into Buffer, in locations
2224 Buffer(0) thru Buffer(11).
2226 loop for i <- 0 to 11
2227 C <- buffer(i) ^ decrypt_byte()
2232 Where decrypt_byte() is defined as:
2234 unsigned char decrypt_byte()
2235 local unsigned short temp
2237 decrypt_byte <- (temp * (temp ^ 1)) >> 8
2240 After the header is decrypted, the last 1 or 2 bytes in Buffer
2241 should be the high-order word/byte of the CRC for the file being
2242 decrypted, stored in Intel low-byte/high-byte order. Versions of
2243 PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
2244 used on versions after 2.0. This can be used to test if the password
2245 supplied is correct or not.
2247 Step 3 - Decrypting the compressed data stream
2248 ----------------------------------------------
2250 The compressed data stream can be decrypted as follows:
2253 read a character into C
2254 Temp <- C ^ decrypt_byte()
2260 VIII. Strong Encryption Specification
2261 -------------------------------------
2263 The Strong Encryption technology defined in this specification is
2264 covered under a pending patent application. The use or implementation
2265 in a product of certain technological aspects set forth in the current
2266 APPNOTE, including those with regard to strong encryption, patching,
2267 or extended tape operations requires a license from PKWARE. Portions
2268 of this Strong Encryption technology are available for use at no charge.
2269 Contact PKWARE for licensing terms and conditions. Refer to section II
2270 of this APPNOTE (Contacting PKWARE) for information on how to
2273 Version 5.x of this specification introduced support for strong
2274 encryption algorithms. These algorithms can be used with either
2275 a password or an X.509v3 digital certificate to encrypt each file.
2276 This format specification supports either password or certificate
2277 based encryption to meet the security needs of today, to enable
2278 interoperability between users within both PKI and non-PKI
2279 environments, and to ensure interoperability between different
2280 computing platforms that are running a ZIP program.
2282 Password based encryption is the most common form of encryption
2283 people are familiar with. However, inherent weaknesses with
2284 passwords (e.g. susceptibility to dictionary/brute force attack)
2285 as well as password management and support issues make certificate
2286 based encryption a more secure and scalable option. Industry
2287 efforts and support are defining and moving towards more advanced
2288 security solutions built around X.509v3 digital certificates and
2289 Public Key Infrastructures(PKI) because of the greater scalability,
2290 administrative options, and more robust security over traditional
2291 password based encryption.
2293 Most standard encryption algorithms are supported with this
2294 specification. Reference implementations for many of these
2295 algorithms are available from either commercial or open source
2296 distributors. Readily available cryptographic toolkits make
2297 implementation of the encryption features straight-forward.
2298 This document is not intended to provide a treatise on data
2299 encryption principles or theory. Its purpose is to document the
2300 data structures required for implementing interoperable data
2301 encryption within the .ZIP format. It is strongly recommended that
2302 you have a good understanding of data encryption before reading
2305 The algorithms introduced in Version 5.0 of this specification
2308 RC2 40 bit, 64 bit, and 128 bit
2309 RC4 40 bit, 64 bit, and 128 bit
2311 3DES 112 bit and 168 bit
2313 Version 5.1 adds support for the following:
2315 AES 128 bit, 192 bit, and 256 bit
2318 Version 6.1 introduces encryption data changes to support
2319 interoperability with Smartcard and USB Token certificate storage
2320 methods which do not support the OAEP strengthening standard.
2322 Version 6.2 introduces support for encrypting metadata by compressing
2323 and encrypting the central directory data structure to reduce information
2324 leakage. Information leakage can occur in legacy ZIP applications
2325 through exposure of information about a file even though that file is
2326 stored encrypted. The information exposed consists of file
2327 characteristics stored within the records and fields defined by this
2328 specification. This includes data such as a files name, its original
2329 size, timestamp and CRC32 value.
2331 Version 6.3 introduces support for encrypting data using the Blowfish
2332 and Twofish algorithms. These are symmetric block ciphers developed
2333 by Bruce Schneier. Blowfish supports using a variable length key from
2334 32 to 448 bits. Block size is 64 bits. Implementations should use 16
2335 rounds and the only mode supported within ZIP files is CBC. Twofish
2336 supports key sizes 128, 192 and 256 bits. Block size is 128 bits.
2337 Implementations should use 16 rounds and the only mode supported within
2338 ZIP files is CBC. Information and source code for both Blowfish and
2339 Twofish algorithms can be found on the internet. Consult with the author
2340 of these algorithms for information on terms or restrictions on use.
2342 Central Directory Encryption provides greater protection against
2343 information leakage by encrypting the Central Directory structure and
2344 by masking key values that are replicated in the unencrypted Local
2345 Header. ZIP compatible programs that cannot interpret an encrypted
2346 Central Directory structure cannot rely on the data in the corresponding
2347 Local Header for decompression information.
2349 Extra Field records that may contain information about a file that should
2350 not be exposed should not be stored in the Local Header and should only
2351 be written to the Central Directory where they can be encrypted. This
2352 design currently does not support streaming. Information in the End of
2353 Central Directory record, the Zip64 End of Central Directory Locator,
2354 and the Zip64 End of Central Directory records are not encrypted. Access
2355 to view data on files within a ZIP file with an encrypted Central Directory
2356 requires the appropriate password or private key for decryption prior to
2357 viewing any files, or any information about the files, in the archive.
2359 Older ZIP compatible programs not familiar with the Central Directory
2360 Encryption feature will no longer be able to recognize the Central
2361 Directory and may assume the ZIP file is corrupt. Programs that
2362 attempt streaming access using Local Headers will see invalid
2363 information for each file. Central Directory Encryption need not be
2364 used for every ZIP file. Its use is recommended for greater security.
2365 ZIP files not using Central Directory Encryption should operate as
2368 This strong encryption feature specification is intended to provide for
2369 scalable, cross-platform encryption needs ranging from simple password
2370 encryption to authenticated public/private key encryption.
2372 Encryption provides data confidentiality and privacy. It is
2373 recommended that you combine X.509 digital signing with encryption
2374 to add authentication and non-repudiation.
2377 Single Password Symmetric Encryption Method:
2378 -------------------------------------------
2380 The Single Password Symmetric Encryption Method using strong
2381 encryption algorithms operates similarly to the traditional
2382 PKWARE encryption defined in this format. Additional data
2383 structures are added to support the processing needs of the
2386 The Strong Encryption data structures are:
2388 1. General Purpose Bits - Bits 0 and 6 of the General Purpose bit
2389 flag in both local and central header records. Both bits set
2390 indicates strong encryption. Bit 13, when set indicates the Central
2391 Directory is encrypted and that selected fields in the Local Header
2392 are masked to hide their actual value.
2395 2. Extra Field 0x0017 in central header only.
2397 Fields to consider in this record are:
2399 Format - the data format identifier for this record. The only
2400 value allowed at this time is the integer value 2.
2402 AlgId - integer identifier of the encryption algorithm from the
2406 0x6602 - RC2 (version needed to extract < 5.2)
2412 0x6702 - RC2 (version needed to extract >= 5.2)
2416 0xFFFF - Unknown algorithm
2418 Bitlen - Explicit bit length of key
2422 Flags - Processing flags needed for decryption
2424 0x0001 - Password is required to decrypt
2425 0x0002 - Certificates only
2426 0x0003 - Password or certificate required to decrypt
2428 Values > 0x0003 reserved for certificate processing
2431 3. Decryption header record preceding compressed file data.
2435 Value Size Description
2436 ----- ---- -----------
2437 IVSize 2 bytes Size of initialization vector (IV)
2438 IVData IVSize Initialization vector for this file
2439 Size 4 bytes Size of remaining decryption header data
2440 Format 2 bytes Format definition for this record
2441 AlgID 2 bytes Encryption algorithm identifier
2442 Bitlen 2 bytes Bit length of encryption key
2443 Flags 2 bytes Processing flags
2444 ErdSize 2 bytes Size of Encrypted Random Data
2445 ErdData ErdSize Encrypted Random Data
2446 Reserved1 4 bytes Reserved certificate processing data
2447 Reserved2 (var) Reserved for certificate processing data
2448 VSize 2 bytes Size of password validation data
2449 VData VSize-4 Password validation data
2450 VCRC32 4 bytes Standard ZIP CRC32 of password validation data
2452 IVData - The size of the IV should match the algorithm block size.
2453 The IVData can be completely random data. If the size of
2454 the randomly generated data does not match the block size
2455 it should be complemented with zero's or truncated as
2456 necessary. If IVSize is 0,then IV = CRC32 + Uncompressed
2457 File Size (as a 64 bit little-endian, unsigned integer value).
2459 Format - the data format identifier for this record. The only
2460 value allowed at this time is the integer value 3.
2462 AlgId - integer identifier of the encryption algorithm from the
2466 0x6602 - RC2 (version needed to extract < 5.2)
2472 0x6702 - RC2 (version needed to extract >= 5.2)
2476 0xFFFF - Unknown algorithm
2478 Bitlen - Explicit bit length of key
2482 Flags - Processing flags needed for decryption
2484 0x0001 - Password is required to decrypt
2485 0x0002 - Certificates only
2486 0x0003 - Password or certificate required to decrypt
2488 Values > 0x0003 reserved for certificate processing
2490 ErdData - Encrypted random data is used to store random data that
2491 is used to generate a file session key for encrypting
2492 each file. SHA1 is used to calculate hash data used to
2493 derive keys. File session keys are derived from a master
2494 session key generated from the user-supplied password.
2495 If the Flags field in the decryption header contains
2496 the value 0x4000, then the ErdData field must be
2497 decrypted using 3DES. If the value 0x4000 is not set,
2498 then the ErdData field must be decrypted using AlgId.
2501 Reserved1 - Reserved for certificate processing, if value is
2502 zero, then Reserved2 data is absent. See the explanation
2503 under the Certificate Processing Method for details on
2504 this data structure.
2506 Reserved2 - If present, the size of the Reserved2 data structure
2507 is located by skipping the first 4 bytes of this field
2508 and using the next 2 bytes as the remaining size. See
2509 the explanation under the Certificate Processing Method
2510 for details on this data structure.
2512 VSize - This size value will always include the 4 bytes of the
2513 VCRC32 data and will be greater than 4 bytes.
2515 VData - Random data for password validation. This data is VSize
2516 in length and VSize must be a multiple of the encryption
2517 block size. VCRC32 is a checksum value of VData.
2518 VData and VCRC32 are stored encrypted and start the
2519 stream of encrypted data for a file.
2524 Strong Encryption is always applied to a file after compression. The
2525 block oriented algorithms all operate in Cypher Block Chaining (CBC)
2526 mode. The block size used for AES encryption is 16. All other block
2527 algorithms use a block size of 8. Two ID's are defined for RC2 to
2528 account for a discrepancy found in the implementation of the RC2
2529 algorithm in the cryptographic library on Windows XP SP1 and all
2530 earlier versions of Windows. It is recommended that zero length files
2531 not be encrypted, however programs should be prepared to extract them
2532 if they are found within a ZIP file.
2534 A pseudo-code representation of the encryption process is as follows:
2536 Password = GetUserPassword()
2537 MasterSessionKey = DeriveKey(SHA1(Password))
2538 RD = CryptographicStrengthRandomData()
2540 IV = CryptographicStrengthRandomData()
2541 VData = CryptographicStrengthRandomData()
2542 VCRC32 = CRC32(VData)
2543 FileSessionKey = DeriveKey(SHA1(IV + RD)
2544 ErdData = Encrypt(RD,MasterSessionKey,IV)
2545 Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)
2548 The function names and parameter requirements will depend on
2549 the choice of the cryptographic toolkit selected. Almost any
2550 toolkit supporting the reference implementations for each
2551 algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft
2552 CryptoAPI libraries are all known to work well.
2555 Single Password - Central Directory Encryption:
2556 -----------------------------------------------
2558 Central Directory Encryption is achieved within the .ZIP format by
2559 encrypting the Central Directory structure. This encapsulates the metadata
2560 most often used for processing .ZIP files. Additional metadata is stored for
2561 redundancy in the Local Header for each file. The process of concealing
2562 metadata by encrypting the Central Directory does not protect the data within
2563 the Local Header. To avoid information leakage from the exposed metadata
2564 in the Local Header, the fields containing information about a file are masked.
2568 Masking replaces the true content of the fields for a file in the Local
2569 Header with false information. When masked, the Local Header is not
2570 suitable for streaming access and the options for data recovery of damaged
2571 archives is reduced. Extra Data fields that may contain confidential
2572 data should not be stored within the Local Header. The value set into
2573 the Version needed to extract field should be the correct value needed to
2574 extract the file without regard to Central Directory Encryption. The fields
2575 within the Local Header targeted for masking when the Central Directory is
2578 Field Name Mask Value
2579 ------------------ ---------------------------
2580 compression method 0
2581 last mod file time 0
2582 last mod file date 0
2586 file name (variable size) Base 16 value from the
2587 range 1 - 0xFFFFFFFFFFFFFFFF
2588 represented as a string whose
2589 size will be set into the
2590 file name length field
2592 The Base 16 value assigned as a masked file name is simply a sequentially
2593 incremented value for each file starting with 1 for the first file.
2594 Modifications to a ZIP file may cause different values to be stored for
2595 each file. For compatibility, the file name field in the Local Header
2596 should never be left blank. As of Version 6.2 of this specification,
2597 the Compression Method and Compressed Size fields are not yet masked.
2598 Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format
2599 should not be masked.
2601 Encrypting the Central Directory:
2603 Encryption of the Central Directory does not include encryption of the
2604 Central Directory Signature data, the Zip64 End of Central Directory
2605 record, the Zip64 End of Central Directory Locator, or the End
2606 of Central Directory record. The ZIP file comment data is never
2609 Before encrypting the Central Directory, it may optionally be compressed.
2610 Compression is not required, but for storage efficiency it is assumed
2611 this structure will be compressed before encrypting. Similarly, this
2612 specification supports compressing the Central Directory without
2613 requiring that it also be encrypted. Early implementations of this
2614 feature will assume the encryption method applied to files matches the
2615 encryption applied to the Central Directory.
2617 Encryption of the Central Directory is done in a manner similar to
2618 that of file encryption. The encrypted data is preceded by a
2619 decryption header. The decryption header is known as the Archive
2620 Decryption Header. The fields of this record are identical to
2621 the decryption header preceding each encrypted file. The location
2622 of the Archive Decryption Header is determined by the value in the
2623 Start of the Central Directory field in the Zip64 End of Central
2624 Directory record. When the Central Directory is encrypted, the
2625 Zip64 End of Central Directory record will always be present.
2627 The layout of the Zip64 End of Central Directory record for all
2628 versions starting with 6.2 of this specification will follow the
2629 Version 2 format. The Version 2 format is as follows:
2631 The leading fixed size fields within the Version 1 format for this
2632 record remain unchanged. The record signature for both Version 1
2633 and Version 2 will be 0x06064b50. Immediately following the last
2634 byte of the field known as the Offset of Start of Central
2635 Directory With Respect to the Starting Disk Number will begin the
2636 new fields defining Version 2 of this record.
2638 New fields for Version 2:
2640 Note: all fields stored in Intel low-byte/high-byte order.
2642 Value Size Description
2643 ----- ---- -----------
2644 Compression Method 2 bytes Method used to compress the
2646 Compressed Size 8 bytes Size of the compressed data
2647 Original Size 8 bytes Original uncompressed size
2648 AlgId 2 bytes Encryption algorithm ID
2649 BitLen 2 bytes Encryption key length
2650 Flags 2 bytes Encryption flags
2651 HashID 2 bytes Hash algorithm identifier
2652 Hash Length 2 bytes Length of hash data
2653 Hash Data (variable) Hash data
2655 The Compression Method accepts the same range of values as the
2656 corresponding field in the Central Header.
2658 The Compressed Size and Original Size values will not include the
2659 data of the Central Directory Signature which is compressed or
2662 The AlgId, BitLen, and Flags fields accept the same range of values
2663 the corresponding fields within the 0x0017 record.
2665 Hash ID identifies the algorithm used to hash the Central Directory
2666 data. This data does not have to be hashed, in which case the
2667 values for both the HashID and Hash Length will be 0. Possible
2668 values for HashID are:
2681 When the Central Directory data is signed, the same hash algorithm
2682 used to hash the Central Directory for signing should be used.
2683 This is recommended for processing efficiency, however, it is
2684 permissible for any of the above algorithms to be used independent
2685 of the signing process.
2687 The Hash Data will contain the hash data for the Central Directory.
2688 The length of this data will vary depending on the algorithm used.
2690 The Version Needed to Extract should be set to 62.
2692 The value for the Total Number of Entries on the Current Disk will
2693 be 0. These records will no longer support random access when
2694 encrypting the Central Directory.
2696 When the Central Directory is compressed and/or encrypted, the
2697 End of Central Directory record will store the value 0xFFFFFFFF
2698 as the value for the Total Number of Entries in the Central
2699 Directory. The value stored in the Total Number of Entries in
2700 the Central Directory on this Disk field will be 0. The actual
2701 values will be stored in the equivalent fields of the Zip64
2702 End of Central Directory record.
2704 Decrypting and decompressing the Central Directory is accomplished
2705 in the same manner as decrypting and decompressing a file.
2707 Certificate Processing Method:
2708 -----------------------------
2710 The Certificate Processing Method of for ZIP file encryption
2711 defines the following additional data fields:
2713 1. Certificate Flag Values
2715 Additional processing flags that can be present in the Flags field of both
2716 the 0x0017 field of the central directory Extra Field and the Decryption
2717 header record preceding compressed file data are:
2719 0x0007 - reserved for future use
2720 0x000F - reserved for future use
2721 0x0100 - Indicates non-OAEP key wrapping was used. If this
2722 this field is set, the version needed to extract must
2723 be at least 61. This means OAEP key wrapping is not
2724 used when generating a Master Session Key using
2726 0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the
2727 same algorithm used for encrypting the file contents.
2728 0x8000 - reserved for future use
2731 2. CertData - Extra Field 0x0017 record certificate data structure
2733 The data structure used to store certificate data within the section
2734 of the Extra Field defined by the CertData field of the 0x0017
2735 record are as shown:
2737 Value Size Description
2738 ----- ---- -----------
2739 RCount 4 bytes Number of recipients.
2740 HashAlg 2 bytes Hash algorithm identifier
2741 HSize 2 bytes Hash size
2742 SRList (var) Simple list of recipients hashed public keys
2745 RCount This defines the number intended recipients whose
2746 public keys were used for encryption. This identifies
2747 the number of elements in the SRList.
2749 HashAlg This defines the hash algorithm used to calculate
2750 the public key hash of each public key used
2751 for encryption. This field currently supports
2752 only the following value for SHA-1
2756 HSize This defines the size of a hashed public key.
2758 SRList This is a variable length list of the hashed
2759 public keys for each intended recipient. Each
2760 element in this list is HSize. The total size of
2761 SRList is determined using RCount * HSize.
2764 3. Reserved1 - Certificate Decryption Header Reserved1 Data:
2766 Value Size Description
2767 ----- ---- -----------
2768 RCount 4 bytes Number of recipients.
2770 RCount This defines the number intended recipients whose
2771 public keys were used for encryption. This defines
2772 the number of elements in the REList field defined below.
2775 4. Reserved2 - Certificate Decryption Header Reserved2 Data Structures:
2778 Value Size Description
2779 ----- ---- -----------
2780 HashAlg 2 bytes Hash algorithm identifier
2781 HSize 2 bytes Hash size
2782 REList (var) List of recipient data elements
2785 HashAlg This defines the hash algorithm used to calculate
2786 the public key hash of each public key used
2787 for encryption. This field currently supports
2788 only the following value for SHA-1
2792 HSize This defines the size of a hashed public key
2795 REList This is a variable length of list of recipient data.
2796 Each element in this list consists of a Recipient
2797 Element data structure as follows:
2800 Recipient Element (REList) Data Structure:
2802 Value Size Description
2803 ----- ---- -----------
2804 RESize 2 bytes Size of REHData + REKData
2805 REHData HSize Hash of recipients public key
2806 REKData (var) Simple key blob
2809 RESize This defines the size of an individual REList
2810 element. This value is the combined size of the
2811 REHData field + REKData field. REHData is defined by
2812 HSize. REKData is variable and can be calculated
2813 for each REList element using RESize and HSize.
2815 REHData Hashed public key for this recipient.
2817 REKData Simple Key Blob. The format of this data structure
2818 is identical to that defined in the Microsoft
2819 CryptoAPI and generated using the CryptExportKey()
2820 function. The version of the Simple Key Blob
2821 supported at this time is 0x02 as defined by
2824 Certificate Processing - Central Directory Encryption:
2825 ------------------------------------------------------
2827 Central Directory Encryption using Digital Certificates will
2828 operate in a manner similar to that of Single Password Central
2829 Directory Encryption. This record will only be present when there
2830 is data to place into it. Currently, data is placed into this
2831 record when digital certificates are used for either encrypting
2832 or signing the files within a ZIP file. When only password
2833 encryption is used with no certificate encryption or digital
2834 signing, this record is not currently needed. When present, this
2835 record will appear before the start of the actual Central Directory
2836 data structure and will be located immediately after the Archive
2837 Decryption Header if the Central Directory is encrypted.
2839 The Archive Extra Data record will be used to store the following
2840 information. Additional data may be added in future versions.
2844 0x0014 - PKCS#7 Store for X.509 Certificates
2845 0x0016 - X.509 Certificate ID and Signature for central directory
2846 0x0019 - PKCS#7 Encryption Recipient Certificate List
2848 The 0x0014 and 0x0016 Extra Data records that otherwise would be
2849 located in the first record of the Central Directory for digital
2850 certificate processing. When encrypting or compressing the Central
2851 Directory, the 0x0014 and 0x0016 records must be located in the
2852 Archive Extra Data record and they should not remain in the first
2853 Central Directory record. The Archive Extra Data record will also
2854 be used to store the 0x0019 data.
2856 When present, the size of the Archive Extra Data record will be
2857 included in the size of the Central Directory. The data of the
2858 Archive Extra Data record will also be compressed and encrypted
2859 along with the Central Directory data structure.
2861 Certificate Processing Differences:
2863 The Certificate Processing Method of encryption differs from the
2864 Single Password Symmetric Encryption Method as follows. Instead
2865 of using a user-defined password to generate a master session key,
2866 cryptographically random data is used. The key material is then
2867 wrapped using standard key-wrapping techniques. This key material
2868 is wrapped using the public key of each recipient that will need
2869 to decrypt the file using their corresponding private key.
2871 This specification currently assumes digital certificates will follow
2872 the X.509 V3 format for 1024 bit and higher RSA format digital
2873 certificates. Implementation of this Certificate Processing Method
2874 requires supporting logic for key access and management. This logic
2875 is outside the scope of this specification.
2877 OAEP Processing with Certificate-based Encryption:
2879 OAEP stands for Optimal Asymmetric Encryption Padding. It is a
2880 strengthening technique used for small encoded items such as decryption
2881 keys. This is commonly applied in cryptographic key-wrapping techniques
2882 and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification
2883 were designed to support OAEP key-wrapping for certificate-based
2884 decryption keys for additional security.
2886 Support for private keys stored on Smartcards or Tokens introduced
2887 a conflict with this OAEP logic. Most card and token products do
2888 not support the additional strengthening applied to OAEP key-wrapped
2889 data. In order to resolve this conflict, versions 6.1 and above of this
2890 specification will no longer support OAEP when encrypting using
2891 digital certificates.
2893 Versions of PKZIP available during initial development of the
2894 certificate processing method set a value of 61 into the
2895 version needed to extract field for a file. This indicates that
2896 non-OAEP key wrapping is used. This affects certificate encryption
2897 only, and password encryption functions should not be affected by
2898 this value. This means values of 61 may be found on files encrypted
2899 with certificates only, or on files encrypted with both password
2900 encryption and certificate encryption. Files encrypted with both
2901 methods can safely be decrypted using the password methods documented.
2906 In order for the .ZIP file format to remain a viable definition, this
2907 specification should be considered as open for periodic review and
2908 revision. Although this format was originally designed with a
2909 certain level of extensibility, not all changes in technology
2910 (present or future) were or will be necessarily considered in its
2911 design. If your application requires new definitions to the
2912 extensible sections in this format, or if you would like to
2913 submit new data structures, please forward your request to
2914 zipformat@pkware.com. All submissions will be reviewed by the
2915 ZIP File Specification Committee for possible inclusion into
2916 future versions of this specification. Periodic revisions
2917 to this specification will be published to ensure interoperability.
2918 We encourage comments and feedback that may help improve clarity
2921 X. Incorporating PKWARE Proprietary Technology into Your Product
2922 ----------------------------------------------------------------
2924 PKWARE is committed to the interoperability and advancement of the
2925 .ZIP format. PKWARE offers a free license for certain technological
2926 aspects described above under certain restrictions and conditions.
2927 However, the use or implementation in a product of certain technological
2928 aspects set forth in the current APPNOTE, including those with regard to
2929 strong encryption, patching, or extended tape operations requires a
2930 license from PKWARE. Please contact PKWARE with regard to acquiring
2933 XI. Acknowledgements
2934 ---------------------
2936 In addition to the above mentioned contributors to PKZIP and PKUNZIP,
2937 I would like to extend special thanks to Robert Mahoney for suggesting
2938 the extension .ZIP for this software.
2943 Fiala, Edward R., and Greene, Daniel H., "Data compression with
2944 finite windows", Communications of the ACM, Volume 32, Number 4,
2945 April 1989, pages 490-505.
2947 Held, Gilbert, "Data Compression, Techniques and Applications,
2948 Hardware and Software Considerations", John Wiley & Sons, 1987.
2950 Huffman, D.A., "A method for the construction of minimum-redundancy
2951 codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
2954 Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
2955 Number 10, October 1989, pages 29-37.
2957 Nelson, Mark, "The Data Compression Book", M&T Books, 1991.
2959 Storer, James A., "Data Compression, Methods and Theory",
2960 Computer Science Press, 1988
2962 Welch, Terry, "A Technique for High-Performance Data Compression",
2963 IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
2965 Ziv, J. and Lempel, A., "A universal algorithm for sequential data
2966 compression", Communications of the ACM, Volume 30, Number 6,
2967 June 1987, pages 520-540.
2969 Ziv, J. and Lempel, A., "Compression of individual sequences via
2970 variable-rate coding", IEEE Transactions on Information Theory,
2971 Volume 24, Number 5, September 1978, pages 530-536.
2974 APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions
2975 --------------------------------------------------------------
2977 Field Definition Structure:
2979 a. field length including length 2 bytes
2980 b. field code 2 bytes
2983 Field Code Description
2984 4001 Source type i.e. CLP etc
2985 4002 The text description of the library
2986 4003 The text description of the file
2987 4004 The text description of the member
2988 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC
2989 4007 Database Type Code 1 byte
2990 4008 Database file and fields definition
2991 4009 GZIP file type 2 bytes
2992 400B IFS code page 2 bytes
2993 400C IFS Creation Time 4 bytes
2994 400D IFS Access Time 4 bytes
2995 400E IFS Modification time 4 bytes
2996 005C Length of the records in the file 2 bytes
2997 0068 GZIP two words 8 bytes
2999 APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions
3000 ------------------------------------------------------------
3002 Field Definition Structure:
3004 a. field length including length 2 bytes
3005 b. field code 2 bytes
3008 Field Code Description
3009 0001 File Type 2 bytes
3010 0002 NonVSAM Record Format 1 byte
3012 0004 NonVSAM Block Size 2 bytes Big Endian
3013 0005 Primary Space Allocation 3 bytes Big Endian
3014 0006 Secondary Space Allocation 3 bytes Big Endian
3015 0007 Space Allocation Type1 byte flag
3016 0008 Modification Date Retired with PKZIP 5.0 +
3017 0009 Expiration Date Retired with PKZIP 5.0 +
3018 000A PDS Directory Block Allocation 3 bytes Big Endian binary value
3019 000B NonVSAM Volume List variable
3020 000C UNIT Reference Retired with PKZIP 5.0 +
3021 000D DF/SMS Management Class 8 bytes EBCDIC Text Value
3022 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value
3023 000F DF/SMS Data Class 8 bytes EBCDIC Text Value
3024 0010 PDS/PDSE Member Info. 30 bytes
3025 0011 VSAM sub-filetype 2 bytes
3026 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)"
3027 0013 VSAM Cluster Name Retired with PKZIP 5.0 +
3028 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)"
3029 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks
3030 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks
3031 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks
3032 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks
3033 0019 VSAM Data Name 1-44 bytes EBCDIC text string
3034 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string
3035 001B VSAM Catalog Name 1-44 bytes EBCDIC text string
3036 001C VSAM Data Space Type 9 bytes EBCDIC text string
3037 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified
3038 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified
3039 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs
3040 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified
3041 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified
3042 0022 VSAM Erase Flag 1 byte flag
3043 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified
3044 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified
3045 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs
3046 0026 VSAM Ordered Flag 1 byte flag
3047 0027 VSAM REUSE Flag 1 byte flag
3048 0028 VSAM SPANNED Flag 1 byte flag
3049 0029 VSAM Recovery Flag 1 byte flag
3050 002A VSAM WRITECHK Flag 1 byte flag
3051 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y"
3052 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y"
3053 002D VSAM Index Space Type 9 bytes EBCDIC text string
3054 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified
3055 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified
3056 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified
3057 0031 VSAM Index IMBED 1 byte flag
3058 0032 VSAM Index Ordered Flag 1 byte flag
3059 0033 VSAM REPLICATE Flag 1 byte flag
3060 0034 VSAM Index REUSE Flag 1 byte flag
3061 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 +
3062 0036 VSAM Owner 8 bytes EBCDIC text string
3063 0037 VSAM Index Owner 8 bytes EBCDIC text string
3096 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian
3097 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian
3098 005A PDS LMOD EP Rec # 4 bytes Big Endian
3100 005C Max Length of records 2 bytes Big Endian
3101 005D PDSE Flag 1 byte flag
3109 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd"
3110 0066 Date Created 4 bytes Packed Hex "yyyymmdd"
3111 0068 GZIP two words 8 bytes
3112 0071 Extended NOTE Location 12 bytes Big Endian
3113 0072 Archive device UNIT 6 bytes EBCDIC
3114 0073 Archive 1st Volume 6 bytes EBCDIC
3115 0074 Archive 1st VOL File Seq# 2 bytes Binary
3117 APPENDIX C - Zip64 Extensible Data Sector Mappings (EFS)
3118 --------------------------------------------------------
3122 The following is the general layout of the attributes for the
3123 ZIP 64 "extra" block for extended tape operations. Portions of
3124 this extended tape processing technology is covered under a
3125 pending patent application. The use or implementation in a
3126 product of certain technological aspects set forth in the
3127 current APPNOTE, including those with regard to strong encryption,
3128 patching or extended tape operations, requires a license from
3129 PKWARE. Please contact PKWARE with regard to acquiring a license.
3132 Note: some fields stored in Big Endian format. All text is
3133 in EBCDIC format unless otherwise specified.
3135 Value Size Description
3136 ----- ---- -----------
3137 (Z390) 0x0065 2 bytes Tag for this "extra" block type
3138 Size 4 bytes Size for the following data block
3139 Tag 4 bytes EBCDIC "Z390"
3140 Length71 2 bytes Big Endian
3141 Subcode71 2 bytes Enote type code
3143 Length72 2 bytes Big Endian
3144 Subcode72 2 bytes Unit type code
3146 Length73 2 bytes Big Endian
3147 Subcode73 2 bytes Volume1 type code
3148 FirstVol 1 byte Volume
3149 Length74 2 bytes Big Endian
3150 Subcode74 2 bytes FirstVol file sequence
3151 FileSeq 2 bytes Sequence
3153 APPENDIX D - Language Encoding (EFS)
3154 ------------------------------------
3156 The ZIP format has historically supported only the original IBM PC character
3157 encoding set, commonly referred to as IBM Code Page 437. This limits storing
3158 file name characters to only those within the original MS-DOS range of values
3159 and does not properly support file names in other character encodings, or
3160 languages. To address this limitation, this specification will support the
3163 If general purpose bit 11 is unset, the file name and comment should conform
3164 to the original ZIP character encoding. If general purpose bit 11 is set, the
3165 filename and comment must support The Unicode Standard, Version 4.1.0 or
3166 greater using the character encoding form defined by the UTF-8 storage
3167 specification. The Unicode Standard is published by the The Unicode
3168 Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
3169 is expected to not include a byte order mark (BOM).
3171 Applications may choose to supplement this file name storage through the use
3172 of the 0x0008 Extra Field. Storage for this optional field is currently
3173 undefined, however it will be used to allow storing extended information
3174 on source or target encoding that may further assist applications with file
3175 name, or file content encoding tasks. Please contact PKWARE with any
3176 requirements on how this field should be used.
3178 The 0x0008 Extra Field storage may be used with either setting for general
3179 purpose bit 11. Examples of the intended usage for this field is to store
3180 whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other
3181 commonly used character encoding (code page) designations can be indicated
3182 through this field. Formalized values for use of the 0x0008 record remain
3183 undefined at this time. The definition for the layout of the 0x0008 field
3184 will be published when available. Use of the 0x0008 Extra Field provides
3185 for storing data within a ZIP file in an encoding other than IBM Code
3188 General purpose bit 11 will not imply any encoding of file content or
3189 password. Values defining character encoding for file content or
3190 password must be stored within the 0x0008 Extended Language Encoding
3193 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records
3194 that can be used to store UTF-8 file name and file comment fields. These
3195 records can be used for cases when the general purpose bit 11 method
3196 for storing UTF-8 data in the standard file name and comment fields is
3197 not desirable. A common case for this alternate method is if backward
3198 compatibility with older programs is required.
3200 Definitions for the record structure of these fields are included above
3201 in the section on 3rd party mappings for "extra field" records. These
3202 records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment
3203 Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field).
3205 The choice of which storage method to use when writing a ZIP file is left
3206 to the implementation. Developers should expect that a ZIP file may
3207 contain either method and should provide support for reading data in
3208 either format. Use of general purpose bit 11 reduces storage requirements
3209 for file name data by not requiring additional "extra field" data for
3210 each file, but can result in older ZIP programs not being able to extract
3211 files. Use of the 0x6375 and 0x7075 records will result in a ZIP file
3212 that should always be readable by older ZIP programs, but requires more
3213 storage per file to write file name and/or file comment fields.