1 File: APPNOTE.TXT - .ZIP File Format Specification
\r
3 Revised: September 28, 2007
\r
4 Copyright (c) 1989 - 2007 PKWARE Inc., All Rights Reserved.
\r
6 The use of certain technological aspects disclosed in the current
\r
7 APPNOTE is available pursuant to the below section entitled
\r
8 "Incorporating PKWARE Proprietary Technology into Your Product".
\r
13 This specification is intended to define a cross-platform,
\r
14 interoperable file storage and transfer format. Since its
\r
15 first publication in 1989, PKWARE has remained committed to
\r
16 ensuring the interoperability of the .ZIP file format through
\r
17 publication and maintenance of this specification. We trust that
\r
18 all .ZIP compatible vendors and application developers that have
\r
19 adopted and benefited from this format will share and support
\r
20 this commitment to interoperability.
\r
22 II. Contacting PKWARE
\r
23 ---------------------
\r
26 648 N. Plankinton Avenue, Suite 220
\r
30 zipformat@pkware.com
\r
35 Although PKWARE will attempt to supply current and accurate
\r
36 information relating to its file formats, algorithms, and the
\r
37 subject programs, the possibility of error or omission cannot
\r
38 be eliminated. PKWARE therefore expressly disclaims any warranty
\r
39 that the information contained in the associated materials relating
\r
40 to the subject programs and/or the format of the files created or
\r
41 accessed by the subject programs and/or the algorithms used by
\r
42 the subject programs, or any other matter, is current, correct or
\r
43 accurate as delivered. Any risk of damage due to any possible
\r
44 inaccurate information is assumed by the user of the information.
\r
45 Furthermore, the information relating to the subject programs
\r
46 and/or the file formats created or accessed by the subject
\r
47 programs and/or the algorithms used by the subject programs is
\r
48 subject to change without notice.
\r
50 If the version of this file is marked as a NOTIFICATION OF CHANGE,
\r
51 the content defines an Early Feature Specification (EFS) change
\r
52 to the .ZIP file format that may be subject to modification prior
\r
53 to publication of the Final Feature Specification (FFS). This
\r
54 document may also contain information on Planned Feature
\r
55 Specifications (PFS) defining recognized future extensions.
\r
60 Version Change Description Date
\r
61 ------- ------------------ ----------
\r
62 5.2 -Single Password Symmetric Encryption 06/02/2003
\r
65 6.1.0 -Smartcard compatibility 01/20/2004
\r
66 -Documentation on certificate storage
\r
68 6.2.0 -Introduction of Central Directory 04/26/2004
\r
69 Encryption for encrypting metadata
\r
70 -Added OS/X to Version Made By values
\r
72 6.2.1 -Added Extra Field placeholder for 04/01/2005
\r
73 POSZIP using ID 0x4690
\r
75 -Clarified size field on
\r
76 "zip64 end of central directory record"
\r
78 6.2.2 -Documented Final Feature Specification 01/06/2006
\r
79 for Strong Encryption
\r
81 -Clarifications and typographical
\r
84 6.3.0 -Added tape positioning storage 09/29/2006
\r
87 -Expanded list of supported hash algorithms
\r
89 -Expanded list of supported compression
\r
92 -Expanded list of supported encryption
\r
95 -Added option for Unicode filename
\r
98 -Clarifications for consistent use
\r
99 of Data Descriptor records
\r
101 -Added additional "Extra Field"
\r
104 6.3.1 -Corrected standard hash values for 04/11/2007
\r
107 6.3.2 -Added compression method 97 09/28/2007
\r
109 -Documented InfoZIP "Extra Field"
\r
110 values for UTF-8 file name and
\r
111 file comment storage
\r
113 V. General Format of a .ZIP file
\r
114 --------------------------------
\r
116 Files stored in arbitrary order. Large .ZIP files can span multiple
\r
117 volumes or be split into user-defined segment sizes. All values
\r
118 are stored in little-endian byte order unless otherwise specified.
\r
120 Overall .ZIP file format:
\r
122 [local file header 1]
\r
124 [data descriptor 1]
\r
128 [local file header n]
\r
130 [data descriptor n]
\r
131 [archive decryption header]
\r
132 [archive extra data record]
\r
133 [central directory]
\r
134 [zip64 end of central directory record]
\r
135 [zip64 end of central directory locator]
\r
136 [end of central directory record]
\r
139 A. Local file header:
\r
141 local file header signature 4 bytes (0x04034b50)
\r
142 version needed to extract 2 bytes
\r
143 general purpose bit flag 2 bytes
\r
144 compression method 2 bytes
\r
145 last mod file time 2 bytes
\r
146 last mod file date 2 bytes
\r
148 compressed size 4 bytes
\r
149 uncompressed size 4 bytes
\r
150 file name length 2 bytes
\r
151 extra field length 2 bytes
\r
153 file name (variable size)
\r
154 extra field (variable size)
\r
158 Immediately following the local header for a file
\r
159 is the compressed or stored data for the file.
\r
160 The series of [local file header][file data][data
\r
161 descriptor] repeats for each file in the .ZIP archive.
\r
163 C. Data descriptor:
\r
166 compressed size 4 bytes
\r
167 uncompressed size 4 bytes
\r
169 This descriptor exists only if bit 3 of the general
\r
170 purpose bit flag is set (see below). It is byte aligned
\r
171 and immediately follows the last byte of compressed data.
\r
172 This descriptor is used only when it was not possible to
\r
173 seek in the output .ZIP file, e.g., when the output .ZIP file
\r
174 was standard output or a non-seekable device. For ZIP64(tm) format
\r
175 archives, the compressed and uncompressed sizes are 8 bytes each.
\r
177 When compressing files, compressed and uncompressed sizes
\r
178 should be stored in ZIP64 format (as 8 byte values) when a
\r
179 files size exceeds 0xFFFFFFFF. However ZIP64 format may be
\r
180 used regardless of the size of a file. When extracting, if
\r
181 the zip64 extended information extra field is present for
\r
182 the file the compressed and uncompressed sizes will be 8
\r
185 Although not originally assigned a signature, the value
\r
186 0x08074b50 has commonly been adopted as a signature value
\r
187 for the data descriptor record. Implementers should be
\r
188 aware that ZIP files may be encountered with or without this
\r
189 signature marking data descriptors and should account for
\r
190 either case when reading ZIP files to ensure compatibility.
\r
191 When writing ZIP files, it is recommended to include the
\r
192 signature value marking the data descriptor record. When
\r
193 the signature is used, the fields currently defined for
\r
194 the data descriptor record will immediately follow the
\r
197 An extensible data descriptor will be released in a future
\r
198 version of this APPNOTE. This new record is intended to
\r
199 resolve conflicts with the use of this record going forward,
\r
200 and to provide better support for streamed file processing.
\r
202 When the Central Directory Encryption method is used, the data
\r
203 descriptor record is not required, but may be used. If present,
\r
204 and bit 3 of the general purpose bit field is set to indicate
\r
205 its presence, the values in fields of the data descriptor
\r
206 record should be set to binary zeros.
\r
208 D. Archive decryption header:
\r
210 The Archive Decryption Header is introduced in version 6.2
\r
211 of the ZIP format specification. This record exists in support
\r
212 of the Central Directory Encryption Feature implemented as part of
\r
213 the Strong Encryption Specification as described in this document.
\r
214 When the Central Directory Structure is encrypted, this decryption
\r
215 header will precede the encrypted data segment. The encrypted
\r
216 data segment will consist of the Archive extra data record (if
\r
217 present) and the encrypted Central Directory Structure data.
\r
218 The format of this data record is identical to the Decryption
\r
219 header record preceding compressed file data. If the central
\r
220 directory structure is encrypted, the location of the start of
\r
221 this data record is determined using the Start of Central Directory
\r
222 field in the Zip64 End of Central Directory record. Refer to the
\r
223 section on the Strong Encryption Specification for information
\r
224 on the fields used in the Archive Decryption Header record.
\r
227 E. Archive extra data record:
\r
229 archive extra data signature 4 bytes (0x08064b50)
\r
230 extra field length 4 bytes
\r
231 extra field data (variable size)
\r
233 The Archive Extra Data Record is introduced in version 6.2
\r
234 of the ZIP format specification. This record exists in support
\r
235 of the Central Directory Encryption Feature implemented as part of
\r
236 the Strong Encryption Specification as described in this document.
\r
237 When present, this record immediately precedes the central
\r
238 directory data structure. The size of this data record will be
\r
239 included in the Size of the Central Directory field in the
\r
240 End of Central Directory record. If the central directory structure
\r
241 is compressed, but not encrypted, the location of the start of
\r
242 this data record is determined using the Start of Central Directory
\r
243 field in the Zip64 End of Central Directory record.
\r
246 F. Central directory structure:
\r
253 [digital signature]
\r
257 central file header signature 4 bytes (0x02014b50)
\r
258 version made by 2 bytes
\r
259 version needed to extract 2 bytes
\r
260 general purpose bit flag 2 bytes
\r
261 compression method 2 bytes
\r
262 last mod file time 2 bytes
\r
263 last mod file date 2 bytes
\r
265 compressed size 4 bytes
\r
266 uncompressed size 4 bytes
\r
267 file name length 2 bytes
\r
268 extra field length 2 bytes
\r
269 file comment length 2 bytes
\r
270 disk number start 2 bytes
\r
271 internal file attributes 2 bytes
\r
272 external file attributes 4 bytes
\r
273 relative offset of local header 4 bytes
\r
275 file name (variable size)
\r
276 extra field (variable size)
\r
277 file comment (variable size)
\r
281 header signature 4 bytes (0x05054b50)
\r
282 size of data 2 bytes
\r
283 signature data (variable size)
\r
285 With the introduction of the Central Directory Encryption
\r
286 feature in version 6.2 of this specification, the Central
\r
287 Directory Structure may be stored both compressed and encrypted.
\r
288 Although not required, it is assumed when encrypting the
\r
289 Central Directory Structure, that it will be compressed
\r
290 for greater storage efficiency. Information on the
\r
291 Central Directory Encryption feature can be found in the section
\r
292 describing the Strong Encryption Specification. The Digital
\r
293 Signature record will be neither compressed nor encrypted.
\r
295 G. Zip64 end of central directory record
\r
297 zip64 end of central dir
\r
298 signature 4 bytes (0x06064b50)
\r
299 size of zip64 end of central
\r
300 directory record 8 bytes
\r
301 version made by 2 bytes
\r
302 version needed to extract 2 bytes
\r
303 number of this disk 4 bytes
\r
304 number of the disk with the
\r
305 start of the central directory 4 bytes
\r
306 total number of entries in the
\r
307 central directory on this disk 8 bytes
\r
308 total number of entries in the
\r
309 central directory 8 bytes
\r
310 size of the central directory 8 bytes
\r
311 offset of start of central
\r
312 directory with respect to
\r
313 the starting disk number 8 bytes
\r
314 zip64 extensible data sector (variable size)
\r
316 The value stored into the "size of zip64 end of central
\r
317 directory record" should be the size of the remaining
\r
318 record and should not include the leading 12 bytes.
\r
320 Size = SizeOfFixedFields + SizeOfVariableData - 12.
\r
322 The above record structure defines Version 1 of the
\r
323 zip64 end of central directory record. Version 1 was
\r
324 implemented in versions of this specification preceding
\r
325 6.2 in support of the ZIP64 large file feature. The
\r
326 introduction of the Central Directory Encryption feature
\r
327 implemented in version 6.2 as part of the Strong Encryption
\r
328 Specification defines Version 2 of this record structure.
\r
329 Refer to the section describing the Strong Encryption
\r
330 Specification for details on the version 2 format for
\r
333 Special purpose data may reside in the zip64 extensible data
\r
334 sector field following either a V1 or V2 version of this
\r
335 record. To ensure identification of this special purpose data
\r
336 it must include an identifying header block consisting of the
\r
339 Header ID - 2 bytes
\r
340 Data Size - 4 bytes
\r
342 The Header ID field indicates the type of data that is in the
\r
343 data block that follows.
\r
345 Data Size identifies the number of bytes that follow for this
\r
348 Multiple special purpose data blocks may be present, but each
\r
349 must be preceded by a Header ID and Data Size field. Current
\r
350 mappings of Header ID values supported in this field are as
\r
351 defined in APPENDIX C.
\r
353 H. Zip64 end of central directory locator
\r
355 zip64 end of central dir locator
\r
356 signature 4 bytes (0x07064b50)
\r
357 number of the disk with the
\r
358 start of the zip64 end of
\r
359 central directory 4 bytes
\r
360 relative offset of the zip64
\r
361 end of central directory record 8 bytes
\r
362 total number of disks 4 bytes
\r
364 I. End of central directory record:
\r
366 end of central dir signature 4 bytes (0x06054b50)
\r
367 number of this disk 2 bytes
\r
368 number of the disk with the
\r
369 start of the central directory 2 bytes
\r
370 total number of entries in the
\r
371 central directory on this disk 2 bytes
\r
372 total number of entries in
\r
373 the central directory 2 bytes
\r
374 size of the central directory 4 bytes
\r
375 offset of start of central
\r
376 directory with respect to
\r
377 the starting disk number 4 bytes
\r
378 .ZIP file comment length 2 bytes
\r
379 .ZIP file comment (variable size)
\r
381 J. Explanation of fields:
\r
383 version made by (2 bytes)
\r
385 The upper byte indicates the compatibility of the file
\r
386 attribute information. If the external file attributes
\r
387 are compatible with MS-DOS and can be read by PKZIP for
\r
388 DOS version 2.04g then this value will be zero. If these
\r
389 attributes are not compatible, then this value will
\r
390 identify the host system on which the attributes are
\r
391 compatible. Software can use this information to determine
\r
392 the line record format for text files etc. The current
\r
395 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
\r
396 1 - Amiga 2 - OpenVMS
\r
397 3 - UNIX 4 - VM/CMS
\r
398 5 - Atari ST 6 - OS/2 H.P.F.S.
\r
399 7 - Macintosh 8 - Z-System
\r
400 9 - CP/M 10 - Windows NTFS
\r
401 11 - MVS (OS/390 - Z/OS) 12 - VSE
\r
402 13 - Acorn Risc 14 - VFAT
\r
403 15 - alternate MVS 16 - BeOS
\r
404 17 - Tandem 18 - OS/400
\r
405 19 - OS/X (Darwin) 20 thru 255 - unused
\r
407 The lower byte indicates the ZIP specification version
\r
408 (the version of this document) supported by the software
\r
409 used to encode the file. The value/10 indicates the major
\r
410 version number, and the value mod 10 is the minor version
\r
413 version needed to extract (2 bytes)
\r
415 The minimum supported ZIP specification version needed to
\r
416 extract the file, mapped as above. This value is based on
\r
417 the specific format features a ZIP program must support to
\r
418 be able to extract the file. If multiple features are
\r
419 applied to a file, the minimum version should be set to the
\r
420 feature having the highest value. New features or feature
\r
421 changes affecting the published format specification will be
\r
422 implemented using higher version numbers than the last
\r
423 published value to avoid conflict.
\r
425 Current minimum feature versions are as defined below:
\r
427 1.0 - Default value
\r
428 1.1 - File is a volume label
\r
429 2.0 - File is a folder (directory)
\r
430 2.0 - File is compressed using Deflate compression
\r
431 2.0 - File is encrypted using traditional PKWARE encryption
\r
432 2.1 - File is compressed using Deflate64(tm)
\r
433 2.5 - File is compressed using PKWARE DCL Implode
\r
434 2.7 - File is a patch data set
\r
435 4.5 - File uses ZIP64 format extensions
\r
436 4.6 - File is compressed using BZIP2 compression*
\r
437 5.0 - File is encrypted using DES
\r
438 5.0 - File is encrypted using 3DES
\r
439 5.0 - File is encrypted using original RC2 encryption
\r
440 5.0 - File is encrypted using RC4 encryption
\r
441 5.1 - File is encrypted using AES encryption
\r
442 5.1 - File is encrypted using corrected RC2 encryption**
\r
443 5.2 - File is encrypted using corrected RC2-64 encryption**
\r
444 6.1 - File is encrypted using non-OAEP key wrapping***
\r
445 6.2 - Central directory encryption
\r
446 6.3 - File is compressed using LZMA
\r
447 6.3 - File is compressed using PPMd+
\r
448 6.3 - File is encrypted using Blowfish
\r
449 6.3 - File is encrypted using Twofish
\r
452 * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the
\r
453 version needed to extract for BZIP2 compression to be 50
\r
454 when it should have been 46.
\r
456 ** Refer to the section on Strong Encryption Specification
\r
457 for additional information regarding RC2 corrections.
\r
459 *** Certificate encryption using non-OAEP key wrapping is the
\r
460 intended mode of operation for all versions beginning with 6.1.
\r
461 Support for OAEP key wrapping should only be used for
\r
462 backward compatibility when sending ZIP files to be opened by
\r
463 versions of PKZIP older than 6.1 (5.0 or 6.0).
\r
465 + Files compressed using PPMd should set the version
\r
466 needed to extract field to 6.3, however, not all ZIP
\r
467 programs enforce this and may be unable to decompress
\r
468 data files compressed using PPMd if this value is set.
\r
470 When using ZIP64 extensions, the corresponding value in the
\r
471 zip64 end of central directory record should also be set.
\r
472 This field should be set appropriately to indicate whether
\r
473 Version 1 or Version 2 format is in use.
\r
475 general purpose bit flag: (2 bytes)
\r
477 Bit 0: If set, indicates that the file is encrypted.
\r
479 (For Method 6 - Imploding)
\r
480 Bit 1: If the compression method used was type 6,
\r
481 Imploding, then this bit, if set, indicates
\r
482 an 8K sliding dictionary was used. If clear,
\r
483 then a 4K sliding dictionary was used.
\r
484 Bit 2: If the compression method used was type 6,
\r
485 Imploding, then this bit, if set, indicates
\r
486 3 Shannon-Fano trees were used to encode the
\r
487 sliding dictionary output. If clear, then 2
\r
488 Shannon-Fano trees were used.
\r
490 (For Methods 8 and 9 - Deflating)
\r
492 0 0 Normal (-en) compression option was used.
\r
493 0 1 Maximum (-exx/-ex) compression option was used.
\r
494 1 0 Fast (-ef) compression option was used.
\r
495 1 1 Super Fast (-es) compression option was used.
\r
497 (For Method 14 - LZMA)
\r
498 Bit 1: If the compression method used was type 14,
\r
499 LZMA, then this bit, if set, indicates
\r
500 an end-of-stream (EOS) marker is used to
\r
501 mark the end of the compressed data stream.
\r
502 If clear, then an EOS marker is not present
\r
503 and the compressed data size must be known
\r
506 Note: Bits 1 and 2 are undefined if the compression
\r
507 method is any other.
\r
509 Bit 3: If this bit is set, the fields crc-32, compressed
\r
510 size and uncompressed size are set to zero in the
\r
511 local header. The correct values are put in the
\r
512 data descriptor immediately following the compressed
\r
513 data. (Note: PKZIP version 2.04g for DOS only
\r
514 recognizes this bit for method 8 compression, newer
\r
515 versions of PKZIP recognize this bit for any
\r
516 compression method.)
\r
518 Bit 4: Reserved for use with method 8, for enhanced
\r
521 Bit 5: If this bit is set, this indicates that the file is
\r
522 compressed patched data. (Note: Requires PKZIP
\r
523 version 2.70 or greater)
\r
525 Bit 6: Strong encryption. If this bit is set, you should
\r
526 set the version needed to extract value to at least
\r
527 50 and you must also set bit 0. If AES encryption
\r
528 is used, the version needed to extract value must
\r
531 Bit 7: Currently unused.
\r
533 Bit 8: Currently unused.
\r
535 Bit 9: Currently unused.
\r
537 Bit 10: Currently unused.
\r
539 Bit 11: Language encoding flag (EFS). If this bit is set,
\r
540 the filename and comment fields for this file
\r
541 must be encoded using UTF-8. (see APPENDIX D)
\r
543 Bit 12: Reserved by PKWARE for enhanced compression.
\r
545 Bit 13: Used when encrypting the Central Directory to indicate
\r
546 selected data values in the Local Header are masked to
\r
547 hide their actual values. See the section describing
\r
548 the Strong Encryption Specification for details.
\r
550 Bit 14: Reserved by PKWARE.
\r
552 Bit 15: Reserved by PKWARE.
\r
554 compression method: (2 bytes)
\r
556 (see accompanying documentation for algorithm
\r
559 0 - The file is stored (no compression)
\r
560 1 - The file is Shrunk
\r
561 2 - The file is Reduced with compression factor 1
\r
562 3 - The file is Reduced with compression factor 2
\r
563 4 - The file is Reduced with compression factor 3
\r
564 5 - The file is Reduced with compression factor 4
\r
565 6 - The file is Imploded
\r
566 7 - Reserved for Tokenizing compression algorithm
\r
567 8 - The file is Deflated
\r
568 9 - Enhanced Deflating using Deflate64(tm)
\r
569 10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
\r
570 11 - Reserved by PKWARE
\r
571 12 - File is compressed using BZIP2 algorithm
\r
572 13 - Reserved by PKWARE
\r
574 15 - Reserved by PKWARE
\r
575 16 - Reserved by PKWARE
\r
576 17 - Reserved by PKWARE
\r
577 18 - File is compressed using IBM TERSE (new)
\r
578 19 - IBM LZ77 z Architecture (PFS)
\r
579 97 - WavPack compressed data
\r
580 98 - PPMd version I, Rev 1
\r
582 date and time fields: (2 bytes each)
\r
584 The date and time are encoded in standard MS-DOS format.
\r
585 If input came from standard input, the date and time are
\r
586 those at which compression was started for this data.
\r
587 If encrypting the central directory and general purpose bit
\r
588 flag 13 is set indicating masking, the value stored in the
\r
589 Local Header will be zero.
\r
593 The CRC-32 algorithm was generously contributed by
\r
594 David Schwaderer and can be found in his excellent
\r
595 book "C Programmers Guide to NetBIOS" published by
\r
596 Howard W. Sams & Co. Inc. The 'magic number' for
\r
597 the CRC is 0xdebb20e3. The proper CRC pre and post
\r
598 conditioning is used, meaning that the CRC register
\r
599 is pre-conditioned with all ones (a starting value
\r
600 of 0xffffffff) and the value is post-conditioned by
\r
601 taking the one's complement of the CRC residual.
\r
602 If bit 3 of the general purpose flag is set, this
\r
603 field is set to zero in the local header and the correct
\r
604 value is put in the data descriptor and in the central
\r
605 directory. When encrypting the central directory, if the
\r
606 local header is not in ZIP64 format and general purpose
\r
607 bit flag 13 is set indicating masking, the value stored
\r
608 in the Local Header will be zero.
\r
610 compressed size: (4 bytes)
\r
611 uncompressed size: (4 bytes)
\r
613 The size of the file compressed and uncompressed,
\r
614 respectively. When a decryption header is present it will
\r
615 be placed in front of the file data and the value of the
\r
616 compressed file size will include the bytes of the decryption
\r
617 header. If bit 3 of the general purpose bit flag is set,
\r
618 these fields are set to zero in the local header and the
\r
619 correct values are put in the data descriptor and
\r
620 in the central directory. If an archive is in ZIP64 format
\r
621 and the value in this field is 0xFFFFFFFF, the size will be
\r
622 in the corresponding 8 byte ZIP64 extended information
\r
623 extra field. When encrypting the central directory, if the
\r
624 local header is not in ZIP64 format and general purpose bit
\r
625 flag 13 is set indicating masking, the value stored for the
\r
626 uncompressed size in the Local Header will be zero.
\r
628 file name length: (2 bytes)
\r
629 extra field length: (2 bytes)
\r
630 file comment length: (2 bytes)
\r
632 The length of the file name, extra field, and comment
\r
633 fields respectively. The combined length of any
\r
634 directory record and these three fields should not
\r
635 generally exceed 65,535 bytes. If input came from standard
\r
636 input, the file name length is set to zero.
\r
638 disk number start: (2 bytes)
\r
640 The number of the disk on which this file begins. If an
\r
641 archive is in ZIP64 format and the value in this field is
\r
642 0xFFFF, the size will be in the corresponding 4 byte zip64
\r
643 extended information extra field.
\r
645 internal file attributes: (2 bytes)
\r
647 Bits 1 and 2 are reserved for use by PKWARE.
\r
649 The lowest bit of this field indicates, if set, that
\r
650 the file is apparently an ASCII or text file. If not
\r
651 set, that the file apparently contains binary data.
\r
652 The remaining bits are unused in version 1.0.
\r
654 The 0x0002 bit of this field indicates, if set, that a
\r
655 4 byte variable record length control field precedes each
\r
656 logical record indicating the length of the record. The
\r
657 record length control field is stored in little-endian byte
\r
658 order. This flag is independent of text control characters,
\r
659 and if used in conjunction with text data, includes any
\r
660 control characters in the total length of the record. This
\r
661 value is provided for mainframe data transfer support.
\r
663 external file attributes: (4 bytes)
\r
665 The mapping of the external attributes is
\r
666 host-system dependent (see 'version made by'). For
\r
667 MS-DOS, the low order byte is the MS-DOS directory
\r
668 attribute byte. If input came from standard input, this
\r
669 field is set to zero.
\r
671 relative offset of local header: (4 bytes)
\r
673 This is the offset from the start of the first disk on
\r
674 which this file appears, to where the local header should
\r
675 be found. If an archive is in ZIP64 format and the value
\r
676 in this field is 0xFFFFFFFF, the size will be in the
\r
677 corresponding 8 byte zip64 extended information extra field.
\r
679 file name: (Variable)
\r
681 The name of the file, with optional relative path.
\r
682 The path stored should not contain a drive or
\r
683 device letter, or a leading slash. All slashes
\r
684 should be forward slashes '/' as opposed to
\r
685 backwards slashes '\' for compatibility with Amiga
\r
686 and UNIX file systems etc. If input came from standard
\r
687 input, there is no file name field. If encrypting
\r
688 the central directory and general purpose bit flag 13 is set
\r
689 indicating masking, the file name stored in the Local Header
\r
690 will not be the actual file name. A masking value consisting
\r
691 of a unique hexadecimal value will be stored. This value will
\r
692 be sequentially incremented for each file in the archive. See
\r
693 the section on the Strong Encryption Specification for details
\r
694 on retrieving the encrypted file name.
\r
696 extra field: (Variable)
\r
698 This is for expansion. If additional information
\r
699 needs to be stored for special needs or for specific
\r
700 platforms, it should be stored here. Earlier versions
\r
701 of the software can then safely skip this file, and
\r
702 find the next file or header. This field will be 0
\r
703 length in version 1.0.
\r
705 In order to allow different programs and different types
\r
706 of information to be stored in the 'extra' field in .ZIP
\r
707 files, the following structure should be used for all
\r
708 programs storing data in this field:
\r
710 header1+data1 + header2+data2 . . .
\r
712 Each header should consist of:
\r
714 Header ID - 2 bytes
\r
715 Data Size - 2 bytes
\r
717 Note: all fields stored in Intel low-byte/high-byte order.
\r
719 The Header ID field indicates the type of data that is in
\r
720 the following data block.
\r
722 Header ID's of 0 thru 31 are reserved for use by PKWARE.
\r
723 The remaining ID's can be used by third party vendors for
\r
726 The current Header ID mappings defined by PKWARE are:
\r
728 0x0001 Zip64 extended information extra field
\r
730 0x0008 Reserved for extended language encoding data (PFS)
\r
736 0x000e Reserved for file stream and fork descriptors
\r
737 0x000f Patch Descriptor
\r
738 0x0014 PKCS#7 Store for X.509 Certificates
\r
739 0x0015 X.509 Certificate ID and Signature for
\r
741 0x0016 X.509 Certificate ID for Central Directory
\r
742 0x0017 Strong Encryption Header
\r
743 0x0018 Record Management Controls
\r
744 0x0019 PKCS#7 Encryption Recipient Certificate List
\r
745 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes
\r
747 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400)
\r
748 attributes - compressed
\r
749 0x4690 POSZIP 4690 (reserved)
\r
751 Third party mappings commonly used are:
\r
755 0x2605 ZipIt Macintosh
\r
756 0x2705 ZipIt Macintosh 1.3.5+
\r
757 0x2805 ZipIt Macintosh 1.3.5+
\r
758 0x334d Info-ZIP Macintosh
\r
759 0x4341 Acorn/SparkFS
\r
760 0x4453 Windows NT security descriptor (binary ACL)
\r
763 0x4b46 FWKCS MD5 (see below)
\r
764 0x4c41 OS/2 access control list (text ACL)
\r
765 0x4d49 Info-ZIP OpenVMS
\r
766 0x4f4c Xceed original location extra field
\r
767 0x5356 AOS/VS (ACL)
\r
768 0x5455 extended timestamp
\r
769 0x554e Xceed unicode extra field
\r
770 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc)
\r
771 0x6375 Info-ZIP Unicode Comment Extra Field
\r
773 0x7075 Info-ZIP Unicode Path Extra Field
\r
775 0x7855 Info-ZIP UNIX (new)
\r
776 0xa220 Microsoft Open Packaging Growth Hint
\r
779 Detailed descriptions of Extra Fields defined by third
\r
780 party mappings will be documented as information on
\r
781 these data structures is made available to PKWARE.
\r
782 PKWARE does not guarantee the accuracy of any published
\r
785 The Data Size field indicates the size of the following
\r
786 data block. Programs can use this value to skip to the
\r
787 next header block, passing over any data blocks that are
\r
790 Note: As stated above, the size of the entire .ZIP file
\r
791 header, including the file name, comment, and extra
\r
792 field should not exceed 64K in size.
\r
794 In case two different programs should appropriate the same
\r
795 Header ID value, it is strongly recommended that each
\r
796 program place a unique signature of at least two bytes in
\r
797 size (and preferably 4 bytes or bigger) at the start of
\r
798 each data area. Every program should verify that its
\r
799 unique signature is present, in addition to the Header ID
\r
800 value being correct, before assuming that it is a block of
\r
803 -Zip64 Extended Information Extra Field (0x0001):
\r
805 The following is the layout of the zip64 extended
\r
806 information "extra" block. If one of the size or
\r
807 offset fields in the Local or Central directory
\r
808 record is too small to hold the required data,
\r
809 a Zip64 extended information record is created.
\r
810 The order of the fields in the zip64 extended
\r
811 information record is fixed, but the fields will
\r
812 only appear if the corresponding Local or Central
\r
813 directory record field is set to 0xFFFF or 0xFFFFFFFF.
\r
815 Note: all fields stored in Intel low-byte/high-byte order.
\r
817 Value Size Description
\r
818 ----- ---- -----------
\r
819 (ZIP64) 0x0001 2 bytes Tag for this "extra" block type
\r
820 Size 2 bytes Size of this "extra" block
\r
822 Size 8 bytes Original uncompressed file size
\r
824 Size 8 bytes Size of compressed data
\r
826 Offset 8 bytes Offset of local header record
\r
828 Number 4 bytes Number of the disk on which
\r
831 This entry in the Local header must include BOTH original
\r
832 and compressed file size fields. If encrypting the
\r
833 central directory and bit 13 of the general purpose bit
\r
834 flag is set indicating masking, the value stored in the
\r
835 Local Header for the original file size will be zero.
\r
838 -OS/2 Extra Field (0x0009):
\r
840 The following is the layout of the OS/2 attributes "extra"
\r
841 block. (Last Revision 09/05/95)
\r
843 Note: all fields stored in Intel low-byte/high-byte order.
\r
845 Value Size Description
\r
846 ----- ---- -----------
\r
847 (OS/2) 0x0009 2 bytes Tag for this "extra" block type
\r
848 TSize 2 bytes Size for the following data block
\r
849 BSize 4 bytes Uncompressed Block Size
\r
850 CType 2 bytes Compression type
\r
851 EACRC 4 bytes CRC value for uncompress block
\r
852 (var) variable Compressed block
\r
854 The OS/2 extended attribute structure (FEA2LIST) is
\r
855 compressed and then stored in it's entirety within this
\r
856 structure. There will only ever be one "block" of data in
\r
859 -NTFS Extra Field (0x000a):
\r
861 The following is the layout of the NTFS attributes
\r
862 "extra" block. (Note: At this time the Mtime, Atime
\r
863 and Ctime values may be used on any WIN32 system.)
\r
865 Note: all fields stored in Intel low-byte/high-byte order.
\r
867 Value Size Description
\r
868 ----- ---- -----------
\r
869 (NTFS) 0x000a 2 bytes Tag for this "extra" block type
\r
870 TSize 2 bytes Size of the total "extra" block
\r
871 Reserved 4 bytes Reserved for future use
\r
872 Tag1 2 bytes NTFS attribute tag value #1
\r
873 Size1 2 bytes Size of attribute #1, in bytes
\r
874 (var.) Size1 Attribute #1 data
\r
878 TagN 2 bytes NTFS attribute tag value #N
\r
879 SizeN 2 bytes Size of attribute #N, in bytes
\r
880 (var.) SizeN Attribute #N data
\r
882 For NTFS, values for Tag1 through TagN are as follows:
\r
883 (currently only one set of attributes is defined for NTFS)
\r
885 Tag Size Description
\r
886 ----- ---- -----------
\r
887 0x0001 2 bytes Tag for attribute #1
\r
888 Size1 2 bytes Size of attribute #1, in bytes
\r
889 Mtime 8 bytes File last modification time
\r
890 Atime 8 bytes File last access time
\r
891 Ctime 8 bytes File creation time
\r
893 -OpenVMS Extra Field (0x000c):
\r
895 The following is the layout of the OpenVMS attributes
\r
898 Note: all fields stored in Intel low-byte/high-byte order.
\r
900 Value Size Description
\r
901 ----- ---- -----------
\r
902 (VMS) 0x000c 2 bytes Tag for this "extra" block type
\r
903 TSize 2 bytes Size of the total "extra" block
\r
904 CRC 4 bytes 32-bit CRC for remainder of the block
\r
905 Tag1 2 bytes OpenVMS attribute tag value #1
\r
906 Size1 2 bytes Size of attribute #1, in bytes
\r
907 (var.) Size1 Attribute #1 data
\r
911 TagN 2 bytes OpenVMS attribute tag value #N
\r
912 SizeN 2 bytes Size of attribute #N, in bytes
\r
913 (var.) SizeN Attribute #N data
\r
917 1. There will be one or more of attributes present, which
\r
918 will each be preceded by the above TagX & SizeX values.
\r
919 These values are identical to the ATR$C_XXXX and
\r
920 ATR$S_XXXX constants which are defined in ATR.H under
\r
921 OpenVMS C. Neither of these values will ever be zero.
\r
923 2. No word alignment or padding is performed.
\r
925 3. A well-behaved PKZIP/OpenVMS program should never produce
\r
926 more than one sub-block with the same TagX value. Also,
\r
927 there will never be more than one "extra" block of type
\r
928 0x000c in a particular directory record.
\r
930 -UNIX Extra Field (0x000d):
\r
932 The following is the layout of the UNIX "extra" block.
\r
933 Note: all fields are stored in Intel low-byte/high-byte
\r
936 Value Size Description
\r
937 ----- ---- -----------
\r
938 (UNIX) 0x000d 2 bytes Tag for this "extra" block type
\r
939 TSize 2 bytes Size for the following data block
\r
940 Atime 4 bytes File last access time
\r
941 Mtime 4 bytes File last modification time
\r
942 Uid 2 bytes File user ID
\r
943 Gid 2 bytes File group ID
\r
944 (var) variable Variable length data field
\r
946 The variable length data field will contain file type
\r
947 specific data. Currently the only values allowed are
\r
948 the original "linked to" file names for hard or symbolic
\r
949 links, and the major and minor device node numbers for
\r
950 character and block device nodes. Since device nodes
\r
951 cannot be either symbolic or hard links, only one set of
\r
952 variable length data is stored. Link files will have the
\r
953 name of the original file stored. This name is NOT NULL
\r
954 terminated. Its size can be determined by checking TSize -
\r
955 12. Device entries will have eight bytes stored as two 4
\r
956 byte entries (in little endian format). The first entry
\r
957 will be the major device number, and the second the minor
\r
960 -PATCH Descriptor Extra Field (0x000f):
\r
962 The following is the layout of the Patch Descriptor "extra"
\r
965 Note: all fields stored in Intel low-byte/high-byte order.
\r
967 Value Size Description
\r
968 ----- ---- -----------
\r
969 (Patch) 0x000f 2 bytes Tag for this "extra" block type
\r
970 TSize 2 bytes Size of the total "extra" block
\r
971 Version 2 bytes Version of the descriptor
\r
972 Flags 4 bytes Actions and reactions (see below)
\r
973 OldSize 4 bytes Size of the file about to be patched
\r
974 OldCRC 4 bytes 32-bit CRC of the file to be patched
\r
975 NewSize 4 bytes Size of the resulting file
\r
976 NewCRC 4 bytes 32-bit CRC of the resulting file
\r
978 Actions and reactions
\r
981 ---- ----------------
\r
982 0 Use for auto detection
\r
983 1 Treat as a self-patch
\r
985 4-5 Action (see below)
\r
987 8-9 Reaction (see below) to absent file
\r
988 10-11 Reaction (see below) to newer file
\r
989 12-13 Reaction (see below) to unknown file
\r
1011 Patch support is provided by PKPatchMaker(tm) technology and is
\r
1012 covered under U.S. Patents and Patents Pending. The use or
\r
1013 implementation in a product of certain technological aspects set
\r
1014 forth in the current APPNOTE, including those with regard to
\r
1015 strong encryption, patching, or extended tape operations requires
\r
1016 a license from PKWARE. Please contact PKWARE with regard to
\r
1017 acquiring a license.
\r
1019 -PKCS#7 Store for X.509 Certificates (0x0014):
\r
1021 This field contains information about each of the certificates
\r
1022 files may be signed with. When the Central Directory Encryption
\r
1023 feature is enabled for a ZIP file, this record will appear in
\r
1024 the Archive Extra Data Record, otherwise it will appear in the
\r
1025 first central directory record and will be ignored in any
\r
1028 Note: all fields stored in Intel low-byte/high-byte order.
\r
1030 Value Size Description
\r
1031 ----- ---- -----------
\r
1032 (Store) 0x0014 2 bytes Tag for this "extra" block type
\r
1033 TSize 2 bytes Size of the store data
\r
1034 TData TSize Data about the store
\r
1037 -X.509 Certificate ID and Signature for individual file (0x0015):
\r
1039 This field contains the information about which certificate in
\r
1040 the PKCS#7 store was used to sign a particular file. It also
\r
1041 contains the signature data. This field can appear multiple
\r
1042 times, but can only appear once per certificate.
\r
1044 Note: all fields stored in Intel low-byte/high-byte order.
\r
1046 Value Size Description
\r
1047 ----- ---- -----------
\r
1048 (CID) 0x0015 2 bytes Tag for this "extra" block type
\r
1049 TSize 2 bytes Size of data that follows
\r
1050 TData TSize Signature Data
\r
1052 -X.509 Certificate ID and Signature for central directory (0x0016):
\r
1054 This field contains the information about which certificate in
\r
1055 the PKCS#7 store was used to sign the central directory structure.
\r
1056 When the Central Directory Encryption feature is enabled for a
\r
1057 ZIP file, this record will appear in the Archive Extra Data Record,
\r
1058 otherwise it will appear in the first central directory record.
\r
1060 Note: all fields stored in Intel low-byte/high-byte order.
\r
1062 Value Size Description
\r
1063 ----- ---- -----------
\r
1064 (CDID) 0x0016 2 bytes Tag for this "extra" block type
\r
1065 TSize 2 bytes Size of data that follows
\r
1068 -Strong Encryption Header (0x0017):
\r
1070 Value Size Description
\r
1071 ----- ---- -----------
\r
1072 0x0017 2 bytes Tag for this "extra" block type
\r
1073 TSize 2 bytes Size of data that follows
\r
1074 Format 2 bytes Format definition for this record
\r
1075 AlgID 2 bytes Encryption algorithm identifier
\r
1076 Bitlen 2 bytes Bit length of encryption key
\r
1077 Flags 2 bytes Processing flags
\r
1078 CertData TSize-8 Certificate decryption extra field data
\r
1079 (refer to the explanation for CertData
\r
1080 in the section describing the
\r
1081 Certificate Processing Method under
\r
1082 the Strong Encryption Specification)
\r
1085 -Record Management Controls (0x0018):
\r
1087 Value Size Description
\r
1088 ----- ---- -----------
\r
1089 (Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type
\r
1090 CSize 2 bytes Size of total extra block data
\r
1091 Tag1 2 bytes Record control attribute 1
\r
1092 Size1 2 bytes Size of attribute 1, in bytes
\r
1093 Data1 Size1 Attribute 1 data
\r
1097 TagN 2 bytes Record control attribute N
\r
1098 SizeN 2 bytes Size of attribute N, in bytes
\r
1099 DataN SizeN Attribute N data
\r
1102 -PKCS#7 Encryption Recipient Certificate List (0x0019):
\r
1104 This field contains information about each of the certificates
\r
1105 used in encryption processing and it can be used to identify who is
\r
1106 allowed to decrypt encrypted files. This field should only appear
\r
1107 in the archive extra data record. This field is not required and
\r
1108 serves only to aide archive modifications by preserving public
\r
1109 encryption key data. Individual security requirements may dictate
\r
1110 that this data be omitted to deter information exposure.
\r
1112 Note: all fields stored in Intel low-byte/high-byte order.
\r
1114 Value Size Description
\r
1115 ----- ---- -----------
\r
1116 (CStore) 0x0019 2 bytes Tag for this "extra" block type
\r
1117 TSize 2 bytes Size of the store data
\r
1118 TData TSize Data about the store
\r
1122 Value Size Description
\r
1123 ----- ---- -----------
\r
1124 Version 2 bytes Format version number - must 0x0001 at this time
\r
1125 CStore (var) PKCS#7 data blob
\r
1128 -MVS Extra Field (0x0065):
\r
1130 The following is the layout of the MVS "extra" block.
\r
1131 Note: Some fields are stored in Big Endian format.
\r
1132 All text is in EBCDIC format unless otherwise specified.
\r
1134 Value Size Description
\r
1135 ----- ---- -----------
\r
1136 (MVS) 0x0065 2 bytes Tag for this "extra" block type
\r
1137 TSize 2 bytes Size for the following data block
\r
1138 ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or
\r
1139 "T4MV" for TargetFour
\r
1140 (var) TSize-4 Attribute data (see APPENDIX B)
\r
1143 -OS/400 Extra Field (0x0065):
\r
1145 The following is the layout of the OS/400 "extra" block.
\r
1146 Note: Some fields are stored in Big Endian format.
\r
1147 All text is in EBCDIC format unless otherwise specified.
\r
1149 Value Size Description
\r
1150 ----- ---- -----------
\r
1151 (OS400) 0x0065 2 bytes Tag for this "extra" block type
\r
1152 TSize 2 bytes Size for the following data block
\r
1153 ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or
\r
1154 "T4MV" for TargetFour
\r
1155 (var) TSize-4 Attribute data (see APPENDIX A)
\r
1158 Third-party Mappings:
\r
1160 -ZipIt Macintosh Extra Field (long) (0x2605):
\r
1162 The following is the layout of the ZipIt extra block
\r
1163 for Macintosh. The local-header and central-header versions
\r
1164 are identical. This block must be present if the file is
\r
1165 stored MacBinary-encoded and it should not be used if the file
\r
1166 is not stored MacBinary-encoded.
\r
1168 Value Size Description
\r
1169 ----- ---- -----------
\r
1170 (Mac2) 0x2605 Short tag for this extra block type
\r
1171 TSize Short total data size for this block
\r
1172 "ZPIT" beLong extra-field signature
\r
1173 FnLen Byte length of FileName
\r
1174 FileName variable full Macintosh filename
\r
1175 FileType Byte[4] four-byte Mac file type string
\r
1176 Creator Byte[4] four-byte Mac creator string
\r
1179 -ZipIt Macintosh Extra Field (short, for files) (0x2705):
\r
1181 The following is the layout of a shortened variant of the
\r
1182 ZipIt extra block for Macintosh (without "full name" entry).
\r
1183 This variant is used by ZipIt 1.3.5 and newer for entries of
\r
1184 files (not directories) that do not have a MacBinary encoded
\r
1185 file. The local-header and central-header versions are identical.
\r
1187 Value Size Description
\r
1188 ----- ---- -----------
\r
1189 (Mac2b) 0x2705 Short tag for this extra block type
\r
1190 TSize Short total data size for this block (12)
\r
1191 "ZPIT" beLong extra-field signature
\r
1192 FileType Byte[4] four-byte Mac file type string
\r
1193 Creator Byte[4] four-byte Mac creator string
\r
1194 fdFlags beShort attributes from FInfo.frFlags,
\r
1196 0x0000 beShort reserved, may be omitted
\r
1199 -ZipIt Macintosh Extra Field (short, for directories) (0x2805):
\r
1201 The following is the layout of a shortened variant of the
\r
1202 ZipIt extra block for Macintosh used only for directory
\r
1203 entries. This variant is used by ZipIt 1.3.5 and newer to
\r
1204 save some optional Mac-specific information about directories.
\r
1205 The local-header and central-header versions are identical.
\r
1207 Value Size Description
\r
1208 ----- ---- -----------
\r
1209 (Mac2c) 0x2805 Short tag for this extra block type
\r
1210 TSize Short total data size for this block (12)
\r
1211 "ZPIT" beLong extra-field signature
\r
1212 frFlags beShort attributes from DInfo.frFlags, may
\r
1214 View beShort ZipIt view flag, may be omitted
\r
1217 The View field specifies ZipIt-internal settings as follows:
\r
1219 Bits of the Flags:
\r
1220 bit 0 if set, the folder is shown expanded (open)
\r
1221 when the archive contents are viewed in ZipIt.
\r
1222 bits 1-15 reserved, zero;
\r
1225 -FWKCS MD5 Extra Field (0x4b46):
\r
1227 The FWKCS Contents_Signature System, used in
\r
1228 automatically identifying files independent of file name,
\r
1229 optionally adds and uses an extra field to support the
\r
1230 rapid creation of an enhanced contents_signature:
\r
1232 Header ID = 0x4b46
\r
1233 Data Size = 0x0013
\r
1234 Preface = 'M','D','5'
\r
1235 followed by 16 bytes containing the uncompressed file's
\r
1236 128_bit MD5 hash(1), low byte first.
\r
1238 When FWKCS revises a .ZIP file central directory to add
\r
1239 this extra field for a file, it also replaces the
\r
1240 central directory entry for that file's uncompressed
\r
1241 file length with a measured value.
\r
1243 FWKCS provides an option to strip this extra field, if
\r
1244 present, from a .ZIP file central directory. In adding
\r
1245 this extra field, FWKCS preserves .ZIP file Authenticity
\r
1246 Verification; if stripping this extra field, FWKCS
\r
1247 preserves all versions of AV through PKZIP version 2.04g.
\r
1249 FWKCS, and FWKCS Contents_Signature System, are
\r
1250 trademarks of Frederick W. Kantor.
\r
1252 (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
\r
1253 Science and RSA Data Security, Inc., April 1992.
\r
1254 ll.76-77: "The MD5 algorithm is being placed in the
\r
1255 public domain for review and possible adoption as a
\r
1259 -Info-ZIP Unicode Comment Extra Field (0x6375):
\r
1261 Stores the UTF-8 version of the file comment as stored in the
\r
1262 central directory header. (Last Revision 20070912)
\r
1264 Value Size Description
\r
1265 ----- ---- -----------
\r
1266 (UCom) 0x6375 Short tag for this extra block type ("uc")
\r
1267 TSize Short total data size for this block
\r
1268 Version 1 byte version of this extra field, currently 1
\r
1269 ComCRC32 4 bytes Comment Field CRC32 Checksum
\r
1270 UnicodeCom Variable UTF-8 version of the entry comment
\r
1272 Currently Version is set to the number 1. If there is a need
\r
1273 to change this field, the version will be incremented. Changes
\r
1274 may not be backward compatible so this extra field should not be
\r
1275 used if the version is not recognized.
\r
1277 The ComCRC32 is the standard zip CRC32 checksum of the File Comment
\r
1278 field in the central directory header. This is used to verify that
\r
1279 the comment field has not changed since the Unicode Comment extra field
\r
1280 was created. This can happen if a utility changes the File Comment
\r
1281 field but does not update the UTF-8 Comment extra field. If the CRC
\r
1282 check fails, this Unicode Comment extra field should be ignored and
\r
1283 the File Comment field in the header should be used instead.
\r
1285 The UnicodeCom field is the UTF-8 version of the File Comment field
\r
1286 in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte
\r
1287 order mark (BOM) is used. The length of this field is determined by
\r
1288 subtracting the size of the previous fields from TSize. If both the
\r
1289 File Name and Comment fields are UTF-8, the new General Purpose Bit
\r
1290 Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate
\r
1291 both the header File Name and Comment fields are UTF-8 and, in this
\r
1292 case, the Unicode Path and Unicode Comment extra fields are not
\r
1293 needed and should not be created. Note that, for backward
\r
1294 compatibility, bit 11 should only be used if the native character set
\r
1295 of the paths and comments being zipped up are already in UTF-8. It is
\r
1296 expected that the same file comment storage method, either general
\r
1297 purpose bit 11 or extra fields, be used in both the Local and Central
\r
1298 Directory Header for a file.
\r
1301 -Info-ZIP Unicode Path Extra Field (0x7075):
\r
1303 Stores the UTF-8 version of the file name field as stored in the
\r
1304 local header and central directory header. (Last Revision 20070912)
\r
1306 Value Size Description
\r
1307 ----- ---- -----------
\r
1308 (UPath) 0x7075 Short tag for this extra block type ("up")
\r
1309 TSize Short total data size for this block
\r
1310 Version 1 byte version of this extra field, currently 1
\r
1311 NameCRC32 4 bytes File Name Field CRC32 Checksum
\r
1312 UnicodeName Variable UTF-8 version of the entry File Name
\r
1314 Currently Version is set to the number 1. If there is a need
\r
1315 to change this field, the version will be incremented. Changes
\r
1316 may not be backward compatible so this extra field should not be
\r
1317 used if the version is not recognized.
\r
1319 The NameCRC32 is the standard zip CRC32 checksum of the File Name
\r
1320 field in the header. This is used to verify that the header
\r
1321 File Name field has not changed since the Unicode Path extra field
\r
1322 was created. This can happen if a utility renames the File Name but
\r
1323 does not update the UTF-8 path extra field. If the CRC check fails,
\r
1324 this UTF-8 Path Extra Field should be ignored and the File Name field
\r
1325 in the header should be used instead.
\r
1327 The UnicodeName is the UTF-8 version of the contents of the File Name
\r
1328 field in the header. As UnicodeName is defined to be UTF-8, no UTF-8
\r
1329 byte order mark (BOM) is used. The length of this field is determined
\r
1330 by subtracting the size of the previous fields from TSize. If both
\r
1331 the File Name and Comment fields are UTF-8, the new General Purpose
\r
1332 Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to
\r
1333 indicate that both the header File Name and Comment fields are UTF-8
\r
1334 and, in this case, the Unicode Path and Unicode Comment extra fields
\r
1335 are not needed and should not be created. Note that, for backward
\r
1336 compatibility, bit 11 should only be used if the native character set
\r
1337 of the paths and comments being zipped up are already in UTF-8. It is
\r
1338 expected that the same file name storage method, either general
\r
1339 purpose bit 11 or extra fields, be used in both the Local and Central
\r
1340 Directory Header for a file.
\r
1343 -Microsoft Open Packaging Growth Hint (0xa220):
\r
1345 Value Size Description
\r
1346 ----- ---- -----------
\r
1347 0xa220 Short tag for this extra block type
\r
1348 TSize Short size of Sig + PadVal + Padding
\r
1349 Sig Short verification signature (A028)
\r
1350 PadVal Short Initial padding value
\r
1351 Padding variable filled with NULL characters
\r
1354 file comment: (Variable)
\r
1356 The comment for this file.
\r
1358 number of this disk: (2 bytes)
\r
1360 The number of this disk, which contains central
\r
1361 directory end record. If an archive is in ZIP64 format
\r
1362 and the value in this field is 0xFFFF, the size will
\r
1363 be in the corresponding 4 byte zip64 end of central
\r
1367 number of the disk with the start of the central
\r
1368 directory: (2 bytes)
\r
1370 The number of the disk on which the central
\r
1371 directory starts. If an archive is in ZIP64 format
\r
1372 and the value in this field is 0xFFFF, the size will
\r
1373 be in the corresponding 4 byte zip64 end of central
\r
1376 total number of entries in the central dir on
\r
1377 this disk: (2 bytes)
\r
1379 The number of central directory entries on this disk.
\r
1380 If an archive is in ZIP64 format and the value in
\r
1381 this field is 0xFFFF, the size will be in the
\r
1382 corresponding 8 byte zip64 end of central
\r
1385 total number of entries in the central dir: (2 bytes)
\r
1387 The total number of files in the .ZIP file. If an
\r
1388 archive is in ZIP64 format and the value in this field
\r
1389 is 0xFFFF, the size will be in the corresponding 8 byte
\r
1390 zip64 end of central directory field.
\r
1392 size of the central directory: (4 bytes)
\r
1394 The size (in bytes) of the entire central directory.
\r
1395 If an archive is in ZIP64 format and the value in
\r
1396 this field is 0xFFFFFFFF, the size will be in the
\r
1397 corresponding 8 byte zip64 end of central
\r
1400 offset of start of central directory with respect to
\r
1401 the starting disk number: (4 bytes)
\r
1403 Offset of the start of the central directory on the
\r
1404 disk on which the central directory starts. If an
\r
1405 archive is in ZIP64 format and the value in this
\r
1406 field is 0xFFFFFFFF, the size will be in the
\r
1407 corresponding 8 byte zip64 end of central
\r
1410 .ZIP file comment length: (2 bytes)
\r
1412 The length of the comment for this .ZIP file.
\r
1414 .ZIP file comment: (Variable)
\r
1416 The comment for this .ZIP file. ZIP file comment data
\r
1417 is stored unsecured. No encryption or data authentication
\r
1418 is applied to this area at this time. Confidential information
\r
1419 should not be stored in this section.
\r
1421 zip64 extensible data sector (variable size)
\r
1423 (currently reserved for use by PKWARE)
\r
1426 K. Splitting and Spanning ZIP files
\r
1428 Spanning is the process of segmenting a ZIP file across
\r
1429 multiple removable media. This support has typically only
\r
1430 been provided for DOS formatted floppy diskettes.
\r
1432 File splitting is a newer derivative of spanning.
\r
1433 Splitting follows the same segmentation process as
\r
1434 spanning, however, it does not require writing each
\r
1435 segment to a unique removable medium and instead supports
\r
1436 placing all pieces onto local or non-removable locations
\r
1437 such as file systems, local drives, folders, etc...
\r
1439 A key difference between spanned and split ZIP files is
\r
1440 that all pieces of a spanned ZIP file have the same name.
\r
1441 Since each piece is written to a separate volume, no name
\r
1442 collisions occur and each segment can reuse the original
\r
1443 .ZIP file name given to the archive.
\r
1445 Sequence ordering for DOS spanned archives uses the DOS
\r
1446 volume label to determine segment numbers. Volume labels
\r
1447 for each segment are written using the form PKBACK#xxx,
\r
1448 where xxx is the segment number written as a decimal
\r
1449 value from 001 - nnn.
\r
1451 Split ZIP files are typically written to the same location
\r
1452 and are subject to name collisions if the spanned name
\r
1453 format is used since each segment will reside on the same
\r
1454 drive. To avoid name collisions, split archives are named
\r
1457 Segment 1 = filename.z01
\r
1458 Segment n-1 = filename.z(n-1)
\r
1459 Segment n = filename.zip
\r
1461 The .ZIP extension is used on the last segment to support
\r
1462 quickly reading the central directory. The segment number
\r
1463 n should be a decimal value.
\r
1465 Spanned ZIP files may be PKSFX Self-extracting ZIP files.
\r
1466 PKSFX files may also be split, however, in this case
\r
1467 the first segment must be named filename.exe. The first
\r
1468 segment of a split PKSFX archive must be large enough to
\r
1469 include the entire executable program.
\r
1471 Capacities for split archives are as follows.
\r
1473 Maximum number of segments = 4,294,967,295 - 1
\r
1474 Maximum .ZIP segment size = 4,294,967,295 bytes
\r
1475 Minimum segment size = 64K
\r
1476 Maximum PKSFX segment size = 2,147,483,647 bytes
\r
1478 Segment sizes may be different however by convention, all
\r
1479 segment sizes should be the same with the exception of the
\r
1480 last, which may be smaller. Local and central directory
\r
1481 header records must never be split across a segment boundary.
\r
1482 When writing a header record, if the number of bytes remaining
\r
1483 within a segment is less than the size of the header record,
\r
1484 end the current segment and write the header at the start
\r
1485 of the next segment. The central directory may span segment
\r
1486 boundaries, but no single record in the central directory
\r
1487 should be split across segments.
\r
1489 Spanned/Split archives created using PKZIP for Windows
\r
1490 (V2.50 or greater), PKZIP Command Line (V2.50 or greater),
\r
1491 or PKZIP Explorer will include a special spanning
\r
1492 signature as the first 4 bytes of the first segment of
\r
1493 the archive. This signature (0x08074b50) will be
\r
1494 followed immediately by the local header signature for
\r
1495 the first file in the archive.
\r
1497 A special spanning marker may also appear in spanned/split
\r
1498 archives if the spanning or splitting process starts but
\r
1499 only requires one segment. In this case the 0x08074b50
\r
1500 signature will be replaced with the temporary spanning
\r
1501 marker signature of 0x30304b50. Split archives can
\r
1502 only be uncompressed by other versions of PKZIP that
\r
1503 know how to create a split archive.
\r
1505 The signature value 0x08074b50 is also used by some
\r
1506 ZIP implementations as a marker for the Data Descriptor
\r
1507 record. Conflict in this alternate assignment can be
\r
1508 avoided by ensuring the position of the signature
\r
1509 within the ZIP file to determine the use for which it
\r
1514 1) All fields unless otherwise noted are unsigned and stored
\r
1515 in Intel low-byte:high-byte, low-word:high-word order.
\r
1517 2) String fields are not null terminated, since the
\r
1518 length is given explicitly.
\r
1520 3) The entries in the central directory may not necessarily
\r
1521 be in the same order that files appear in the .ZIP file.
\r
1523 4) If one of the fields in the end of central directory
\r
1524 record is too small to hold required data, the field
\r
1525 should be set to -1 (0xFFFF or 0xFFFFFFFF) and the
\r
1526 ZIP64 format record should be created.
\r
1528 5) The end of central directory record and the
\r
1529 Zip64 end of central directory locator record must
\r
1530 reside on the same disk when splitting or spanning
\r
1533 VI. Explanation of compression methods
\r
1534 --------------------------------------
\r
1536 UnShrinking - Method 1
\r
1537 ----------------------
\r
1539 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
\r
1540 with partial clearing. The initial code size is 9 bits, and
\r
1541 the maximum code size is 13 bits. Shrinking differs from
\r
1542 conventional Dynamic Ziv-Lempel-Welch implementations in several
\r
1545 1) The code size is controlled by the compressor, and is not
\r
1546 automatically increased when codes larger than the current
\r
1547 code size are created (but not necessarily used). When
\r
1548 the decompressor encounters the code sequence 256
\r
1549 (decimal) followed by 1, it should increase the code size
\r
1550 read from the input stream to the next bit size. No
\r
1551 blocking of the codes is performed, so the next code at
\r
1552 the increased size should be read from the input stream
\r
1553 immediately after where the previous code at the smaller
\r
1554 bit size was read. Again, the decompressor should not
\r
1555 increase the code size used until the sequence 256,1 is
\r
1558 2) When the table becomes full, total clearing is not
\r
1559 performed. Rather, when the compressor emits the code
\r
1560 sequence 256,2 (decimal), the decompressor should clear
\r
1561 all leaf nodes from the Ziv-Lempel tree, and continue to
\r
1562 use the current code size. The nodes that are cleared
\r
1563 from the Ziv-Lempel tree are then re-used, with the lowest
\r
1564 code value re-used first, and the highest code value
\r
1565 re-used last. The compressor can emit the sequence 256,2
\r
1568 Expanding - Methods 2-5
\r
1569 -----------------------
\r
1571 The Reducing algorithm is actually a combination of two
\r
1572 distinct algorithms. The first algorithm compresses repeated
\r
1573 byte sequences, and the second algorithm takes the compressed
\r
1574 stream from the first algorithm and applies a probabilistic
\r
1575 compression method.
\r
1577 The probabilistic compression stores an array of 'follower
\r
1578 sets' S(j), for j=0 to 255, corresponding to each possible
\r
1579 ASCII character. Each set contains between 0 and 32
\r
1580 characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
\r
1581 The sets are stored at the beginning of the data area for a
\r
1582 Reduced file, in reverse order, with S(255) first, and S(0)
\r
1585 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
\r
1586 where N(j) is the size of set S(j). N(j) can be 0, in which
\r
1587 case the follower set for S(j) is empty. Each N(j) value is
\r
1588 encoded in 6 bits, followed by N(j) eight bit character values
\r
1589 corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If
\r
1590 N(j) is 0, then no values for S(j) are stored, and the value
\r
1591 for N(j-1) immediately follows.
\r
1593 Immediately after the follower sets, is the compressed data
\r
1594 stream. The compressed data stream can be interpreted for the
\r
1595 probabilistic decompression as follows:
\r
1597 let Last-Character <- 0.
\r
1599 if the follower set S(Last-Character) is empty then
\r
1600 read 8 bits from the input stream, and copy this
\r
1601 value to the output stream.
\r
1602 otherwise if the follower set S(Last-Character) is non-empty then
\r
1603 read 1 bit from the input stream.
\r
1604 if this bit is not zero then
\r
1605 read 8 bits from the input stream, and copy this
\r
1606 value to the output stream.
\r
1607 otherwise if this bit is zero then
\r
1608 read B(N(Last-Character)) bits from the input
\r
1609 stream, and assign this value to I.
\r
1610 Copy the value of S(Last-Character)[I] to the
\r
1613 assign the last value placed on the output stream to
\r
1617 B(N(j)) is defined as the minimal number of bits required to
\r
1618 encode the value N(j)-1.
\r
1620 The decompressed stream from above can then be expanded to
\r
1621 re-create the original file as follows:
\r
1626 read 8 bits from the input stream into C.
\r
1628 0: if C is not equal to DLE (144 decimal) then
\r
1629 copy C to the output stream.
\r
1630 otherwise if C is equal to DLE then
\r
1633 1: if C is non-zero then
\r
1636 let State <- F(Len).
\r
1637 otherwise if C is zero then
\r
1638 copy the value 144 (decimal) to the output stream.
\r
1641 2: let Len <- Len + C
\r
1644 3: move backwards D(V,C) bytes in the output stream
\r
1645 (if this position is before the start of the output
\r
1646 stream, then assume that all the data before the
\r
1647 start of the output stream is filled with zeros).
\r
1648 copy Len+3 bytes from this position to the output stream.
\r
1653 The functions F,L, and D are dependent on the 'compression
\r
1654 factor', 1 through 4, and are defined as follows:
\r
1656 For compression factor 1:
\r
1657 L(X) equals the lower 7 bits of X.
\r
1658 F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
\r
1659 D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
\r
1660 For compression factor 2:
\r
1661 L(X) equals the lower 6 bits of X.
\r
1662 F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
\r
1663 D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
\r
1664 For compression factor 3:
\r
1665 L(X) equals the lower 5 bits of X.
\r
1666 F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
\r
1667 D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
\r
1668 For compression factor 4:
\r
1669 L(X) equals the lower 4 bits of X.
\r
1670 F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
\r
1671 D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
\r
1673 Imploding - Method 6
\r
1674 --------------------
\r
1676 The Imploding algorithm is actually a combination of two distinct
\r
1677 algorithms. The first algorithm compresses repeated byte
\r
1678 sequences using a sliding dictionary. The second algorithm is
\r
1679 used to compress the encoding of the sliding dictionary output,
\r
1680 using multiple Shannon-Fano trees.
\r
1682 The Imploding algorithm can use a 4K or 8K sliding dictionary
\r
1683 size. The dictionary size used can be determined by bit 1 in the
\r
1684 general purpose flag word; a 0 bit indicates a 4K dictionary
\r
1685 while a 1 bit indicates an 8K dictionary.
\r
1687 The Shannon-Fano trees are stored at the start of the compressed
\r
1688 file. The number of trees stored is defined by bit 2 in the
\r
1689 general purpose flag word; a 0 bit indicates two trees stored, a
\r
1690 1 bit indicates three trees are stored. If 3 trees are stored,
\r
1691 the first Shannon-Fano tree represents the encoding of the
\r
1692 Literal characters, the second tree represents the encoding of
\r
1693 the Length information, the third represents the encoding of the
\r
1694 Distance information. When 2 Shannon-Fano trees are stored, the
\r
1695 Length tree is stored first, followed by the Distance tree.
\r
1697 The Literal Shannon-Fano tree, if present is used to represent
\r
1698 the entire ASCII character set, and contains 256 values. This
\r
1699 tree is used to compress any data not compressed by the sliding
\r
1700 dictionary algorithm. When this tree is present, the Minimum
\r
1701 Match Length for the sliding dictionary is 3. If this tree is
\r
1702 not present, the Minimum Match Length is 2.
\r
1704 The Length Shannon-Fano tree is used to compress the Length part
\r
1705 of the (length,distance) pairs from the sliding dictionary
\r
1706 output. The Length tree contains 64 values, ranging from the
\r
1707 Minimum Match Length, to 63 plus the Minimum Match Length.
\r
1709 The Distance Shannon-Fano tree is used to compress the Distance
\r
1710 part of the (length,distance) pairs from the sliding dictionary
\r
1711 output. The Distance tree contains 64 values, ranging from 0 to
\r
1712 63, representing the upper 6 bits of the distance value. The
\r
1713 distance values themselves will be between 0 and the sliding
\r
1714 dictionary size, either 4K or 8K.
\r
1716 The Shannon-Fano trees themselves are stored in a compressed
\r
1717 format. The first byte of the tree data represents the number of
\r
1718 bytes of data representing the (compressed) Shannon-Fano tree
\r
1719 minus 1. The remaining bytes represent the Shannon-Fano tree
\r
1722 High 4 bits: Number of values at this bit length + 1. (1 - 16)
\r
1723 Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)
\r
1725 The Shannon-Fano codes can be constructed from the bit lengths
\r
1726 using the following algorithm:
\r
1728 1) Sort the Bit Lengths in ascending order, while retaining the
\r
1729 order of the original lengths stored in the file.
\r
1731 2) Generate the Shannon-Fano trees:
\r
1734 CodeIncrement <- 0
\r
1735 LastBitLength <- 0
\r
1736 i <- number of Shannon-Fano codes - 1 (either 255 or 63)
\r
1739 Code = Code + CodeIncrement
\r
1740 if BitLength(i) <> LastBitLength then
\r
1741 LastBitLength=BitLength(i)
\r
1742 CodeIncrement = 1 shifted left (16 - LastBitLength)
\r
1743 ShannonCode(i) = Code
\r
1747 3) Reverse the order of all the bits in the above ShannonCode()
\r
1748 vector, so that the most significant bit becomes the least
\r
1749 significant bit. For example, the value 0x1234 (hex) would
\r
1750 become 0x2C48 (hex).
\r
1752 4) Restore the order of Shannon-Fano codes as originally stored
\r
1757 This example will show the encoding of a Shannon-Fano tree
\r
1758 of size 8. Notice that the actual Shannon-Fano trees used
\r
1759 for Imploding are either 64 or 256 entries in size.
\r
1761 Example: 0x02, 0x42, 0x01, 0x13
\r
1763 The first byte indicates 3 values in this table. Decoding the
\r
1765 0x42 = 5 codes of 3 bits long
\r
1766 0x01 = 1 code of 2 bits long
\r
1767 0x13 = 2 codes of 4 bits long
\r
1769 This would generate the original bit length array of:
\r
1770 (3, 3, 3, 3, 3, 2, 4, 4)
\r
1772 There are 8 codes in this table for the values 0 thru 7. Using
\r
1773 the algorithm to obtain the Shannon-Fano codes produces:
\r
1775 Reversed Order Original
\r
1776 Val Sorted Constructed Code Value Restored Length
\r
1777 --- ------ ----------------- -------- -------- ------
\r
1778 0: 2 1100000000000000 11 101 3
\r
1779 1: 3 1010000000000000 101 001 3
\r
1780 2: 3 1000000000000000 001 110 3
\r
1781 3: 3 0110000000000000 110 010 3
\r
1782 4: 3 0100000000000000 010 100 3
\r
1783 5: 3 0010000000000000 100 11 2
\r
1784 6: 4 0001000000000000 1000 1000 4
\r
1785 7: 4 0000000000000000 0000 0000 4
\r
1787 The values in the Val, Order Restored and Original Length columns
\r
1788 now represent the Shannon-Fano encoding tree that can be used for
\r
1789 decoding the Shannon-Fano encoded data. How to parse the
\r
1790 variable length Shannon-Fano values from the data stream is beyond
\r
1791 the scope of this document. (See the references listed at the end of
\r
1792 this document for more information.) However, traditional decoding
\r
1793 schemes used for Huffman variable length decoding, such as the
\r
1794 Greenlaw algorithm, can be successfully applied.
\r
1796 The compressed data stream begins immediately after the
\r
1797 compressed Shannon-Fano data. The compressed data stream can be
\r
1798 interpreted as follows:
\r
1801 read 1 bit from input stream.
\r
1803 if this bit is non-zero then (encoded data is literal data)
\r
1804 if Literal Shannon-Fano tree is present
\r
1805 read and decode character using Literal Shannon-Fano tree.
\r
1807 read 8 bits from input stream.
\r
1808 copy character to the output stream.
\r
1809 otherwise (encoded data is sliding dictionary match)
\r
1810 if 8K dictionary size
\r
1811 read 7 bits for offset Distance (lower 7 bits of offset).
\r
1813 read 6 bits for offset Distance (lower 6 bits of offset).
\r
1815 using the Distance Shannon-Fano tree, read and decode the
\r
1816 upper 6 bits of the Distance value.
\r
1818 using the Length Shannon-Fano tree, read and decode
\r
1821 Length <- Length + Minimum Match Length
\r
1823 if Length = 63 + Minimum Match Length
\r
1824 read 8 bits from the input stream,
\r
1825 add this value to Length.
\r
1827 move backwards Distance+1 bytes in the output stream, and
\r
1828 copy Length characters from this position to the output
\r
1829 stream. (if this position is before the start of the output
\r
1830 stream, then assume that all the data before the start of
\r
1831 the output stream is filled with zeros).
\r
1834 Tokenizing - Method 7
\r
1835 ---------------------
\r
1837 This method is not used by PKZIP.
\r
1839 Deflating - Method 8
\r
1840 --------------------
\r
1842 The Deflate algorithm is similar to the Implode algorithm using
\r
1843 a sliding dictionary of up to 32K with secondary compression
\r
1844 from Huffman/Shannon-Fano codes.
\r
1846 The compressed data is stored in blocks with a header describing
\r
1847 the block and the Huffman codes used in the data block. The header
\r
1848 format is as follows:
\r
1850 Bit 0: Last Block bit This bit is set to 1 if this is the last
\r
1851 compressed block in the data.
\r
1852 Bits 1-2: Block type
\r
1853 00 (0) - Block is stored - All stored data is byte aligned.
\r
1854 Skip bits until next byte, then next word = block
\r
1855 length, followed by the ones compliment of the block
\r
1856 length word. Remaining data in block is the stored
\r
1859 01 (1) - Use fixed Huffman codes for literal and distance codes.
\r
1860 Lit Code Bits Dist Code Bits
\r
1861 --------- ---- --------- ----
\r
1862 0 - 143 8 0 - 31 5
\r
1867 Literal codes 286-287 and distance codes 30-31 are
\r
1868 never used but participate in the huffman construction.
\r
1870 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)
\r
1872 11 (3) - Reserved - Flag a "Error in compressed data" if seen.
\r
1874 Expanding Huffman Codes
\r
1875 -----------------------
\r
1876 If the data block is stored with dynamic Huffman codes, the Huffman
\r
1877 codes are sent in the following compressed format:
\r
1879 5 Bits: # of Literal codes sent - 256 (256 - 286)
\r
1880 All other codes are never sent.
\r
1881 5 Bits: # of Dist codes - 1 (1 - 32)
\r
1882 4 Bits: # of Bit Length codes - 3 (3 - 19)
\r
1884 The Huffman codes are sent as bit lengths and the codes are built as
\r
1885 described in the implode algorithm. The bit lengths themselves are
\r
1886 compressed with Huffman codes. There are 19 bit length codes:
\r
1888 0 - 15: Represent bit lengths of 0 - 15
\r
1889 16: Copy the previous bit length 3 - 6 times.
\r
1890 The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
\r
1891 Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
\r
1892 expand to 12 bit lengths of 8 (1 + 6 + 5)
\r
1893 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
\r
1894 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
\r
1896 The lengths of the bit length codes are sent packed 3 bits per value
\r
1897 (0 - 7) in the following order:
\r
1899 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
\r
1901 The Huffman codes should be built as described in the Implode algorithm
\r
1902 except codes are assigned starting at the shortest bit length, i.e. the
\r
1903 shortest code should be all 0's rather than all 1's. Also, codes with
\r
1904 a bit length of zero do not participate in the tree construction. The
\r
1905 codes are then used to decode the bit lengths for the literal and
\r
1908 The bit lengths for the literal tables are sent first with the number
\r
1909 of entries sent described by the 5 bits sent earlier. There are up
\r
1910 to 286 literal characters; the first 256 represent the respective 8
\r
1911 bit character, code 256 represents the End-Of-Block code, the remaining
\r
1912 29 codes represent copy lengths of 3 thru 258. There are up to 30
\r
1913 distance codes representing distances from 1 thru 32k as described
\r
1918 Extra Extra Extra Extra
\r
1919 Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)
\r
1920 ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------
\r
1921 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162
\r
1922 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194
\r
1923 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226
\r
1924 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257
\r
1925 261 0 7 269 2 19-22 277 4 67-82 285 0 258
\r
1926 262 0 8 270 2 23-26 278 4 83-98
\r
1927 263 0 9 271 2 27-30 279 4 99-114
\r
1928 264 0 10 272 2 31-34 280 4 115-130
\r
1932 Extra Extra Extra Extra
\r
1933 Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance
\r
1934 ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
\r
1935 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
\r
1936 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
\r
1937 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
\r
1938 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
\r
1939 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
\r
1940 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
\r
1941 6 2 9-12 14 6 129-192 22 10 2049-3072
\r
1942 7 2 13-16 15 6 193-256 23 10 3073-4096
\r
1944 The compressed data stream begins immediately after the
\r
1945 compressed header data. The compressed data stream can be
\r
1946 interpreted as follows:
\r
1949 read header from input stream.
\r
1952 skip bits until byte aligned
\r
1953 read count and 1's compliment of count
\r
1954 copy count bytes data block
\r
1956 loop until end of block code sent
\r
1957 decode literal character from input stream
\r
1959 copy character to the output stream
\r
1961 if literal = end of block
\r
1964 decode distance from input stream
\r
1966 move backwards distance bytes in the output stream, and
\r
1967 copy length characters from this position to the output
\r
1970 while not last block
\r
1972 if data descriptor exists
\r
1973 skip bits until byte aligned
\r
1974 read crc and sizes
\r
1977 Enhanced Deflating - Method 9
\r
1978 -----------------------------
\r
1980 The Enhanced Deflating algorithm is similar to Deflate but
\r
1981 uses a sliding dictionary of up to 64K. Deflate64(tm) is supported
\r
1982 by the Deflate extractor.
\r
1987 BZIP2 is an open-source data compression algorithm developed by
\r
1988 Julian Seward. Information and source code for this algorithm
\r
1989 can be found on the internet.
\r
1991 LZMA - Method 14 (EFS)
\r
1992 ----------------------
\r
1994 LZMA is a block-oriented, general purpose data compression algorithm
\r
1995 developed and maintained by Igor Pavlov. It is a derivative of LZ77
\r
1996 that utilizes Markov chains and a range coder. Information and
\r
1997 source code for this algorithm can be found on the internet. Consult
\r
1998 with the author of this algorithm for information on terms or
\r
1999 restrictions on use.
\r
2001 Support for LZMA within the ZIP format is defined as follows:
\r
2003 The Compression method field within the ZIP Local and Central
\r
2004 Header records will be set to the value 14 to indicate data was
\r
2005 compressed using LZMA.
\r
2007 The Version needed to extract field within the ZIP Local and
\r
2008 Central Header records will be set to 6.3 to indicate the
\r
2009 minimum ZIP format version supporting this feature.
\r
2011 File data compressed using the LZMA algorithm must be placed
\r
2012 immediately following the Local Header for the file. If a
\r
2013 standard ZIP encryption header is required, it will follow
\r
2014 the Local Header and will precede the LZMA compressed file
\r
2015 data segment. The location of LZMA compressed data segment
\r
2016 within the ZIP format will be as shown:
\r
2018 [local header file 1]
\r
2019 [encryption header file 1]
\r
2020 [LZMA compressed data segment for file 1]
\r
2021 [data descriptor 1]
\r
2022 [local header file 2]
\r
2024 The encryption header and data descriptor records may
\r
2025 be conditionally present. The LZMA Compressed Data Segment
\r
2026 will consist of an LZMA Properties Header followed by the
\r
2027 LZMA Compressed Data as shown:
\r
2029 [LZMA properties header for file 1]
\r
2030 [LZMA compressed data for file 1]
\r
2032 The LZMA Compressed Data will be stored as provided by the
\r
2033 LZMA compression library. Compressed size, uncompressed
\r
2034 size and other file characteristics about the file being
\r
2035 compressed must be stored in standard ZIP storage format.
\r
2037 The LZMA Properties Header will store specific data required to
\r
2038 decompress the LZMA compressed Data. This data is set by the
\r
2039 LZMA compression engine using the function WriteCoderProperties()
\r
2040 as documented within the LZMA SDK.
\r
2042 Storage fields for the property information within the LZMA
\r
2043 Properties Header are as follows:
\r
2045 LZMA Version Information 2 bytes
\r
2046 LZMA Properties Size 2 bytes
\r
2047 LZMA Properties Data variable, defined by "LZMA Properties Size"
\r
2049 LZMA Version Information - this field identifies which version of
\r
2050 the LZMA SDK was used to compress a file. The first byte will
\r
2051 store the major version number of the LZMA SDK and the second
\r
2052 byte will store the minor number.
\r
2054 LZMA Properties Size - this field defines the size of the remaining
\r
2055 property data. Typically this size should be determined by the
\r
2056 version of the SDK. This size field is included as a convenience
\r
2057 and to help avoid any ambiguity should it arise in the future due
\r
2058 to changes in this compression algorithm.
\r
2060 LZMA Property Data - this variable sized field records the required
\r
2061 values for the decompressor as defined by the LZMA SDK. The
\r
2062 data stored in this field should be obtained using the
\r
2063 WriteCoderProperties() in the version of the SDK defined by
\r
2064 the "LZMA Version Information" field.
\r
2066 The layout of the "LZMA Properties Data" field is a function of the
\r
2067 LZMA compression algorithm. It is possible that this layout may be
\r
2068 changed by the author over time. The data layout in version 4.32
\r
2069 of the LZMA SDK defines a 5 byte array that uses 4 bytes to store
\r
2070 the dictionary size in little-endian order. This is preceded by a
\r
2071 single packed byte as the first element of the array that contains
\r
2072 the following fields:
\r
2075 LiteralPosStateBits
\r
2076 LiteralContextBits
\r
2078 Refer to the LZMA documentation for a more detailed explanation of
\r
2081 Data compressed with method 14, LZMA, may include an end-of-stream
\r
2082 (EOS) marker ending the compressed data stream. This marker is not
\r
2083 required, but its use is highly recommended to facilitate processing
\r
2084 and implementers should include the EOS marker whenever possible.
\r
2085 When the EOS marker is used, general purpose bit 1 must be set. If
\r
2086 general purpose bit 1 is not set, the EOS marker is not present.
\r
2088 WavPack - Method 97
\r
2089 -------------------
\r
2091 Information describing the use of compression method 97 is
\r
2092 provided by WinZIP International, LLC. This method relies on the
\r
2093 open source WavPack audio compression utility developed by David Bryant.
\r
2094 Information on WavPack is available at www.wavpack.com. Please consult
\r
2095 with the author of this algorithm for information on terms and
\r
2096 restrictions on use.
\r
2098 WavPack data for a file begins immediately after the end of the
\r
2099 local header data. This data is the output from WavPack compression
\r
2100 routines. Within the ZIP file, the use of WavPack compression is
\r
2101 indicated by setting the compression method field to a value of 97
\r
2102 in both the local header and the central directory header. The Version
\r
2103 needed to extract and version made by fields use the same values as are
\r
2104 used for data compressed using the Deflate algorithm.
\r
2106 An implementation note for storing digital sample data when using
\r
2107 WavPack compression within ZIP files is that all of the bytes of
\r
2108 the sample data should be compressed. This includes any unused
\r
2109 bits up to the byte boundary. An example is a 2 byte sample that
\r
2110 uses only 12 bits for the sample data with 4 unused bits. If only
\r
2111 12 bits are passed as the sample size to the WavPack routines, the 4
\r
2112 unused bits will be set to 0 on extraction regardless of their original
\r
2113 state. To avoid this, the full 16 bits of the sample data size
\r
2114 should be provided.
\r
2119 PPMd is a data compression algorithm developed by Dmitry Shkarin
\r
2120 which includes a carryless rangecoder developed by Dmitry Subbotin.
\r
2121 This algorithm is based on predictive phrase matching on multiple
\r
2122 order contexts. Information and source code for this algorithm
\r
2123 can be found on the internet. Consult with the author of this
\r
2124 algorithm for information on terms or restrictions on use.
\r
2126 Support for PPMd within the ZIP format currently is provided only
\r
2127 for version I, revision 1 of the algorithm. Storage requirements
\r
2128 for using this algorithm are as follows:
\r
2130 Parameters needed to control the algorithm are stored in the two
\r
2131 bytes immediately preceding the compressed data. These bytes are
\r
2132 used to store the following fields:
\r
2134 Model order - sets the maximum model order, default is 8, possible
\r
2135 values are from 2 to 16 inclusive
\r
2137 Sub-allocator size - sets the size of sub-allocator in MB, default is 50,
\r
2138 possible values are from 1MB to 256MB inclusive
\r
2140 Model restoration method - sets the method used to restart context
\r
2141 model at memory insufficiency, values are:
\r
2143 0 - restarts model from scratch - default
\r
2144 1 - cut off model - decreases performance by as much as 2x
\r
2145 2 - freeze context tree - not recommended
\r
2147 An example for packing these fields into the 2 byte storage field is
\r
2148 illustrated below. These values are stored in Intel low-byte/high-byte
\r
2151 wPPMd = (Model order - 1) +
\r
2152 ((Sub-allocator size - 1) << 4) +
\r
2153 (Model restoration method << 12)
\r
2156 VII. Traditional PKWARE Encryption
\r
2157 ----------------------------------
\r
2159 The following information discusses the decryption steps
\r
2160 required to support traditional PKWARE encryption. This
\r
2161 form of encryption is considered weak by today's standards
\r
2162 and its use is recommended only for situations with
\r
2163 low security needs or for compatibility with older .ZIP
\r
2169 PKWARE is grateful to Mr. Roger Schlafly for his expert contribution
\r
2170 towards the development of PKWARE's traditional encryption.
\r
2172 PKZIP encrypts the compressed data stream. Encrypted files must
\r
2173 be decrypted before they can be extracted.
\r
2175 Each encrypted file has an extra 12 bytes stored at the start of
\r
2176 the data area defining the encryption header for that file. The
\r
2177 encryption header is originally set to random values, and then
\r
2178 itself encrypted, using three, 32-bit keys. The key values are
\r
2179 initialized using the supplied encryption password. After each byte
\r
2180 is encrypted, the keys are then updated using pseudo-random number
\r
2181 generation techniques in combination with the same CRC-32 algorithm
\r
2182 used in PKZIP and described elsewhere in this document.
\r
2184 The following is the basic steps required to decrypt a file:
\r
2186 1) Initialize the three 32-bit keys with the password.
\r
2187 2) Read and decrypt the 12-byte encryption header, further
\r
2188 initializing the encryption keys.
\r
2189 3) Read and decrypt the compressed data stream using the
\r
2192 Step 1 - Initializing the encryption keys
\r
2193 -----------------------------------------
\r
2195 Key(0) <- 305419896
\r
2196 Key(1) <- 591751049
\r
2197 Key(2) <- 878082192
\r
2199 loop for i <- 0 to length(password)-1
\r
2200 update_keys(password(i))
\r
2203 Where update_keys() is defined as:
\r
2205 update_keys(char):
\r
2206 Key(0) <- crc32(key(0),char)
\r
2207 Key(1) <- Key(1) + (Key(0) & 000000ffH)
\r
2208 Key(1) <- Key(1) * 134775813 + 1
\r
2209 Key(2) <- crc32(key(2),key(1) >> 24)
\r
2212 Where crc32(old_crc,char) is a routine that given a CRC value and a
\r
2213 character, returns an updated CRC value after applying the CRC-32
\r
2214 algorithm described elsewhere in this document.
\r
2216 Step 2 - Decrypting the encryption header
\r
2217 -----------------------------------------
\r
2219 The purpose of this step is to further initialize the encryption
\r
2220 keys, based on random data, to render a plaintext attack on the
\r
2223 Read the 12-byte encryption header into Buffer, in locations
\r
2224 Buffer(0) thru Buffer(11).
\r
2226 loop for i <- 0 to 11
\r
2227 C <- buffer(i) ^ decrypt_byte()
\r
2232 Where decrypt_byte() is defined as:
\r
2234 unsigned char decrypt_byte()
\r
2235 local unsigned short temp
\r
2236 temp <- Key(2) | 2
\r
2237 decrypt_byte <- (temp * (temp ^ 1)) >> 8
\r
2240 After the header is decrypted, the last 1 or 2 bytes in Buffer
\r
2241 should be the high-order word/byte of the CRC for the file being
\r
2242 decrypted, stored in Intel low-byte/high-byte order. Versions of
\r
2243 PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
\r
2244 used on versions after 2.0. This can be used to test if the password
\r
2245 supplied is correct or not.
\r
2247 Step 3 - Decrypting the compressed data stream
\r
2248 ----------------------------------------------
\r
2250 The compressed data stream can be decrypted as follows:
\r
2253 read a character into C
\r
2254 Temp <- C ^ decrypt_byte()
\r
2260 VIII. Strong Encryption Specification
\r
2261 -------------------------------------
\r
2263 The Strong Encryption technology defined in this specification is
\r
2264 covered under a pending patent application. The use or implementation
\r
2265 in a product of certain technological aspects set forth in the current
\r
2266 APPNOTE, including those with regard to strong encryption, patching,
\r
2267 or extended tape operations requires a license from PKWARE. Portions
\r
2268 of this Strong Encryption technology are available for use at no charge.
\r
2269 Contact PKWARE for licensing terms and conditions. Refer to section II
\r
2270 of this APPNOTE (Contacting PKWARE) for information on how to
\r
2273 Version 5.x of this specification introduced support for strong
\r
2274 encryption algorithms. These algorithms can be used with either
\r
2275 a password or an X.509v3 digital certificate to encrypt each file.
\r
2276 This format specification supports either password or certificate
\r
2277 based encryption to meet the security needs of today, to enable
\r
2278 interoperability between users within both PKI and non-PKI
\r
2279 environments, and to ensure interoperability between different
\r
2280 computing platforms that are running a ZIP program.
\r
2282 Password based encryption is the most common form of encryption
\r
2283 people are familiar with. However, inherent weaknesses with
\r
2284 passwords (e.g. susceptibility to dictionary/brute force attack)
\r
2285 as well as password management and support issues make certificate
\r
2286 based encryption a more secure and scalable option. Industry
\r
2287 efforts and support are defining and moving towards more advanced
\r
2288 security solutions built around X.509v3 digital certificates and
\r
2289 Public Key Infrastructures(PKI) because of the greater scalability,
\r
2290 administrative options, and more robust security over traditional
\r
2291 password based encryption.
\r
2293 Most standard encryption algorithms are supported with this
\r
2294 specification. Reference implementations for many of these
\r
2295 algorithms are available from either commercial or open source
\r
2296 distributors. Readily available cryptographic toolkits make
\r
2297 implementation of the encryption features straight-forward.
\r
2298 This document is not intended to provide a treatise on data
\r
2299 encryption principles or theory. Its purpose is to document the
\r
2300 data structures required for implementing interoperable data
\r
2301 encryption within the .ZIP format. It is strongly recommended that
\r
2302 you have a good understanding of data encryption before reading
\r
2305 The algorithms introduced in Version 5.0 of this specification
\r
2308 RC2 40 bit, 64 bit, and 128 bit
\r
2309 RC4 40 bit, 64 bit, and 128 bit
\r
2311 3DES 112 bit and 168 bit
\r
2313 Version 5.1 adds support for the following:
\r
2315 AES 128 bit, 192 bit, and 256 bit
\r
2318 Version 6.1 introduces encryption data changes to support
\r
2319 interoperability with Smartcard and USB Token certificate storage
\r
2320 methods which do not support the OAEP strengthening standard.
\r
2322 Version 6.2 introduces support for encrypting metadata by compressing
\r
2323 and encrypting the central directory data structure to reduce information
\r
2324 leakage. Information leakage can occur in legacy ZIP applications
\r
2325 through exposure of information about a file even though that file is
\r
2326 stored encrypted. The information exposed consists of file
\r
2327 characteristics stored within the records and fields defined by this
\r
2328 specification. This includes data such as a files name, its original
\r
2329 size, timestamp and CRC32 value.
\r
2331 Version 6.3 introduces support for encrypting data using the Blowfish
\r
2332 and Twofish algorithms. These are symmetric block ciphers developed
\r
2333 by Bruce Schneier. Blowfish supports using a variable length key from
\r
2334 32 to 448 bits. Block size is 64 bits. Implementations should use 16
\r
2335 rounds and the only mode supported within ZIP files is CBC. Twofish
\r
2336 supports key sizes 128, 192 and 256 bits. Block size is 128 bits.
\r
2337 Implementations should use 16 rounds and the only mode supported within
\r
2338 ZIP files is CBC. Information and source code for both Blowfish and
\r
2339 Twofish algorithms can be found on the internet. Consult with the author
\r
2340 of these algorithms for information on terms or restrictions on use.
\r
2342 Central Directory Encryption provides greater protection against
\r
2343 information leakage by encrypting the Central Directory structure and
\r
2344 by masking key values that are replicated in the unencrypted Local
\r
2345 Header. ZIP compatible programs that cannot interpret an encrypted
\r
2346 Central Directory structure cannot rely on the data in the corresponding
\r
2347 Local Header for decompression information.
\r
2349 Extra Field records that may contain information about a file that should
\r
2350 not be exposed should not be stored in the Local Header and should only
\r
2351 be written to the Central Directory where they can be encrypted. This
\r
2352 design currently does not support streaming. Information in the End of
\r
2353 Central Directory record, the Zip64 End of Central Directory Locator,
\r
2354 and the Zip64 End of Central Directory records are not encrypted. Access
\r
2355 to view data on files within a ZIP file with an encrypted Central Directory
\r
2356 requires the appropriate password or private key for decryption prior to
\r
2357 viewing any files, or any information about the files, in the archive.
\r
2359 Older ZIP compatible programs not familiar with the Central Directory
\r
2360 Encryption feature will no longer be able to recognize the Central
\r
2361 Directory and may assume the ZIP file is corrupt. Programs that
\r
2362 attempt streaming access using Local Headers will see invalid
\r
2363 information for each file. Central Directory Encryption need not be
\r
2364 used for every ZIP file. Its use is recommended for greater security.
\r
2365 ZIP files not using Central Directory Encryption should operate as
\r
2368 This strong encryption feature specification is intended to provide for
\r
2369 scalable, cross-platform encryption needs ranging from simple password
\r
2370 encryption to authenticated public/private key encryption.
\r
2372 Encryption provides data confidentiality and privacy. It is
\r
2373 recommended that you combine X.509 digital signing with encryption
\r
2374 to add authentication and non-repudiation.
\r
2377 Single Password Symmetric Encryption Method:
\r
2378 -------------------------------------------
\r
2380 The Single Password Symmetric Encryption Method using strong
\r
2381 encryption algorithms operates similarly to the traditional
\r
2382 PKWARE encryption defined in this format. Additional data
\r
2383 structures are added to support the processing needs of the
\r
2384 strong algorithms.
\r
2386 The Strong Encryption data structures are:
\r
2388 1. General Purpose Bits - Bits 0 and 6 of the General Purpose bit
\r
2389 flag in both local and central header records. Both bits set
\r
2390 indicates strong encryption. Bit 13, when set indicates the Central
\r
2391 Directory is encrypted and that selected fields in the Local Header
\r
2392 are masked to hide their actual value.
\r
2395 2. Extra Field 0x0017 in central header only.
\r
2397 Fields to consider in this record are:
\r
2399 Format - the data format identifier for this record. The only
\r
2400 value allowed at this time is the integer value 2.
\r
2402 AlgId - integer identifier of the encryption algorithm from the
\r
2406 0x6602 - RC2 (version needed to extract < 5.2)
\r
2412 0x6702 - RC2 (version needed to extract >= 5.2)
\r
2416 0xFFFF - Unknown algorithm
\r
2418 Bitlen - Explicit bit length of key
\r
2422 Flags - Processing flags needed for decryption
\r
2424 0x0001 - Password is required to decrypt
\r
2425 0x0002 - Certificates only
\r
2426 0x0003 - Password or certificate required to decrypt
\r
2428 Values > 0x0003 reserved for certificate processing
\r
2431 3. Decryption header record preceding compressed file data.
\r
2433 -Decryption Header:
\r
2435 Value Size Description
\r
2436 ----- ---- -----------
\r
2437 IVSize 2 bytes Size of initialization vector (IV)
\r
2438 IVData IVSize Initialization vector for this file
\r
2439 Size 4 bytes Size of remaining decryption header data
\r
2440 Format 2 bytes Format definition for this record
\r
2441 AlgID 2 bytes Encryption algorithm identifier
\r
2442 Bitlen 2 bytes Bit length of encryption key
\r
2443 Flags 2 bytes Processing flags
\r
2444 ErdSize 2 bytes Size of Encrypted Random Data
\r
2445 ErdData ErdSize Encrypted Random Data
\r
2446 Reserved1 4 bytes Reserved certificate processing data
\r
2447 Reserved2 (var) Reserved for certificate processing data
\r
2448 VSize 2 bytes Size of password validation data
\r
2449 VData VSize-4 Password validation data
\r
2450 VCRC32 4 bytes Standard ZIP CRC32 of password validation data
\r
2452 IVData - The size of the IV should match the algorithm block size.
\r
2453 The IVData can be completely random data. If the size of
\r
2454 the randomly generated data does not match the block size
\r
2455 it should be complemented with zero's or truncated as
\r
2456 necessary. If IVSize is 0,then IV = CRC32 + Uncompressed
\r
2457 File Size (as a 64 bit little-endian, unsigned integer value).
\r
2459 Format - the data format identifier for this record. The only
\r
2460 value allowed at this time is the integer value 3.
\r
2462 AlgId - integer identifier of the encryption algorithm from the
\r
2466 0x6602 - RC2 (version needed to extract < 5.2)
\r
2472 0x6702 - RC2 (version needed to extract >= 5.2)
\r
2476 0xFFFF - Unknown algorithm
\r
2478 Bitlen - Explicit bit length of key
\r
2482 Flags - Processing flags needed for decryption
\r
2484 0x0001 - Password is required to decrypt
\r
2485 0x0002 - Certificates only
\r
2486 0x0003 - Password or certificate required to decrypt
\r
2488 Values > 0x0003 reserved for certificate processing
\r
2490 ErdData - Encrypted random data is used to store random data that
\r
2491 is used to generate a file session key for encrypting
\r
2492 each file. SHA1 is used to calculate hash data used to
\r
2493 derive keys. File session keys are derived from a master
\r
2494 session key generated from the user-supplied password.
\r
2495 If the Flags field in the decryption header contains
\r
2496 the value 0x4000, then the ErdData field must be
\r
2497 decrypted using 3DES. If the value 0x4000 is not set,
\r
2498 then the ErdData field must be decrypted using AlgId.
\r
2501 Reserved1 - Reserved for certificate processing, if value is
\r
2502 zero, then Reserved2 data is absent. See the explanation
\r
2503 under the Certificate Processing Method for details on
\r
2504 this data structure.
\r
2506 Reserved2 - If present, the size of the Reserved2 data structure
\r
2507 is located by skipping the first 4 bytes of this field
\r
2508 and using the next 2 bytes as the remaining size. See
\r
2509 the explanation under the Certificate Processing Method
\r
2510 for details on this data structure.
\r
2512 VSize - This size value will always include the 4 bytes of the
\r
2513 VCRC32 data and will be greater than 4 bytes.
\r
2515 VData - Random data for password validation. This data is VSize
\r
2516 in length and VSize must be a multiple of the encryption
\r
2517 block size. VCRC32 is a checksum value of VData.
\r
2518 VData and VCRC32 are stored encrypted and start the
\r
2519 stream of encrypted data for a file.
\r
2524 Strong Encryption is always applied to a file after compression. The
\r
2525 block oriented algorithms all operate in Cypher Block Chaining (CBC)
\r
2526 mode. The block size used for AES encryption is 16. All other block
\r
2527 algorithms use a block size of 8. Two ID's are defined for RC2 to
\r
2528 account for a discrepancy found in the implementation of the RC2
\r
2529 algorithm in the cryptographic library on Windows XP SP1 and all
\r
2530 earlier versions of Windows. It is recommended that zero length files
\r
2531 not be encrypted, however programs should be prepared to extract them
\r
2532 if they are found within a ZIP file.
\r
2534 A pseudo-code representation of the encryption process is as follows:
\r
2536 Password = GetUserPassword()
\r
2537 MasterSessionKey = DeriveKey(SHA1(Password))
\r
2538 RD = CryptographicStrengthRandomData()
\r
2540 IV = CryptographicStrengthRandomData()
\r
2541 VData = CryptographicStrengthRandomData()
\r
2542 VCRC32 = CRC32(VData)
\r
2543 FileSessionKey = DeriveKey(SHA1(IV + RD)
\r
2544 ErdData = Encrypt(RD,MasterSessionKey,IV)
\r
2545 Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)
\r
2548 The function names and parameter requirements will depend on
\r
2549 the choice of the cryptographic toolkit selected. Almost any
\r
2550 toolkit supporting the reference implementations for each
\r
2551 algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft
\r
2552 CryptoAPI libraries are all known to work well.
\r
2555 Single Password - Central Directory Encryption:
\r
2556 -----------------------------------------------
\r
2558 Central Directory Encryption is achieved within the .ZIP format by
\r
2559 encrypting the Central Directory structure. This encapsulates the metadata
\r
2560 most often used for processing .ZIP files. Additional metadata is stored for
\r
2561 redundancy in the Local Header for each file. The process of concealing
\r
2562 metadata by encrypting the Central Directory does not protect the data within
\r
2563 the Local Header. To avoid information leakage from the exposed metadata
\r
2564 in the Local Header, the fields containing information about a file are masked.
\r
2568 Masking replaces the true content of the fields for a file in the Local
\r
2569 Header with false information. When masked, the Local Header is not
\r
2570 suitable for streaming access and the options for data recovery of damaged
\r
2571 archives is reduced. Extra Data fields that may contain confidential
\r
2572 data should not be stored within the Local Header. The value set into
\r
2573 the Version needed to extract field should be the correct value needed to
\r
2574 extract the file without regard to Central Directory Encryption. The fields
\r
2575 within the Local Header targeted for masking when the Central Directory is
\r
2578 Field Name Mask Value
\r
2579 ------------------ ---------------------------
\r
2580 compression method 0
\r
2581 last mod file time 0
\r
2582 last mod file date 0
\r
2585 uncompressed size 0
\r
2586 file name (variable size) Base 16 value from the
\r
2587 range 1 - 0xFFFFFFFFFFFFFFFF
\r
2588 represented as a string whose
\r
2589 size will be set into the
\r
2590 file name length field
\r
2592 The Base 16 value assigned as a masked file name is simply a sequentially
\r
2593 incremented value for each file starting with 1 for the first file.
\r
2594 Modifications to a ZIP file may cause different values to be stored for
\r
2595 each file. For compatibility, the file name field in the Local Header
\r
2596 should never be left blank. As of Version 6.2 of this specification,
\r
2597 the Compression Method and Compressed Size fields are not yet masked.
\r
2598 Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format
\r
2599 should not be masked.
\r
2601 Encrypting the Central Directory:
\r
2603 Encryption of the Central Directory does not include encryption of the
\r
2604 Central Directory Signature data, the Zip64 End of Central Directory
\r
2605 record, the Zip64 End of Central Directory Locator, or the End
\r
2606 of Central Directory record. The ZIP file comment data is never
\r
2609 Before encrypting the Central Directory, it may optionally be compressed.
\r
2610 Compression is not required, but for storage efficiency it is assumed
\r
2611 this structure will be compressed before encrypting. Similarly, this
\r
2612 specification supports compressing the Central Directory without
\r
2613 requiring that it also be encrypted. Early implementations of this
\r
2614 feature will assume the encryption method applied to files matches the
\r
2615 encryption applied to the Central Directory.
\r
2617 Encryption of the Central Directory is done in a manner similar to
\r
2618 that of file encryption. The encrypted data is preceded by a
\r
2619 decryption header. The decryption header is known as the Archive
\r
2620 Decryption Header. The fields of this record are identical to
\r
2621 the decryption header preceding each encrypted file. The location
\r
2622 of the Archive Decryption Header is determined by the value in the
\r
2623 Start of the Central Directory field in the Zip64 End of Central
\r
2624 Directory record. When the Central Directory is encrypted, the
\r
2625 Zip64 End of Central Directory record will always be present.
\r
2627 The layout of the Zip64 End of Central Directory record for all
\r
2628 versions starting with 6.2 of this specification will follow the
\r
2629 Version 2 format. The Version 2 format is as follows:
\r
2631 The leading fixed size fields within the Version 1 format for this
\r
2632 record remain unchanged. The record signature for both Version 1
\r
2633 and Version 2 will be 0x06064b50. Immediately following the last
\r
2634 byte of the field known as the Offset of Start of Central
\r
2635 Directory With Respect to the Starting Disk Number will begin the
\r
2636 new fields defining Version 2 of this record.
\r
2638 New fields for Version 2:
\r
2640 Note: all fields stored in Intel low-byte/high-byte order.
\r
2642 Value Size Description
\r
2643 ----- ---- -----------
\r
2644 Compression Method 2 bytes Method used to compress the
\r
2646 Compressed Size 8 bytes Size of the compressed data
\r
2647 Original Size 8 bytes Original uncompressed size
\r
2648 AlgId 2 bytes Encryption algorithm ID
\r
2649 BitLen 2 bytes Encryption key length
\r
2650 Flags 2 bytes Encryption flags
\r
2651 HashID 2 bytes Hash algorithm identifier
\r
2652 Hash Length 2 bytes Length of hash data
\r
2653 Hash Data (variable) Hash data
\r
2655 The Compression Method accepts the same range of values as the
\r
2656 corresponding field in the Central Header.
\r
2658 The Compressed Size and Original Size values will not include the
\r
2659 data of the Central Directory Signature which is compressed or
\r
2662 The AlgId, BitLen, and Flags fields accept the same range of values
\r
2663 the corresponding fields within the 0x0017 record.
\r
2665 Hash ID identifies the algorithm used to hash the Central Directory
\r
2666 data. This data does not have to be hashed, in which case the
\r
2667 values for both the HashID and Hash Length will be 0. Possible
\r
2668 values for HashID are:
\r
2681 When the Central Directory data is signed, the same hash algorithm
\r
2682 used to hash the Central Directory for signing should be used.
\r
2683 This is recommended for processing efficiency, however, it is
\r
2684 permissible for any of the above algorithms to be used independent
\r
2685 of the signing process.
\r
2687 The Hash Data will contain the hash data for the Central Directory.
\r
2688 The length of this data will vary depending on the algorithm used.
\r
2690 The Version Needed to Extract should be set to 62.
\r
2692 The value for the Total Number of Entries on the Current Disk will
\r
2693 be 0. These records will no longer support random access when
\r
2694 encrypting the Central Directory.
\r
2696 When the Central Directory is compressed and/or encrypted, the
\r
2697 End of Central Directory record will store the value 0xFFFFFFFF
\r
2698 as the value for the Total Number of Entries in the Central
\r
2699 Directory. The value stored in the Total Number of Entries in
\r
2700 the Central Directory on this Disk field will be 0. The actual
\r
2701 values will be stored in the equivalent fields of the Zip64
\r
2702 End of Central Directory record.
\r
2704 Decrypting and decompressing the Central Directory is accomplished
\r
2705 in the same manner as decrypting and decompressing a file.
\r
2707 Certificate Processing Method:
\r
2708 -----------------------------
\r
2710 The Certificate Processing Method of for ZIP file encryption
\r
2711 defines the following additional data fields:
\r
2713 1. Certificate Flag Values
\r
2715 Additional processing flags that can be present in the Flags field of both
\r
2716 the 0x0017 field of the central directory Extra Field and the Decryption
\r
2717 header record preceding compressed file data are:
\r
2719 0x0007 - reserved for future use
\r
2720 0x000F - reserved for future use
\r
2721 0x0100 - Indicates non-OAEP key wrapping was used. If this
\r
2722 this field is set, the version needed to extract must
\r
2723 be at least 61. This means OAEP key wrapping is not
\r
2724 used when generating a Master Session Key using
\r
2726 0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the
\r
2727 same algorithm used for encrypting the file contents.
\r
2728 0x8000 - reserved for future use
\r
2731 2. CertData - Extra Field 0x0017 record certificate data structure
\r
2733 The data structure used to store certificate data within the section
\r
2734 of the Extra Field defined by the CertData field of the 0x0017
\r
2735 record are as shown:
\r
2737 Value Size Description
\r
2738 ----- ---- -----------
\r
2739 RCount 4 bytes Number of recipients.
\r
2740 HashAlg 2 bytes Hash algorithm identifier
\r
2741 HSize 2 bytes Hash size
\r
2742 SRList (var) Simple list of recipients hashed public keys
\r
2745 RCount This defines the number intended recipients whose
\r
2746 public keys were used for encryption. This identifies
\r
2747 the number of elements in the SRList.
\r
2749 HashAlg This defines the hash algorithm used to calculate
\r
2750 the public key hash of each public key used
\r
2751 for encryption. This field currently supports
\r
2752 only the following value for SHA-1
\r
2756 HSize This defines the size of a hashed public key.
\r
2758 SRList This is a variable length list of the hashed
\r
2759 public keys for each intended recipient. Each
\r
2760 element in this list is HSize. The total size of
\r
2761 SRList is determined using RCount * HSize.
\r
2764 3. Reserved1 - Certificate Decryption Header Reserved1 Data:
\r
2766 Value Size Description
\r
2767 ----- ---- -----------
\r
2768 RCount 4 bytes Number of recipients.
\r
2770 RCount This defines the number intended recipients whose
\r
2771 public keys were used for encryption. This defines
\r
2772 the number of elements in the REList field defined below.
\r
2775 4. Reserved2 - Certificate Decryption Header Reserved2 Data Structures:
\r
2778 Value Size Description
\r
2779 ----- ---- -----------
\r
2780 HashAlg 2 bytes Hash algorithm identifier
\r
2781 HSize 2 bytes Hash size
\r
2782 REList (var) List of recipient data elements
\r
2785 HashAlg This defines the hash algorithm used to calculate
\r
2786 the public key hash of each public key used
\r
2787 for encryption. This field currently supports
\r
2788 only the following value for SHA-1
\r
2792 HSize This defines the size of a hashed public key
\r
2793 defined in REHData.
\r
2795 REList This is a variable length of list of recipient data.
\r
2796 Each element in this list consists of a Recipient
\r
2797 Element data structure as follows:
\r
2800 Recipient Element (REList) Data Structure:
\r
2802 Value Size Description
\r
2803 ----- ---- -----------
\r
2804 RESize 2 bytes Size of REHData + REKData
\r
2805 REHData HSize Hash of recipients public key
\r
2806 REKData (var) Simple key blob
\r
2809 RESize This defines the size of an individual REList
\r
2810 element. This value is the combined size of the
\r
2811 REHData field + REKData field. REHData is defined by
\r
2812 HSize. REKData is variable and can be calculated
\r
2813 for each REList element using RESize and HSize.
\r
2815 REHData Hashed public key for this recipient.
\r
2817 REKData Simple Key Blob. The format of this data structure
\r
2818 is identical to that defined in the Microsoft
\r
2819 CryptoAPI and generated using the CryptExportKey()
\r
2820 function. The version of the Simple Key Blob
\r
2821 supported at this time is 0x02 as defined by
\r
2824 Certificate Processing - Central Directory Encryption:
\r
2825 ------------------------------------------------------
\r
2827 Central Directory Encryption using Digital Certificates will
\r
2828 operate in a manner similar to that of Single Password Central
\r
2829 Directory Encryption. This record will only be present when there
\r
2830 is data to place into it. Currently, data is placed into this
\r
2831 record when digital certificates are used for either encrypting
\r
2832 or signing the files within a ZIP file. When only password
\r
2833 encryption is used with no certificate encryption or digital
\r
2834 signing, this record is not currently needed. When present, this
\r
2835 record will appear before the start of the actual Central Directory
\r
2836 data structure and will be located immediately after the Archive
\r
2837 Decryption Header if the Central Directory is encrypted.
\r
2839 The Archive Extra Data record will be used to store the following
\r
2840 information. Additional data may be added in future versions.
\r
2842 Extra Data Fields:
\r
2844 0x0014 - PKCS#7 Store for X.509 Certificates
\r
2845 0x0016 - X.509 Certificate ID and Signature for central directory
\r
2846 0x0019 - PKCS#7 Encryption Recipient Certificate List
\r
2848 The 0x0014 and 0x0016 Extra Data records that otherwise would be
\r
2849 located in the first record of the Central Directory for digital
\r
2850 certificate processing. When encrypting or compressing the Central
\r
2851 Directory, the 0x0014 and 0x0016 records must be located in the
\r
2852 Archive Extra Data record and they should not remain in the first
\r
2853 Central Directory record. The Archive Extra Data record will also
\r
2854 be used to store the 0x0019 data.
\r
2856 When present, the size of the Archive Extra Data record will be
\r
2857 included in the size of the Central Directory. The data of the
\r
2858 Archive Extra Data record will also be compressed and encrypted
\r
2859 along with the Central Directory data structure.
\r
2861 Certificate Processing Differences:
\r
2863 The Certificate Processing Method of encryption differs from the
\r
2864 Single Password Symmetric Encryption Method as follows. Instead
\r
2865 of using a user-defined password to generate a master session key,
\r
2866 cryptographically random data is used. The key material is then
\r
2867 wrapped using standard key-wrapping techniques. This key material
\r
2868 is wrapped using the public key of each recipient that will need
\r
2869 to decrypt the file using their corresponding private key.
\r
2871 This specification currently assumes digital certificates will follow
\r
2872 the X.509 V3 format for 1024 bit and higher RSA format digital
\r
2873 certificates. Implementation of this Certificate Processing Method
\r
2874 requires supporting logic for key access and management. This logic
\r
2875 is outside the scope of this specification.
\r
2877 OAEP Processing with Certificate-based Encryption:
\r
2879 OAEP stands for Optimal Asymmetric Encryption Padding. It is a
\r
2880 strengthening technique used for small encoded items such as decryption
\r
2881 keys. This is commonly applied in cryptographic key-wrapping techniques
\r
2882 and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification
\r
2883 were designed to support OAEP key-wrapping for certificate-based
\r
2884 decryption keys for additional security.
\r
2886 Support for private keys stored on Smartcards or Tokens introduced
\r
2887 a conflict with this OAEP logic. Most card and token products do
\r
2888 not support the additional strengthening applied to OAEP key-wrapped
\r
2889 data. In order to resolve this conflict, versions 6.1 and above of this
\r
2890 specification will no longer support OAEP when encrypting using
\r
2891 digital certificates.
\r
2893 Versions of PKZIP available during initial development of the
\r
2894 certificate processing method set a value of 61 into the
\r
2895 version needed to extract field for a file. This indicates that
\r
2896 non-OAEP key wrapping is used. This affects certificate encryption
\r
2897 only, and password encryption functions should not be affected by
\r
2898 this value. This means values of 61 may be found on files encrypted
\r
2899 with certificates only, or on files encrypted with both password
\r
2900 encryption and certificate encryption. Files encrypted with both
\r
2901 methods can safely be decrypted using the password methods documented.
\r
2903 IX. Change Process
\r
2904 ------------------
\r
2906 In order for the .ZIP file format to remain a viable definition, this
\r
2907 specification should be considered as open for periodic review and
\r
2908 revision. Although this format was originally designed with a
\r
2909 certain level of extensibility, not all changes in technology
\r
2910 (present or future) were or will be necessarily considered in its
\r
2911 design. If your application requires new definitions to the
\r
2912 extensible sections in this format, or if you would like to
\r
2913 submit new data structures, please forward your request to
\r
2914 zipformat@pkware.com. All submissions will be reviewed by the
\r
2915 ZIP File Specification Committee for possible inclusion into
\r
2916 future versions of this specification. Periodic revisions
\r
2917 to this specification will be published to ensure interoperability.
\r
2918 We encourage comments and feedback that may help improve clarity
\r
2921 X. Incorporating PKWARE Proprietary Technology into Your Product
\r
2922 ----------------------------------------------------------------
\r
2924 PKWARE is committed to the interoperability and advancement of the
\r
2925 .ZIP format. PKWARE offers a free license for certain technological
\r
2926 aspects described above under certain restrictions and conditions.
\r
2927 However, the use or implementation in a product of certain technological
\r
2928 aspects set forth in the current APPNOTE, including those with regard to
\r
2929 strong encryption, patching, or extended tape operations requires a
\r
2930 license from PKWARE. Please contact PKWARE with regard to acquiring
\r
2933 XI. Acknowledgements
\r
2934 ---------------------
\r
2936 In addition to the above mentioned contributors to PKZIP and PKUNZIP,
\r
2937 I would like to extend special thanks to Robert Mahoney for suggesting
\r
2938 the extension .ZIP for this software.
\r
2943 Fiala, Edward R., and Greene, Daniel H., "Data compression with
\r
2944 finite windows", Communications of the ACM, Volume 32, Number 4,
\r
2945 April 1989, pages 490-505.
\r
2947 Held, Gilbert, "Data Compression, Techniques and Applications,
\r
2948 Hardware and Software Considerations", John Wiley & Sons, 1987.
\r
2950 Huffman, D.A., "A method for the construction of minimum-redundancy
\r
2951 codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
\r
2954 Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
\r
2955 Number 10, October 1989, pages 29-37.
\r
2957 Nelson, Mark, "The Data Compression Book", M&T Books, 1991.
\r
2959 Storer, James A., "Data Compression, Methods and Theory",
\r
2960 Computer Science Press, 1988
\r
2962 Welch, Terry, "A Technique for High-Performance Data Compression",
\r
2963 IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
\r
2965 Ziv, J. and Lempel, A., "A universal algorithm for sequential data
\r
2966 compression", Communications of the ACM, Volume 30, Number 6,
\r
2967 June 1987, pages 520-540.
\r
2969 Ziv, J. and Lempel, A., "Compression of individual sequences via
\r
2970 variable-rate coding", IEEE Transactions on Information Theory,
\r
2971 Volume 24, Number 5, September 1978, pages 530-536.
\r
2974 APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions
\r
2975 --------------------------------------------------------------
\r
2977 Field Definition Structure:
\r
2979 a. field length including length 2 bytes
\r
2980 b. field code 2 bytes
\r
2983 Field Code Description
\r
2984 4001 Source type i.e. CLP etc
\r
2985 4002 The text description of the library
\r
2986 4003 The text description of the file
\r
2987 4004 The text description of the member
\r
2988 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC
\r
2989 4007 Database Type Code 1 byte
\r
2990 4008 Database file and fields definition
\r
2991 4009 GZIP file type 2 bytes
\r
2992 400B IFS code page 2 bytes
\r
2993 400C IFS Creation Time 4 bytes
\r
2994 400D IFS Access Time 4 bytes
\r
2995 400E IFS Modification time 4 bytes
\r
2996 005C Length of the records in the file 2 bytes
\r
2997 0068 GZIP two words 8 bytes
\r
2999 APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions
\r
3000 ------------------------------------------------------------
\r
3002 Field Definition Structure:
\r
3004 a. field length including length 2 bytes
\r
3005 b. field code 2 bytes
\r
3008 Field Code Description
\r
3009 0001 File Type 2 bytes
\r
3010 0002 NonVSAM Record Format 1 byte
\r
3012 0004 NonVSAM Block Size 2 bytes Big Endian
\r
3013 0005 Primary Space Allocation 3 bytes Big Endian
\r
3014 0006 Secondary Space Allocation 3 bytes Big Endian
\r
3015 0007 Space Allocation Type1 byte flag
\r
3016 0008 Modification Date Retired with PKZIP 5.0 +
\r
3017 0009 Expiration Date Retired with PKZIP 5.0 +
\r
3018 000A PDS Directory Block Allocation 3 bytes Big Endian binary value
\r
3019 000B NonVSAM Volume List variable
\r
3020 000C UNIT Reference Retired with PKZIP 5.0 +
\r
3021 000D DF/SMS Management Class 8 bytes EBCDIC Text Value
\r
3022 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value
\r
3023 000F DF/SMS Data Class 8 bytes EBCDIC Text Value
\r
3024 0010 PDS/PDSE Member Info. 30 bytes
\r
3025 0011 VSAM sub-filetype 2 bytes
\r
3026 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)"
\r
3027 0013 VSAM Cluster Name Retired with PKZIP 5.0 +
\r
3028 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)"
\r
3029 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks
\r
3030 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks
\r
3031 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks
\r
3032 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks
\r
3033 0019 VSAM Data Name 1-44 bytes EBCDIC text string
\r
3034 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string
\r
3035 001B VSAM Catalog Name 1-44 bytes EBCDIC text string
\r
3036 001C VSAM Data Space Type 9 bytes EBCDIC text string
\r
3037 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified
\r
3038 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified
\r
3039 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs
\r
3040 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified
\r
3041 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified
\r
3042 0022 VSAM Erase Flag 1 byte flag
\r
3043 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified
\r
3044 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified
\r
3045 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs
\r
3046 0026 VSAM Ordered Flag 1 byte flag
\r
3047 0027 VSAM REUSE Flag 1 byte flag
\r
3048 0028 VSAM SPANNED Flag 1 byte flag
\r
3049 0029 VSAM Recovery Flag 1 byte flag
\r
3050 002A VSAM WRITECHK Flag 1 byte flag
\r
3051 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y"
\r
3052 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y"
\r
3053 002D VSAM Index Space Type 9 bytes EBCDIC text string
\r
3054 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified
\r
3055 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified
\r
3056 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified
\r
3057 0031 VSAM Index IMBED 1 byte flag
\r
3058 0032 VSAM Index Ordered Flag 1 byte flag
\r
3059 0033 VSAM REPLICATE Flag 1 byte flag
\r
3060 0034 VSAM Index REUSE Flag 1 byte flag
\r
3061 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 +
\r
3062 0036 VSAM Owner 8 bytes EBCDIC text string
\r
3063 0037 VSAM Index Owner 8 bytes EBCDIC text string
\r
3096 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian
\r
3097 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian
\r
3098 005A PDS LMOD EP Rec # 4 bytes Big Endian
\r
3100 005C Max Length of records 2 bytes Big Endian
\r
3101 005D PDSE Flag 1 byte flag
\r
3109 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd"
\r
3110 0066 Date Created 4 bytes Packed Hex "yyyymmdd"
\r
3111 0068 GZIP two words 8 bytes
\r
3112 0071 Extended NOTE Location 12 bytes Big Endian
\r
3113 0072 Archive device UNIT 6 bytes EBCDIC
\r
3114 0073 Archive 1st Volume 6 bytes EBCDIC
\r
3115 0074 Archive 1st VOL File Seq# 2 bytes Binary
\r
3117 APPENDIX C - Zip64 Extensible Data Sector Mappings (EFS)
\r
3118 --------------------------------------------------------
\r
3120 -Z390 Extra Field:
\r
3122 The following is the general layout of the attributes for the
\r
3123 ZIP 64 "extra" block for extended tape operations. Portions of
\r
3124 this extended tape processing technology is covered under a
\r
3125 pending patent application. The use or implementation in a
\r
3126 product of certain technological aspects set forth in the
\r
3127 current APPNOTE, including those with regard to strong encryption,
\r
3128 patching or extended tape operations, requires a license from
\r
3129 PKWARE. Please contact PKWARE with regard to acquiring a license.
\r
3132 Note: some fields stored in Big Endian format. All text is
\r
3133 in EBCDIC format unless otherwise specified.
\r
3135 Value Size Description
\r
3136 ----- ---- -----------
\r
3137 (Z390) 0x0065 2 bytes Tag for this "extra" block type
\r
3138 Size 4 bytes Size for the following data block
\r
3139 Tag 4 bytes EBCDIC "Z390"
\r
3140 Length71 2 bytes Big Endian
\r
3141 Subcode71 2 bytes Enote type code
\r
3143 Length72 2 bytes Big Endian
\r
3144 Subcode72 2 bytes Unit type code
\r
3146 Length73 2 bytes Big Endian
\r
3147 Subcode73 2 bytes Volume1 type code
\r
3148 FirstVol 1 byte Volume
\r
3149 Length74 2 bytes Big Endian
\r
3150 Subcode74 2 bytes FirstVol file sequence
\r
3151 FileSeq 2 bytes Sequence
\r
3153 APPENDIX D - Language Encoding (EFS)
\r
3154 ------------------------------------
\r
3156 The ZIP format has historically supported only the original IBM PC character
\r
3157 encoding set, commonly referred to as IBM Code Page 437. This limits storing
\r
3158 file name characters to only those within the original MS-DOS range of values
\r
3159 and does not properly support file names in other character encodings, or
\r
3160 languages. To address this limitation, this specification will support the
\r
3161 following change.
\r
3163 If general purpose bit 11 is unset, the file name and comment should conform
\r
3164 to the original ZIP character encoding. If general purpose bit 11 is set, the
\r
3165 filename and comment must support The Unicode Standard, Version 4.1.0 or
\r
3166 greater using the character encoding form defined by the UTF-8 storage
\r
3167 specification. The Unicode Standard is published by the The Unicode
\r
3168 Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
\r
3169 is expected to not include a byte order mark (BOM).
\r
3171 Applications may choose to supplement this file name storage through the use
\r
3172 of the 0x0008 Extra Field. Storage for this optional field is currently
\r
3173 undefined, however it will be used to allow storing extended information
\r
3174 on source or target encoding that may further assist applications with file
\r
3175 name, or file content encoding tasks. Please contact PKWARE with any
\r
3176 requirements on how this field should be used.
\r
3178 The 0x0008 Extra Field storage may be used with either setting for general
\r
3179 purpose bit 11. Examples of the intended usage for this field is to store
\r
3180 whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other
\r
3181 commonly used character encoding (code page) designations can be indicated
\r
3182 through this field. Formalized values for use of the 0x0008 record remain
\r
3183 undefined at this time. The definition for the layout of the 0x0008 field
\r
3184 will be published when available. Use of the 0x0008 Extra Field provides
\r
3185 for storing data within a ZIP file in an encoding other than IBM Code
\r
3186 Page 437 or UTF-8.
\r
3188 General purpose bit 11 will not imply any encoding of file content or
\r
3189 password. Values defining character encoding for file content or
\r
3190 password must be stored within the 0x0008 Extended Language Encoding
\r
3193 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records
\r
3194 that can be used to store UTF-8 file name and file comment fields. These
\r
3195 records can be used for cases when the general purpose bit 11 method
\r
3196 for storing UTF-8 data in the standard file name and comment fields is
\r
3197 not desirable. A common case for this alternate method is if backward
\r
3198 compatibility with older programs is required.
\r
3200 Definitions for the record structure of these fields are included above
\r
3201 in the section on 3rd party mappings for "extra field" records. These
\r
3202 records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment
\r
3203 Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field).
\r
3205 The choice of which storage method to use when writing a ZIP file is left
\r
3206 to the implementation. Developers should expect that a ZIP file may
\r
3207 contain either method and should provide support for reading data in
\r
3208 either format. Use of general purpose bit 11 reduces storage requirements
\r
3209 for file name data by not requiring additional "extra field" data for
\r
3210 each file, but can result in older ZIP programs not being able to extract
\r
3211 files. Use of the 0x6375 and 0x7075 records will result in a ZIP file
\r
3212 that should always be readable by older ZIP programs, but requires more
\r
3213 storage per file to write file name and/or file comment fields.
\r