docs/concepts.rst

   1 ======================
   2 Nanopb: Basic concepts
   3 ======================
   4
   5 .. include :: menu.rst
   6
   7 The things outlined here are the underlying concepts of the nanopb design.
   8
   9 .. contents::
  10
  11 Proto files
  12 ===========
  13 All Protocol Buffers implementations use .proto files to describe the message format.
  14 The point of these files is to be a portable interface description language.
  15
  16 Compiling .proto files for nanopb
  17 ---------------------------------
  18 Nanopb uses the Google's protoc compiler to parse the .proto file, and then a python script to generate the C header and source code from it::
  19
  20     user@host:~$ protoc -omessage.pb message.proto
  21     user@host:~$ python ../generator/nanopb_generator.py message.pb
  22     Writing to message.h and message.c
  23     user@host:~$
  24
  25 Compiling .proto files with nanopb options
  26 ------------------------------------------
  27 Nanopb defines two extensions for message fields described in .proto files: *max_size* and *max_count*.
  28 These are the maximum size of a string and maximum count of items in an array::
  29
  30     required string name = 1 [(nanopb).max_size = 40];
  31     repeated PhoneNumber phone = 4 [(nanopb).max_count = 5];
  32
  33 To use these extensions, you need to place an import statement in the beginning of the file::
  34
  35     import "nanopb.proto";
  36
  37 This file, in turn, requires the file *google/protobuf/descriptor.proto*. This is usually installed under */usr/include*. Therefore, to compile a .proto file which uses options, use a protoc command similar to::
  38
  39     protoc -I/usr/include -Inanopb/generator -I. -omessage.pb message.proto
  40
  41 The options can be defined in file, message and field scopes::
  42
  43     option (nanopb_fileopt).max_size = 20; // File scope
  44     message Message
  45     {
  46         option (nanopb_msgopt).max_size = 30; // Message scope
  47         required string fieldsize = 1 [(nanopb).max_size = 40]; // Field scope
  48     }
  49
  50 It is also possible to give the options on command line, but then they will affect the whole file. For example::
  51
  52     user@host:~$ python ../generator/nanopb_generator.py -s 'max_size: 20' message.pb
  53
  54
  55 Streams
  56 =======
  57
  58 Nanopb uses streams for accessing the data in encoded format.
  59 The stream abstraction is very lightweight, and consists of a structure (*pb_ostream_t* or *pb_istream_t*) which contains a pointer to a callback function.
  60
  61 There are a few generic rules for callback functions:
  62
  63 #) Return false on IO errors. The encoding or decoding process will abort immediately.
  64 #) Use state to store your own data, such as a file descriptor.
  65 #) *bytes_written* and *bytes_left* are updated by pb_write and pb_read.
  66 #) Your callback may be used with substreams. In this case *bytes_left*, *bytes_written* and *max_size* have smaller values than the original stream. Don't use these values to calculate pointers.
  67 #) Always read or write the full requested length of data. For example, POSIX *recv()* needs the *MSG_WAITALL* parameter to accomplish this.
  68
  69 Output streams
  70 --------------
  71
  72 ::
  73
  74  struct _pb_ostream_t
  75  {
  76     bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
  77     void *state;
  78     size_t max_size;
  79     size_t bytes_written;
  80  };
  81
  82 The *callback* for output stream may be NULL, in which case the stream simply counts the number of bytes written. In this case, *max_size* is ignored.
  83
  84 Otherwise, if *bytes_written* + bytes_to_be_written is larger than *max_size*, pb_write returns false before doing anything else. If you don't want to limit the size of the stream, pass SIZE_MAX.
  85
  86 **Example 1:**
  87
  88 This is the way to get the size of the message without storing it anywhere::
  89
  90  Person myperson = ...;
  91  pb_ostream_t sizestream = {0};
  92  pb_encode(&sizestream, Person_fields, &myperson);
  93  printf("Encoded size is %d\n", sizestream.bytes_written);
  94
  95 **Example 2:**
  96
  97 Writing to stdout::
  98
  99  bool callback(pb_ostream_t *stream, const uint8_t *buf, size_t count)
 100  {
 101     FILE *file = (FILE*) stream->state;
 102     return fwrite(buf, 1, count, file) == count;
 103  }
 104
 105  pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
 106
 107 Input streams
 108 -------------
 109 For input streams, there is one extra rule:
 110
 111 #) You don't need to know the length of the message in advance. After getting EOF error when reading, set bytes_left to 0 and return false. Pb_decode will detect this and if the EOF was in a proper position, it will return true.
 112
 113 Here is the structure::
 114
 115  struct _pb_istream_t
 116  {
 117     bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
 118     void *state;
 119     size_t bytes_left;
 120  };
 121
 122 The *callback* must always be a function pointer. *Bytes_left* is an upper limit on the number of bytes that will be read. You can use SIZE_MAX if your callback handles EOF as described above.
 123
 124 **Example:**
 125
 126 This function binds an input stream to stdin:
 127
 128 ::
 129
 130  bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
 131  {
 132     FILE *file = (FILE*)stream->state;
 133     bool status;
 134
 135     if (buf == NULL)
 136     {
 137         while (count-- && fgetc(file) != EOF);
 138         return count == 0;
 139     }
 140
 141     status = (fread(buf, 1, count, file) == count);
 142
 143     if (feof(file))
 144         stream->bytes_left = 0;
 145
 146     return status;
 147  }
 148
 149  pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
 150
 151 Data types
 152 ==========
 153
 154 Most Protocol Buffers datatypes have directly corresponding C datatypes, such as int32 is int32_t, float is float and bool is bool. However, the variable-length datatypes are more complex:
 155
 156 1) Strings, bytes and repeated fields of any type map to callback functions by default.
 157 2) If there is a special option *(nanopb).max_size* specified in the .proto file, string maps to null-terminated char array and bytes map to a structure containing a char array and a size field.
 158 3) If there is a special option *(nanopb).max_count* specified on a repeated field, it maps to an array of whatever type is being repeated. Another field will be created for the actual number of entries stored.
 159
 160 =============================================================================== =======================
 161       field in .proto                                                           autogenerated in .h
 162 =============================================================================== =======================
 163 required string name = 1;                                                       pb_callback_t name;
 164 required string name = 1 [(nanopb).max_size = 40];                              char name[40];
 165 repeated string name = 1 [(nanopb).max_size = 40];                              pb_callback_t name;
 166 repeated string name = 1 [(nanopb).max_size = 40, (nanopb).max_count = 5];      | size_t name_count;
 167                                                                                 | char name[5][40];
 168 required bytes data = 1 [(nanopb).max_size = 40];                               | typedef struct {
 169                                                                                 |    size_t size;
 170                                                                                 |    uint8_t bytes[40];
 171                                                                                 | } Person_data_t;
 172                                                                                 | Person_data_t data;
 173 =============================================================================== =======================
 174
 175 The maximum lengths are checked in runtime. If string/bytes/array exceeds the allocated length, *pb_decode* will return false.
 176
 177 Note: for the *bytes* datatype, the field length checking may not be exact.
 178 The compiler may add some padding to the *pb_bytes_t* structure, and the nanopb runtime doesn't know how much of the structure size is padding. Therefore it uses the whole length of the structure for storing data, which is not very smart but shouldn't cause problems. In practise, this means that if you specify *(nanopb).max_size=5* on a *bytes* field, you may be able to store 6 bytes there. For the *string* field type, the length limit is exact.
 179
 180 Field callbacks
 181 ===============
 182 When a field has dynamic length, nanopb cannot statically allocate storage for it. Instead, it allows you to handle the field in whatever way you want, using a callback function.
 183
 184 The `pb_callback_t`_ structure contains a function pointer and a *void* pointer called *arg* you can use for passing data to the callback. If the function pointer is NULL, the field will be skipped. A pointer to the *arg* is passed to the function, so that it can modify it and retrieve the value.
 185
 186 The actual behavior of the callback function is different in encoding and decoding modes. In encoding mode, the callback is called once and should write out everything, including field tags. In decoding mode, the callback is called repeatedly for every data item.
 187
 188 .. _`pb_callback_t`: reference.html#pb-callback-t
 189
 190 Encoding callbacks
 191 ------------------
 192 ::
 193
 194     bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, void * const *arg);
 195
 196 When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
 197
 198 Usually you can use `pb_encode_tag_for_field`_ to encode the wire type and tag number of the field. However, if you want to encode a repeated field as a packed array, you must call `pb_encode_tag`_ instead to specify a wire type of *PB_WT_STRING*.
 199
 200 If the callback is used in a submessage, it will be called multiple times during a single call to `pb_encode`_. In this case, it must produce the same amount of data every time. If the callback is directly in the main message, it is called only once.
 201
 202 .. _`pb_encode`: reference.html#pb-encode
 203 .. _`pb_encode_tag_for_field`: reference.html#pb-encode-tag-for-field
 204 .. _`pb_encode_tag`: reference.html#pb-encode-tag
 205
 206 This callback writes out a dynamically sized string::
 207
 208     bool write_string(pb_ostream_t *stream, const pb_field_t *field, void * const *arg)
 209     {
 210         char *str = get_string_from_somewhere();
 211         if (!pb_encode_tag_for_field(stream, field))
 212             return false;
 213
 214         return pb_encode_string(stream, (uint8_t*)str, strlen(str));
 215     }
 216
 217 Decoding callbacks
 218 ------------------
 219 ::
 220
 221     bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void **arg);
 222
 223 When decoding, the callback receives a length-limited substring that reads the contents of a single field. The field tag has already been read. For *string* and *bytes*, the length value has already been parsed, and is available at *stream->bytes_left*.
 224
 225 The callback will be called multiple times for repeated fields. For packed fields, you can either read multiple values until the stream ends, or leave it to `pb_decode`_ to call your function over and over until all values have been read.
 226
 227 .. _`pb_decode`: reference.html#pb-decode
 228
 229 This callback reads multiple integers and prints them::
 230
 231     bool read_ints(pb_istream_t *stream, const pb_field_t *field, void **arg)
 232     {
 233         while (stream->bytes_left)
 234         {
 235             uint64_t value;
 236             if (!pb_decode_varint(stream, &value))
 237                 return false;
 238             printf("%lld\n", value);
 239         }
 240         return true;
 241     }
 242
 243 Field description array
 244 =======================
 245
 246 For using the *pb_encode* and *pb_decode* functions, you need an array of pb_field_t constants describing the structure you wish to encode. This description is usually autogenerated from .proto file.
 247
 248 For example this submessage in the Person.proto file::
 249
 250  message Person {
 251     message PhoneNumber {
 252         required string number = 1 [(nanopb).max_size = 40];
 253         optional PhoneType type = 2 [default = HOME];
 254     }
 255  }
 256
 257 generates this field description array for the structure *Person_PhoneNumber*::
 258
 259  const pb_field_t Person_PhoneNumber_fields[3] = {
 260     PB_FIELD(  1, STRING  , REQUIRED, STATIC, Person_PhoneNumber, number, number, 0),
 261     PB_FIELD(  2, ENUM    , OPTIONAL, STATIC, Person_PhoneNumber, type, number, &Person_PhoneNumber_type_default),
 262     PB_LAST_FIELD
 263  };
 264
 265
 266 Return values and error handling
 267 ================================
 268
 269 Most functions in nanopb return bool: *true* means success, *false* means failure. There is also some support for error messages for debugging purposes: the error messages go in *stream->errmsg*.
 270
 271 The error messages help in guessing what is the underlying cause of the error. The most common error conditions are:
 272
 273 1) Running out of memory, i.e. stack overflow.
 274 2) Invalid field descriptors (would usually mean a bug in the generator).
 275 3) IO errors in your own stream callbacks.
 276 4) Errors that happen in your callback functions.
 277 5) Exceeding the max_size or bytes_left of a stream.
 278 6) Exceeding the max_size of a string or array field
 279 7) Invalid protocol buffers binary message.