docs/concepts.rst

   1 ======================
   2 Nanopb: Basic concepts
   3 ======================
   4
   5 .. include :: menu.rst
   6
   7 The things outlined here are common to both the encoder and the decoder part.
   8
   9 .. contents::
  10
  11 Proto files
  12 ===========
  13 All Protocol Buffers implementations use .proto files to describe the message format.
  14 The point of these files is to be a portable interface description language.
  15
  16 Compiling .proto files for nanopb
  17 ---------------------------------
  18 Nanopb uses the Google's protoc compiler to parse the .proto file, and then a python script to generate the C header and source code from it::
  19
  20     user@host:~$ protoc -omessage.pb message.proto
  21     user@host:~$ python ../generator/nanopb_generator.py message.pb
  22     Writing to message.h and message.c
  23     user@host:~$
  24
  25 Compiling .proto files with nanopb options
  26 ------------------------------------------
  27 Nanopb defines two extensions for message fields described in .proto files: *max_size* and *max_count*.
  28 These are the maximum size of a string and maximum count of items in an array::
  29
  30     required string name = 1 [(nanopb).max_size = 40];
  31     repeated PhoneNumber phone = 4 [(nanopb).max_count = 5];
  32
  33 To use these extensions, you need to place an import statement in the beginning of the file::
  34
  35     import "nanopb.proto";
  36
  37 This file, in turn, requires the file *google/protobuf/descriptor.proto*. This is usually installed under */usr/include*. Therefore, to compile a .proto file which uses options, use a protoc command similar to::
  38
  39     protoc -I/usr/include -Inanopb/generator -I. -omessage.pb message.proto
  40
  41 Streams
  42 =======
  43
  44 Nanopb uses streams for accessing the data in encoded format.
  45 The stream abstraction is very lightweight, and consists of a structure (*pb_ostream_t* or *pb_istream_t*) which contains a pointer to a callback function.
  46
  47 There are a few generic rules for callback functions:
  48
  49 #) Return false on IO errors. The encoding or decoding process will abort immediately.
  50 #) Use state to store your own data, such as a file descriptor.
  51 #) *bytes_written* and *bytes_left* are updated by pb_write and pb_read.
  52 #) Your callback may be used with substreams. In this case *bytes_left*, *bytes_written* and *max_size* have smaller values than the original stream. Don't use these values to calculate pointers.
  53
  54 Output streams
  55 --------------
  56
  57 ::
  58
  59  struct _pb_ostream_t
  60  {
  61     bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
  62     void *state;
  63     size_t max_size;
  64     size_t bytes_written;
  65  };
  66
  67 The *callback* for output stream may be NULL, in which case the stream simply counts the number of bytes written. In this case, *max_size* is ignored.
  68
  69 Otherwise, if *bytes_written* + bytes_to_be_written is larger than *max_size*, pb_write returns false before doing anything else. If you don't want to limit the size of the stream, pass SIZE_MAX.
  70
  71 **Example 1:**
  72
  73 This is the way to get the size of the message without storing it anywhere::
  74
  75  Person myperson = ...;
  76  pb_ostream_t sizestream = {0};
  77  pb_encode(&sizestream, Person_fields, &myperson);
  78  printf("Encoded size is %d\n", sizestream.bytes_written);
  79
  80 **Example 2:**
  81
  82 Writing to stdout::
  83
  84  bool callback(pb_ostream_t *stream, const uint8_t *buf, size_t count)
  85  {
  86     FILE *file = (FILE*) stream->state;
  87     return fwrite(buf, 1, count, file) == count;
  88  }
  89
  90  pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
  91
  92 Input streams
  93 -------------
  94 For input streams, there are a few extra rules:
  95
  96 #) If buf is NULL, read from stream but don't store the data. This is used to skip unknown input.
  97 #) You don't need to know the length of the message in advance. After getting EOF error when reading, set bytes_left to 0 and return false. Pb_decode will detect this and if the EOF was in a proper position, it will return true.
  98
  99 Here is the structure::
 100
 101  struct _pb_istream_t
 102  {
 103     bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
 104     void *state;
 105     size_t bytes_left;
 106  };
 107
 108 The *callback* must always be a function pointer. *Bytes_left* is an upper limit on the number of bytes that will be read. You can use SIZE_MAX if your callback handles EOF as described above.
 109
 110 **Example:**
 111
 112 This function binds an input stream to stdin:
 113
 114 ::
 115
 116  bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
 117  {
 118     FILE *file = (FILE*)stream->state;
 119     bool status;
 120
 121     if (buf == NULL)
 122     {
 123         while (count-- && fgetc(file) != EOF);
 124         return count == 0;
 125     }
 126
 127     status = (fread(buf, 1, count, file) == count);
 128
 129     if (feof(file))
 130         stream->bytes_left = 0;
 131
 132     return status;
 133  }
 134
 135  pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
 136
 137 Data types
 138 ==========
 139
 140 Most Protocol Buffers datatypes have directly corresponding C datatypes, such as int32 is int32_t, float is float and bool is bool. However, the variable-length datatypes are more complex:
 141
 142 1) Strings, bytes and repeated fields of any type map to callback functions by default.
 143 2) If there is a special option *(nanopb).max_size* specified in the .proto file, string maps to null-terminated char array and bytes map to a structure containing a char array and a size field.
 144 3) If there is a special option *(nanopb).max_count* specified on a repeated field, it maps to an array of whatever type is being repeated. Another field will be created for the actual number of entries stored.
 145
 146 =============================================================================== =======================
 147       field in .proto                                                           autogenerated in .h
 148 =============================================================================== =======================
 149 required string name = 1;                                                       pb_callback_t name;
 150 required string name = 1 [(nanopb).max_size = 40];                              char name[40];
 151 repeated string name = 1 [(nanopb).max_size = 40];                              pb_callback_t name;
 152 repeated string name = 1 [(nanopb).max_size = 40, (nanopb).max_count = 5];      | size_t name_count;
 153                                                                                 | char name[5][40];
 154 required bytes data = 1 [(nanopb).max_size = 40];                               | typedef struct {
 155                                                                                 |    size_t size;
 156                                                                                 |    uint8_t bytes[40];
 157                                                                                 | } Person_data_t;
 158                                                                                 | Person_data_t data;
 159 =============================================================================== =======================
 160
 161 The maximum lengths are checked in runtime. If string/bytes/array exceeds the allocated length, *pb_decode* will return false.
 162
 163 Field callbacks
 164 ===============
 165 When a field has dynamic length, nanopb cannot statically allocate storage for it. Instead, it allows you to handle the field in whatever way you want, using a callback function.
 166
 167 The `pb_callback_t`_ structure contains a function pointer and a *void* pointer you can use for passing data to the callback. The actual behavior of the callback function is different in encoding and decoding modes.
 168
 169 .. _`pb_callback_t`: reference.html#pb-callback-t
 170
 171 Encoding callbacks
 172 ------------------
 173 ::
 174
 175     bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, const void *arg);
 176
 177 When encoding, the callback should write out complete fields, including the wire type and field number tag. It can write as many or as few fields as it likes. For example, if you want to write out an array as *repeated* field, you should do it all in a single call.
 178
 179 The callback may be called multiple times during a single call to `pb_encode`_. It must produce the same amount of data every time.
 180
 181 .. _`pb_encode`: reference.html#pb-encode
 182
 183 This callback writes out a dynamically sized string::
 184
 185     bool write_string(pb_ostream_t *stream, const pb_field_t *field, const void *arg)
 186     {
 187         char *str = get_string_from_somewhere();
 188         if (!pb_encode_tag_for_field(stream, field))
 189             return false;
 190
 191         return pb_encode_string(stream, (uint8_t*)str, strlen(str));
 192     }
 193
 194 Decoding callbacks
 195 ------------------
 196 ::
 197
 198     bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void *arg);
 199
 200 When decoding, the callback receives a length-limited substring that reads the contents of a single field. The field tag has already been read.
 201
 202 The callback will be called multiple times for repeated fields. For packed fields, you can either read multiple values until the stream ends, or leave it to `pb_decode`_ to call your function over and over until all values have been read.
 203
 204 .. _`pb_decode`: reference.html#pb-decode
 205
 206 This callback reads multiple integers and prints them::
 207
 208     bool read_ints(pb_istream_t *stream, const pb_field_t *field, void *arg)
 209     {
 210         while (stream.bytes_left)
 211         {
 212             uint64_t value;
 213             if (!pb_decode_varint(stream, &value))
 214                 return false;
 215             printf("%lld\n", value);
 216         }
 217         return true;
 218     }
 219
 220 Field description array
 221 =======================
 222
 223 For using the *pb_encode* and *pb_decode* functions, you need an array of pb_field_t constants describing the structure you wish to encode. This description is usually autogenerated from .proto file.
 224
 225 For example this submessage in the Person.proto file::
 226
 227  message Person {
 228     message PhoneNumber {
 229         required string number = 1 [(nanopb).max_size = 40];
 230         optional PhoneType type = 2 [default = HOME];
 231     }
 232  }
 233
 234 generates this field description array for the structure *Person_PhoneNumber*::
 235
 236  const pb_field_t Person_PhoneNumber_fields[3] = {
 237     {1, PB_HTYPE_REQUIRED | PB_LTYPE_STRING,
 238     offsetof(Person_PhoneNumber, number), 0,
 239     pb_membersize(Person_PhoneNumber, number), 0, 0},
 240
 241     {2, PB_HTYPE_OPTIONAL | PB_LTYPE_VARINT,
 242     pb_delta(Person_PhoneNumber, type, number),
 243     pb_delta(Person_PhoneNumber, has_type, type),
 244     pb_membersize(Person_PhoneNumber, type), 0,
 245     &Person_PhoneNumber_type_default},
 246
 247     PB_LAST_FIELD
 248  };
 249
 250
 251 Return values and error handling
 252 ================================
 253
 254 Most functions in nanopb return bool: *true* means success, *false* means failure. If this is enough for you, skip this section.
 255
 256 For simplicity, nanopb doesn't define it's own error codes. This might be added if there is a compelling need for it. You can however deduce something about the error causes:
 257
 258 1) Running out of memory. Because everything is allocated from the stack, nanopb can't detect this itself. Encoding or decoding the same type of a message always takes the same amount of stack space. Therefore, if it works once, it works always.
 259 2) Invalid field description. These are usually stored as constants, so if it works under the debugger, it always does.
 260 3) IO errors in your own stream callbacks. Because encoding/decoding stops at the first error, you can overwrite the *state* field in the struct and store your own error code there.
 261 4) Errors that happen in your callback functions. You can use the state field in the callback structure.
 262 5) Exceeding the max_size or bytes_left of a stream.
 263 6) Exceeding the max_size of a string or array field
 264 7) Invalid protocol buffers binary message. It's not like you could recover from it anyway, so a simple failure should be enough.
 265
 266 In my opinion, it is enough that 1. and 2. can be resolved using a debugger.
 267
 268 However, you may be interested which of the remaining conditions caused the error. For 3. and 4., you can set and check the state. If you have to detect 5. and 6., you should convert the fields to callback type. Any remaining problem is of type 7.