docs/4_APIs_and_Services/4.5_Message_Signaling/1_Message_Signaling/architecture.md

   1 ---
   2 edit_link: ''
   3 title: Message Signaling
   4 origin_url: >-
   5   https://raw.githubusercontent.com/automotive-grade-linux/docs-sources/master/docs/signaling/architecture.md
   6 ---
   7
   8 <!-- WARNING: This file is generated by fetch_docs.js using /home/boron/Documents/AGL/docs-webtemplate/site/_data/tocs/apis_services/master/docs-source-signaling-signaling-book.yml -->
   9
  10 ---
  11 title: AGL - Message Signaling Architecture
  12 author: Fulup Ar Foll (IoT.bzh)
  13 date: 2016-06-30
  14
  15 categories: architecture, appfw
  16 tags: architecture, signal, message
  17 layout: techdoc
  18
  19 ---
  20
  21 **Table of Content**
  22
  23 1. TOC
  24 {:toc}
  25
  26 ## Context
  27
  28 Automotive applications need to understand in real time the context in which
  29 vehicles operate. In order to do so, it is critical for automotive application
  30 to rely on a simple, fast and secure method to access data generated by the
  31 multiple sensors/ECU embedded in modern cars.
  32
  33 This signaling problem is neither new, neither unique to the automotive and
  34 multiple solutions often described as Message Broker or Signaling Gateway have
  35 been around for a while.
  36
  37 The present document is the now implemented since AGL Daring Dab version, to
  38 handle existing signaling/message in a car. It relies on [[APbinder]]
  39 binder/bindings model to minimize complexity while keeping the system fast
  40 around secure. We propose a model with multiple transport options and a full set
  41 of security feature to protect the service generating the signal as well as
  42 consuming them.
  43
  44 ## Objectives
  45
  46 Our objectives are solving following 3 key issues:
  47
  48 1. reduce as much as possible the amount of exchanged data to the meaningful
  49  subset really used by applications
  50 1. offer a high level API that obfuscates low level and proprietary interface to
  51  improve stability in time of the code
  52 1. hide specificities of low level implementation as well as the chosen
  53  deployment distribution model.
  54
  55 To reach first objective, events emission frequency should be controlled at the
  56 lowest level it possibly can. Aggregation, composition, treatment, filtering of
  57 signals should be supported at software level when not supported by the hardware.
  58
  59 Second objectives of offering long term stable hight level API while allowing
  60 flexibility in changing low level implementation may look somehow conflicting.
  61 Nevertheless by isolating low level interface from high level and allowing
  62 dynamic composition it is possible to mitigate both objectives.
  63
  64 ## Architecture
  65
  66 Good practice is often based on modularity with clearly separated components
  67 assembled within a common framework. Such modularity ensures separation of
  68 duties, robustness, resilience and achievable long term maintenance.
  69
  70 This document uses the term "**Service**" to define a specific instance of this
  71 proposed common framework used to host a group of dedicated separated components
  72 that handle targeted signals/events. Each service exposes to services/applications
  73 the signals/events it is responsible for.
  74
  75 As an example, a CAN service may want to mix non-public proprietary API with
  76 CANopen compatible devices while hiding this complexity to applications. The
  77 goal is on one hand to isolate proprietary piece of code in such a way that it
  78 is as transparent as possible for the remaining part of the architecture. On a
  79 second hand isolation of code related to a specific device provides a better
  80 separation of responsibilities, keeping all specificity related to a given
  81 component clearly isolated and much easier to test or maintain. Last but not
  82 least if needed this model may also help to provide some proprietary code
  83 directly as binary and not as source code.
  84
  85 Communicating between the car and regular apps should be done using a 2 levels
  86 AGL services which have two distincts roles:
  87
  88 - low level should handle communication with CAN bus device (read, decoding,
  89  basic and efficient filtering, caching, ...)
  90 - high level should handle more complex tasks (signals compositions, complex
  91  algorythms like Kalman filter, business logic...)
  92
  93 ![image](images/signal-service-arch.svg "Signal Agent Architecture")
  94
  95 To do so, the choice has been to use a similar architecture than [[OpenXC]], a
  96 Ford project. Principle is simple, from a JSON file that describes all CAN
  97 signals wanted to be handled, in general a conversion from a **dbc** file, AGL
  98 generator convert it to a C++ source code file. This file which in turn is used
  99 as part of the low level CAN service which can now be compiled. This service
 100 reads, decodes and serves this CAN signals to a high level CAN service that
 101 holds business logic and high level features like described is the above
 102 chapter.
 103
 104 ![image](images/can-generator.svg "AGL CAN generator")
 105
 106 While in some cases it may be chosen to implement a single service responsible
 107 for everything, other scenarii may chose to split responsibility between
 108 multiple services. Those multiple services may run on a single ECU or on
 109 multiple ECUs. Chosen deployment distribution strategy should not impact the
 110 development of components responsible for signals/events capture. As well as it
 111 should have a loose impact on applications/services consuming those events.
 112
 113 A distributed capable architecture may provide multiple advantages:
 114
 115 - it avoids to concentrate complexity in a single big/fat component.
 116 - it leverages naturally multiple ECUs and existing network architecture
 117 - it simplifies security by enabling isolation and sandboxing
 118 - it clearly separates responsibilities and simplifies resolution of conflicts
 119
 120 Distributed architecture has to be discussed and about now is not fully
 121 implemented. Low level CAN service isn't fully functional nor tested to assume
 122 this feature but its architecture let the possibility open and will be
 123 implemented later.
 124
 125 ![image](images/distributed-arch.png "Distributed Architecture")
 126
 127 Performance matters. There is a trade-off between modularity and efficiency.
 128 This is specially critical for signals where propagation time from one module to
 129 the other should remain as short as possible and furthermore should consume as
 130 little computing resources as possible.
 131
 132 A flexible solution should provide enough versatility to either compose modules
 133 in separate processes; either chose a model where everything is hosted within a
 134 single process. Chosen deployment model should have minor or no impact on
 135 development/integration processes. Deployment model should be something easy to
 136 change, it should remain a tactical decision and never become a structuring
 137 decision.
 138
 139 Nevertheless while grouping modules may improve performance and reduce resource consumption, on the other hand,
 140 it has a clear impact on security. No one should forget that some signals have very different level of security from other ones.
 141 Mixing everything within a single process makes all signal's handling within a single security context.
 142 Such a decision may have a significant impact on the level on confidence one may have in the global system.
 143
 144 Providing such flexibility constrains the communication model used by modules:
 145
 146 - The API of integration of the modules (the API of the framework) that enables
 147   the connection of modules must be independent of the implementation of
 148   the communication layer
 149 - The communication layer must be as transparent as possible, its
 150   implementation shouldn't impact how it is used
 151 - The cost of the abstraction for modules grouped in a same process
 152   must be as little as possible
 153 - The cost of separating modules with the maximum of security must remain as
 154  minimal as possible
 155
 156 Another point impacting performance relates to a smart limitation on the number
 157 of emitted signals. Improving the cost of sending a signal is one thing,
 158 reducing the number of signals is an other one. No one should forget that the
 159 faster you ignore a useless signal the better it is. The best way to achieve
 160 this is by doing the filtering of useless signal as close as possible of the
 161 component generating the signal and when possible directly at the hardware level.
 162
 163 To enable the right component to filter useless signals, consumer clients must
 164 describe precisely the data they need. A filter on frequency is provided since
 165 Daring Dab version, as well as minimum and maximum limits. These filters can be
 166 specified at subscription time. Also, any data not required by any client should
 167 at the minimum never be transmitted. So only changed data is transmitted and if
 168 another service needs to receive at a regular time, it has to assume that if no
 169 events are received then it is that the value hasn't change. Furthermore when
 170 possible then should even not be computed at all, a CAN signal received on
 171 socket is purely ignored if no one asks for it.
 172
 173 Describing expected data in a precise but nevertheless simple manner remains a
 174 challenge. It implies to manage:
 175
 176 - requested frequency of expected data
 177 - accuracy of data to avoid detection of inaccurate changes
 178 - when signaling is required (raising edge, falling edge,
 179   on maintained state, ...),
 180 - filtering of data to avoid glitches and noise,
 181 - composition of signals both numerically and logically (adding,
 182   subtracting, running logical operators like AND/OR/XOR, getting the mean, ...)
 183 - etc...
 184
 185 It is critical to enable multiple features in signal queries to enable modules
 186 to implement the best computing method. The best computing method may have an
 187 impact on which device to query as well as on which filters should be applied.
 188 Furthermore filtering should happen as soon as possible and obviously when
 189 possible directly at hardware level.
 190
 191 ### Transport Solutions
 192
 193 D-Bus is the standard choice for Linux, nevertheless it has some serious
 194 performance limitation due to internal verbosity. Nevertheless because it is
 195 available and pre-integrated with almost every Linux component, D-Bus may still
 196 remains an acceptable choice for signal with low rate of emission (i.e. HMI).
 197
 198 For a faster communication, Jaguar-Land-Rover proposes a memory shared signal
 199 infrastructure. Unfortunately this solution is far from solving all issues and
 200 has some drawbacks. Let check the open issues it has:
 201
 202 - there is no management of what requested data are. This
 203  translate in computing data even when not needed.
 204 - on top of shared memory, an extra side channel is required for processes
 205  to communicate with the daemon.
 206 - a single shared memory implies a lot of concurrency handling. This might
 207  introduce drawbacks that otherwise would be solved through communication
 208  buffering.
 209
 210 ZeroMQ, NanoMSG and equivalent libraries focused on fast communication. Some
 211 (e.g. ZeroMQ) come with a commercial licensing model when others (e.g. NanoMSG)
 212 use an open source licensing. Those solutions are well suited for both
 213 communicating inside a unique ECU or across several ECUs. However, most of them
 214 are using Unix domain sockets and TCP sockets and typically do not use shared
 215 memory for inter-process communication.
 216
 217 Last but not least Android binder, Kdbus and other leverage shared memory, zero
 218 copy and sit directly within Linux kernel. While this may boost information
 219 passing between local processes, it also has some limitations. The first one is
 220 the non support of a multi-ECU or vehicle to cloud distribution. The second one
 221 is that none of them is approved upstream in kernel tree. This last point may
 222 create some extra burden each time a new version of Linux kernel is needed or
 223 when porting toward a new hardware is required.
 224
 225 ### Query and Filtering Language
 226
 227 Description language for filtering of expected data remains an almost green
 228 field where nothing really fit signal service requirements. Languages like
 229 Simulink or signal processing graphical languages are valuable modelling tools.
 230 Unfortunately they cannot be inserted in the car. Furthermore those languages
 231 have many features that are not useful in proposed signal service context and
 232 cost of integrating such complex languages might not be justified for something
 233 as simple as a signal service. The same remarks apply for automation languages.
 234
 235 Further investigations leads to some specifications already presents like the
 236 one from Jaguar Land Rover [[VISS]], for **Vehicule Information Service
 237 Specification** and another from Volkwagen AG named [[ViWi]], stand for
 238 **Volkwagen Infotainment Web Interface**. Each ones has their differences and
 239 provides different approach serving the same goal:
 240
 241 |                        VISS                                   |                                   ViWi                          |
 242 |---------------------------------------------------------------|-----------------------------------------------------------------|
 243 | Filtering on node (not possible on several nodes or branches) | Describe a protocol                                             |
 244 | Access restrictions to signals                                | Ability to specify custom signals                               |
 245 | Use high level development languages                          | RESTful HTTP calls                                              |
 246 | One big Server that handle requests                           | Stateless                                                       |
 247 | Filtering                                                     | Filtering, sorting                                              |
 248 | Static signals tree not extensible [[VSS]]                    | Use JSON objects to communicate                                 |
 249 | Use of AMB ?                                                  | Identification of resources may be a bit heavy going using UUID |
 250 | Use of Websocket                                              |      |
 251
 252 About **[[VISS]]** specification, the major problem comes from the fact that
 253 signals are specified under the [[VSS]], **Vehicle Signal Specification**. So,
 254 problem is that it is difficult, if not impossible, to make a full inventory
 255 of all signals existing for each car. More important, each evolution in signals
 256 must be reported in the specification and it is without seeing the fact that
 257 car makers have their names and set of signals that would mostly don't
 258 comply with the [[VSS]]. VISS doesn't seems to be an valuable way to handle
 259 car's signals, a big component that responds requests, use of **Automotive
 260 Message Broker** that use DBus is a performance problem. Fujitsu Ten recent
 261 study[[1]] highlights that processor can't handle an heavy load on CAN bus and
 262 that Low level binding adopted for AGL is about 10 times[[2]] less impact on
 263 performance.
 264
 265 ## Describing Signal Subscriptions using JSON
 266
 267 JSON is a rich structured representation of data. For requested data, it allows
 268 the expression of multiple features and constraints. JSON is both very flexible
 269 and efficient. There are significant advantages in describing requested data at
 270 subscription time using a language like JSON. Another advantage of JSON is that
 271 no parser is required to analyse the request.
 272
 273 Existing works exists to describe a signals that comes first from Vector with
 274 its proprietary database (`DBC`) which widely used in industry. Make a
 275 description based on this format appears to be a good solution and Open Source
 276 community already has existing tools that let you convert proprietary file
 277 format to an open one. So, a JSON description based on work from [[OpenXC]] is
 278 specified [here](https://github.com/openxc/vi-firmware/blob/master/docs/config/reference.rst)
 279 which in turn is used in Low level CAN service in AGL:
 280
 281 ```json
 282 {   "name": "example",
 283     "extra_sources": [],
 284     "initializers": [],
 285     "loopers": [],
 286     "buses": {},
 287     "commands": [],
 288     "0x3D9": {
 289     "bus": "hs",
 290     "signals": {
 291         "PT_FuelLevelPct": {
 292         "generic_name": "fuel.level",
 293         "bit_position": 8,
 294         "bit_size": 8,
 295         "factor": 0.392157,
 296         "offset": 0
 297         },
 298         "PT_EngineSpeed": {
 299         "generic_name": "engine.speed",
 300         "bit_position": 16,
 301         "bit_size": 16,
 302         "factor": 0.25,
 303         "offset": 0
 304         },
 305         "PT_FuelLevelLow": {
 306         "generic_name": "fuel.level.low",
 307         "bit_position": 55,
 308         "bit_size": 1,
 309         "factor": 1,
 310         "offset": 0,
 311         "decoder": "decoder_t::booleanDecoder"
 312         }
 313     }
 314     }
 315 }
 316 ```
 317
 318 From a description like the above one, low level CAN generator will output
 319 a C++ source file which let low level CAN service that uses it to handle such
 320 signal definition.
 321
 322 ## Naming Signal
 323
 324 Naming and defining signals is something very complex. For example just
 325 ***speed***, as a signal, is difficult to define.
 326 What unit is used (km/h, M/h, m/s, ...)?
 327 From which source (wheels, GPS, AccelMeter)?
 328 How was it captured (period of measure, instantaneous, mean, filtered)?
 329
 330 In order to simplify application development we should nevertheless agree on
 331 some naming convention for key signals. Those names might be relatively complex
 332 and featured. They may include a unit, a rate, a precision, etc.
 333
 334 How these names should be registered, documented and managed is out of scope of
 335 this document but extremely important and at some point in time should be
 336 addressed. Nevertheless this issue should not prevent from moving forward
 337 developing a modern architecture. Developers should be warned that naming is a
 338 complex task, and that in the future naming scheme should be redefined, and
 339 potential adjustments would be required.
 340
 341 About Low level CAN signals naming a doted notation, like the one used by Jaguar
 342 Landrover, is a good compromise as it describe a path to an car element. It
 343 separates and organize names into hierarchy. From the left to right, you
 344 describe your names using the more common ancestor at the left then more you go
 345 to the right the more it will be accurate. Using this notation let you subscribe
 346 or unsubscribe several signals at once using a globbing expression.
 347
 348 Example using OBD2 standard PID:
 349
 350 ```path
 351 engine.load
 352 engine.coolant.temperature
 353 fuel.pressure
 354 intake.manifold.pressure
 355 engine.speed
 356 vehicle.speed
 357 intake.air.temperature
 358 mass.airflow
 359 throttle.position
 360 running.time
 361 EGR.error
 362 fuel.level
 363 barometric.pressure
 364 commanded.throttle.position
 365 ethanol.fuel.percentage
 366 accelerator.pedal.position
 367 hybrid.battery-pack.remaining.life
 368 engine.oil.temperature
 369 engine.torque
 370 ```
 371
 372 Here you can chose to subscribe to all engine component using an expression
 373 like : `engine.*`
 374
 375 ## Reusing existing/legacy code
 376
 377 About now provided services use:
 378
 379 - **Low Level** [[OpenXC]] project provides logic and some useful libraries to
 380  access a CAN bus. It is the choice for AGL.
 381
 382 - **High Level** In many cases accessing to low level signal is not enough.
 383   Low level information might need to be composed (i.e. GPS+Gyro+Accel).
 384   Writing this composition logic might be quite complex and reusing existing
 385   libraries like: LibEkNav for Kalman filtering [[9]] or Vrgimbal for 3 axes
 386   control[[10]] may help saving a lot of time. AGL apps should access CAN
 387   signals through High Level service. High level can lean on as many low level
 388   service as needed to compute its **Virtual signals** coming from differents
 389   sources. Viwi protocol seems to be a good solution.
 390
 391 ## Leveraging AGL binder
 392
 393 Such a model is loosely coupled with AGL binder. Low level CAN service as well
 394 as virtual signal components may potentially run within any hosting environment
 395 that would provide the right API with corresponding required facilities.
 396 Nevertheless leveraging [[APbinder]] has multiple advantages. It already
 397 implements event notification to support a messaging/signaling model for
 398 distributed services. It enables a subscribe model responding to the
 399 requirement and finally it uses JSON natively.
 400
 401 This messaging/signalling model already enforces the notion of subscription for
 402 receiving data. It implies that unexpected data are not sent and merely not
 403 computed. When expected data is available, it is pushed to all waiting
 404 subscriber only one time.
 405
 406 The [[APbinder]] provides transparency of communication.
 407 It currently implements the transparency over D-Bus/Kdbus and WebSocket.
 408 Its transparency mechanism of communication is easy to extend to other
 409 technologies: pools of shared memory or any proprietary transport model.
 410
 411 When bindings/services are loaded by the same binder, it provides transparently
 412 `in-memory` communication. This in-memory communication is really efficient: on
 413 one hand, the exchanged JSON objects are not serialized (because not streamed),
 414 on the other hand, those JSON objects provide a high level of abstraction able
 415 to transfer any data.
 416
 417 Technically a service is a standard [[APbinder]] binding which is also handled
 418 by the system and launched as a daemon by systemD.
 419 Therefore Signal/Agent inherits of security protection through SMACK, access
 420 control through Cynara, transparency of API to transport layer, life cycle
 421 management, ... Like any other [[APbinder]] process is composed of a set of
 422 bindings. In signal service specific case, those bindings are in fact the
 423 `signal modules`.
 424
 425 The proposed model allows to implement low level dependencies as independent
 426 signal modules. Those modules when developed are somehow like "Lego Bricks".
 427 They can be spread or grouped within one or multiple services depending on
 428 deployment constraints (performance, multi-ECU, security & isolation
 429 constraints,...).
 430
 431 On top of that low level signal modules, you should use a high level service.
 432 A first implementation of [[ViWi]] is available [here](https://github.com/iotbzh/high-level-viwi-service)
 433 and can be use to integrate business logic and high level features.
 434
 435 The model naturally uses JSON to represent data.
 436
 437 ## Multi-ECU and Vehicule to Cloud interactions
 438
 439 While this might not be a show stopper for current projects, it is obvious that
 440 in the near future Signal/Agent should support a fully distributed
 441 architectures. Some event may come from the cloud (i.e. request to start
 442 monitoring a given feature), some may come from SmartCity and nearby vehicles,
 443 and last but not least some may come from another ECU within the same vehicle or
 444 from a virtualized OS within the same ECU (e.g. cluster & IVI). In order to do
 445 so, Signal modules should enable composition within one or more [[APbinder]]
 446 inside the same ECU. Furthermore they should also support chaining with the
 447 outside world.
 448
 449 ![image](images/cloud-arch.svg "Cloud & Multi-ECU Architecture")
 450
 451 1. Application requests Virtual Signal exactly like if it was a low level signal
 452 1. Agent Signal has direct relation to low level signal
 453 1. Agent needs to proxy to an other service inside the same ECU to access the signal
 454 1. Signal is not present on current ECU. Request has to be proxied to the outside world
 455
 456 [AppFw]:  http://iot.bzh/download/public/2016/appfw/01_Introduction-to-AppFW-for-AGL-1.0.pdf "Application Framework"
 457 [APcore]:  http://iot.bzh/download/public/2016/appfw/03_Documentation-AppFW-Core-1.0.pdf "AppFw Core"
 458 [APmain]:  https://gerrit.automotivelinux.org/gerrit/#/q/project:src/app-framework-main "AppFw Main"
 459 [APbinder]:  https://gerrit.automotivelinux.org/gerrit/#/q/project:src/app-framework-binder "AppFw Binder"
 460 [APsamples]:  https://gerrit.automotivelinux.org/gerrit/gitweb?p=src/app-framework-binder.git;a=tree;f=bindings/samples "AppFw Samples"
 461 [Signal-K]: http://signalk.org/overview.html
 462 [1]: http://schd.ws/hosted_files/aglmmwinter2017/37/20170201_AGL-AMM_F10_kusakabe.pdf
 463 [2]: https://wiki.automotivelinux.org/_media/agl-distro/20170402_ften_can_kusakabe_v2.pdf
 464 [6]:  https://github.com/otcshare/automotive-message-broker
 465 [7]:  http://ardupilot.org/rover/index.html
 466 [8]:  https://github.com/ArduPilot/ardupilot/tree/master/libraries
 467 [9]:  https://bitbucket.org/jbrandmeyer/libeknav/wiki/Home
 468 [10]: http://ardupilot.org/rover/docs/common-vrgimbal.html
 469 [11]: http://elinux.org/R-Car/Boards/Porter:PEXT01
 470 [12]: https://github.com/gpsnavi/gpsnavi
 471 [VISS]: http://rawgit.com/w3c/automotive/gh-pages/vehicle_data/vehicle_information_service.html
 472 [VSS]: https://github.com/GENIVI/vehicle_signal_specification
 473 [ViWi]: https://www.w3.org/Submission/2016/SUBM-viwi-protocol-20161213/
 474 [OpenXC]: http://openxcplatform.com/
 475 [low level CAN service]: https://gerrit.automotivelinux.org/gerrit/#/admin/projects/src/low-level-can-generator
 476 [high level ViWi]: https://github.com/iotbzh/high-level-viwi-service