System integration

Roles and responsibilities

This section discusses the different participants in the Hexagon software ecosystem, their roles and responsibilities. This SDK is primarily targeted towards independent developers and OEM/integrator partners.

System integrators

"System integrators" in this document refers to organizations building and integrating complete software packages for Qualcomm Snapdragon SoC based devices. In the mobile phone ecosystem the device manufacturer typically acts as the system integrator; in other cases the integrator may be a third-party solution provider or a subsystem vendor. Regardless, the system integrator has capabilities and responsibilities beyond independent developers, and is ultimately responsible for the software running on their product.

As part of building the application CPU software image, system integrators can manage which applications can run on the DSPs and the resource access privileges:

Starting with Lahaina, integrators can mark selected cDSP client applications or system services as privileged clients. See Marking clients as privileged for more details.
Starting with Lahaina, integrators can create additional VTCM partitions for use by privileged clients in specific use cases. See VTCM partitions for more details. Before partitioning VTCM, integrators should ensure that the functional and performance requirements of their applications and dependent libraries are met. They can work with Qualcomm Customer Engineering(CE) to determine an appropriate configuration for their device.
System integrators can sign dynamically-loaded DSP modules for running in Signed PDs. See Device signing for more details. However, to improve product security, all DSP applications should run as unsigned PDs unless they require services not available in them.
System integrators can whitelist clients to enable them to create Signed PDs on the cDSP in Lahaina and later products. Starting with Lahaina, other clients can only create Unsigned PDs.
System integrators can build and integrate software running in CPZ PDs or a secure PD
System integrators must ensure application priority levels are set appropriately: Critical system services typically need higher priority than integrated applications, which may also be given higher priority than downloaded third-party applications. Note that prior to Lahaina, Signed PDs have access to a higher priority range than unsigned PDs. Starting with Lahaina, higher priorities are available to privileged clients, regardless of whether they use signed or unsigned PDs.

OEM/integrator partners

"OEM/integrator partners" in this document refers to organizations developing custom DSP software to be integrated by a device manufacturer or a System integrator. A common example would be an independent company developing camera algorithms that a phone OEM integrates into their camera application. Such developers can work as independent developers to develop their software using stand-alone test applications, but will need to work with their system integrator partners to determine how the software is ultimately integrated in the product.

Since system integrators have the ability to sign dynamically loadable modules, software designed for integration directly into a product can run in a signed PD. Developers can install test signatures on test devices to facilitate testing their software in signed PDs before integration to the product.

Independent developers

"Independent developers" or ISVs (Independent Software Vendors) in this document refers to people or organizations developing installable applications that have a DSP component. This includes applications deployed on mobile devices via application stores, internal applications within organizations, and possibly other deployment models in other markets. Independent developers are not expected to require direct support from system integrators or Qualcomm.

Independent developers can use the SDK to develop and test software running on the Hexagon Compute DSP in Unsigned PDs. They can distribute their DSP modules as a part of their application and call directly to the DSP module using FastRPC calls from native code. Such applications always act as unprivileged clients on Lahaina and later products that make such distinction.

Independent developers cannot easily sign modules for loading into signed PDs on production devices. Doing so requires working with system integrators to sign the modules or install the appropriate credentials on devices at build time. Developers can however install test signatures on test devices to facilitate testing code in signed PDs.

Independent developers may also achieve DSP acceleration through existing CPU software libraries and frameworks that internally offload to the system's DSPs, such as SNPE, QNN, FastCV, etc. Such usage is outside of the scope of this SDK.

Privileged and unprivileged clients

Starting with Lahaina, CPU-side client processes for the cDSP are categorized as privileged and unprivileged clients. Privileged clients can have access to higher priority levels and resources unprivileged clients do not have. This mechanism is designed to ensure system services and critical pre-installed applications can access the cDSP with priority over installable third-party applications while still keeping the cDSP open for all clients.

The distinction between privileged and unprivileged clients currently only applies to the Compute DSP (cDSP). Other DSPs are not widely open for installable applications.

Marking clients as privileged

System services

System services are identified as privileged based on the Group ID (GID) of the process. The GID used may vary between products. The current GIDs are:

Lahaina: 2908

On Android, System integrators can set system services as privileged by adding the correct group (here oem_2908) to the service's .rc file. For example, vppservice is configured as privileged as follows (assuming it needs also to be in the camera group):

service vendor.vppservice /vendor/bin/vppservice
    class hal
    user media
    group camera oem_2908

See $ANDROID_BUILD_TOP/system/core/init/README.md in the Android source tree for a discussion on the Android Init language and .rc files.

Built-in applications

System integrators can configure built-in device applications as privileged clients by using the com.qualcomm.permission.qti.FASTRPCPRIVILEGE permission in the application's AndroidManifest XML file:

<uses-permission android:name="com.qualcomm.permission.qti.FASTRPCPRIVILEGE"/>
...
<permission android:name="com.qualcomm.permission.qti.FASTRPCPRIVILEGE"
    android:protectionLevel="signatureOrSystem" />

Test applications

Test applications can also use the setgid() system call to change their group. The process must call setgid(2908) before opening a session on the DSP and making any FastRPC calls, including memory mapping operations. This can be useful for testing code intended for privileged clients before system integration. Calling setgid() requires root privileges and is not available for regular applications on production devices.

Priority levels

The system caps DSP thread priority levels for unprivileged processes on the cDSP. This ensures critical system services can use the DSP at a higher priority than installable applications. The priority ranges may be changed, but are currently:

Unprivileged clients: 64 through 254 (cDSP only)
Privileged clients: 1 through 254

Prior to Lahaina priority limits are based on whether the process is running as a signed or unsigned PD. See Unsigned PD services and limitations.

Note that QuRT, the DSP RTOS, uses lower priority values for higher priorities. In other words, a thread at priority 1 has higher execution priority than one at priority 2.

The priority limits apply both to the priority used for FastRPC threads (set with the FASTRPC_THREAD_PARAMS or FASTRPC_RELATIVE_THREAD_PRIORITY session control requests) and threads created locally on the DSP. The DSP process can query its thread priority ceiling using the HAP_get_thread_priority_ceiling() API defined in HAP_ps.h.

Thread priorities are also used as the resource reservation priority with the Compute Resource Manager.

To ensure different applications are prioritized appropriately they all need to use a consistent approach to choosing priority values. The table below gives a set of recommended priority values for applications on the cDSP to use depending on the use case; audio and sensor use cases should follow technology-specific recommendations.

Dec	Hex	Thread	Examples
2	2	Hard real-time use cases	Camera Streamer (microsecond timelines); limit the number of active threads below the number of HW threads.
4	4	High-priority ISRs
32	20	Regular ISRs	Inter-processor communication; driver interrupt threads
40	28	System hardware control	Driver control threads
44	2c	System inter-processor communication	Internal FastRPC communication threads
48	30	System application real-time use cases	Built-in camera application preview or camcorder per-frame video processing (millisecond timelines)

64	40	Application inter-processor communication	Internal FastRPC communication threads
96	60	Application real-time use cases	Third-party camera application preview or camcorder per-frame video processing
160	a0	Application high-priority use cases	High-priority machine learning inference operations
192	c0	FastRPC default priority; Application default	Default priority machine learning inference operations
224	e0	Application low-priority use cases	Still image post-capture processing; batch use cases
255	ff	Idle task

Additional notes:

It is generally best to use the same priority for worker and control threads for consistent execution and to ensure resource reservations and code using those resources use the same priority levels.
The priority for threads used to execute FastRPC calls on the DSP can be set with the remote_session_control() API before the first library is opened on a DSP. After this applications can change thread priorities with the QuRT API qurt_thread_set_priority(), and can also control their own worker thread priorities as appropriate. See Remote APIs.
Drivers that perform data processing on a client application's behalf should use thread priorities consistent with the client's. ISR and control threads should use higher priority levels as noted in the table.
On devices with multiple hard real-time use cases (e.g. camera streamer and real-time control) the system integrator should consider all system use cases to assign real-time use case priorities. For example it may be appropriate for some real-time ISR threads to run at a higher priority than the streamer. On mobile devices the camera streamer is generally the only use case with microsecond-level timeline requirements however.
Choosing appropriate thread priorities is ultimately the developer's and system integrator's responsibility. The recommendations in this section are guidelines and not enforced by the system.

VTCM partitions

Lahaina introduces the ability to create multiple partitions within the CDSP's VTCM. Multiple partitions are intended for device-specific use cases such as camera streaming which may require guaranteed parallel VTCM allocations to succeed while other applications use VTCM; most applications should continue to use the default partition, and many devices will ship with a single default VTCM partition only.

Some VTCM partitions in the system can be restricted to privileged clients only. All clients can access regular non-privileged partitions.

Note that some CDSP libraries, notably neural network runtimes, may run at reduced performance levels or have functional limitations if they do not get access to the whole VTCM in the system. System integrators should discuss their partitioning requirements and plans with Qualcomm CE as part of designing their products.

System integrators configure VTCM partitions in the Linux Kernel Device Tree under the RPMSG section of the appropriate DSP:

qcom,msm_cdsprm_rpmsg {
    qcom,msm_cdsp_rm {
        qcom,vtcm-partition-info = <0 2048 0x1>,
                                   <1 1024 0x2>,
                                   <2 512 0x4>,
                                   <3 512 0x4>;
        qcom,vtcm-partition-map = <0 0>,
                                  <1 0>,
                                  <2 1>,
                                  <30 2>,
                                  <31 3>;
    };
};

The configuration has two parts:

qcom,vtcm-partition-info specifies the partitions, their sizes and flags in <index size_in_KB flags> format.
- index
  - Partitions must be defined with a linear partition index starting with 0 till (Number of VTCM partitions - 1).
  - VTCM memory will be partitioned in the order provided (0 being the first partition).
- size_in_KB
  - Size of each partition should be a multiple of 256KB.
  - Given 256KB is the minimum VTCM allocation size, 256K, 1M, 4M are supported page sizes. Specifying a 3MB partition will allow maximum of 1MB page (3x). Similarly, a 512KB partition will be of 256KB pages (2x).
- flags
  - Flags can be used to set some partitions as privileged, i.e. only available to privileged clients. Currently 0x1 (PRIMARY), 0x2 (SECONDARY) and 0x4 (PRIVILEGED) are the supported flags per partition (only one per partition).
  - PRIMARY and SECONDARY partitions are available to all the clients while the PRIMARY partition is used by default. Partition selection is controlled by the vtcm-partition-map information.
  - There must be only one PRIMARY partition.
qcom,vtcm-partition-map maps application type identifiers to partitions using <Application_ID partition_ID>. Clients use application type IDs to request non-default partitions. The application identifier is specified as a value [0...31] in the device tree.

Client applications use the HAP_compute_res_attr_set_app_type() API to select a non-default VTCM partition for their allocation. The application type ID must match the values configured in the device tree. Developers wishing to use non-default VTCM partitions must work with their System integrator partners to determine appropriate application types to use.

VTCM partitioning can be temporarily disabled at runtime. This will treat the entire VTCM as a single default partition, and may be useful for testing purposes or to implement device-wide "performance mode" settings. There are two options available:

debugfs node: /d/compute/vtcm_partition_state
- A read would return '0' if VTCM partitioning is disabled, '1' if enabled.
- Write '0' to disable partition.
- Write '1' to switch back to the default partition table specified in the device tree.
Kernel API: int cdsprm_compute_vtcm_set_partition_map(unsigned int b_vtcm_partitioning);
- b_vtcm_partitioning = 0 disables partition if already enabled via device tree configuration.
- b_vtcm_partitioning = 1 resets the configuration to the default specified in device tree.
- Returns 0 on success.
- Available via include/linux/soc/qcom/cdsprm.h
- List QCOM_CDSP_RM as a dependency for using the API. Access will be restricted to only dynamically loadable kernel modules when QCOM_CDSP_RM is compiled as a module (default).

Signed and unsigned PDs

Most Hexagon DSPs in the system require code running on the DSP to be cryptographically signed. This includes libraries built with the Hexagon SDK and installed to the device at runtime, such as most examples shipped in the SDK. To avoid having to separately sign software for the Hexagon DSP, the Compute DSP (cDSP) supports dedicated sandboxed Unsigned PDs that can load and execute unsigned code.

Unsigned PDs

Unsigned PDs are sandboxed DSP processes used to offload computation workloads to the cDSP. From a system security point of view they are considered extensions to their CPU client processes: They operate in memory provided by the client process and do not have access to drivers or other resources the client could not access directly. Since code running in unsigned PDs does not need to be signed, it can be easily shipped with installable applications - application developers not working directly with OEMs or system integrators should design their DSP software to run in unsigned PDs on the cDSP.

System unsigned PDs

System unsigned PDs are unsigned PDs with certain additional privileges, available only on CDSP on targets having v75 arch and beyond. Regular unsigned PDs have access to limited guestOS services on the CDSP which may not be sufficient for all use-cases. A limited set of additional guestOS services are available to system unsigned PDs, thereby giving them some of the advantages of signed PDs but with a significantly lower dynamic loading latency (as no signature verification of the Hexagon library is required).

Currently, the additional services available to system unsigned PDs are UBWCDMA and L2 cache line-locking.

Only CPU applications whitelisted to offload to signed PD can offload to system unsigned PD. To whitelist a CPU application, the appropriate sepolicy needs to be added to give permission to open the FastRPC device node.

The steps to offload to a system unsigned PD are the same as a regular unsigned PD. A whitelisted application that requests offload to unsigned PD will automatically be launched as a system unsigned PD on the CDSP.

Unsigned PD support

Unsigned PDs are only supported on the cDSP and not available on other DSPs. To check if a DSP supports Unsigned PD, perform a capability query using the remote APIs using DSP attribute UNSIGNED_PD_SUPPORT.

Note that all devices supported by this SDK support unsigned PDs on the cDSP. Other system DSPs only support Signed PDs and are not available for third-party application use.

Unsigned PD services and limitations

To allow running unsigned code, unsigned PDs are more tightly sandboxed than regular PDs and have fewer services available. The intent is to support all system services required by compute offload, and while the list can change in the future the current list includes:

Inter-processor communication with FastRPC
Thread creation and thread services - mutexes, semaphores, signals, etc, including lock/wait operations with timeouts.
Memory allocation, mapping, and memory management
Full DSP instruction set, including HVX and HMX
Cache management operations
Clock and power management
VTCM allocation and usage

The most notable omissions are drivers such as UBWC/DMA and the camera streamer and access to cache locking and timer APIs.

Prior to Lahaina unsigned PD thread and resource management priorities are limited to a pre-defined range (64-254). For Lahaina and later products priorities are determined by the client privilege level instead; see section Priority levels above. Additionally, unsigned PDs can create a maximum of 128 threads.

Activate Unsigned PD

All DSP processes are started as signed PDs by default. To request an unsigned PD instead, clients must use the remote_session_control API as follows before starting to use the DSP:

#pragma weak remote_session_control
int unsigned_pd = 0;
if ( remote_session_control ) {
    struct remote_rpc_control_unsigned_module data;
    data.enable = 1;
    data.domain = CDSP_DOMAIN_ID;
    err = remote_session_control(DSPRPC_CONTROL_UNSIGNED_MODULE, (void*)&data, sizeof(data));
}

The calculator example illustrates how to run code in an unsigned PD using this approach.

The client's DSP process is instantiated as a signed or unsigned PD when it is first launched and cannot be changed. Because of this the call to remote_session_control must be made before using the DSP - before making any FastRPC calls, mapping memory to the DSP, etc.

Signed PDs

Signed PDs execute as regular DSP processes and can have access to drivers and resources not available to Unsigned PDs. They are intended to be used as extensions to the underlying operating system and frameworks, and are isolated from their CPU-side client processes. Only signed libraries can be loaded to signed PDs.

On the cDSP Signed PDs are used for system services such as camera processing: The camera streamer cannot be accessed from Unsigned PDs. Other DSPs in the system only support signed PDs, and all code must be signed before it can be loaded on them.

Most Hexagon system libraries are shipped as signed so that they can be loaded to both signed and unsigned PDs.

Lahaina and later products restrict launching signed PDs to specified whitelisted applications only. To avoid issues, all clients should use unsigned PDs for their cDSP offload if possible.

All code loaded into Signed PDs must be signed unless the device has a test signature installed. See Device signing for discussion on how to install test signatures on test devices. For production devices System integrators can hash libraries as part of the image build process or create and install new Trusted Code Groups (TCGs) used to sign modules for multiple builds. System integrators should contact Qualcomm Customer Engineering for documentation and support on signing software for production devices.

CPZ

Content Protection Zone (CPZ) PDs are sandboxed user processes that can be used to post-process DRM-protected video content. Only System integrators can develop or integrate software that runs in CPZ PDs, and CPZ processing can only be used as part of the system video processing framework. For more details contact Qualcomm Customer Engineering.

Secure PD

The cDSP supports a sandboxed Secure PD that can be used to improve the performance of secure use cases such as face authentication by offloading algorithms from secure execution environments on the application CPU to the cDSP. Only System integrators can develop or integrate software that runs in the secure PD. For more details contact Qualcomm Customer Engineering.

DSP rebuilding/hashing

System integrators can rebuild the DSP image. This may be necessary to add or modify code running in static processes, or to change certain configuration settings. This SDK focuses on developing dynamically loadable DSP modules that can be installed separately, and rebuilding the image is outside of the scope. System integrators should work with their Qualcomm Customer Engineering contacts to get more information.

As part of the image build process system integrators can statically hash dynamically loadable modules so that they can be loaded in Signed PDs for the corresponding DSP build.

Binary Compatibility

The System integrators must ensure that all Hexagon binaries to be loaded (be it through static or dynamic linking) on a given target DSP are compiled with the same major Hexagon Tools revision and for the same architecture flavor as the image running on that target DSP in order to be assured binary compatibility between the DSP image and the loaded objects. Target DSPs supported by this SDK are listed, along with their archiecture versions and the Hexagon Tools versions used to build their run-time images, in the Hexagon SDK Feature Matrix.

As an example, the Feature Matrix column for the Lahaina target shows that its cDSP architecture version is v68, and its image is built with Hexagon Tools 8.4.04. Hence the system integrator must ensure that all objects to be run on the Lahaina cDSP are built with an 8.4.x toolchain version, for the v68 architecture flavor, to be assured run-time compatibility.

OEM config

System integrators and OEM/integrator partners can restrict certain shared objects to load on to DSP, that has security vulnerabilities by programming oemconfig.so with blacklisted versions.