sysMonApp
sysMonApp is an Android application that configures and displays from multiple performance-related services on the cDSP, aDSP, and sDSP. These services are explained in the following sections.
Service | Description |
---|---|
Profiler | Profile any DSP to gather clock information, resource usage, processor and thread load distribution, bus bandwidth metrics, and so on. |
Getstate | Get the current clock voting information and heap statistics of each static Protection Domain. |
DCVS | Enable or disable the Dynamic Clock Voltage Scaling (DCVS) feature executing on each DSP. |
Clocks | Set the DSP core clock and bus clocks. |
Thread info | For each active software thread, display their priority, declared stack size, and maximum stack used. |
TLP | Thread-level profiling (TLP) that monitors software thread activity. |
ETM trace | Enable the ETM to trace the instruction and data memory and provide information about dynamic modules loaded on the cDSP. |
HTA profiler | Profile the HTA to gather clock information, DDR bandwidth, and various HTA activity statistics. |
getPowerStats | This service reports time spent in power collapse, low power island (LPI) and each core clock level of selected Q6 subsystem since device boot up or last reset. |
pinfo | Display the statistics (e.g. thread priority, stack size) of FastRPC-spawned user processes. |
getinfo | Display information about the DSPs. |
getload | Display DSP load statistics. |
sysMonApp supports all targets covered by the Hexagon SDK with the exception of Lahaina. For more details, see the feature matrix.
Setup
The Hexagon SDK includes the following LA and LE variants of sysMonApp. These variants are present at the indicated locations:
- LA 64-bit verion:
${HEXAGON_SDK_ROOT}/tools/utils/sysmon/sysMonApp
- LA 32-bit version:
${HEXAGON_SDK_ROOT}/tools/utils/sysmon/sysMonApp_32Bit
- LE 64-bit verion:
${HEXAGON_SDK_ROOT}/tools/utils/sysmon/sysMonAppLE
- LE 32-bit version:
${HEXAGON_SDK_ROOT}/tools/utils/sysmon/sysMonAppLE_32Bit
Push the version for your device to a location of your choice, and then change the permissions to make the file an executable. For example, on rooted Android devices, we recommend you push your executable into a specified folder as part of the data
partition:
adb push ${HEXAGON_SDK_ROOT}/tools/utils/sysmon/sysMonApp /data/local/tmp/
adb shell chmod 777 /data/local/tmp/sysMonApp
Execute sysMonApp from the ADB shell by entering the following command:
adb shell /data/local/tmp/sysMonApp <service> <arguments related to the service>
When sysMonApp is executed without a service name, the help page is displayed:
adb shell /data/local/tmp/sysMonApp
Profiler service
Use the sysMonApp profiler option to profile services running on one of the Hexagon DSPs. Gather information such as:
- The clocks voted for
- Resource usage
- Load distribution across available hardware threads
- Load on the processor
- Bus bandwidth metrics
- And other profiling metrics
The metrics are useful in measuring performance, debugging performance related issues, and identifying possible optimizations.
To see the profiler service help page:
adb shell /data/local/tmp/sysMonApp profiler -help
Usage
sysMonApp profiler [options]
Options | Expected values | Default value | Description |
---|---|---|---|
--samplingPeriod | Integer >=0 | 0 if debugLevel==1 1 otherwise |
Sampling period (in milliseconds) at which the eight PMU events are collected. When samplingPeriod is set to 0, the sampling interval is chipset and subsystem dependent but usually on the order of 1 through 50 ms. |
--debugLevel | 0/1/2 | Chipset dependent | Select a sampling mode. 0 - User mode with four customized PMU counters. 1 - Default mode. 2 - User mode with eight customized PMU counters. See below for more details. |
--dcvsOption | 0/1 | 1 if debugLevel==1 0 otherwise |
0 - Disable DCVS for the profiling duration. 1 - no DCVS override, DCVS enablement is controlled by the applications running on the selected DSP. See below for more details. |
--profileFastrpcTimeline | 0/1 | 0 | Disable (0) or enable (1) the option of collecting the time spent by the FastRPC interface. |
--noMeasured | 0/1 | 0 | Enable (0) or disable (1) reporting measured bus clock frequencies. |
--stidArray | Comma-separated Software Thread ID (STID) values | disabled | User-provided list (comma separated) of STID values. For more details, see below. |
--q6 | adsp, cdsp, sdsp | adsp | Select the Hexagon DSP to profile. |
Sampling modes
The 'debugLevel' option controls the PMU sampling modes. Three sampling modes are supported:
- Default mode ('--debugLevel 1')
- User mode with four customized PMU counters ('--debugLevel 0')
- User mode with eight customized PMU counters ('--debugLevel 2')
In Default mode, a fixed set of eight PMU events (MPPS, AXI read and write bandwidth, HVX MPPS, and so on) are monitored. (For more details, see the sysMon parser metric description.)
The eight PMU events chosen in this mode are specific to the chipset and subsystem (aDSP/cDSP/sDSP). You cannot override these events.
In User mode, you can configure which PMU events to assign to four (or eight) of the PMU counters. Furthermore, you can also use the --stidArray
option to configure the counters to only increment when the configured events occur on specified software threads. Select any number of PMU events to count in a /data/pmu_events.txt
file. sysMonApp rotates through the requested events, four (or eight) at a time, to generate a sampling of all the requested events. If the --stidArray
option is used to confine the counting to only specified software threads, sysMonApp further rotates through the specified software threads for each set of four (or eight) PMU events assigned to the PMU counters at a given time.
For example, in User mode 0 (four user-defined PMUs), assume an STID array specifies two threads and 10 user-defined events:
- The sysMon profiler service collects a set of samples for the first four user-defined PMUs for the first STID, and then for the second STID.
- It then moves on to collect samples for the next four user-defined PMU events for the first STID, and then the second STID.
- Next, it collects samples for both the last two and again the first two user-defined PMU events, for the first STID, and then the second STID, and so on.
- Throughout the entire time, the profiler also collects samples for four predefined PMU events. These events are not filtered by STID: they are collected for all software threads with an STID equal to 0.
DCVS monitors a set of PMU events for decision making. When --debugLevel 0 is selected, DCVS will be restricted to using the available 4 PMU counters (while the other four PMU counters will be used for user profiling), hence the DCVS decisions may not be coherent with the default mode (where all 8 PMU counters are available to DCVS). DCVS will remain disabled in '--debugLevel 2' profiling as all the available PMU counters are used for user profiling.
PMU selection
As explained earlier, STIDs allow you to filter some PMU events captured by the profiler service.
To filter PMU counters by software thread, follow these steps:
-
Programmatically assign an STID to each thread that is to be filtered by the profiler.
Use the
qurt_thread_attr_enable_stid
API at the time of thread creation. The second parameter of this API is an 8-bit unsigned number that has the following meaning:- 0: No STID is assigned. No PMU-based filtering is applied to this thread.
- 1: QuRT assigns an STID that is not already in use.
- 2 through 255: This value is used as the STID value. QuRT does not check whether that STID is already in use, and thus multiple threads might share the same STID.
The following code snippet shows how to use
qurt_thread_attr_enable_stid
:// Standard way of setting some thread attributes qurt_thread_attr_t attr; qurt_thread_attr_init (&attr); qurt_thread_attr_set_name(&attr, p_ThreadName); qurt_thread_attr_set_stack_addr (&attr, p_StackBase); qurt_thread_attr_set_stack_size (&attr, stackSize); /* The following line enables STID allocation to the software thread being created by * requesting QuRT to assign an available STID to the thread during qurt_thread_create. */ qurt_thread_attr_enable_stid(&attr, 1); result = qurt_thread_create(&tid, &attr, (void*)entry, NULL);
-
Use the tinfo service while the application is running to determine or confirm which STIDs are assigned to the threads on which STID filtering is required.
-
Run the sysMonApp profiler service with the
--stidArray
option that lists the STIDs.For example, to start the profiler service on the cDSP, enter the following command:
adb shell /data/local/tmp/sysMonApp profiler --debugLevel 0 --stidArray 1,2,3,4 --q6 cdsp
This command starts the profiler in User mode with four customized PMU events (
debugLevel 0
). The--q6
option overrides the default, which is \adsp`.As explained previously, four user-configured PMU events are selected per sample in User mode, while the other four are the defaults and remain constant. STID filtering only applies to the four user-configured PMU events, and the defaults are not filtered (STID mask is 0) and thus collected continuously. In a profiling sample captured over a provided sampling period, only one STID value at a time is applied as a filter to the four user-configured PMU counters. In this example, a sequence of 50 PMU events for threads with STID values of 1, 2, 3, and 4 are collected one at a time.
NOTES:
- By default, STID-based filtering is disabled. You can enable it by using the
--stidArray
option in conjunction with--debugLevel
to select one of the two user modes. - Not all PMU events are maskable under STID.
- An STID cannot be assigned to a thread that has already been created.
- Due to time division multiplexing (iterating over provided STIDs over profiling samples), selecting multiple STIDs as filters might not provide a complete picture for a given STID (missing time slots where other STID is configured). In cases where a continuous (time domain) filtering is required per STID, pass only one STID to sysMonApp.
- The
--stidArray
list has a limit of eight STIDs.
Usage examples
-
Run the sysMonApp profiler service in User mode.
adb shell /data/local/tmp/sysMonApp profiler --debugLevel 0 Starting Profiler with parameters: Q6 Processor: adsp Sampling Interval in ms : 1 Total samples :0 samplesInSet: 50 Default Mode : 0 dcvs enable : 0 no. of stids: 0 Domain Configured ADSP Q6 architecture detected as v66... Opening /data/pmu_events.txt file /data/pmu_events.txt not found, going ahead with default events, no. of events = 84 opening outputfile @/sdcard/sysmon.bin Enabling DSP SysMon using FastRPC Allocating output buffer >> Starting thread to Query DSP SysMon for samples >> Waiting for a keyboard iHTAt...
-
Run the sysMonApp profiler service in Default mode.
adb shell /data/local/tmp/sysMonApp profiler --debugLevel 1 Starting Profiler with parameters: Q6 Processor: adsp Sampling Interval in ms : 0 Total samples :0 samplesInSet: 50 Default Mode : 1 dcvs enable : 1 no. of stids: 0 Domain Configured ADSP Q6 architecture detected as v66... opening outputfile @/sdcard/sysmon.bin Enabling DSP SysMon using FastRPC Allocating output buffer >> Starting thread to Query DSP SysMon for samples >> Waiting for a keyboard iHTAt...
-
Run the sysMonApp profiler and capture samples at sampling intervals of 10 ms on the cDSP in User mode, and capture the FastRPC timeline packets.
adb shell /data/local/tmp/sysMonApp profiler --debugLevel 0 --samplingPeriod 10 --q6 cdsp --profileFastrpcTimeline 1 Starting Profiler with parameters: Q6 Processor: cdsp Sampling Interval in ms : 10 Total samples :0 samplesInSet: 50 Default Mode : 0 dcvs enable : 0 no. of stids: 0 Domain Configured Compute DSP Running FastRPC Timeline Profiling in parallel... Q6 architecture detected as v66... Opening /data/pmu_events.txt file /data/pmu_events.txt not found, going ahead with default events, no. of events = 84 opening outputfile @/sdcard/sysmon_cdsp.bin Enabling DSP SysMon using FastRPC Allocating output buffer >> Starting thread to Query DSP SysMon for samples >> Profiling FastRPC Timelines in parallel >> Waiting for a keyboard iHTAt...
Data collection
The sysMonApp profiler stores raw profiling data on the device in either the /sdcard
or /data
folders:
- For the aDSP, the file name is
sysmon.bin
. - For the cDSP, the file name is
sysmon_cdsp.bin
. - For the sDSP, the file name is
sysmon_sdsp.bin
.
The sysMonApp profiler also prints the output file path with the appropriate file name in the standard output. When you finish profiling, pull the file from the device and postprocess it using the sysmon parser on a host machine.
To pull the profiler output file from device using ADB, enter the following command:
adb pull /sdcard/sysmon<DSP>.bin <destination directory>
Where <DSP>
takes the value _cdsp
or _sdsp
to designate the cDSP or sDSP, respectively. For the aDSP, do not use <DSP>
.
Data postprocessing
To postprocess the profiling data, see the instructions on how to use the sysmon parser.
Getstate service
Use the getstate service to determine the current clock voting information and heap statistics of each static Protection Domain.
Usage
sysMonApp getstate [--getVotes <0/1>] [--q6 <dsp>]
Where the --q6
option allows you to specify the DSP to query: adsp
, cdsp
, or sdsp
.
--getVotes 1
can be used to display DCVS_V2/V3 votes done via HAP_power_set() call.
Example
The following command queries the clock and heap statistics of the ADSP:
adb shell /data/local/tmp/sysMonApp getstate
Domain Configured ADSP
DSP Core clock :576.00MHz
SNOC Vote:1.26MHz
MEMNOC Vote:0.00MHz
GuestOS : Total Heap:1792.00KB
Available Heap:514.49KB
Max.Free Bin:433.02KB
Audio PD : Total Heap:9216.00KB
Available Heap:8170.21KB
Max.Free Bin:8059.44KB
Measured SNOC (/clk/snoc) :200.00MHz
Measured BIMC (/clk/bimc) :681.66MHz
Measured CPU L3 clock :1612.80MHz
The following command queries the clock and heap statistics of the CDSP:
adb shell /data/local/tmp/sysMonApp getstate --q6 cdsp
Domain Configured Compute DSP
DSP Core clock :384.00MHz
SNOC Vote:0.00MHz
MEMNOC Vote:0.62MHz
GuestOS : Total Heap:2560.00KB
Available Heap:1622.56KB
Max.Free Bin:1575.28KB
Measured SNOC (/clk/snoc) :200.00MHz
Measured BIMC (/clk/bimc) :681.66MHz
Measured CPU L3 clock :1612.80MHz
getPowerStats service
Reports time spent in power collapse, low power island (LPI) and each core clock level of selected Q6 subsystem since device boot up or last reset.
Usage
sysMonApp getPowerStats [--clear 1] [--q6 <dsp>]
Where the --q6
option allows you to specify the DSP to query: adsp
, cdsp
, or sdsp
.
--clear 1
can be provided to clear the stats collected so far and start fresh logging.
Example
The following command queries the power statistics of the ADSP:
adb shell /data/local/tmp/sysMonApp getPowerStats
Domain Configured ADSP
Clock freq.(MHz) Active time(seconds)
307.20 7.28
576.00 0.01
614.40 9.60
768.00 0.78
940.80 1.93
960.00 0.10
1171.20 0.05
1324.80 0.00
1401.60 0.03
Power collapse time(seconds): 17181.12
Low Power Island time(seconds): 38.59
Current core clock(MHz): 307.20
Total time(seconds): 17239.49
The following command queries the power statistics of the CDSP:
adb shell /data/local/tmp/sysMonApp getPowerStats --q6 cdsp
Domain Configured Compute DSP
Clock freq.(MHz) Active time(seconds)
364.80 0.10
556.80 0.00
768.00 0.08
960.00 0.00
1171.20 0.18
1324.80 0.00
1382.40 0.00
Power collapse time(seconds): 17276.43
Low Power Island time(seconds): 0.00
Current core clock(MHz): 364.80
Total time(seconds): 17276.79
The following command queries the power statistics of the SDSP:
adb shell /data/local/tmp/sysMonApp getPowerStats --q6 sdsp
Domain Configured Sensors DSP
Clock freq.(MHz) Active time(seconds)
423.00 0.23
557.00 0.00
672.00 0.22
845.00 1.18
960.00 0.00
1075.00 0.01
Power collapse time(seconds): 13988.50
Low Power Island time(seconds): 39.77
Current core clock(MHz): 423.00
Total time(seconds): 14029.91
pinfo service
This service displays FastRPC-spawned user process statistics.
Usage
sysMonApp pinfo [--maxT] [--q6 <dsp>]
- Use
--q6
to specify the DSP to query (adsp
,cdsp
, orsdsp
) - Use
--maxT
option to create multiple threads on the DSP subsystem within sysMonApp process and reports maximum number of successful thread creations.
Example
The following command displays the FastRPC-spawned user process statistics of the CDSP:
adb shell /data/local/tmp/sysMonApp pinfo --q6 cdsp
Domain Configured Compute DSP
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Info of UserPD : 1
NAME is : /frpc/f05b8b10 sysMonApp ASID is : 772 PIDA is : 6493 PD State is : InitDone PD Type is : SignedDynamic
User heap Used Bytes in KB : 53.58
RPC Total Memory in KB : 512.00 RPC Memory Used in KB : 35.28
UserPD Threads info:
DSP FastRPC Threads:
T1 : Name : /frpc/f05b8b10 tidQ : 5494 tidA : 6493 Thread state is : Running Priority : 192 Allocated Stack : 16384
T2 : Name : /frpc/f05b8b10 tidQ : 7553 tidA : 6495 Thread state is : Running Priority : 192 Allocated Stack : 16384
T3 : Name : tidQ : 6534 Thread state is : Running
DSP Non-FastRPC Threads:
T4 : Thread Name : user_reaper Priority : 64 Allocated Stack : 4096
T5 : Thread Name : gc_thread Priority : 192 Allocated Stack : 4096
T6 : Thread Name : HAP_par_thread_ Priority : 192 Allocated Stack : 4096
T7 : Thread Name : HAP_par_thread_ Priority : 192 Allocated Stack : 4096
T8 : Thread Name : exception_handl Priority : 192 Allocated Stack : 4096
T9 : Thread Name : HAP_par_thread_ Priority : 192 Allocated Stack : 4096
T10 : Thread Name : HAP_par_thread_ Priority : 192 Allocated Stack : 4096
T11 : Thread Name : HAP_par_thread_ Priority : 192 Allocated Stack : 4096
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
**********************************************************************************
Total memory borrowed from HLOS by all PDs in KB : 512.00
Number of times memory borrowed from HLOS : 1
Total Threads created on CDSP by all PDs: 11
**********************************************************************************
**********************************************************************************
Maximum Threads Can be Created: 58
**********************************************************************************
getinfo service
Use the getinfo service to display details of DSP. This service displays the following information for a given DSP: * Q6 version. * Maximum h/w threads. * Core Clock frequencies. * L2 and VTCM memory size. * HMX clock frequencies (v75 onwards).
Usage
sysMonApp getinfo [--q6 <dsp>]
- Use
--q6
to specify the DSP to query (adsp
,cdsp
, orsdsp
)
Example
The following command displays the following details about the ADSP:
adb shell /data/local/tmp/sysMonApp getinfo --q6 adsp
Domain Configured ADSP
Q6 Version : v66
Max HW Threads : 4
L2 Cache Size : 0.5 MB
L2 TCM Size : 1.5 MB
Core Clock plans(in MHz) : {
562.500 , LOW_MINUS ,
750.000 , LOW ,
918.750 , LOW_PLUS ,
1143.750 , NOMINAL ,
1293.750 , NOMINAL_PLUS ,
1368.750 , HIGH ,
}
getload service
This service displays instantaneous Q6 load statistics of the selected DSP subsystem.
Usage
sysMonApp getload [--time <duration_ms>] [--iterations <num_iter>] [--q6 <dsp>]
- Use
--q6
to specify the DSP to query (adsp
,cdsp
, orsdsp
) - Use
--time <duration_ms>
to specify the duration in ms for making load measurements (default: 1000ms) - Use
--iterations <num_iter>
to provide profiling statistics foriterations (default: 1)
Example
The following command displays the following load statistics for the CDSP:
adb shell /data/local/tmp/sysMonApp getload --q6 cdsp
Time programed is:1000
No of Iterations:1
Domain Configured Compute DSP
Q6 architecture detected as v68...
opening outputfile @/sdcard/sysmon_cdsp.bin
Enabling DSP SysMon using FastRPC
Allocating output buffer
>> Starting thread to Query DSP SysMon for samples
>> Waiting for profile duration (2 s) to elapse...
=========================================================
=========================================================
METRICS Units AVG MAX
=========================================================
MPPS Packets/Sec 0.52 80.10
pCPP Cycles/Packet 5.22 6.21
AXI Cached Rd BW MBPS 2.32 282.02
AXI Cached Wr BW MBPS 0.02 2.88
Eff Q6 Freq MHz 2.73
=========================================================
=========================================================
***************************EXITING!***************************
>> Time elapsed, sending kill to Query thread...
>> Waiting for the Query thread to join...
The output bin file is placed @ /sdcard/sysmon_cdsp.bin
Domain Configured Compute DSP
DSP Core clock :375.00MHz
SNOC Vote:0.00MHz
MEMNOC Vote:0.62MHz
GuestOS : Total Heap:3072.00KB
Available Heap:1917.71KB
Max.Free Bin:1859.91KB
Measured SNOC (/clk/snoc) :240.00MHz
Measured BIMC (/clk/bimc) :2096.44MHz
DCVS service
This service is used to enable or disable the Dynamic Clock Voltage Scaling (DCVS) feature executing on each DSP.
Usage
sysMonApp dcvs <enable/disable> [--q6 <dsp>]
Where:
enable
ordisable
enables or disables DCVS on the selected DSP until the target is rebooted or this setting is modified again.--q6
option allows you to specify the DSP to query:adsp
,cdsp
, orsdsp
.
Example
To enable DCVS on the aDSP:
adb shell /data/local/tmp/sysMonApp dcvs enable
Domain Configured ADSP
Successfully enabled adsp DCVS
Clocks service
Use clocks service to set or reset the DSP core clock and bus clocks of the specified DSP. For architecture v75 onwards, clocks service can also be used to set or reset the HMX clock and Q6-CENG bus clock.
Usage
Three clocks service actions are available:
Clocks set
Use the clocks set service to set the minimum clock frequency for a given subsystem or vote for a sleep latency:
sysMonApp clocks set [options]
Options | Values | Description |
---|---|---|
--coreClock | MHz | Sets the minimum clock speed at which the DSP should run. |
--busClock | MHz | Sets the minimum AXI (DSP <-> AXI) bus frequency when the DSP is active. |
--hcpBusClock | MHz | Sets the minimum HCP (HCP <-> DDR) bus frequency (cDSP only). |
--dmaBusClock | MHz | Sets the minimum DMA (DMA <-> DDR) bus frequency (cDSP only). |
--sleepLatency | uSec | Specified DSP sleep latency vote. Used by the sleep driver to choose one of the appropriate low power modes that satisfy the latency requirement. |
--hmxClock | MHz | Sets the minimum clock speed at which the HMX unit should run (v75 onwards). |
--cengClock | MHZ | Sets the minimum clock for the Q6-CENG bus interface (v75 onwards). |
--q6 | adsp, cdsp, sdsp | Execute the service for the selected DSP. |
NOTE: A vote of 0 resets the settings for that option.
Clocks limit
Use the clocks limit service to put an upper limit (maximum) on the selected DSP core clock:
sysMonApp clocks limit --coreClock <frequency_MHz> [--q6 <dsp>]
The DSP core clock is capped at the nearest available clock frequency even when the applications request an available higher frequency.
Given that each target has a finite set of available clock rates, you can select exactly one of these clock rates by setting the minimum (with the clocks set service) and maximum (with the clocks limit service) rates. If a selected range does not include a valid clock rate, the system will be forced to pick a clock rate outside the requested range.
Starting with v75, the --coreClock
option can also be used to limit the HMX clock frequency and Q6-CENG bus clock frequency on the CDSP as follows:
sysMonApp clocks limit --hmxClock <frequency_MHz> --q6 cdsp
sysMonApp clocks limit --cengClock <frequency_MHz> --q6 cdsp
Clocks remove
Use the clock remove service to reset all clock settings to their default values:
sysMonApp clocks remove [--q6 <dsp>]
Examples
NOTE: All sysMonApp invocations are followed by calling the getstate service to show the new DSP clock settings. For the sake of brevity, these calls are not reproduced in the following command output examples.
To set the CDSP core clock speed to 400 MHz and the DMA clock speed to 100 MHz:
adb shell /data/local/tmp/sysMonApp clocks set clocks set --coreClock 400 --dmaBusClock 100 --q6 cdsp
Domain Configured Compute DSP
Calling CDSP set clocks function with following parameters:
Core clock : 400 MHz
DMA AXI clock : 100 MHz
Successfully set the required clock configurations, Call the remove API once done...
Example: sysMonApp clocks remove
To reinitialize the clock speeds to their default values:
adb shell /data/local/tmp/sysMonApp clocks remove --q6 cdsp
Domain Configured Compute DSP
Resetting clock votes and limits...
Successful in resetting the votes and limits...
To disable the power collapse of the aDSP, use a low sleep latency:
adb shell /data/local/tmp/sysMonApp clocks set --sleepLatency 10
Domain Configured ADSP
Calling adsp set clocks function with following parameters:
Sleep latency vote : 10usec
tinfo service
For each active software thread, use the tinfo service to display the thread's priority, declared stack size, and maximum stack used.
Usage
sysMonApp tinfo [--getstack <PD>] [--q6 <dsp>]
Where <PD>
specifies the protection domain (PD) on which to request information. The values are:
- audio
- sensors
- user
- all (to display information for all PDs)
Examples
To see the software thread information for the audio PD on the aDSP:
adb shell /data/local/tmp/sysMonApp tinfo --getstack audio
Domain Configured ADSP
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ThreadName ThreadID ThreadPrio PID STID DeclaredStackSize(Bytes) MaxUsedStackSize(Bytes)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
audio_process_r 243 32 1 0 4096 484
user_reaper 240 32 1 0 4096 248
rcinit 239 112 1 0 6144 492
rcinit_worker 238 94 1 0 6144 1528
UIST 237 60 1 0 1952 0
TMR_CLNT_1 236 19 1 0 4096 276
DIAG_LSM 235 237 1 0 4096 476
/frpc/AudioPD 234 192 1 0 4096 844
dog_vir_task1 233 93 1 0 4096 476
dog_hb 232 121 1 0 4096 412
NPA_ASYNC_EVENT 231 48 1 0 8192 136
NPA_ASYNC_REQUE 230 29 1 0 8192 136
GPIOINT_SRV 229 4 1 0 4096 136
DALTF_TH_0 228 252 1 0 8192 48
DALTF_TH_1 227 253 1 0 8192 48
DALTF_TH_2 226 254 1 0 8192 48
DALTF_TH_3 225 255 1 0 8192 48
IPCRTR_RDR 224 204 1 0 4096 640
QMI_PING_SVC 223 10 1 0 2560 540
gen_cb_ctxt 222 205 1 0 2048 204
sr_notif_worker 221 93 1 0 4096 960
sr_notif_signal 220 93 1 0 2048 80
UTMR_CLNT_1 218 18 1 0 4096 168
217 94 1 0 4096 248
qdssc_svc_task 216 10 1 0 4096 48
smlworker 215 152 1 0 4096 840
i2c_qdi_cb 214 14 1 0 4096 344
APR_QDI_USR 213 50 1 0 8064 608
AMDB0 212 69 1 0 8192 728
AMDB1 211 69 1 0 8192 548
AMDB2 210 69 1 0 8192 648
AMDB3 209 69 1 0 8192 728
AVT 208 1 1 0 4096 104
hw_af_ist 207 2 1 0 1024 80
VTM 206 11 1 0 2560 184
VDS1 205 12 1 0 2560 208
VMX1 204 32 1 0 4096 296
VMX2 203 32 1 0 4096 296
VPM 202 52 1 0 12288 336
VSM 201 35 1 0 16384 368
AfeS 200 34 1 0 5120 1520
MXAT 199 38 1 0 8192 336
MXAR 198 38 1 0 8192 820
dma_typ_dflt_is 197 2 1 0 1024 80
dma_typ_hdmi_is 196 2 1 0 1024 80
AfeDataMgr 195 31 1 0 4096 192
CodecIntHldr 194 37 1 0 4096 860
RXSR 193 38 1 0 4096 312
TXSR 192 38 1 0 4096 312
ADM 191 37 1 0 12544 812
ASM 190 37 1 0 16640 736
VfrD 189 10 1 0 4096 480
LSM 188 71 1 0 2048 336
USM 187 69 1 0 4096 384
AfeANC 186 90 1 0 5120 320
ACS 184 37 1 0 16384 792
MVM 183 50 1 0 4096 712
CVD_CAL_LOGG 182 109 1 0 2048 364
SlimBusQmiSvc 181 39 1 0 4096 1068
usb 180 32 1 0 2040 420
irq11 177 4 1 0 2048 344
SlimBusMsg 176 207 1 0 4088 624
SlimBusMaster 175 207 1 0 4088 616
irq86 173 4 1 0 2048 96
SlimBusMsg 172 207 1 0 4088 584
SlimBusMaster 171 207 1 0 4088 544
irq87 4266 4 1 0 2048 336
err_ex_pd_1 167 1 1 0 4096 248
mem_gc_thread 166 192 1 0 4096 56
/frpc/audiopd 165 192 1 0 16384 1172
/frpc/audiopd 164 192 1 0 16384 988
irq12 94382 4 1 0 2048 336
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
To see the software thread information for the user PD on the cDSP:
adb shell /data/local/tmp/sysMonApp tinfo --getstack user --q6 cdsp
Domain Configured Compute DSP
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ThreadName ThreadID ThreadPrio PID STID DeclaredStackSize(Bytes) MaxUsedStackSize(Bytes)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
/frpc/c04d3240 41193 32 12 0 65536 940
user_reaper 41190 32 12 0 4096 248
mem_gc_thread 61666 192 12 0 4096 64
HAP_par_thread_ 61662 192 12 0 8192 540
exception_handl 61669 192 12 0 4096 248
/frpc/c04d3240 41187 192 12 0 16384 1196
/frpc/c04d3240 61663 192 12 0 4096 3136
HAP_par_thread_ 61674 192 12 0 8192 2220
HAP_par_thread_ 209124 192 12 0 8192 1008
HAP_par_thread_ 41191 192 12 0 8192 752
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTE: The PID value allows you to distinguish between the different user PDs that are running.
Thread level profile (TLP) service
The TLP service provides profiling information for specified software threads. The profiling information includes cycles spent and packets executed per software thread.
The service also provides a --profile
option to run the sysmon profiler service in parallel to collect all DSP profiling data along with the software thread-specific information.
Usage
sysMonApp tlp [options]
Options | Values | Description |
---|---|---|
--samplingPeriod | Integer>=1. Default=50 | Sampling period in milliseconds for collecting profile statistics for a given software thread |
--profile | 0 (default)/1 | Execute the sysmon profiler in parallel (1). Execute the TLP service only (0). |
--tname | Thread names (case sensitive) to profile separated by , . Default = Profile all active threads. |
|
--duration | Integer>=1. Default=10 | duration (in seconds) after which the thread-level profiling will stop. |
--q6 | adsp, cdsp, sdsp | Execute the service for the selected DSP. |
NOTE: Thread names can be accessed by using the tinfo service.
To end the profiling service, press Enter. Then enter the following command to retrieve the output binary file created in /sdcard
(or /data
):
adb pull /sdcard/sysmontlp_<x>dsp.bin
Where a
, c
, or s
to designate the aDSP, cDSP, or sDSP.
If the sysmon profiler was executed in parallel using the --profile 1
option, follow the data collection instructions above to retrieve the generated output.
Postprocessing
Refer to the sysMon parser documentation for postprocessing TLP output binary files. sysmon parser can parse the output from the TLP service with or without the sysmon profiler output.
ETM trace service
Use the ETM service to trace the instruction and data memory on the cDSP only. The service also provides information about the dynamic modules that are loaded on the cDSP.
Usage
Two ETM trace services commands are available:
DLL command
The DLL command provides information about all the dynamic modules loaded on the cDSP. It also provides the information required to run the Hexagon Trace Analyzer tool.
Usage:
sysMonApp etmTrace --command dll
For each loaded DLL, the following information is provided:
- ELF_NAME: Name of the dynamic module that was loaded.
- LOAD_ADDRESS: Virtual address where the dynamic module was loaded.
- LOAD_TIMESTAMP: Timestamp when the dynamic module was loaded.
- UNLOAD_TIMESTAMP: Timestamp when the dynamic module was unloaded.
NOTE: This command can take around 30 seconds to complete.
Following is an example of output data showing when the FastRPC shell was loaded and unloaded.
data.ELF_NAME = fastrpc_shell_3
data.LOAD_ADDRESS = 0xe0d00000
data.ELF_IDENTIFIER = 0x00000000
data.LOAD_TIMESTAMP = 0x14b46e27fb1a
data.UNLOAD_TIMESTAMP = 0x14b46e3abe76
NOTE: A load address of 0x0 means the library was loaded statically (it is part of the DSP image).
ETM command
The ETM command traces instruction and data memory on the cDSP. For information on how to retrieve and parse the generated trace file, see the documentation on the Hexagon Trace Analyzer.
There are three supported trace modes:
Trace mode | Description |
---|---|
default | ETM does not send any cycle information. Keeps the amount of data sent by ETM to a minimum. This mode is helpful in analyzing the flow trace but does not help for any performance analysis. |
cycle-accurate | ETM sends cycle information for each packet. This results in emitting data at a higher rate per instruction and can occasionally lead to data loss due to buffer overlow. This mode is required to postprocess the trace with the Hexagon Trace Analyzer. |
cycle-coarse | ETM still tracks cycles but the tracking is not done per instruction, and thus it results in a lower data rate per instruction. It might be a good compromise for some performance analysis where the cycle-accurate mode overflows. |
Usage
sysMonApp etmTrace --command etm [--etmType <etm_type>]
Where the following values are supported for
etm_type | Description |
---|---|
pc_mem | Trace instruction and data memory. (Default.) |
pc | Trace instruction memory. |
mem | Trace data memory. |
cc_pc | Trace instruction memory with cycle-coarse mode. |
cc_mem | Trace data memory with cycle-coarse mode. |
cc_pc_mem | Trace instruction and data memory with cycle-coarse mode. |
ca_pc | Trace instruction memory with cycle-accurate mode. |
ca_mem | Trace data memory with cycle-accurate mode. |
ca_pc_mem | Trace instruction and data memory with cycle-accurate mode. |
HTA profiler service
Use the HTA profiler service to profile services running on the HTA to gather the following information:
- Clocks voted for
- DDR read and write bandwidths
- HTA activity statistics like HTA active time
- Measured and maximum inferences per second
- Number of layers
- Bus bandwidth metrics
These profiling metrics are useful in measuring performance, debugging performance-related issues, and identifying possible optimizations.
Usage
adb shell /data/local/tmp/sysMonApp profiler --q6 HTA [--cdsp 1]
Where the --cdsp 1
option allows the cDSP to be profiled in parallel with the HTA.
Data collection
The HTA profiler service stores raw profiling data in either the sysmon_HTA.bin
or sysmon_cdsp_HTA.bin
file, depending whether the cDSP was profiled in parallel with the HTA or not. These files are stored in the /sdcard/
or /data/
folders.
The sysMonApp profiler also prints the output file path and file name in the standard output. When you are finished with profiling, pull the file from the device and postprocess it using the sysmon parser on a host machine.
To pull the profiler output file using ADB:
adb pull /sdcard/sysmon[_cdsp]_HTA.bin <destination directory>
Data postprocessing
To postprocess the HTA profiling data, see the instructions on how to use the sysmon parser.