Profiling resources
The Hexagon SDK provides a number of profiling tools serving different purposes. The table below summarizes these options and their availability on the Hexagon simulator, target devices, or both. The most common profiling tools and techniques are illustrated in the profiling example.
Tool | Description | Simulator | Device |
---|---|---|---|
HAP_perf timer APIs | APIs to access elapsed cycles or time | Yes | Yes |
Hexagon simulator and profiler | Generate instruction and data cache analysis as well as function-level profiling reports | Yes | No |
Hexagon Trace Analyzer | An ETM trace analyzer utility that generates various analysis reports for cDSP | Yes | Yes |
sysMon | A set of monitoring tools that collect and display high-level DSP profiling information | No | Yes |
itrace | A library monitoring CPU and DSP events, including PMU counters, and generating various reports including gprof, Flamegraph, and Chrome trace output | No | Yes |
Timers
The timer APIs may be used to time a specific section of code by accessing the elapsed time or number of processor cycles.
Measuring time
The most accurate way of measuring time consists of accessing the Hexagon UTIMERHI:UTIMERLO
registers. This pair of 32-bit register is a direct measure of the elapsed time reported in ticks. One tick is 1/19.2 MHz seconds.
These registers can be accessed from C with the HAP_perf_get_qtimer_count
API by including HAP_perf.h
.
Alternatively, you can invoke HAP_perf_get_time_us
accessible from C by including HAP_perf.h
. This API returns directly the elapsed time in microseconds by deriving the time from the number of ticks and performing a division, which consumes some extra cycles.
Note: Measuring time is only meaningful when running on device.
Measuring processor cycles
For measuring elapsed processor cycles, access the UPCYCLEHI:UPCYCLELO
registers using the HAP_perf_get_pcycles
API available from C by including HAP_perf.h
.
Simulator profiling
The simulator generates profiling data that help assess where the bottlenecks are in an application. This information is useful to identify which functions will lead the most saving if properly optimized.
When profiling an application on the simulator, ensure that the simulator runs in its timing mode. That mode is enabled by using the --timing
option. This option results in the simulator counting cycles accurately at the cost of increasing simulation time.
The --statsfile
and --pmu_statsfile
options direct the simulator to gather statistics on instruction and cache utilization. For more information about these options, see the Simulator statistics files and PMU statistics files sections in the Hexagon Simulator User Guide (80-N2040-17).
In addition, the simulator may also be used to understand how cycles are spent in specific code regions. In order to do so, you must first run the simulator with the option --packet_analyze
. When run in timing mode and with this option, the simulator generates a packet analysis file, which can be postprocessed by the Hexagon profiler to generate a user-friendly HTML file displaying the following information:
- Total number of cycles executed
- Total number of stall cycles
- Highest cycle or stall counts (by function or instruction packet)
- Commit and stall statistics (by function or instruction packet)
- PMU event counts (by event type or instruction packet)
- Annotated disassembly of instruction packets
- Assembly instruction counts
For more information about the Hexagon profiler, see the Hexagon Profiler User Guide (80 N2040-10).
Load address of dynamic libraries
The profiler takes a comma-separated list of ELF files, but for shared objects you need to provide the run-time load address of these objects using the [:reloc] option:
hexagon-profiler --help
--elf=<file>[:reloc][,<file>[:reloc],...]
Input one or more Hexagon elf/obj/lib files for disassembly, with optional relocation offsets
to match their memory locations when the Hexagon code was run by hexagon-sim
The relocation address is displayed by the SDK loader when executing the code on the simulator:
%DEFAULT_HEXAGON_TOOLS_ROOT%\Tools\bin\hexagon-sim -mv66g_1024 --packet_analyze calculator.json --simulated_returnval --usefs hexagon_ReleaseG_toolv87_v66 --pmu_statsfile hexagon_ReleaseG_toolv87_v66/pmu_stats.txt --cosim_file hexagon_ReleaseG_toolv87_v66/q6ss.cfg --l2tcm_base 0xd800 --rtos hexagon_ReleaseG_toolv87_v66\osam.cfg %HEXAGON_SDK_ROOT%\rtos\qurt\computev66\sdksim_bin\runelf.pbn -- %HEXAGON_SDK_ROOT%\libs\run_main_on_hexagon\ship\hexagon_toolv87_v66\run_main_on_hexagon_sim -- calculator_q.so
...
try ./calculator_q.so: HIGH:0x5A:81:search.c
fs_region_create request with addr=0, size=1000
fs_region_create map region with vaddr=d8041000, paddr=5009d000, size=1000, perm=b, region_handle=1e03fc90
read headers 0x0 -> d8041000 (0x1000 B): HIGH:0x5A:539:map_object.c
_rtld_map_object_ex: sigverify skipped for ./calculator_q.so, no function specified!: HIGH:0x5A:598:map_object.c
fs_region_create request with addr=0, size=9000
fs_region_create map region with vaddr=d81f7000, paddr=500f7000, size=9000, perm=b, region_handle=1e03fcf0
mapped [d81f7000 - d8200000] (36864 Bytes): HIGH:0x5A:729:map_object.c
This indicates that the load address of the calculator_q.so
is 0xd81f7000
, which can then be passed to the hexagon-profiler
as follows:
%DEFAULT_HEXAGON_TOOLS_ROOT%\Tools\bin\hexagon-profiler --packet_analyze --json=calculator.json --elf=%HEXAGON_SDK_ROOT%\libs\run_main_on_hexagon\ship\hexagon_toolv87_v66\run_main_on_hexagon_sim,hexagon_ReleaseG_toolv87_v66\ship\calculator_q.so:0xd81f7000 -o calculator.html
Hexagon Trace Analyzer
The Hexagon Trace Analyzer, or HexTA, is a software trace analysis tool. It processes Hexagon ETM (Embedded Trace Macrocell) traces generated by the software running on the cDSP and derives the flow of each thread of the processor. It is a valuable tool for giving insights into code execution, and allows in-depth analysis and optimization. It can process traces generated on target or by the simulator.
Hexagon Trace Analyzer requires ETM traces to be collected from the cDSP. These traces are then parsed for the binaries that are loaded and post-processed to present data in a meaningful manner. The outputs of Hexagon Trace Analyzer include various .csv
files that give per function, per instruction and per section statistics. It also generates flame graphs, which provide a graphical view of the execution tree.
Hexagon Trace Analyzer can be found in the Hexagon SDK at the following location $HEXAGON_SDK_ROOT/tools/debug/hexagon-trace-analyzer
.
Prerequisites
Following are the prerequisites for using Hexagon Trace Analyzer:
-
Hexagon Trace Analyzer executable:
<HEXAGON_SDK_ROOT>/tools/debug/hexagon-trace-analyzer/hexagon-trace-analyzer
-
Linux
-
Python 3.7.0 installed with required python packages listed under
$HEXAGON_SDK_ROOT/utils/scripts/python_requirements.txt
-
This application is used to collect ETM traces and retrieve Hexagon shared object load addresses.
-
Two visualizing tools, FlameGraph and Perfetto are needed to display the output from the Hexagon Trace Analyzer.
-
The Perfetto visualizer will install automatically when accessing its URL for the first time. An internet connection won't be needed once installed.
-
FlameGraph should be installed in the
hexagon-trace-analyzer
directory as follows (using Linux instructions):cd ${HEXAGON_SDK_ROOT}/tools/debug/hexagon-trace-analyzer wget https://github.com/brendangregg/FlameGraph/archive/master.zip && unzip master.zip && rm master.zip && mv FlameGraph-master FlameGraph
-
Usage
Please refer to the profiling example for an example on how to collect a trace on simulator or device and process it with the Hexagon Trace Analyzer tool. The example also describes the various files generated by the tool and how to interpret them.
sysMon
Qualcomm provides a set of tools for monitoring high-level statistics for an application running on target. These tools are:
-
The sysMonApp, which is an Android executable that provides user functionalities such as profiling DSP workload, getting or setting various core and bus clocks, or collecting various software thread information.
-
The sysMon DSP profiler, which is an Android UI application using the sysMonApp profiler service to profile DSP workloads. This application is useful to monitor load distribution and bus activity over time.
-
The sysMon Parser, which is an executable used to postprocess profiling data collected by sysMonApp or the sysMon DSP profiler.
-
The sysMon marker API, which enables the collection of profiling data for a specific code region.