Download: 18Jan2014 (current), 15Dec2013, 13Dec2013, 02Nov2013, 25Oct2013, 05Oct2013, 29Sep2013, 23Sep2013, 20Sep2013, 18Sep2013, 15Sep2013
This package provides a couple of tools and libraries to support the C64+ DSP found in TI's OMAP3 hardware.
It includes the following components:
"c64_load" -- A COFF2 DSP image loader.
Must be run with root privileges.
This utility loads DSP images created with CodeComposerStudio / DSPBIOS 5.xx.
Licensed under the terms of the GPL.
"c64.ko" -- A Linux kernel module that handles messaging between the ARM (GPP) and the C64+ (DSP).
It also handles shared memory allocation (via the Linux CMA (contiguous memory allocator))
and provides a couple of utility functions to writeback/invalidate the ARM data caches.
When the module is loaded, it creates the "/dev/c64" device.
Licensed under the terms of the GPL.
"libc64" -- A utility library to be used with client applications.
It provides access to the "c64.ko" kernel module features and lets applications use
remote procedure calls to call functions on the DSP side.
Licensed under the terms of the LGPL.
"libc64_dsp" -- The DSP-side counterpart to "libc64".
It allows registration of DSP components whose commands (methods) can then be called in GPP
Licensed under the terms of the MIT license.
"go64.sh" -- A utility script that loads and starts a DSP image.
"c64_tools" supports multiple processes / multiple clients, although each process is limited to one client connection.
- changed DSP power on/off sequences to match those found in TI's SysLink.
This seems to fix stability issues on older OMAP3530 based Pandora devices
(newer DM3730 based devices worked fine with the old sequences)
Tests indicated that the stability issues were related to the way the DSP
was powered off in previous releases.
With the new power on/off code, the power consumption during c64_pwrbench
(powercycling stresstest) dropped from ~2.7W to ~1.6W.
DSP idle power consumption also dropped from ~80mW to ~10mW
(when the 'c64' device is not in use, the DSP will still be completely
powered off, though).
The new power up sequence also keeps the IVA2 video sequencer (ARM926) in reset.
- added: 'pwroff' kernel module parameter
Setting this to 0 (i.e. 'insmod c64.ko pwroff=0') will prevent the kernel module
from powering off the DSP when there are no client applications connected or
when all DSP applications have called dsp_suspend().
The DSP will still be powered off when the kernel module is unloaded via 'rmmod'.
- changed: updated go64.sh script to support new 'pwroff' kmod parameter.
# export C64_PWROFF=0
# export C64_PWRLOG=1
- changed: removed dsp_send_forced_message() call in c64_load/main.c
(this was only needed by very early version of c64_tools)
- added TC_L2SRAM_RAND_CHKSUM_* testcases (46..49) to 'c64_tc'
- changed: the CROSS_ROOT env.var. is optional now
(if it is not set, CROSS_KERNEL _must_ be set)
- added support for CROSS_KERNEL env.var. to force specific kernel source directory
- changed some debug messages to use KERN_DEBUG instead of KERN_INFO
(cosmetic change, only relevant for real (non-X11) consoles)
- fixed: dsp_power_notify() was not called if 'pwrlog' kmod option was set to 0 (false)
- changed: decreased udelay()s in kmod/dsp_c64.c
- added proper DSP startup sync (wait for signal/flag from DSP instead of udelay())
- added 'c64_pwrbench' testcase
(benchmarks DSP suspend/resume latency (~1.37 millisec @800Mhz, ~2.84 millisec @200Mhz))
- added c64_memview graphics example
(a realtime memory monitor. quite useless but fun. looks best after a fresh
reboot (unfragmented memory), e.g. load LibreOffice)
- added 'sramtest' DSP component
(various L1/L2 access pattern tests)
- added 'c64_sramtest' example
- optimized 'dsprite' DSP graphics component
Performance @800Mz is now (2000 32x32 ARGB32 sprites):
copy: 249 MPixel/sec
alphatest: 133 MPixel/sec
premul_srcover_saturate: 132 MPixel/sec
srcover: 113 MPixel/sec
- added 'age' DSP graphics component (WIP)
(Amiga "copper" style software display controller that currently emulates
up to 8 bitplanes, and up to 8 scrollregisters, ..work in progress..
currently performs @~90 MPixel/sec (8 bitplanes, ARGB32 target))
- added c64_age graphics example (WIP)
- added qdma_link1d() syscall
- added qdma_link2d() syscall
- fixed sS16 typedef (now signed)
- refactored SRAM malloc code to minim.h/minim.c
- renamed dsp_fshm_*() to dsp_l1sram_*() (old calls are marked deprecated)
- added dsp_physgpp_to_physdsp() API function (L3 to local interconnect addr. translation)
- added C64_IOCTL_L2SRAM_ALLOC to c64_dev.h
- added C64_IOCTL_L2SRAM_FREE to c64_dev.h
- added C64_IOCTL_L2SRAM_DEBUG to c64_dev.h
- added dsp_l2sram_alloc() API function
- added dsp_l2sram_free() API function
- added dsp_l2sram_debug() (private) API function
- added new c64_tc testcase TC_L2SRAM_ALLOCATOR
- added CMD_LINK1D to 'test_qdma' DSP component
- added CMD_LINK2D to 'test_qdma' DSP component
- added new c64_tc testcase TC_QDMA_LINK1D
- added new c64_tc testcase TC_QDMA_LINK2D
- added DSP_MAILBOX_RESET compile time option to dsp_config.h
- added C64_TOOLS_ROOT env.var. export to dsp/setenv.sh (for DSP out of tree builds)
- added setenv.sh (for GPP out of tree builds)
- moved GPP graphics examples from tests/ to gfx_tests/
- moved DSP graphics components from dsp/components/ to gfx_tests/dsp/components/
- moved dsp_component_id_find_by_name() to dsp_priv.h
- fixed multiprocess sync. in dsp_component_load()
- added CORE_FC_CMD_COM_OVERLAY_FIND fastcall to core DSP component
- added some cache writeback calls to fix sporadic DSP side component (un-)registration issue
- added dsp_suspend() API function
- deprecated dsp_power_off() (dsp_priv.h) function (falls back to dsp_suspend())
- moved dsp_resume() to public header file (dsp.h)
- fixed DSP resume which was essentially broken due to some forgotten test/debug code
(registered components were lost during dsp_resume())
- increased RPC timeouts
- added retry loop to dsp_rpc_send() to wait until DSP exits fastcall mode
- increased accuracy of 'c64_nops' utility
(measures DSP clock speed by executing/benchmarking a lot of NOP instructions)
- improved DSP power management: applications can now suspend/resume the DSP at any time
(DSP is powered off when all client apps. have requested the suspend state, and
resumes when first client app. calls dsp_resume())
- added 'c64_pwrtest' testcase
- added 'pwrlog' c64.ko kernel module option (default is 1 / true)
(if enabled, trace DSP suspend/resume actions in klog)
- updated 'go64.sh' to support pwrlog kmod option (C64_PWRLOG env. var. overrides def. pwrlog setting)
full list of 'c64_tc' testcases:
- added pnd_src/ folder
- added -nops cmdline option to c64_dsprite
(executes 800 million NOPs on DSP side, can be used to calibrate the DSP clock rate)
- added DSP_CACHE_W (caches writes / uncached reads) cache type
- added dsp_cache_inv_virt() API function
- added dsp_cache_wb_virt() API function
- added dsp_cache_wbinv_virt() API function
- added USE_DSP_POWER_NOTIFY build option to kmod.h
(if defined, call dsp_power_notify() before/after DSP is resumed/suspended)
- added USE_FORCED_LOWPOWER_BYPASS build option to kmod.h
- fixed: kmod/dev.c cache actions (inv/wb/wbinv) now use customized versions of v7_dma_map_area()
instead of falling back to (the slow) cache_flush_all()
- added support for huge pages (DSP_CACHE_HUGETLB cache type)
- added (a lot of) new c64_tc testcases (run "./c64_tc" to see a list)
- fixed: DSP build dependencies
- changed: DSP overlays can now be compiled w/o installing DSP/BIOS
- changed: build versioned libc64.so.1
- changed: renamed c64_msg_t to dsp_msg_t
- changed: renamed dsp_shm_init()/exit() to dsp_shm_alloc()/free()
- added: dsp_shm_alloc() memory attributes DSP_CACHE_NONE, DSP_CACHE_R (write-through), DSP_CACHE_RW
- added: dsp_fshm_alloc() and free() (and associated ioctls). Used for L1DSRAM mem. allocation in userspace.
The allocation granularity is 64 bytes (one cacheline).
- added: malloc()/free() style memory allocator for e.g. shared memory. Uses Doug Lea's "dl_malloc".
New API functions: dsp_mspace_create()/malloc()/memalign()/free()/destroy()
- fixed: terminate fastcall-sequence in kmod/dev.c in case client forgot to
- fixed: multiprocess overlay load race condition
- added: dsp_virt_to_phys() and dsp_phys_to_virt() API calls
- added: dsp_cache_wbinvall() API call
- added: DSP ringbuffer for printf(). New DSP syscalls puts()/printf()/vprintf(). New API call dsp_logbuf_print()
- added: new DSP syscalls qdma_init()/wait()/copy1d()/copy2d()
- added: new DSP example components test_logbuf, test_qdma, dsprite
- added: new GPP examples "c64_minimal" and "c64_dsprite"
- added: new "tests/c64_tc.c" testcases:
Download fixes: linker_fix-07Oct2013d.tar.gz
- fixed makefile dependencies for "dsp/components/" build
- fixed makefile dependencies for "dsp/core" build
- added prebuilt .o64 files for "dsp/core" and "dsp/libc64_dsp" projects
- fixed: removed references to "lnknone.a64P" and "rtdx64xplus.lib" in "pre.cmd" linker script.
Now the overlays images can be built without having the "bios_5_42_01_09" package installed
("xdctools_3_25_03_72" is not required by c64_tools at all).
To build just the overlay DSP images (example components):
1) Go to the "c64_tools/dsp/" directory, edit "setenv.sh" and adjust the TI_ROOT directory as necessary,
then run the script by issueing ". setenv.sh"
2) Type "m overlays" to rebuild the overlay images
Alternatively, type "m overlays_bin" to build the .out images incrementally or
"m overlays_clean" to remove the output files.
3) Use "m scp" to copy the images to the target
(first edit "c64_tools/scp.mk" and enter the hostname/IP address of your Pandora and your credentials)
- added: DSP auto suspend / resume (in suspend, DSP consumes no power).
The DSP is automatically suspended when the last application using it quits.
Vice versa, it resumes execution / is restarted when the next app. connects.
- added: support for dynamically loaded overlay images (4*256k + 1*1024k).
Overlays are unregistered when dsp_close() is called and their refcount becomes 0.
In case an application crashes, an "emergency unload" mechanism in the kernel module
takes care of this to prevent DSP code areas from getting lost.
The DSP build system automatically creates 4 .out file variants for the 256k areas
(linked to different base addresses).
Which of these 4 variants is loaded when dsp_component_load() is called depends on
what area is available / not already in use by other applications.
"demo_calc", "demo_checksum", "demo_calc_fastcall" are examples for "area2" (256k) overlays.
"demo_fractal" is an example for an "area3" (1024k) overlay.
- added: DSP-side global syscalls table. Currently includes some c64_tools utility functions,
cache management utilities, and mem/string handling functions.
- changed: DSP components no longer require DSP/BIOS (=> considerable decrease in code size)
For DSP component development, the following (free as in beer) TI packages are required:
- changed: cleaned up DSP build system and moved common .mk code into include files
- changed: cleaned up "core" tconf and split it into multiple .tcis (see dsp/tci/ folder).
"config.tci" is the central memory configuration file.
It is included by the new gen_link_areas.tks script, which generates the DSP linker
command files (.cmd) and the "overlay_sections.c" source, which is also included
by the GPP-side build.
Unless you want to change the memory layout, you do not need to worry about this, though.
- fixed: added mtx_clients mutex to kernel module to fix some (potential) multiprocess concurrency issues
- changed: partially rewrote fastcall protocol to use two separate ARM/C64+ cachelines.
A more complex handshake protocol is used now but fortunately that
did not have a negative effect on the performance
- changed: removed CCS projects and renamed "c64_ccs_projects/" to "dsp/".
- added: makefile based DSP build system. This require the following TI packages:
In order to build the DSP libraries and .out image, change to the "dsp/" directory,
run ". setenv.sh", then "m all" to start the build.
- added "reg_read" and "reg_write" utilities (for dev. purposes only)
- changed: when "c64_fractal" is compiled with no DSP support, it no longer requires "/dev/c64" to be present.
- fixed: DSP powerdown sequence when c64.ko module is unloaded
- added: DSP powersave mode. The system power consumption should now be not much different when the DSP is running.
** NOTE ** The Open Pandora SuperZaxxon 1.55 kernel update is now located in "pandora_sz1.55_kernel_update/".
- added Linux build scripts / makefiles for the DSP libraries and image: c64_ccs_projects-26Sep2013.tar.gz
- added: the c64.ko kernel module now supports select() (=> OS friendly message reception)
- added dsp_poll_enable() API function (increases message throughput by ~500% but is
not OS friendly (high GPP usage). default=use select).
- the dsp_cache_inv() and dsp_cache_wbinv() now fall back to flush_cache_all() since all
other functions (see "c64_kmod/dev.c") cause system instabilities. disabling interrupts
during cache invalidates did not change that.
- added "tests/omapfb.c" utility code for framebuffer access on OMAP3 / Open Pandora.
- added "c64_fractal" GPP example and "demo_fractal" component on DSP-side.
(a single precision floating point heavy effect. uses TI's IQmath library on the DSP side.
the source also contains several other alternative (slower) implementations.)
- changed: GPP applications do not require root privileges anymore (a bug in the last release)
- GPP applications now have access to the second half of the L1DSRAM of the DSP (24 kbytes)
The last 16 bytes of that area are used as IPC 'registers'. See "include/dsp_common.h".
- added dsp_fastcall_rpc() (can be called after the DSP has initiated a fastcall sequence)
- added dsp_fastcall_end() (must be called to finish the fastcall sequence)
- added dsp_rpc_send() and dsp_rpc_recv() (so that applications can do something else while
the DSP is busy)
- added new DSP "dsp_calc_fastcall" demo component that shows how to implement fastcall RPCs
- added testcase #6 to "tests/c64_tc.c" (fastcall RPC example)
- added testcase #7 to "tests/c64_tc.c" (SRAM access benchmark (=> ~3.89 million 32bit reads/writes per second)
- the "c64.ko" kernel module now allocates the DSP image area (fixed address) during init.
this means that it is no longer necessary to statically reserve memory for the DSP at boottime.
(thanks to Notaz for providing a new Linux kernel and a small example module !)
** NOTE ** You need the updated kernel (see "bin-20Sep2013/uImage-3", copy it to "/boot" on the Pandora)
for the kernel module binary to work.
Please use the "scripts/cma_dsp_mem/autoboot.txt" boot script (works on all Pandora editions).
In case you do not want to update your kernel for some reason, you have to undefine
"USE_PLATFORM_DRIVER" in "c64_kmod/kmod.c" and rebuild the module. You will also need
to use one of the "scripts/static_dsp_mem/*" boot scripts (depends on what edition you have).
- code cleanup
- improved IPC/messaging speed by factor 1.7 (~23500 message roundtrips per second)
- support for Linux CMA (contiguous memory allocator) (for dynamic GPP/DSP shared memory allocation)
- GPP-side cache utility functions (writeback/invalidate/writeback+invalidate)
- lots of new API functions (see "include/dsp.h")
- added Pandora "autoboot.txt" boot scripts to scripts/
(use autoboot.txt__CLASSIC for Pandoras with 256 MBytes RAM, rename to autoboot.txt before copying
to left SDCard)
- now uses 4 MBytes DSP memory config (instead of 32 MBytes)
- this "readme.txt"
- initial release. ~13000 message roundtrips per second (~10 times faster than DspBridge)
- an OMAP3530 or DM3730 based board. I am working with an Open Pandora
(you should get one -- it has a keyboard, a WVGA touchscreen, game controls,
a quality audio output and an awesome battery life! and yes, lots of
- an ARM cross compiler (tested with CodeSourcery 2011.09).
- the Linux kernel sources (if you want to rebuild the kernel module).
The binary included with this release works with the Linux 3.2.45
kernel shipped with the Open Pandora Super Zaxxon 1.55 firmware.
- CodeComposerStudio v5.x (or better) for building DSP images
(note: CCS is free (as in beer) for non-commercial/education use)
(you can skip this part if you just want to try the precompiled binaries)
First of all, set the following shell variables:
$ export CROSS_COMPILE=arm-none-linux-gnueabi-
$ export CROSS_ROOT=/bsp/pandora-dev/arm-2011.09
(change this to point to the target filesystem crosscompile root, which
must contain "usr/src/pandora-kernel")
$ make -f makefile.linux all
to rebuild the sources.
Copy the binary files (precompiled ones can be found in the "bin-18Sep2013/"
directory) to the target device.
If you have built the sources you can use the "scp" toplevel makefile
target to copy the binaries to the device (edit the makefile and insert
your user/hostname first).
On the device, run "./go64.sh" to boot the DSP and load the kernel module:
$ sudo su
# export C64_DEBUG=<lvl> ; (lvl=0..30. 0 if this line is omitted)
The verbosity of the kernel module and libc64 can be controlled by setting
the C64_DEBUG environment variable prior to running "go64".
The "c64_tc" test application allows to run a couple of testcases:
testcase_nr can be one of:
Start with running the IPC (interprocessor communication) benchmark/stresstest:
$ ./c64_tc 3
The stresstest can be started multiple times, in order to test the multi-client
feature (add an '&' to the cmdline above, then issue the command multiple times).
** Design considerations
The remote procedure call interface used by "c64_tools" is very simple:
- The DSP side registers a number of "components" (with ids 1..15).
- Each component has an optional init() function, a required
exec() function, and another optional exit() function which is
never called at the moment.
- Each component can execute up to 65535 "commands".
- A component command takes up to two 32bit input arguments and can
return up to two 32bit output values.
It can also set an error number (return value of the exec()
- The components used by a specific DSP image are enumerated in
the file "components.h", located in the DSP project directory.
- The GPP-side includes "components.h" so it can use the component
IDs in remote procedure calls.
Please take a look at the example DSP project ("c64p_simple_dm3730"),
the code should be easy to understand.
Something to consider:
- please do not use multithreading on the DSP side. I.e. only one
component exec() function can be executed at a time.
- remember to invalidate and writeback memory regions before reading
them on processor A after processor B has modified the memory
(A/B=GPP or DSP)
On the ARM-side, use the following libc64 functions:
dsp_cache_wb() -- write back data caches for the given range
dsp_cache_inv() -- invalidate data caches for the given range
dsp_cache_wbinv() -- write back and invalidate data caches for the given range
On the DSP-side, use the following DSPBIOS functions:
BCACHE_wb() -- write back data caches for the given range
BCACHE_inv() -- invalidate data caches for the given range
BCACHE_wbInv() -- write back and invalidate data caches for the given range
- the DSP side can set up to 4 debug values via "mlb_debug_usr()".
- the GPP side can read those debug values via "dsp_debug_usr_get()".
When developing DSP code, you can use the C64+ simulator included with TI's
If your DSP code is plain "C", you may want to simulate the DSP environment
by using multi-threading. Maybe I will include an emulation-version of
libc64 in future releases (but that's much easier to do on a higher application level).
About the fastcall RPCs:
This feature does not use "slow" interrupts to talk to the DSP but rather uses the L1DSRAM.
The general idea is:
1) GPP sends regular RPC to DSP (via dsp_rpc_send())
2) DSP component initiates fastcall sequence using mlb_fastcall_initiate()
3) GPP calls dsp_fastcall_rpc() to send requests to the DSP
4) GPP calls dsp_fastcall_end() when all requests have been sent
Right now this software is, although quite new, stable, as tested so far.
** Known issues
- the mailbox hardware on my Pandora seems buggy and needs to be reset
for each message. Fortunately, this does not seem to take a lot of time.
However, this has the consequence that for each message sent from the
GPP to the DSP, the DSP *must* send a reply.
Please leave the config define in the kernel enabled -- who knows how
many units are affected by this.
- the DSP MMU is currently disabled before booting the DSP.
This will be changed in future releases, so that the GPP OS is protected
from misbehaving DSP software.
- the kernel module should support message queues, so that when a client
application calls write(), the call returns successfully after queuing
the message. not a terribly important feature, though.
- The maximum DSP image size is currently limited to 4MBytes (not rly an issue, probably)
- The DSP-side "mlb.c" MUST NOT be compiled with full optimizations or the DSP image
will not load. This looks like a c6x compiler bug.
- The dsp_cache_inv() and dsp_cache_wbinv() currently fall back to flush_cache_all()
because the other kernel functions (see "c64_kmod/kmod.c") cause system instabilities
(tested on Open Pandora kernel 3.2.45).
This software was written by Bastian Spiegel. Feel free to contact me
at "bs AT tkscript DOT de", if you have any questions or want to contribute.
Thanks to Notaz for the platform_driver reference code (see "c64_kmod/kmod.c").
Some small portions (a few lines of code) are loosely based on Texas Instruments'
"CMEM" kernel module, which is distributed under the terms of the GPL.
In particular: The GPP-side cache wb/inv/wbinv function calls. While these
are just standard ARM Linux function calls, it was quite helpful to read the
CMEM source instead of reading documentation finding out about these calls.
So, thank you guys'n'gals !
The source code that generates the C64+ branch instruction to jump to the
actual DSP image entry point is also loosely based on code found in DspLink.
It's not a verbatim copy but looking at the DspLink source saved me some time
I otherwise would have needed for finding out the hex-values for the