At SEGGER, we pretty much use our own tools and products to develop our products. That includes using our middleware, such as embOS, emNet, emUSB, emFile, web and FTP Servers and so on, as part of the firmware of our J-Link, J-Trace and Flasher products. And the other way round, utilizing the same hardware products, most of all the J-Link, to develop, test and constantly improve our middleware. Using our own products in house helps us to check usability and improve them. I think we have come a long way and have great products pretty much in every area.
For an IDE, we have completely switched to Embedded Studio. It is a great piece of software that can be used free of charge for non-commercial purposes! Ready for development out of the box, with both GCC and Clang/LLVM compilers.
Embedded Studio comes with our own runtime library, which I believe is second to none. Optimized to the bone, it leaves not only the GNU runtime, but also its commercial counterparts, in the dust.
When comparing Embedded Studio to other commercial products, we realized that one weak point is the GNU linker.
Old-school GNU linker blues
The GNU linker has evolved from the Unix world, where megabytes of linear virtual addressing is commonplace, disk space is unbounded, and processing power is plentiful. This is as far from a low-end embedded system as you could imagine.
Small embedded systems – usually microcontrollers with built-in memories -are complex. Typically they have separate memory areas for flash and RAM. But, to enhance performance, RAM is usually divided into distinct regions so they can be accessed simultaneously by the CPU and peripherals, or even other CPUs in the same device.
The GNU linker has a number of deficiencies in this alien world:
- It’s not flexible enough to deal with typical “keep-out” areas common to embedded firmware envonments, e.g. calibration data, flash protection bytes, fixed-address jump tables for ROM or bootloader APIs and so on.
- With multiple RAM regions, it cannot automatically split data over those RAM regions, requiring the user to choose the placement of data manually across the regions.
- Linkage speed is acceptable, but not fast. When linking large megabyte-order firmware images, with even larger multi-megabyte debug data, time can seep away when linking over and over again.
- It does not automatically handle initialization of read-write sections, delegating that to the loader in Unix systems. In embedded systems the user is responsible for copying the initialized image from flash to RAM and zeroing “bss” sections before entering main(). And the GNU linker cannot compress the initialization image to reduce flash use or automatically compute a CRC to support image integrity checks.
- The map file is almost incomprehensible. Because memory allocation, function and data sizes, and what goes into a microcontroller is highly important, not having an accurate, easy-to-read map file is unforgivable.
So…we decided to write our own linker!
Yes, we would write the SEGGER Linker, from scratch and without any legacy code or legacy thinking. The linker’s design brief is simply to avoid the disadvantages of the GNU linker, making linking simple, and solving linking problems for the embedded developer.
A new, zero-legacy SEGGER linker
The design goals of the SEGGER linker are easily stated:
- High linkage speed, even for large applications
- Modular linkage: only link in what is required, automatically
- Straightforward, easy-to-read map file
- Option to compute how much code / data is pulled in because of one or multiple particular symbols — important to measure code size for middleware options, such as”How much Flash (RO) memory does a particular cipher (for emSSL) need?”
- Compression options to minimize flash-copy space for initialzed data and code in RAM
- Compatibility options for other popular linkers such as IAR and ARM
- Tail optimization for code that calls another function as last operation, so that the tail can be merged with the called function
- Different ways of sorting input fragments (functions and data): alphabetically, by call distance to improve locality, by alignment to improve packing, by size — these are but a few
- Automatic inlining of small functions
- Optionally eliminating functions that have identical bodies at the instruction level
- …and more…
First results
Here it is…not yet ready for release, the first results are very encouraging.
We have used a 350kB application (debug build) of embOS with our TCP/IP stack emNet and with an emSSL test programs, with multiple cipher suites, public key algorithms, and so on, so a representative application.
Where the GNU linker needs about 1 second to link (not so bad), our new linker has built-in timing analysis and only needs 60 ms to link the application whilst also provding better section placement to minimize application size:
Copyright (c) 2017 SEGGER Microcontroller GmbH & Co. KG www.segger.com SEGGER Linker v1.00 compiled Sep 20 2017 22:03:06 Performance: File I/O 677 ELF modules from: 32 ELF files 15 archive files Data in: 49543 KB Processing Ingest 44.95 ms Linker symbols 0.00 ms Find sections 4.79 ms Parse script 0.69 ms Map image 0.11 ms Rewrite headers 0.02 ms Relocate image #0 1.99 ms Create inittab 0.71 ms Relocate image #1 0.73 ms Write image 6.62 ms Print map 0.00 ms Link total: 60.66 ms
It’s interesting to observe that 50 MB of ELF input produces just 350 KB of application code and readonly data, and that file I/O dominates linking as the Ingest and Write Image phases together account for 51.6ms of the total 60.7ms link time.
This is one fast linker!
The application obviously runs without any problem, and loading the resulting file into the debugger is also faster since the debug information in the generated ELF file is neat and only contains information related to what is actually linked in. Here is the output of the sample program:
0:000 MainTask - INIT: emNet init started. Version 3.23b 0:000 MainTask - ********************************************************************* 0:000 MainTask - * emNet Configuration * 0:000 MainTask - ********************************************************************* 0:000 MainTask - * IP_DEBUG: 2 0:000 MainTask - * Memory added: 24576 bytes 0:000 MainTask - * Buffer configuration: 0:000 MainTask - * 12 buffers of 256 bytes 0:000 MainTask - * 6 buffers of 1516 bytes 0:001 MainTask - * TCP Tx/Rx window size per socket: 4380/4380 bytes 0:001 MainTask - * Number of interfaces added: 1 0:001 MainTask - * Interface #0 configuration: 0:001 MainTask - * Type: ETH 0:001 MainTask - * MTU: 1500 0:001 MainTask - * HW addr.: 00:22:C7:AB:FF:22 0:001 MainTask - ********************************************************************* 0:018 MainTask - INIT: Link is down 0:018 MainTask - DRIVER: Found PHY with Id 0x181 at addr 0x0 0:018 MainTask - 0:022 MainTask - 3:000 IP_Task - LINK: Link state changed: Full duplex, 100MHz 4:000 IP_Task - DHCPc: Sending discover! 4:000 IP_Task - DHCPc: IFace 0: Offer: IP: 10.0.0.183, Mask: 255.255.255.0, GW: 10.0.0.3. 5:000 IP_Task - DHCPc: IP addr. checked, no conflicts 5:000 IP_Task - DHCPc: Sending Request. 5:002 IP_Task - DHCPc: IFace 0: Using IP: 10.0.0.183, Mask: 255.255.255.0, GW: 10.0.0.3. Scanning cipher suites on http://www.google.com:443 C009 TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA TLS 1.2 1637 ms, 26 ms socket, 1611 ms connect C02B TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 TLS 1.2 1662 ms, 25 ms socket, 1637 ms connect C00A TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA TLS 1.2 1648 ms, 26 ms socket, 1622 ms connect ...
SEGGER vs GNU linker
We benchmarked the SEGGER linker against the GNU linker using identical application object code and libraries–the same ELF files and archives were provided, in the same order, to both linkers.
Because the SEGGER linker automatically creates initialization code for the application (to initialize data before entering main), a small section of one startup file (thumb_crt0.s) required modification, but otherwise its contents were unchanged. In fact, the startup file for the SEGGER linker is far shorter and simpler than the GNU equivalent as there is no need for explicit user-written initialization code that is always included, even for seldom-used sections, and you don’t need to write that code (or forget to write it) as the linker does it for you so that “it simply works!”
Here is the outcome:
Flash RAM GNU 354,936 80,160 SEGGER 348,824 80,132
The new SEGGER linker is more than 10 times as fast and reduces code size by 2%!
Best of all, this does not even use compression for the initialized segments and does not put any code in RAM. So GNU linker loses about 2% efficiency right from the starting line. We will investigate further and keep you posted on the progress and findings.
An exciting project … And lots of fun. Obviously, the SEGGER Linker will be free for non-commercial use just like all of Embedded Studio.