Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 12

10 March 2025

Here’s an interesting data point that just came to my attention: the on-going experiment with the WCH-official development board for the CH32V003F4P6 device has hung up after 229,552,000,000+ loops. That’s 229 billion with a ‘b’. What caused the hang-up? Unclear.

I was about to shut down the experiment as I thought it was no longer providing any useful data. Well, I was wrong about that. However, it looks more like a testing apparatus failure than the ‘unit under test’ (UUT), as trying to reset the device provided no indication of resumption on the serial console. Unplugging the WCH-LinkE caused the serial terminal to disconnect, as it does, and re-starting the connection showed the recently-reset device counting its millions of loops again. A more self-contained diagnostic set up is certainly worth thinking about at this stage. I’ll just note that here and move on with the other experiments.

I’ll get back to dusting off my C-language framework for these devices now. We’ve got freshly-minted new header files describing all the peripheral registers and all the single-bit-wide settings therein. I’ll take a peek at the -003 support file first and see if everything looks correct.

The first peripheral defined in the SVD file is the PWR power control system. It mostly controls the low power modes, power monitoring facility and ‘automatic wake up’ function. There are only five registers implemented and those are sparsely populated.

There’s going to be a certain amount of “What was I thinking?” involved in this kind of archeology. I see the things that I know for sure should be present, such as the ‘structure of structures’ that I define for each different peripheral, as well as some of the single-bit fields and register addresses.

But a closer look at those single-bit fields has me scratching my head. There seems to be a disconnect between the laid-out format of the structures and the fields. There are values defined that do not appear in the structure at all. Here’s what I’m looking at:

//------------------------------------------------------------------------------
// PWR
//------------------------------------------------------------------------------

typedef volatile struct { // PWR Power control

    union {
        uint32_t        CTLR;   // 0x0 Power control register (PWR_CTRL)
        struct {
            const uint32_t  CTLR_reserved_0:1;  // 0 - reserved
            uint32_t    PDDS:1; // 1 Power Down Deep Sleep
            const uint32_t  CTLR_reserved_2:1;  // 2 - reserved
            const uint32_t  CTLR_reserved_3:1;  // 3 - reserved
            uint32_t    PVDE:1; // 4 Power Voltage Detector Enable
            uint32_t    PLS:3;  // 5 PVD Level Selection
        };
    };
    uint32_t            CSR;    // 0x04 Power control state register (PWR_CSR)
    uint32_t            AWUCSR; // 0x08 Automatic wake-up control state register (PWR_AWUCSR)
    uint32_t            AWUWR;  // 0x0C Automatic wake window comparison value register (PWR_AWUWR)
    uint32_t            AWUPSC; // 0x10 Automatic wake-up prescaler register (PWR_AWUPSC)

} PWR_t;

#define PWR ((PWR_t *) 0x40007000) // peripheral pointer

// peripheral register single-bit values

#define PWR_PDDS    (1 << 1)
#define PWR_PVDE    (1 << 4)
#define PWR_PVDO    (1 << 2)
#define PWR_AWUEN   (1 << 1)

// peripheral register addresses

#define PWR_CTLR (*((volatile uint32_t *) 0x40007000))
#define PWR_CSR (*((volatile uint32_t *) 0x40007004))
#define PWR_AWUCSR (*((volatile uint32_t *) 0x40007008))
#define PWR_AWUWR (*((volatile uint32_t *) 0x4000700c))
#define PWR_AWUPSC (*((volatile uint32_t *) 0x40007010))

I’m specifically talking about the PWR_PVDO and PWR_AWUEN fields. Why are they not broken out as bit fields within the structure for their enclosing registers?

Ah, now I remember. I made the executive decision to not break out fields if there were only one field within a register. This seemed to make sense for registers such as the USART_DATAR register, where the register and the bit field were effectively the same thing.

But in this case, for whatever reason, the two bit fields within the registers do not start at bit position 0. Additionally, there would be no way for me to refer the the field within the register without knowing which register it belonged to – which is something I was hoping to avoid, because it’s possible, if properly encoded.

Were there naming ambiguities between registers and bit fields? That sounds like the kind of problem that this ‘solution’ addresses.

Well, I can always go back into the script and omit the ‘single field omission’ conditional. But before I do that, there’s some more fundamental testing I can do on these new include files. For example, do you even compile, bro? But those single-bit values need comments describing where they belong, for sure.

The simplest test of this is to create a new project that has just one source file in it that includes the device header and has a main() function. As it won’t have any need (yet) for interrupt support or a proper C runtime package, it will only fail because there is no ‘start’ function declared, which is what the linker script says is the ‘entry point’ of the program. But it should compile, if not link properly. Here’s what I think it should look like:

// filename:  F4-test.c
// part of bare-metal C framework test project
// 10 March 2025 - Dale Wheat

#include "ch32v003.h"

void main(void) { // main program function

    while(true) { // an endless loop
    }
}

// F4-test.c [end-of-file]

Whereas, this is all that would actually be required:

#include "ch32v003.h"
void main(void) {}

But you know how I am about these things. Ask a writer to solve a problem and it’s likely that “more writing” will be included near the top of the list.

I borrowed a makefile from another project and made various modifications to it to fit the present needs. And the result is that the header file induces a couple of errors, which is both bad news as well as good news. It could have been so much worse.

It looks like there’s a couple of typos in the WCH-supplied SVD file. Within the definition of the Programmable Fast Interrupt Controller (PFIC), there is a register called PFIC Interrupt Enable Status Register 1 (PFIC_ISR1). It contains fields to indicate which interrupts are enabled. They are called:

INTENSTA2   IRQ2
INTENSTA3   IRQ3
INTENSTA12  IRQ12
INTENSTA14  IRQ14
INTENSTA16_31   IRQ16-IRQ31

Farther down the list is the PFIC Interrupt Pending Status Register 1 (PFIC_IPR1). It contains a similarly named set of fields to indicate which interrupts are currently pending, but where we should see ‘PENDSTA14’ and ‘PENDSTA31_16’, we see ‘INTENSTA14’ and ‘INTENSTA16_31’ repeated.

As a scribbler of codes myself, whose fingers already know how to both copy and paste all by themselves, I think I see how this might have happened. I will let WCH know about this. They were both very prompt and exceedingly polite when I addressed a potential typo in the reference manual. But I will wait until I have made sure there are no other similar issues to report.

So for the moment, I will correct my local copy of the SVD file in question and re-run the conversion script.

This overcomes the compilation error. I get a warning (not an error) about the missing start symbol:

ld: warning: cannot find entry symbol start; defaulting to 0000000000000000

The default of so many zeros will work quite nicely, I think. That’s exactly where I wanted it to go, anyway. And it produces an ELF file! Here’s the meaningful part of the output listing:

Disassembly of section .text:

00000000 <main>:

#include "ch32v003.h"

void main(void) { // main program function

    while(true) { // an endless loop
   0:   a001                    j   0 <main>
   2:   0000                    unimp

Which is perfect. An endless loop, as expressed in C as “while(true) {}” gets translated, as it should, to RISC-V assembly as “0: j 0” or “jump to address 0”. Since it was such as short jump, relatively speaking, the compiler even used the ‘compressed’ version of the ‘j’ instruction, taking up only 16 bits of program memory. So even though the file is reported to be four (4) bytes long, in truth only the first two are doing anything important. I can even flash it to the chip and check it in the debugger. So we’re certainly on track for Great Things at this point.

Eliminating the “ENTRY(start)” command in the linker script gets rid of the warning. Per the GNU documentation for the linker found at:

https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_24.html

ENTRY is only one of several ways of choosing the entry point. You may indicate it in any of the following ways (shown in descending order of priority: methods higher in the list override methods lower down).

the `-e' entry command-line option;
the ENTRY(symbol) command in a linker control script;
the value of the symbol start, if present;
the address of the first byte of the .text section, if present;
The address 0.

As far as I know at the moment, the entry point for these devices will always be address 0, so we should be good. I prefer to be specific about these things, when I can, and not trust assumptions that might change in the future, as they often do. I could specify the entry point as a command line parameter to the linker, but I would normally rather have it in a document of some kind, such as the linker script. When we’re done setting up this framework, I’ll have decided one way or the other about this issue.

So do we have enough machinery in place to blink an LED? Let’s find out.

First, we have to enable the peripheral clock for the GPIO port where the LED has been installed. I’m using PA1, which is bit position 1 of GPIO port A. The GPIO ports are all on the PB2 bus, so we just set the IOPAEN clock enable bit in the PB2 Peripheral Clock Enable Register (RCC_APB2PCENR). It should only take this much C code to do this:

RCC->IOPAEN = ENABLE;

Since the definition for the GPIO ports map out all the fields, and because all the fields have unique names, we, as lazy human programmers, need not keep up with which register is which, and just generally wave in the general vicinity of the peripheral in question. I remembered it was in the RCC, and that was enough. I also remembered that the field was called IOPAEN, a mnemonic for “input output port A enable”. Additionally, I have taken the liberty of #define’ing the binary values ENABLE (1) and DISABLE (0) in the generated header file. I find that this family of chips largely uses a 1 to turn things on and a 0 to turn things off. This is, sadly, not universally true with other manufacturers. Good job, WCH! There are a few other goodies packed in there as well, which I’ll describe by and by.

Step two on the journey to blinking an LED is to configure the now-clocked GPIOA, or at least the pin we want. Do you remember my ‘cheat sheet’ of GPIO initialization codes? It comes in very handy for this kind of thing. The code I want is for a push-pull output with a maximum output frequency of 2 MHz. Why that exact frequency? It happens to be the slowest one available, the other choices being 10 MHz and 30 MHz. It’s an LED that we are going to be looking at with our human eyes, not a microwave signal being sent to outer space. The code for that is ‘2’. Now we just place that code in the right bit position, which for PA1 would be bit position 1. PA2 would be position 2, etc. Or we could just initialize all eight positions at once, even though we have Scientifically Proven that there are only two bits implemented in this port. The code looks like this:

GPIOA->CFGLR = 0x88888828;

All those 8s represent the setting for all the other bits, which is ‘input with pull-up or pull-down resistors’. This is the setting that uses the least amount of power, which will become more important once we need to put the chip to sleep when it needs to wait for something interesting to happen.

Now another way to do this would be to access the individual MODE and CNF fields for this GPIO pin and assign them their proper values, like this:

GPIOA->MODE1 = GPIO_MODE_OUTPUT_2MHz;
GPIOA->CNF1 = GPIO_CNF_PUSH_PULL;

But this requires some enumerated values that I haven’t bothered to put into the collection just yet, as well as much more code. It looks like it’s just two writes instead of one, as in the previous example, but since the compiler is granting our wish to deal with embedded device registers intelligently by using predefined bit fields, it’s going to involve a read-modify-write cycle on each field, along with all the bit shifting and masking that is required to do that.

Now everything is set up properly and we can just blink that LED all we want to now. Add this code inside the inner-most while() loop:

while(true) { // an endless loop
    GPIOA->ODR1 = ENABLE; // LED on
    GPIOA->ODR1 = DISABLE; // LED off
}

The ‘ODR1’ is the bit field corresponding to the output data register, or OUTDR, or ‘output data register. Setting it to ENABLE is the same as writing a 1 to it, which turns on the LED, as I have wired it up in the ‘active high’ configuration. Similarly, writing the DISABLE value of 0 turns it off.

You might be surprised to find when you run this program that the LED just comes on and stays on. Well, that’s a bit of an optical illusion. It’s actually blinking so fast you can’t see it. Try running it within the debugger and step through each program statement one at a time and you’ll see the expected behavior.

We can add a little loop in between the LED commands to slow it down. How about counting to a million? How long should that take? Here’s what the code would look like:

while(true) { // an endless loop
    GPIOA->ODR1 = ENABLE; // LED on
    for(uint32_t i = 0; i < 1000000; i++); // short delay
    GPIOA->ODR1 = DISABLE; // LED off
    for(uint32_t i = 0; i < 1000000; i++); // short delay
}

I’m seeing almost one second on and almost one second off. Those for() loops each create a new 32 bit unsigned integer variable called ‘i’, set it to zero initially, then increment it until it is no longer less than one million. Pretty quick!

Again, the compiler is doing some heavy lifting for us in the background here. Using the bit fields for the individual pins within the GPIO port has it reading, masking, OR’ing or perhaps AND’ing, as required, then finally writing for each transition. The chip itself has a more elegant way to address this frequently-occurring need.

In addition to the output data register, OUTDR, each of the GPIO ports has both a ‘bit set and reset’ register as well as a ‘bit clear’ register. Writing a 1 to any of the lower 8 bits of the BSHR register will set those bits, and only those bits, to 1. Writing a zero there does nothing, and leaves alone whatever is already there. Handy! The ‘lower byte of the upper half’ (?) of the BSHR register does the opposite: Any 1s written there will ‘reset’ that individual bit, and again any zeros written are ignored.

The BCR or ‘bit clear’ register does the same thing. Writing a 1 to a bit position clears that bit in the OUTDR and leaves the others intact. Why have two registers that do the same thing? You’re asking the wrong person. It’s not a ‘wrong question’; I genuinely don’t know the answer. You’ll find the exact same thing on the STM32 devices, so go figure.

So now we have a blinky example program that takes up all of 100 bytes. This is with the compiler’s ‘optimization’ setting of ‘for debug’, so it could probably go lower.

If we can blink an LED, what keeps us from configuring the SPI port and controlling a WS2812B addressable LED? Not much. Let’s do it.

The system clock was running at ~8 MHz for our very simple LED blinky test program. That’s what you get right after a reset with these chips. The internal HSI is running at ~24 MHz and is being divided by three (3) by the HPRE prescaler.

Here is the code to get it to run at 48 MHz, using the built-in PLL to double the HSI frequency:

RCC->HPRE = RCC_HPRE_1; // disable HCLK prescaler
RCC->PLLSRC = RCC_PLLSRC_HSI; // select HSI as PLL input
RCC->PLLON = ENABLE; // enable PLL
RCC->SW = RCC_SW_PLL; // select PLL as system clock, once it locks

Waiting for the PLL to lock is quite a bit simpler, from a coding standpoint, using this framework:

while(RCC->SWS != RCC_SWS_PLL) {
    // wait for PLL to lock
}

This while() loop just checks the system clock status bits to see when they eventually change over to the PLL, which will happen after the PLL locks. It will wait forever, if necessary, but it’s almost always a microscopically short time. Measure it, if you like. Let me know what you find.

Let’s initialize the SPI peripheral, again. Here’s what we need to get the setup we require for our special application of its unique talents. First, remember to enable GPIOC, which will be hosting our SDO on PC6. We configure it to be ‘output push-pull multiplexed 10 MHz max’:

RCC->IOPCEN = ENABLE; // enable GPIOC peripheral clock
GPIOC->CFGLR = 0x89888888; // PC6/SDO

Next enable the SPI peripheral clock setting bit SPI1EN somewhere, we don’t care where, within the RCC:

RCC->SPI1EN = ENABLE; // enable SPI peripheral clock

Then there’s just a short list of bits to flip in the SPI control register, and away we go:

SPI1->BIDIMODE = ENABLE;
SPI1->BIDIOE = ENABLE;
SPI1->SSM = ENABLE;
SPI1->SSI = ENABLE;
SPI1->BR = SPI_BR_8;
// note:  only now can we set these bits
SPI1->MSTR = ENABLE;
SPI1->SPE = ENABLE;

Again, the framework knows which bits are in which register, so we don’t have to.

One critical thing, among a list of other critical things that a proper C framework should do for us, that is not being handled (yet) is setting up the stack pointer. My previous code just happened to work because previous programs had set the stack pointer to the end of SRAM, 0x20000800, and there it remained, until I deliberately unplugged it and plugged it back in again to see what the stack pointer would be. And it was some random number, not pointing anywhere near the SRAM area at all. The code actually worked up to the point of the little delay loops, mostly because we were not calling any functions. Once the compiler tried to set up the ‘automatic variable’ i within the scope of each for() loop, it was reading and writing to No Where. This is considered a Bad Thing.

So let’s fix that by setting up the stack pointer. This ‘ought’ to be done in the C-runtime (which doesn’t yet exist) along with things like initializing variables and possibly setting up the system clock.

The C language, by itself, has no way to know how to set up the stack pointer on this chip, has no way to directly access any of the registers or CSRs, or anything like that. It’s intended to be ‘platform agnostic’ as far as that is possible. The GNU C Compiler Collection, on the other hand, has some extensions that let these things happen. We’ll use one now to set up the stack pointer.

    __asm__("la sp, 0x20000800"); // initialize stack pointer to end of SRAM

The ‘la’ instruction is actually pseudo-instruction to ‘load address’. Now that’s a hard-wired ‘magic number’, if I ever saw one. And it will change the second we move to a different chip within the family that has a different amount of SRAM. Let’s fix that by referring to a variable that’s ‘calculated’ in the linker script, called, so imaginatively, ‘end_of_RAM’:

__asm__("la sp, end_of_RAM"); // initialize stack pointer to end of SRAM

And that works, too, plus it gives us a clue as to where that value came from and what it means.

Now we can call functions! Well, almost, but we’ll fix that in a second. Let’s just splice in the already-working code from way back there.

void spi_send(uint8_t data) { // send 8 bit data via SPI

    while(SPI1->TXE == DISABLE) {
        // wait for transmit register to be empty before transmitting
    }

    SPI1->DATAR = data; // send data
}

void ws2812b_rgb(uint8_t red, uint8_t green, uint8_t blue) { // send RGB data to the WS2812B LED

    uint8_t i; // iterator

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(green & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        green <<= 1; // shift all the bits in the byte
    }

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(red & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        red <<= 1; // shift all the bits in the byte
    }

    for(i = 0; i < 8; i++) { // send 8 bits, MSB first
        spi_send(blue & 0x80 ? 0x7E : 0x60); // send a one or a zero, depending
        blue <<= 1; // shift all the bits in the byte
    }
}

void ws2812b_reset(void) { // hold WS2812B LED data line low for ~ 50 us

    spi_send(0x00); // send a zero
    for(uint32_t i = 0; i < 500; i++); // hold at least 50 us
}

Now the problem with having more than one function in a program, i.e., main() and only main(), is that now the compiler has to guess which one comes first in the binary image. It had better be main()! Well, it wasn’t. So I put back the ‘ENTRY(start)’ in the linker script and created a new function called start(). I moved the stack pointer initialization code to start(), then added some mumbo-jumbo (they are actually called “function attributes”) to make the compiler understand what I was doing. Here’s what it ended up looking like:

void start(void) __attribute__((naked, noreturn, section(".start")));
void start(void) { // what passes for a C-runtime

    __asm__("la sp, end_of_RAM"); // initialize stack pointer to end of SRAM
    __asm__("j main"); // continue to main()
}

So now the compiler knows that the start() function is ever so special, truly it is. It is what is known as ‘naked’, in that it has no procedural prologue or epilogue automatically added to it and has no built-in ‘return’ function appended to the end. It is also marked as ‘noreturn’, which simply means that it doesn’t return in any normal way, which is most certainly does not. There’s also that bit about it belonging to section “.start”, which is a special section that I invented and described in the linker script. It comes before the “main” part of the program code, so that’s how the linker knows to put that first in the binary image.

I also added a ‘jump’ instruction to the end of start() to tell it to jump to the main() function.

So now the program boots into the start() function, sets up the stack pointer and then jumps to the main function. I added a call to ws2812b_rgb() that sets the blue LED on at its minimum level when it turns on the other LED (which happens to be blue) and then sets all the internal LEDs to black when it turns off the other LED. And it just works.

I didn’t even bother calling the ws2812b_reset() function as there was enough time between LED togglings for it to get the idea.

So ideally I will move all this extraneous “support” scaffolding into a separate file and add that to the project makefile. I have been thinking about calling it ‘system.c’ and giving it its very own header file, ‘system.h’. My previous scheme had a handful of different files and it ended up making it quite difficult to create a completely new project, so I ended up coding up a bit of automation for that as well. If a little software is good, then a lot more is better, right?

To give it a good test overnight, I took out the human-perceivable delay and replaced it with the ws2812b_reset() function, which we need now. It’s back to just under 12,000 transmissions per second. I also left the other LED “blinking”, but at ~6 KHz it’s just a dim blur. I did add a line to the spi_send() function to turn off the LED while it was waiting for the TXE bit to be set. If it hangs there, I’ll see that there’s no dim blue blur, and be able to check the waveform on the oscilloscope to be doubly sure. Let’s see how well it does after a few million (or billion) cycles.

Leave a Reply