CH32V003 driving WS2812B LEDs with SPI – Part 2

27 February 2025

I firmly believe that the only reason this article is not finished and already published on my web site (well, for you, Dear Reader, it is, but for me, your Humble Narrator, it is yet to be) is that I listed two goals in my introduction and only achieved the first. In retrospect, and following the also-applicable “one topic per email” rule of writing, I should have edited myself, restated my goal (singular) and been done with it.

But here we are. The first goal, as you recall, was to use the hardware SPI on the chip to create a suitable wave form to drive the WS2812B addressable LED on my little development board. That goal is mostly achieved, in that I have seen it working and verified the signal using an oscilloscope. Mostly, but not completely, as its seems to “hang up” from time to time in a most frustratingly random manner.

The second, and arguably less critical goal was to be able to adjust the apparent brightness of the LED in real time for demonstration purposes. Two completely different things that very well could have been two completely different articles, although I feel that the first goal outweighs the second in value and practicality. We’ll get to that second goal today, I hope.

Another mystery presented itself yesterday and I was tempted to just ignore it, but I think you know “how I am” about these things. When verifying that the SPI output signal was not conflicting with the supposedly high-impedance default state PA2, I was debugging the program and saw that the GPIO configuration register for GPIOA was set to all zeros. Now the RM explicitly states that the reset value is supposed to be 0x44444444, indicating all eight pins of GPIOA are configured as inputs with no pull-up or pull-down resistors. Being all zeros, or 0x00000000, this represents a configuration of all “analog inputs”, which is a different thing. But this profound mystery will have to wait for its investigation as “randomly hanging up” is not a thing I can tolerate at all.

And by “randomly” I mean very randomly. I happened to notice the LED “not blinking” as it was supposed to be cycling through eight basic combinations of red, green and blue. Then I added a debug message on the console for each pass through the entire loop. As there is a 250 ms delay after each LED combination is set, that means the loop takes two seconds (or so) to complete. I left it running overnight and it stalled at loop #27,719. That means that everything worked splendidly for at least 55,438 seconds, which is more easily comprehended as almost 16 hours.

I had left it running under the debugger, so that when (or if) it should misbehave, I would be able to examine its state. I was able to do so, and discovered that it was hanging up at the only place that it possibly could, assuming as I always do that it was my code that was causing the problem. This was in the spi_send() function that first waits for the transmit register to be empty before sending the next byte out the SPI port. And sure enough, the TXE bit of the SPI’s STATR status register is reading a solid zero, meaning that the transmit register is not empty and that more waiting is indicated. Something is amiss here.

Assuming that the SPI is still receiving clock pulses from its prescaler, anything “transmitted” should clock itself out in eight bit times, or roughly ~1.333 us. I’m not using any sort of handshaking controls or other possibly interfering mechanisms here.

Now it has hung up after only 31 loops. It’s bad. Really bad.

So at the moment it seems the only logical thing to do in this situation is to add a timeout feature to the spi_send() function. How long to wait before declaring a ‘mayday’ and implementing Directive Omega? We should know within 2 us if there is a problem, given any eight bit byte should transmit completely in eight cycles of the 6 MHz clock. The little chip can only execute at 48 MHz, and even if it were executing one instruction in every clock cycle, that would only be 64 clock cycles. It’s not, because at system clocks of 24 MHz or over, an additional wait state is introduced for every flash memory access. It’s not entirely clear to me how that maps to the final cycles-per-second equation, but it’s got to be in there somewhere.

So a very safe and humanly undetectable amount of time would be a maximum of 64 iterations of the wait loop. If this were a more time-critical matter, we could enlist the help of the system timer, which is a 32 bit counter that can be clocked by the system clock either directly or after being divided by eight. It is in many ways almost identical to the SysTick peripheral in ARM Cortex devices.

But again, we’re blinking an LED and not landing on the moon or anything of material impact, so ‘close enough’ on this fail-safe device is sufficient.

Now that we’ve calculated a reasonable time frame for the transmit register to report itself empty and ready for new data, what exactly do we do when (not “if”, it seems) this failure occurs?

The only thing that seems to work with things like this is to turn it off and on again. “Have you tried turning it off and on again?” is a classic for a reason. We can just re-initialize the SPI device and just start over again. Just to be safe, it would be prudent to send a protocol reset signal, i.e., a low-level signal of ~50 us, before resuming our attempts to transmit.

I originally coded the SPI initialization code within the main() function, as I had originally only ever intended to execute it once. Now it is its own little function, which I lovingly named spi_init(), which in no way conflicts with the SDK-provided SPI_Init() function.

Well, I almost fell into a trap here. By adding the ‘reset’ function to the end of the recovery procedure, my little function would have been, in effect, calling itself, as the ws2812b_reset() function in turn calls the spi_send() function. Now we’re talking about an exceptional condition here, not something that is guaranteed to happen every time. But the one thing we know about this situation is that we don’t know what is causing it (yet) or why it is happening, much less if or when it will recur.

And now we wait, while the code ‘tests itself’. In the meantime, I’ll describe the original code that I was using to break down the transmission protocol into manageable chunks.

You’ll recall that at the lowest level, we were using the SPI to generate some arbitrary wave forms for us. A short-ish pulse was emitted when we transmitted a 0x60 via the SPI port, and that represented a zero, while a longer-ish pulse was created by shifting out 0x7E, to be interpreted as a one. I wrote a function called ws2812b_bit() which took a single argument, either a zero or something other than a zero and transmitted the appropriate value via the spi_send() function.

Then on top of that, I wrote a function to send the eight bits in a byte by sending the MSB of a byte via the ws2812b_bit() function, then shifting the entire byte to the left, so as to move the next least significant bit up to the MSB position. This happened a total of eight times and the single byte was transmitted.

The top layer was a function called ws2812b_rgb() which took three eight-bit values for the red, green and blue components of the signal, and called the ws2812b_byte() function, except in green, red then blue order.

The application could use the ws2812b_rgb() function to send out a string of RGB values to a string of LEDs, even a string of only one LED. After all the values had been sent, the ws2812b_reset() function would confirm their election and shift all the transmitted data values to the appropriate departments within each LED and start to display them accordingly.

It was totally working and we could have totally gotten away with it, had I not turned the blinding spotlight of the oscilloscope on the signal. The signal was nowhere near running at the throughput I had hoped for. There were biiiig gaps between the individual pulses, and while it still met the ever-so-relaxed requirements of the LED, it was only running at about 250 KHz, and not the 750 KHz theoretical maximum we should have seen, given our SPI clocking constraints.

So I played with about a bazillion combinations of different timing setups, including “unrolling” my functions to eliminate any excessive call overhead, all to no avail. Then I discovered by re-reading the reference manual for the tenth time, that I was relying on the SPI’s ‘busy’ flag instead of the ‘TXE’ flag. You go read the RM and tell me how clear that would have been to you. Here’s what it says about the ‘busy’ flag:

Busy flag. This flag is set and cleared by hardware.
1：SPI is busy in communication or Tx buffer is not empty.
0：SPI (or I2S) not busy.

And here is what it says about the ‘TXE’ flag:

Transmit buffer empty.
1：Tx buffer empty.
0：Tx buffer not empty.

Not interchangeable! And now I know. Well, I think I know. Something is still very messed up. Continued testing has revealed multiple failures after only 256 loops. And these are sequential errors, occurring right after the SPI reboot. Sometimes it’s four or five errors, and sometimes it’s more than I can count, as the error messages scroll off the top of the screen.

The good news is that it always, eventually, recovers and starts playing nice again.

As this is my first real exposure to this chip’s SPI hardware, it’s not entirely unreasonable that my expectations and its actual behavior have diverged. But I really think that I am asking the ‘bare minimum’ from this peripheral. It’s not expecting any sort of input at all and we’re not even using the clock signal that it is providing. I just don’t know what else could be causing these randomly-spaced events to occur. Yet.

As a sanity check, I will try this again on the official WCH CH32V003F4 development board, with just a single WS2812B LED attached directly to PC6, without all this also-connected-to-PA2 nonsense, and see if this happens there as well.