Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 10

7 March 2025

Today I find the WCH -F4P6 dev board has clocked over 35 billion loops without hanging up.

The STK system timer is available in all the CH32V devices. On the QingKe V2 devices, such as our -003 test subject, it is a 32 bit counter that can count up or down and can trigger an interrupt when it hits a particular value. This makes it very useful for basic timing tasks as well as providing periodic interrupts or a measure of uptime. The STK on the QingKe V4 devices is 64 bits long.

There’s not a lot needed to initialize the STK as there just aren’t that many options. One choice is whether to use the system clock directly as its clock source, or to divide it by eight. We’re only going to be using it to measure an approximately 50 microsecond pulse, and it doesn’t have to be excruciatingly precise. I’ll use the prescaled clock as the timing source.

Since the only other options are to have it trigger an interrupt or compare the current count to a value, which I’m not needing at the moment, that’s the only configuration bit in the STK_CTLR control register that I will need to set, other than the “STE” system timer enable control bit.

Time to add more enumerated values to my collection in my CH32V003.h header file:

# STK - System Timer

STK_STE     = (1 << 0) # STK enable
STK_STIE    = (1 << 1) # interrupt enable
STK_STCLK   = (1 << 2) # clock source selection
STK_STRE    = (1 << 3) # auto-reload counter enable
STK_SWIE    = (1 << 31) # software interrupt trigger

# STK_STCLK values

STK_STCLK_HCLK_8 = (0 << 2) # clock source is HCLK / 8
STK_STCLK_HCLK   = (1 << 2) # clock source is HCLK

The code to initialize the STK is pretty simple:

# initialize STK - clock = HCLK/8 = 6 MHz

    la x3, STK_BASE
    li x4, STK_STCLK_HCLK_8 | STK_STE
    sw x4, STK_CTLR(x3)

Technically, we can omit the STK_STCLK_HCLK_8 parameter, as it is a zero, but I like to include it to make my intention clearer to Future Me.

The delay_us function just needs to take the requested number of microseconds, as passed into it via function argument register a0, multiply it by six, as there are six STK timer clock cycles per microsecond, then add that time duration to the current time, as represented by the value in the STK_CNTL register.

The function then loops until the current timer count is no longer less than the calculated ‘future time’.

I also added a quick exit in the case of the caller asking for a zero microsecond delay. We’ll still be late getting back, but not as late as if we went ahead and preserved all the registers, etc.

Here is the code for the delay_us function:

delay_us: # delay in microseconds

    # on entry: a0 delay time in microseconds
    # on exit: none

    # register usage:
    #   x3:  pointer to STK_BASE
    #   x4:  calculated end time
    #   x5:  read timer count

    STK_TICKS_PER_MICROSECOND = ((HCLK / 1000000) / 8)

    beqz a0, 9f # exit on 0 microsecond request

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3
    sw x4, 4(sp) # preserve x4
    sw x5, 0(sp) # preserve x5

    la x3, STK_BASE

    # calculate future end time

    slli x4, a0, 1 # x4 = a0 * 2
    slli x5, a0, 2 # x5 = a0 * 4
    add x4, x4, x5 # x4 = x4 + x5
    lw x5, STK_CNTL(x3) # read current timer count
    add x4, x4, x5

1:  lw x5, STK_CNTL(x3) # read system timer count
    blt x5, x4, 1b # loop if x5 < x4, i.e., end time not yet reached

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    lw x4, 4(sp) # restore x4
    lw x5, 0(sp) # restore x5
    addi sp, sp, 16 # restore stack pointer

9:  ret # return from function

And while I am providing a perfectly mathematical solution to the question of how many STK cycles or ‘ticks’ are in a single microsecond, via the STK_TICKS_PER_MICROSECOND symbol (the answer is six here), the QingKe V2 does not support the ‘mul’ (integer multiply) instruction.

If you put an integer multiply instruction in the code, the assembler assembles it, as assemblers do, but the chip throws an exception when it tries to execute it. But why does the assembler allow it to get that far down the chain?

It’s most likely because I just copy/pasted the makefile from another project and it specifically says that the architecture of the chip is “–march=rv32imac_zicsr”, which it most decidedly is not. Changing the “AS_OPTS” variable in the makefile to “–march=rv32ec_zicsr” fixes this, and the assembler throws the very correct error:

src/F4-WS2812B-SPI-asm.S:10: Error: unrecognized opcode `mul a0,a0,a0'

It now also catches my earlier error when I used the non-existent s2 register. These are powerful tools if you will just let them be so.

So there being no integer multiply instruction, it’s not too terribly difficult to multiply two integers together using shifts and adds. In fact, with a constant multiplier such as six, it’s just a matter of shifting the multiplicand to the left, one time using a single bit shift and then again using two bit shifts, then adding those two numbers together.

Now we have a reasonably accurate delay function that does nothing but waste time for a reasonably accurate amount of time. We can use that to send the ~50 us reset signal to the WS2812B LEDs by sending out a 0x00 via the SPI and then just waiting it out. It makes for a pretty simple function:

ws2812b_reset: # send 'reset' signal to WS2812B LED

    # on entry: none
    # on exit: none

    # register usage:
    #   x3:  function arguments

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    li a0, 0x00
    call spi_send # set SDO low

    li a0, 50
    call delay_us # ~ 50 us low level

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

This is technically a ‘leaf’ function as it does not ‘branch’ out to any other functions in the performance of its duties. So I could have skipped the ‘preservation’ of the return address register and it would have worked perfectly. But I tend to leave it in as it’s fast and it’s better to have it and not need it than to need it and not have it.

I would really like to come up with a way to streamline the creation of these assembly language functions as they do contain a moderate quantity of boiler-plate code.

If you’ll recall, I had originally built up a hierarchy of function calls to send the right wave forms to the LEDs, but then de-optimized the code to eliminate perceived overhead. Well, that was in the C programming language, and it tends to encourage that sort of algebraic abstraction. At least, it encourages me to do so. Now we’re in the Wild West of bare-metal assembly language and everything comes at a price. So to keep the complexity of each function to a minimum, I’ll reinvent my cascade of function calls here.

The lowest level function sends out an encoded one or a zero. A zero has a shorter high period and a one has a longer high period. We are using the bit patterns 0x60 and 0x7E as zero and one, respectively. Here is the ws2812b_bit function:

ws2812b_bit: # send an encoded zero or one to the WS2812B LED via SPI

    WS2812B_ZERO    = 0x60
    WS2812B_ONE     = 0x7E

    # on entry: a0[0] bit to transmit
    # on exit: a0[7..0] bit pattern sent

    # register usage:
    #   x3:  function arguments

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    li x3, WS2812B_ZERO # assume it's a zero
    beqz a0, 1f
    li x3, WS2812B_ONE # well it wasn't

1:  mv a0, x3
    call spi_send

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

I preload the x3 register with a bit pattern for a zero, WS2812B_ZERO or 0x60, assuming that it will be a zero. If it is a zero, it skips the next instruction, which loads the WS2812B_ONE code, or 0x7E. In either case, the contents of x3 are mv’d (moved) over to function argument a0 and the spi_send function is called.

Now that we can write a bit, let’s write a byte. It’s not too terribly difficult, but I think you’re starting to see why I wanted to split this medium-sized problem up into tiny-problem chunks. Tiny problems I can handle. Here’s the ws2812b_byte function:

ws2812b_byte: # send a byte's worth of encoded ones and zeros to the WS2812B LED

    # on entry: a0[7..0] byte to transmit, MSB first
    # on exit: none

    # register usage:
    #   x3:  argument save
    #   x4:  bit counter

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3
    sw x4, 4(sp) # preserve x4
    sw a0, 0(sp) # preserve a0

    mv x3, a0 # save byte argument in x3
    li x4, 8 # initialize bit counter

1:  andi a0, x3, 0x80 # test MSB
    snez a0, a0 # convert 0x00/0x80 to 0/1
    call ws2812b_bit # transmit the bit
    slli x3, x3, 1 # shift all bits one place toward MSB
    addi x4, x4, -1 # decrement bit counter
    bnez x4, 1b # loop if needed

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    lw x4, 4(sp) # restore x4
    lw a0, 0(sp) # restore a0
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

I took the extra step of preserving the function argument so that the caller can just load up a single value and call the function three times in a row without having the reload the argument. That’s just to make the debugging easier, as the final form won’t need that.

And here is the final form: the ws2812b_rgb function, wherein the caller sends the three bytes representing the red, green and blue components of the color they want on the LED:

ws2812b_rgb: # send red, green and blue color components to WS2812B LEDs

    # on entry:
    #   a0[0..7] red data
    #   a1[0..7] green data
    #   a2[0..7] blue data
    # on exit: none

    # register usage:
    #   a0:  function argument
    #   x3:  swap register

    addi sp, sp, -16 # allocate space on stack
    sw ra, 12(sp) # preserve return address
    sw x3, 8(sp) # preserve x3

    mv x3, a0 # save red data
    mv a0, a1 # green data
    call ws2812b_byte # send green data
    mv a0, x3 # return red data
    call ws2812b_byte # send red data
    mv a0, a2
    call ws2812b_byte # send blue data

    lw ra, 12(sp) # restore return address
    lw x3, 8(sp) # restore x3
    addi sp, sp, 16 # restore stack pointer

    ret # return from function

Note the register-swapping shenanigans to be able to state the color data as R, then G, then B, but transmit in GRB order, as the WS2812B thinks proper.

Now to let the little chip send this sequence a bazillion times and see if it gets confused. I’m actually feeling sort of confident that it won’t at this point, but the proper thing to do is to test it.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 9

6 March 2025

Hmmm. Good news? Well, news. The absolutely simplest test I could envisage ran all night and did not hang up. You saw the code. It was only checking the SPI status register to see if the transmit register was empty (and waiting forever for it to be so) and only then shipping out a 0x55 test pattern to the SDO pin on PC6, and repeat, ad infinitum.

This test was conducted on the most symptom-prone variation of the -003 chips I have on hand, the CH32V003F4U6 QFN20. Now, to be fair, this was done using the upgraded and augmented “robust” prototype, and not the original test platform, a very small solderless breadboard. Should I go back and test the original circuit? Of course! My scientific rigor knows no bounds.

Now I should create a similarly minimalistic diagnostic using the WCH SDK. I copied the same SDK initialization function that I was using in the previous code and added this within the while(1) loop in the main() function:

while(SPI_I2S_GetFlagStatus(SPI1, SPI_I2S_FLAG_TXE) == RESET) {
    // wait for SPI transmit register to be empty
}

SPI_I2S_SendData(SPI1, 0x55); // test pattern 01010101

And it was running along quite nicely, until it wasn’t. Just hung up again. Let’s add a little instrumentation to the code and have it spit out some statistics from time to time. I added some variables to track the counting:

uint32_t loops = 0, millions = 0;

And added this code to occasionally print out a report:

loops++; // count the loops

if(loops == 1000000) { // report only every 1,000,000 loops
    millions++; // count those millions
    loops = 0; // reset loop counter
    printf("Loops (millions): %u\r\n", millions);
}

And off it goes! And stops after “only” 51 million loops. Try again, little machine! OK, 382 million loops this time, but hung up solid again. And this is only taking a few minutes each time.

So the immediate conclusion is that there is something in the SDK that is gunking up the SPI state. I’ll have to look at the SPI_I2S_GetFlagStatus() function in more detail and see how that could be acting up. The SPI_I2S_SendData() function literally only writes the passed value to the SPI data register.

Well, the SPI_I2S_GetFlagStatus() function also is only doing the minimum necessary things to check the status of an individual flag, i.e., read the SPI status register and mask out the status bit of interest, returning either ‘SET’ or ‘RESET’ as appropriate.

Not surprisingly, the WCH development board with the CH32V003F4P6 TSSOP20 package runs flawlessly.

At this point, I see two ways forward with this investigation. I can implement the WS2812B-SPI driver in assembly language and see if that works as expected. The other option is to update my C language framework using the new -003 SVD file and fitting some optimizations into the project wizard, which will take more work than the assembly language framework.

But why choose? Can’t I do both?

I’ll attempt the more full-featured LED demo in assembly first, as the base is already in place for that. But before I forget, there’s something very interesting that I noticed and almost failed to note here. Once I cranked up the system clock from 8 MHz to 48 MHz, the chip still worked. Even though I didn’t configure the flash memory controller to use an additional wait state. Even though the RM says the “prefetch buffer” must be enabled, although it never says how. Even though the CH32X035 yurked and horked all over the place when I did the same exact thing. To be fair, the CH32X is a QingKe V4 and the -003 is a V2. I had noticed this behavior before and had decided to err on the side of caution in the future. But now I want to see if it’s an issue or not. If weird and random things happen over and above the current weird and random things that are happening, we’ll know where to look.

I recall hearing that with great power comes great responsibility. I’m so totally feeling that right now as I struggle to come up with a register usage policy that 1) makes sense and 2) I like. I have 15 registers at my disposal and I can use them as I see fit. The GNU assembler adheres to the ABI (application binary interface) that assigns a few of the registers to specific tasks, such as the stack pointer and return address register. There are some other assumptions made in the implementations of the pseudo-instructions that prove quite useful. But I need a system that is simple to remember.

I may have mentioned it before, but here is a RISC-V reference page that I come back to all the time:

https://projectf.io/posts/riscv-jump-function/#functions

Here are the general-purpose registers that I have access to in this RV32EC architecture:

Register Alias Notes
-------- ----- -----
x0       zero  All the values you want, as long as you want a zero
x1       ra    Return address
x2       sp    Stack pointer
x3       gp    Global variable pointer
x4       tp    Thread pointer
x5       t0    Temporary register 0
x6       t1    Temporary register 1
x7       t2    Temporary register 2
x8       s0    Saved register 0
x9       s1    Saved register 1
x10      a0    Function argument 0
x11      a1    Function argument 1
x12      a2    Function argument 2
x13      a3    Function argument 3
x14      a4    Function argument 4
x15      a5    Function argument 5

There is simultaneously so much and so little you can do with the zero register, x0. It’s really handy when you want to write a zero somewhere, or compare something to zero, or subtract something from zero… you get the idea. But writing anything to it doesn’t do anything.

I’ve become accustomed to using the function argument registers, a0-a5, in their intended manner, so I think I will continue doing so on this project, at least.

The return address register, ra, under the current ABI, is generally x1 but can be x5 in some circumstances. The GNU assembler will assume you want to use x1 as the return address register when it decomposes the pseudo instruction ‘call’ into ‘jal’ or ‘jump and link’. In truth, you can ‘jump and link’ using any register you want. But I’m willing to go along with this idea for the time being. The same thing applies to the ‘ret’ (return from function/subroutine) pseudo instruction.

The stack pointer register, x2/sp, is really more up for grabs. Unlike many other microcontroller architectures that I have used in the past, the RISC-V instruction set does not have a predetermined idea of which of the registers ‘should be’ the stack pointer. Use any one you want. Really.

I already ran into the issue of trying to use s2/x18 on this project. It’s not there. Only x0-x15 are available on the RV32EC platform.

So instead of trying to figure out which of the other ‘suggested usage’ aliases for the remaining registers to use, I think I’ll just use x3-x9 for my random register needs. If I need more, I think it would be OK to use some of the functions argument registers as well. Also, if I can’t keep up with x3-x9, I can always rename them to something else more memorable using a macro.

Since I will be needing a reasonably accurate timer function in order to send the ‘reset’ signal to the string of WS2812B LEDs, I’ll need to configure the STK system timer to help with that.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 8

5 March 2025

Now I am going to write a bare-metal diagnostic for this bizarre SPI timeout behavior. This will eliminate the possibility of some odd malfunction in the vendor-supplied SDK. It will also introduce the possibility of some odd malfunction as a result of my own programming.

As I mentioned yesterday, I have had some limited success with writing bare-metal code for these chips, both in the C programming language as well as native RISC-V assembly code. Both of these approaches rely heavily on the vendor-supplied SVD file for these chips. SVD files are ‘system view descriptor’ files containing machine-readable descriptions of the chip’s on-board resources. In the case of the CH32V003 SVD file, this is limited to the peripheral registers and their respective bit-field contents. Alas, no ‘enumerated values’ are included, so I am forced to supply those myself.

Since the release of MounRiver Studio 2, we have an updated version of the SVD for the -003 family of chips. It is contained within the MRS2 app itself, here:

/Applications
/MounRiver Studio 2.app
/Contents
/Resources
/app
/resources
/darwin
/components
/WCH
/SDK
/default
/RISC-V
/CH32V003
/NoneOS/
CH32V003xx.svd

Now that was a deep dive!

The file, 321 KB in length, as distributed by the vendor, is dated 23 December 2024 at 4:59 AM. Within it is a version number of 1.2. The previous version, labelled “1.1” was what I used when I was first starting to get to know these chips.

We’ll need both the register addresses and their bit field information in order to manipulate the chip into doing what we want. What I originally did was use a Python script to examine the SVD file and emit a C header file that created typedef’d structs that encapsulated the needed register information. I then included that header file in a makefile project that used the custom version of GCC supplied by the original MRS toolset. This would allow me to reference individual bits within a given peripheral without having to know which exact register was indicated, like this:

RCC->SW = RCC_SW_HSI; // select HSI as system clock source

Where the “RCC” is the pointer to the base address of the “Reset and Clock Control” peripheral, “SW” is the “system clock source selection” bitfield within the “Clock Configuration Register 0” and “RCC_SW_HSI” was an enumerated value (constant) that I created and #define’d elsewhere. Notice that I didn’t have to keep track of which register it was in. The data structure keeps all that information for me. Now I don’t have to check the reference manual for register addresses or bit positions. I still have to look up specific bitfield values because the manufacturer decided to omit those from the SVD file as defined enumerated values.

I also created a handful of boilerplate source files that coordinated some of the other, lower-level necessities of the project. These are sometimes referred to as the “C runtime support” files.

I eventually started wanting use the same technique with RISC-V assembly language projects. I modified the original Python conversion script to emit an assembly language header file with the peripheral register addresses and a bit mask representing the bitfield assignments.

In both cases, I created a ‘Makefile’ that allowed me to compiler or assemble the project from the command line. I also created my own linker script to link the variously-transmogrified source files to be coalesced into an executable binary image. The makefile also added ‘phony’ targets to perform such actions as erase, program or launch the debugger.

Since the resulting projects had multiple but formulaic folder structures as well as project-unique headers and footers where appropriate, I wrote a console application that would create a new project and populate the required files for me. This only worked for the -003 devices, however. Well, in truth, it “worked” for the simplest of 203 or 307 projects, as well, but not as comprehensively as I would have liked.

Most of this effort was spurred by the fact that version 1 of the MounRiver Studio was only supported on Windows or Linux/x86 hosts. A set of command line tools was provided for macOS, and that’s what I used.

Now I’d like to review the process in light of the MRS2 native support of macOS, which includes Apple Silicon. Let’s start with the RISC-V assembly language version first, as that is a little more straight-forward in that it will need to do less for us than the C language version.

First let’s see what my Python script thinks of the new SVD file. I recall that there were a few issues with the original version 1.1 SVD file, but the details escape me.

Well, it didn’t burp. It created a new file called ‘ch32v003xx.svd.inc’, which is simply the input filename with ‘.inc’ appended to the end. It also generated this report to the console:

svd2inc.py - SVD to RISC-V ASM header file converter
SVD filename: ch32v003xx.svd
Parsing ch32v003xx.svd... done
Filename 'ch32v003xx.svd.inc' already exists.  Overwrite? (Y/N) y
Note:  Overwriting existing file 'ch32v003xx.svd.inc'
Creating 'ch32v003xx.svd.inc'

Peripherals
PWR/PWR, 0x40007000, Power control
RCC/RCC, 0x40021000, Reset and clock control
EXTEN/EXTEN, 0x40023800, Extend configuration
GPIO/GPIOA, 0x40010800, General purpose I/O
GPIO/GPIOC, 0x40011000, derived from GPIOA
GPIO/GPIOD, 0x40011400, derived from GPIOA
AFIO/AFIO, 0x40010000, Alternate function I/O
EXTI/EXTI, 0x40010400, EXTI
DMA1/DMA1, 0x40020000, DMA1 controller
IWDG/IWDG, 0x40003000, Independent watchdog
WWDG/WWDG, 0x40002C00, Window watchdog
TIM/TIM1, 0x40012C00, Advanced timer
TIM/TIM2, 0x40000000, General purpose timer
I2C/I2C1, 0x40005400, Inter integrated circuit
SPI/SPI1, 0x40013000, Serial peripheral interface
USART/USART1, 0x40013800, Universal synchronous asynchronous receiver transmitter
ADC1/ADC1, 0x40012400, Analog to digital converter
DBG/DBG, 0xE000D000, Debug support
ESIG/ESIG, 0x1FFFF7E0, Device electronic signature
FLASH/FLASH, 0x40022000, FLASH
PFIC/PFIC, 0xE000E000, Programmable Fast Interrupt Controller

Interrupts
2: NMI - Non-maskable interrupt
3: HardFault - Exception interrupt
5: Ecall_M - Callback interrupt in machine mode
8: Ecall_U - Callback interrupt in user mode
9: BreakPoint - Breakpoint callback interrupt
12: STK - System timer interrupt
14: SW - Software interrupt
16: WWDG - Window Watchdog interrupt
17: PVD - PVD through EXTI line detection interrupt
18: FLASH - Flash global interrupt
19: RCC - Reset and clock control interrupt
20: EXTI7_0 - EXTI Line[7:0] interrupt
21: AWU - AWU global interrupt
22: DMA1_Channel1 - DMA1 Channel 1 global interrupt
23: DMA1_Channel2 - DMA1 Channel 2 global interrupt
24: DMA1_Channel3 - DMA1 Channel 3 global interrupt
25: DMA1_Channel4 - DMA1 Channel 4 global interrupt
26: DMA1_Channel5 - DMA1 Channel 5 global interrupt
27: DMA1_Channel6 - DMA1 Channel 6 global interrupt
28: DMA1_Channel7 - DMA1 Channel 7 global interrupt
29: ADC - ADC global interrupt
30: I2C1_EV - I2C1 event interrupt
31: I2C1_ER - I2C1 error interrupt
32: USART1 - USART1 global interrupt
33: SPI1 - SPI1 global interrupt
34: TIM1BRK - TIM1 Break interrupt
35: TIM1UP - TIM1 Update interrupt
36: TIM1RG - TIM1 Trigger and Commutation interrupts
37: TIM1CC - TIM1 Capture Compare interrupt
38: TIM2 - TIM2 global interrupt

Creating interrupt vectors
2: NMI_handler
3: HardFault_handler
5: Ecall_M_handler
8: Ecall_U_handler
9: BreakPoint_handler
12: STK_handler
14: SW_handler
Created 7 system vectors
16: WWDG_handler
17: PVD_handler
18: FLASH_handler
19: RCC_handler
20: EXTI7_0_handler
21: AWU_handler
22: DMA1_Channel1_handler
23: DMA1_Channel2_handler
24: DMA1_Channel3_handler
25: DMA1_Channel4_handler
26: DMA1_Channel5_handler
27: DMA1_Channel6_handler
28: DMA1_Channel7_handler
29: ADC_handler
30: I2C1_EV_handler
31: I2C1_ER_handler
32: USART1_handler
33: SPI1_handler
34: TIM1BRK_handler
35: TIM1UP_handler
36: TIM1RG_handler
37: TIM1CC_handler
38: TIM2_handler
Created 23 device vectors
Created 30 vectors in total

So it actually saw that there was already a file with the proposed new filename, and very politely asked permission to over-write it. How courteous!

The new file is 103 KB long. There are still some rough edges in the script, as it tends to emit duplicate definitions for some of the repeated registers, such as the various DMA channel configuration registers. But I think they are “true duplicates” in that they all just redefine the same symbol with the same value, which wastes file space and assembly compute cycles but will still “work”.

Instead of adding newly-minted enumerated values directly to each new source file that needed them, I decided to collect them in a more generic include file for each device, and have that include file subsequently include the generated register definition include file. This file I will creatively and bravely name, “ch32v003.inc”. You can’t stop me!

Here is the as-yet empty generic include file:

# filename:  ch32v003.inc
# register definitions for WCH CH32V003 devices
# 5 March 2025 - Dale Wheat

.ifndef CH32V003_INC # prevent recursive inclusion
CH32V003_INC = 0 # arbitrary but required value

.include "ch32v003xx.svd.inc"

# hand-crafted enumerated values go here

.endif # end of include guard conditional CH32V003_INC

# ch32v003.inc [end-of-file]

Now each new assembly source file that we create need only add this line to become fully (or mostly-enoughly) aware of the inner workings of the -003 family:

.include "ch32v003.inc"

This is assuming that your makefile knows where we’ve stashed this master record of all -003 knowledge.

I looked through the archive for a suitably simple project to use as a template, and I found a likely candidate, “J4-blink-asm”. Buy why is this blinky project source file over 20 K-bytes long?

Ah, it seems that once I got the basic blinky goodness developed, I just kept adding on to it, one little bit at a time. It’s got a lot of stuff in there that I’m not going to immediately need. Here’s what I’m going to start with for this bare-metal diagnostic:

# filename:  F4-WS2812B-SPI-asm.S
# Diagnostic for WS2812B via SPI
# 5 March 2025 - Dale Wheat

.include "CH32V003xx.svd.inc"

.global start:
start:

# F4-WS2812B-SPI-asm.S [end-of-file]

Notice that the filename ends with an upper-case “.S”, telling the assembler to go ahead and expand any macros that it finds contained within the assembler source file.

Now we just need a makefile for the project. I will again borrow this from the J4-blink-asm project. The linker script for the -003 devices is already in place.

I need to update the “CC_PATH” variable in the makefile to reflect the newest version of the GCC compiler suite, as provided by the MRS2 application. I also had to put single quotes around the path because it now contains spaces. How modern!

Additionally, they also changed the names of all the GCC utilities from the ‘risk-none-elf’ triple to ‘riscv-wch-elf’, so that must be updated in the new makefile, as well.

Now since this very simple example needed no interrupt support, I failed to define a symbol to indicate what I wanted. This is a scenario I hadn’t tested before, because it most definitely does not work. I changed the generated include file by hand from:

.if INTERRUPT_VECTOR_TABLE # use vector table interrupts

to:

.ifdef INTERRUPT_VECTOR_TABLE # use vector table interrupts

and now I have to go back and update the Python script, re-run it and copy the resulting output to the distribution folder. Now I can successfully assemble my little program into a file that has exactly nothing in it. But that’s OK, because that is, after all, what I told it to do.

So like with any other new framework, we have to blink that LED. I’ll start with my own -F4P6 development board and attach a jumper from PA1 to the built-in green LED. It’s set up to be ‘active high’, so writing a 1 to PA1 should turn it on and a zero would turn it off.

But before we can do that, we have to enable the GPIOA peripheral clock and then configure PA1 as an output. Neither of those things are the way we want them to be when the chip first wakes up.

In RISC-V assembly language, to write to a memory location, you first have to load the address into one of the registers and then the value that you want to write into another register. That is, unless you want to write a zero, and you can just use the “zero register”, x0, which already has a zero in it. But we want to write a 1, so we’ll use something else.

I won’t bore you with yet another blinking LED code example, but here’s a fun snippet for about a 1/2 second delay, if your clock is running at ~8 MHz, as the CH32V003 does if not otherwise configured:

    li a2, 1000000
1:  addi a2, a2, -1 # decrement a2
    bnez a2, 1b

This sample uses register a2, one of the ‘function argument’ registers, to count down from one million. It could be any other register available on the RV32EC platform. Don’t try, like I did, to use registers like s2, as they are not present here and will just have the chip restart unless you have a HardFault handler set up to catch them.

I will share with you some working code that spits out a 12 MHz square wave on PC6. It really won’t help you much without the not-supplied header file, but you’ll see some of what I’ve been talking about:

# filename:  F4-WS2812B-SPI-asm.S
# Diagnostic for WS2812B via SPI
# 5 March 2025 - Dale Wheat

.include "CH32V003.inc"

.global start
start:

# set up system clock for HSI * 2 via PLL = 48 MHz

    la a0, RCC_BASE
    lw a1, RCC_CTLR(a0)
    li a2, RCC_PLLON
    or a1, a1, a2
    sw a1, RCC_CTLR(a0) # enable PLL

    li a1, RCC_SW_PLL
    sw a1, RCC_CFGR0(a0) # select PLL as system clock

# enable required peripheral clocks

    li a1, RCC_SPI1EN | RCC_IOPCEN | RCC_IOPAEN
    sw a1, RCC_APB2PCENR(a0)

# initialize GPIO

    # GPIOA

    #   PA1 - LED, active high

    la a0, GPIOA_BASE
    li a1, 0x88888828
    sw a1, GPIO_CFGLR(a0)

    sw zero, GPIO_OUTDR(a0) # LED off

    # GPIOC

    #   PC6 - SPI data out

    la a0, GPIOC_BASE
    li a1, 0x89888888
    sw a1, GPIO_CFGLR(a0)

# initialize SPI

    la a0, SPI1_BASE
    li a1, SPI_BIDIMODE | SPI_BIDIOE | SPI_SSM | SPI_SSI # 0xC300
    sw a1, SPI_CTLR1(a0)
    ori a1, a1, SPI_SPE | SPI_MSTR # 0xC344 enable SPI1 as coordinator
    sw a1, SPI_CTLR1(a0)

# set up for blinking LED, sending SPI data

    la a0, GPIOA_BASE
    li a1, (1 << 1) # PA1

    la a3, SPI1_BASE
    li a4, 0x55

# endless loop

main:

    sw a1, GPIO_OUTDR(a0) # LED on

    sw zero, GPIO_OUTDR(a0) # LED off

1:  lb a5, SPI_STATR(a3) # read SPI status register
    andi a5, a5, SPI_TXE # check for transmit register empty
    beqz a5, 1b # wait for TXE to be set

    sw a4, SPI_DATAR(a3) # SPI data out

    j main # endlessly looping

# F4-WS2812B-SPI-asm.S [end-of-file]

There’s a trick to setting up the SPI peripheral. You have to configure all the communications parameters first, then and only then enable the ‘MSTR’ and ‘SPE’ bits in the control register. Otherwise, it just doesn’t work.

Now the funny thing is that this code executes flawlessly on both the -F4P6 and -A4P6 packages. I’m going to let it run overnight on the -F4U6 prototype and see if it has managed to hang up at any point. As you can see, there’s no timeout checking or restarting of the peripheral should a timeout occur.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 7

4 March 2025

Today I see that my overnight testing of the -F4P6 packaged CH32V003 chips shows no errors for over 144 million loops on the WCH board and over 91 million loops for my own development board. While I was able to induce some “power glitches” in my own board (enhanced wiggling), the WCH board proves to be more robust.

Part of me thinks that this issue of the SPI locking up has to be a software issue. But how could that be? Am I not running the exact same software across all the various chips?

Well, maybe I am and maybe I’m not. If the factory can program in a different “device identifier” in the Vendor Bytes area of the flash memory, depending on package type, could they also change some other memory contents? There’s a boot loader in there, somewhere, and who knows what else.

A quick search for “0x1FFF” through the code created by the MRS2 new project minion reveals some interesting items. This is the first part of the address for the “System FLASH” section of the memory map.

The device description header file, ch32v00x.h, contains these #define’d values:

#define OB_BASE          ((uint32_t)0x1FFFF800)    /* Flash Option Bytes base address */
#define VENDOR_CFG0_BASE ((uint32_t)0x1FFFF7D4)

We also see the reference to address 0x1FFFF7C4 in the DBGMCU_Get…ID() functions mentioned previously.

A surprising find is the GPIO_IPD_Unused() function in the /Peripheral/src/ch32v00x_gpio.c file. It specifically configures GPIOC and GPIOD’s unused pins as inputs with pull-up resistors, but only for the less-than-twenty-pin packages, the -A4M6 and -J4M6. I can’t find where this function is actually being called in the supplied source code, but that doesn’t mean that it’s not being called from a pre-compiled library.

So what exactly is “VENDOR_CFG0_BASE” used for? It’s only reference by the very next line in the device definition file, like this:

#define CFG0_PLL_TRIM (VENDOR_CFG0_BASE)

This address is referenced by the RCC_SYSCLKConfig() and SetSysClockTo_48MHZ_HSI() functions, which I assume help to trim the HSI when it is being used as the system clock.

So there’s another “magic number” being stored in the flash, as set up by the factory.

But what about those boot loaders? Let’s grab all those and hold them up to the light.

To get the contents of a memory region from inside the chip and into a file for our examination, I will use the ‘wlink’ utility, available from:

https://github.com/ch32-rs/wlink

It’s a Rust application that speaks directly to the WCH-LinkE and similar devices. The command to read memory contents is called ‘dump’, and the specific syntax to capture the boot loader area of the memory is:

wlink dump 0x1FFFF000 1920 --out bootloader_xxx.bin

where bootloader_xxx.bin will be renamed for each of the four samples we seek.

For the -F4P6 version, the wlink utility responded with:

00:04:04 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:04:04 [INFO] Attached chip: CH32V003 [CH32V003F4P6] (ChipID: 0x00300500)
00:04:04 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:04:04 [INFO] 1920 bytes written to file bootloader_CH32V003F4P6.bin

I’m not 100% sure what the time-stamp at the beginning of each line means, but in any case it happened pretty quickly. Interestingly, the utility did not disturb the little chip too much in the performance of its duties, as it kept right along, and didn’t lose count of its statistics.

Now for the CH32V003A4M6 variant:

00:09:26 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:09:26 [INFO] Attached chip: CH32V003 [CH32V003A4M6] (ChipID: 0x00320500)
00:09:26 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:09:26 [INFO] 1920 bytes written to file bootloader_CH32V003A4P6.bin

These two files are identical. Let’s gather more data. The -J4P6 variant produces this message:

00:14:50 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:14:50 [INFO] Attached chip: CH32V003 [CH32V003J4M6] (ChipID: 0x00330500)
00:14:50 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:14:50 [INFO] 1920 bytes written to file bootloader_CH32V003J4P6.bin

It’s also the same file. Only one more candidate to investigate: the original troublemaker, the CH32V003F4U6:

00:18:21 [INFO] Connected to WCH-Link v2.15(v35) (WCH-LinkE-CH32V305)
00:18:21 [INFO] Attached chip: CH32V003 [CH32V003F4U6] (ChipID: 0x00310510)
00:18:21 [INFO] Read memory from 0x1ffff000 to 0x1ffff780
00:18:21 [INFO] 1920 bytes written to file bootloader_CH32V003F4U6.bin

All boot loaders are identical. Or at least all the boot loaders in the chips before me are identical. It’s also nice to have the ChipID recorded for each of these samples, as well.

Now we should look at the “Vendor Bytes” section of flash, which is 64 bytes long and starts at address 0x1FFFF7C0. The wlink command line would look like this:

wlink dump 0x1FFFF7C0 64 --out vendor_bytes_CH32V003F4P6.bin

As expected, the files differ. The ‘diff’ utility confirms this quite tersely, without going into much detail:

Binary files vendor_bytes_CH32V003F4P6.bin and vendor_bytes_CH32V003A4P6.bin differ

Well, yeah. Let’s gather the rest of the ‘Vendor Bytes’ images from the remaining chips.

Here’s what’s in the ‘Vendor Bytes’ section of the -F4P6’s flash memory:

00000000 34 FE 78 DC 00 05 30 00 09 18 2A 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB B3 A5 49 BC C9 0D
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

We can see the ChipID word at offset 0x04: 0x1FFFF7C4, expressed in “little endian’ fashion as “00 05 30 00”, or 0x00300500 in hexadecimal. I’m guessing that the ‘FF’ data is not being used at this time. That’s the “erased” state of this type of memory.

Now let’s compare that to the -A4P6 data:

00000000 34 FE 78 DC 00 05 32 00 09 18 3F 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB D8 A8 05 BC AA 10
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

The ChipID is different, of course, being 0x00320500 for the -A4P6 package. There’s also a change at offset 0x0A and several starting at offset 0x29. Those might be the HSI calibration data, but I’m totally speculating here. Of course, there’s a way to find out, but let’s continue with this particular exercise, shall we?

The -J4M6 SOP8 package has this data:

00000000 34 FE 78 DC 00 05 32 00 0A 18 4C 13 03 5A 00 00
00000010 FF FF FF FF FF FF FF FF 00 00 00 00 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB DB 87 59 BC 01 F0
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

This shows the same pattern of similarities and differences.

Lastly, here is the CH32V003F4U6 data:

00000000 34 FE 78 DC 10 05 31 00 09 18 3C 13 03 5A 00 00
00000010 FF FF FF FF 0E 00 00 00 FF FF FF FF 05 FA AA 55
00000020 10 00 FF FF FF FF FF FF CD AB EA 1A F0 BC A7 83
00000030 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF

There’s something different here, starting at offsets 0x14 and 0x20. Would you like to guess? That’s all I can do at the moment.

The RM, p. 169, Table 15-1 “ESIG-related registers list”, does tell us about some of the fields in this range:

Address    Offset Name
---------- ------ -----------------------
0x1FFFF7E0 0x20   Flash capacity register
0x1FFFF7E8 0x28   UID register 1
0x1FFFF7EC 0x2C   UID register 2
0x1FFFF7F0 0x30   UID register 3

It’s interesting to me that in every case, the “unique identification code”, which is specified as 96 bits long, has all ones (FF FF FF FF) as the upper 32 bits, yielding “only” 64 bits of ID.

Another way to test this “same or different software” questions is to write my own bare-metal code for this testing, instead of relying on the WCH-supplied SDK. I have been exploring a couple of alternatives already, one for C language programming and another for RISC-V assembly language use. I’ll describe these in more detail for you tomorrow.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 6

3 March 2025

After building and testing what I consider to be a more robust prototype for our experiments, I was dismayed to find that it misbehaved in precisely the same way. So today we are back to the “gold standard” of the WCH CH32V003 development board. Let’s see how well it does on a longer-term run, say 100,000,000 loops.

Now this brings up an interesting question: Does the code need to change for different packages of the -003 chip? The WCH dev board has a -F4P6 TSSOP20 20 pin package, which is different from the -F4U6 QFN20 package on the dev board I designed. I also have a different dev board of my own design that uses the -F4P6 TSSOP20 package, as well as a -A4M6 SOP16 version. I suppose I should haul those out and see what happens. Unfortunately, the -J4M6 SOP8 of “10 cents!” fame does not bring out the PC6 pin, so it can’t help us in this particular investigation.

Or can it? I’m not actually connecting any WS2812B LEDs in these latest tests. The code fails all by itself. The PC6 pin and associated circuitry is still probably in there, somewhere. Unlike my previous experiment to uncover the hidden pins of GPIOA besides PA1 and PA2, there’s no reason to think the chips in different packages are in any way different themselves. The ‘identifying’ codes are programmed into the flash memory at the factory.

Well, what do you know? The -J4 parts fail just like the other ones: randomly and often enough to be problematic.

One thing I can check in the MRS boiler-plate project code is what, if anything, it does with the “Device” information in the project properties settings, General -> Device.

Which, necessarily, brings up the question of whether any -003 device can determine, through code only, what package it is in? Well, I found the answer to that question first, while looking for the answer to the previous question. Weird, but I’ll take it.

Almost ironically, it’s been giving me this information this whole time. What I misunderstood to be the “unique chip ID” code printed out at the beginning of the example application as “ChipID” was in reality the device revision identifier “003” and package identifier, per this list:

Package         ID
------------    ----------
CH32V003F4P6    0x003005x0
CH32V003F4U6    0x003105x0
CH32V003A4M6    0x003205x0
CH32V003J4M6    0x003305x0

This information is retrieved from memory location 0x1FFFF7C4, which falls in the “Vendor Bytes” area, per RM p. 3.

The code in /Peripheral/src/ch32v00x_dbgmcu.c contains three functions that return some or all of this data:

DBGMCU_GetCHIPID()  returns the entire 32 bits as the "chip identifier"
DBGMCU_GetREVID()   returns the upper 16 bits as the "revision identifier"
DBGMCU_GetDEVID()   returns the lower 16 bits as the "device identifier"

However, this does not match the mapping given in the “ChipID List” comment of the DBGMCU_GetCHIPID() function source code. If correct, it would mean each different package was a different chip revision, and that seems unlikely.

But how am I to pursue these interesting and important questions if I have the whole system running an extended diagnostic? That’s right! Set up an entirely new system! So that only took an unreasonable amount of time, involving putting away the not-playing-with-them-right-now toys and making a nice spot adjacent to my desk to run the extended diagnostic. And almost 2,000,000 loops into it, I’m seeing zero errors or glitches, despite my vigorous wiggling of the cables and other apparatus. Go, go, gadget WCH board! You’re the best!

And immediately upon setting it up, my -A4 dev board exhibits the “behavior”. Well, we’re not here to figure out that particular problem at the moment. I’m only wanting to explore the chip’s internal identifiers and see what I can do with that information.

The console prologue give me this: “ChipID:00320500”. Now that’s based on the DBGMCU_GetCHIPID() function provided by the manufacturer’s SDK. Remember, that returns the entirety of the “chip identifier” information burned into the chip’s flash memory at the factory.

Let’s see what the other two functions actually return. First, the DBGMCU_GetREVID() function, which returns 0x0032. Next, the DBGMCU_GetDEVID() function, which returns 0x0500.

So it looks like bits 16-19 of the identifier word specify the package. In this case, it’s the -A4, just like the code comments indicated it would be. Note that I haven’t got all the other data points from other packages yet, but it’s a good start.

Re-attaching the original problem board, my little breadboard-based exploratory vehicle, we get this: “ChipID:00310510”, which corresponds to the F4U6 package. This is correct. We also get a 0x0510, where the x in the source code comment is a ‘1’ in this instance.

Now when I’m setting up a -J4 in the SOP8 package for testing, I am reminded that I have to remap the USART1 TX and RX pins due to the very limited number of available pins. But do I want a special version of the software just for J4 packages?

Now that I can just ask the chip itself what sort of package it has, I don’t need to. I can just test for the one exception and do the pin swap then.

The WCH-supplied SDK already has provision for remapping the USART pins in the example application. I just added a test in the debug.c code to re-#define the DEBUG variable:

// swap USART1 TX & RX pins if it's a J4 SOP8 package

volatile uint16_t dev_id;
dev_id = DBGMCU_GetREVID();

if(dev_id == 0x0033) {
    #undef DEBUG
    #define DEBUG DEBUG_UART1_Remap2
}

I also commented-out the call to the USARTx_CFG() function, as it goes and overrides the USART settings, in this one case incorrectly.

All this confirm that the -J4 packaged devices return 0x0033 from the DBGMCU_GetREVID() function.

Now I’ve built up yet another -F4P6-based development board, and it reports 0x0030 and 0x0500, as it should.

Additionally, I have just now discovered that my ever-so-clever device-check to remap only -J4 devices does not work at all, or rather it works all the time and declares every chip a -J4. That’s because the compiler sees the “#define” and does it at compile time, not at run time.

Now I’ve been able to cause power glitches on this new board, but no timeout errors yet. Again, I’ll have to leave it running for a while, pretend not to be looking at it and get up and sit down… you know, all the standard and accepted ways of making it misbehave.

Other than in the main() function, where the ChipID is reported at the beginning of the program run, I can’t find any other reference to this function being called anywhere. So at this point it looks like the supplied code does not execute differently if a different package is being used.

This doesn’t help me explain why the CH32V004F4P6 packaged devices work perfectly, but every other variant fails consistently. Your thoughts?

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 5

2 March 2025

The test apparatus doesn’t like it when I get up or sit down in front of it. This adds more evidence to the theory that the problem is an intermittent connection, and random vibration from the environment is causing something to conduct either better or worse than it was. There were over 100,000,000 loops and only 1,355 errors overnight, but there was a run of errors just as I came in to view. Should I take this personally?

So I decidedly wish to rebuild a more substantial test platform, but part of me wants to understand exactly what is going wrong with the present system. One of the many un-followed-up-on trouble-shooting ideas was to make the power supply monitor generate an interrupt, as only occasionally glancing at the status bit in code has not revealed a correlation between the failures and the power status.

Adding an interrupt routine to a MRA2 project is not difficult, as most of the required coding gymnastics have already been performed for us. The PVD interrupt is only a little more involved, as it is routed through the external interrupt controller, EXTI.

Here is the code to enable voltage monitoring:

// initialize power monitoring

RCC_APB1PeriphClockCmd(RCC_APB1Periph_PWR, ENABLE); // enable peripheral clock
//PWR_DeInit(); // reset peripheral - hope it doesn't brick the chip! (it does)
//PWR_PVDLevelConfig(PWR_PVDLevel_2V9); // lowest voltage monitoring
PWR_PVDLevelConfig(PWR_PVDLevel_4V4); // highest voltage monitoring

PWR_PVDCmd(ENABLE); // enable programmable voltage detector
//Delay_Ms(100); // short delay for voltage detector to "warm up"
printf("Power is %s\r\n", PWR_GetFlagStatus(PWR_FLAG_PVDO) == SET ? "*** LOW ***" : "OK");

EXTI_InitTypeDef EXTI_InitStruct = { 0 };
EXTI_StructInit(&EXTI_InitStruct); // set default values
EXTI_InitStruct.EXTI_Line = EXTI_Line8; // PVD is connected to EXTI8
EXTI_InitStruct.EXTI_Mode = EXTI_Mode_Interrupt;
EXTI_InitStruct.EXTI_Trigger = EXTI_Trigger_Rising; // rising edge on PVD means voltage is dropping out of specified range
EXTI_InitStruct.EXTI_LineCmd = ENABLE;
EXTI_Init(&EXTI_InitStruct); // initialize EXTI8/PVD

NVIC_EnableIRQ(PVD_IRQn);

And here is the simple interrupt handler I wrote to catch those pesky power glitches:

void PVD_IRQHandler(void) __attribute__((interrupt("WCH-Interrupt-fast")));
void PVD_IRQHandler(void) { // programmable voltage detector interrupt handler

    // the supply voltage has dropped below 4.4 VDC

    power_glitch++; // count this power glitch

    EXTI_ClearITPendingBit(EXTI_Line8); // clear interrupt pending bit

    printf("*** POWER GLITCH! ***\r\n");

    //while(true); // *** debug *** stop here for now

    while(PWR->AWUCSR & PWR_FLAG_PVDO != 0) {
        // wait for power to return to return to normal values, i.e., > 4.4 VDC
    }
}

I originally added a while(true); loop in the interrupt handler to stop and let me see when a power glitch was detected, and sure enough, I was rewarded very quickly. So the power is dipping down enough to confuse the SPI peripheral but not actually reset the core. This is not as surprising as it sounds, as the core power-up and power-down reset voltage levels are set at 2.5V. We’re losing some voltage, somewhere, for only a moment, but not enough to trigger a full system reset.

Again, to me it seems that all this points to a low-quality connection somewhere in the mix. It’s time to build that improved test fixture.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 4

1 March 2025

I am both displeased and a little disappointed to find my experiment still running today, with over 40,000 successful loops and zero errors.

In order to help convince myself that this is not just a statistically unlikely run of “good luck”, I will eliminate the delays between color changes that I had inserted into the code to make the color changes more obvious to us slow(er)-brained humans.

Once upon a time, a long, long time ago, I worked on a project that needed to work perfectly every time, and be able to check that it had worked perfectly every time. We were running millions of encryption and decryption cycles on blocks of data and were seeing some very rare cases of mistakes creeping in. We were able to catch the mistakes, but were ever so curious as to what was causing them. Sound familiar? So I set up a test to run over the weekend and let the test machine run full speed ahead. Returning on Monday, we found it had caught five errors in just over 50,000,000 transactions. Unacceptable!

We contacted the manufacturer of one of the critical components of the device and were told that, because of the way we had configured the circuit, the component could encounter a “race condition” and double-clock itself, if a particular signal arrived within a one nanosecond window that varied plus or minus five nanoseconds over temperature. That’s a tiny window! But we were hitting it on a reproducible scale.

The solution in that case was to clock the device synchronously by providing our own clock signal to the chip instead of depending on its internal clock. That way all the transactions would have been rigidly in step and not exploring the spaces of all possible timing combinations. Unfortunately, we had already committed to a PCB design and were on the verge of production when the whole outfit went south. Printed circuit board design and production were things to be Taken Seriously back in the day.

And by “went south” I mean the owner cleaned out the bank account and disappeared, literally leaving us at the office saying, “Stay here and I’ll go get your paychecks from the bank myself.”

But away from past mistakes and back to present mistakes. Over five million loops with no errors seems to indicate to me that the code, when properly enhardwared, works as designed. Now I need to run it up on the lift and swap the original circuit back in and see if we can continue to reproduce the error states we were previously seeing.

So at first it looked like the impossible was happening: everything now worked and yet I had changed nothing. But patience won by asking me to take a break and come back in a few minutes. When I did, I saw, just as I was sitting down, a run of errors being logged on the console.

So the next variation on the testing got underway. I wanted to try disconnecting the SPI output from the PA2 line and drive a different WS2812B externally to the board. I set about finding another suitable LED module and building another little test cable for it. Again, when I sat back down at the desk, another run of errors was simultaneously occurring. What are the odds?, one might ask.

A judicious tap-tap-tapping on the little breadboard circuit rewarded me with my answer: 100% guaranteed to fail when vigorously agitated. An intermittent connection is somehow to blame for all this mess.

Here is my list of possible candidates for where the issue lies, in decreasing order of probability (in my mind):

1.  Janky test cables made from exceedingly economical jumper wires
2.  Interconnects in the no-name solderless breadboard hosting the circuit
3.  My soldering of the header pins to the PCB
4.  Manufacturing error or tolerances in the board itself

Before I completely disassemble this prototype and build it up again in a more resilient form factor, I will go ahead and try the new LED module. Same problem, as errors continue to be encountered, even with the PA2-PC6 bridge disconnected. Disconnecting the new LED module completely, while at the same time not re-connecting the onboard LED still encounters errors. I really thought it would have no measurable effect, and I seem to be right this one time.

Usually in these situations, the first thing I look for is some sort of power interruption or brown-out condition on the power supply. The reason I don’t think this is the culprit is because the chip does not seem to be resetting itself when these errors occur, as both the loop and error counts seem to persist across these error states. Additionally, I feel that both the CH32V003 and the WS2812B are correctly and adequately decoupled using their respective manufacturers’ suggested values of capacitors.

Now one thing I have not yet done is to activate the chip’s inbuilt power monitoring circuitry. Perhaps that could tell me if there are sufficient variations in chip’s internal power distribution occurring that could cause individual peripherals to misbehave without triggering a complete system reset.

Reading about the power control peripheral, I see that it can monitor the system voltage and potentially trigger an interrupt if certain parameters are exceeded. Using the SDK, I see the first reasonable thing to do is to ‘de-initialize’ the peripheral using the PWR_DeInit() function, contained in the /Peripheral/inc/ch32v00x_pwr.h and /Peripheral/src/ch32v00x.c files. Here’s what the function does:

RCC_APB1PeriphResetCmd(RCC_APB1Periph_PWR, ENABLE);
RCC_APB1PeriphResetCmd(RCC_APB1Periph_PWR, DISABLE);

But… wait a minute. Isn’t that the “brick myself so hard” sequence I found earlier? Let’s find out!

And the answer is… yes. Yes, it does. Recovery consists of the following steps.

In the MRS2 IDE, go to the Flash -> Download Configuration menu option.
In the “Download Parameters” panel, go to the bottom and check (enable) these options:

1.  Turn off WCH-Link Power Output
2.  Clear CodeFlash by Power-Off
3.  Disable MCU Code-Protect

Now take that PWR_DeInit() function call out of your program! Your program should download properly and run again.

Now having configured and enabled the “programmable voltage detector” circuit (and don’t forget to enable the PWR peripheral’s clock, like I did!), I see that the chip thinks its supply voltage is just fine. I set it to the highest voltage, ~4.4 VDC, and actually measured 4.75 VDC at the board. The chip is rated to run at full speed all the way down to 2.7 V, or 2.8 V if’n you’re wanting ADC function, so it’s able to detect any voltage anomalies in this manner. Of course, the next step is to make the voltage monitoring an asynchronous process and have it trigger an interrupt, but we both know it’s my wiring.

I’ll wire up a more robust test fixture on the morrow.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 3

28 February 2025

So running overnight, there were over 40,000 loops of the blinking demonstration program, and it was still going. What I shoulda but didna do was add in an error counter that updated with each loop. There’s still time!

Moving the experimental apparatus over to the “official” WCH CH32V003 development board was simple enough. I built another programming cable for power, programming and serial communications, as well as a little cable for another WS2812B module I had in the WS2812B bucket. Building bespoke, modular cables for these little devices takes a little bit of time but saves so much more time in their subsequent reuse.

And it works! Well, I expected it to work, at least as well as it was working previously, which was “mostly”. But I really do need to add that error counter to the program so that if it’s not immediately going to fail, I can leave it running overnight and see what happened in the morning.

The error count is being kept (I strongly suspect) and it is being printed alongside each loop message. It has gone through several hundred loops by now and no errors have occurred.

Using a “known good device” is a proven trouble-shooting stratagem, when that is possible. It is, however, not an “apples to apples” comparison, at least the way I have it set up. The WCH board actually does have a 24 MHz quartz crystal mounted on it, even though my code is still telling the clock control unit to use the 24 MHz HSI oscillator, turbo’d up to 48 MHz by the magic of a phase-locked loop. The WCH board hosts a CH32V003F4P6, which is a 20 pin TSSOP20 package (thin shrink small outline package), while my little board has the 20 pin QFN20 (quad flat no lead) package. And the WB2812B LED is different, although it’s not clear to me how that could affect the outcome, but I list it for completeness.

I am going to be both displeased and a little disappointed if the code works perfectly on the “known good” board and not on mine. While I have already designed and shipped several different PCB-based products using this family of chips, I’m the first one to admit that I still have a lot to learn. Lay some wisdom on me, little chips!

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 2

27 February 2025

I firmly believe that the only reason this article is not finished and already published on my web site (well, for you, Dear Reader, it is, but for me, your Humble Narrator, it is yet to be) is that I listed two goals in my introduction and only achieved the first. In retrospect, and following the also-applicable “one topic per email” rule of writing, I should have edited myself, restated my goal (singular) and been done with it.

But here we are. The first goal, as you recall, was to use the hardware SPI on the chip to create a suitable wave form to drive the WS2812B addressable LED on my little development board. That goal is mostly achieved, in that I have seen it working and verified the signal using an oscilloscope. Mostly, but not completely, as its seems to “hang up” from time to time in a most frustratingly random manner.

The second, and arguably less critical goal was to be able to adjust the apparent brightness of the LED in real time for demonstration purposes. Two completely different things that very well could have been two completely different articles, although I feel that the first goal outweighs the second in value and practicality. We’ll get to that second goal today, I hope.

Another mystery presented itself yesterday and I was tempted to just ignore it, but I think you know “how I am” about these things. When verifying that the SPI output signal was not conflicting with the supposedly high-impedance default state PA2, I was debugging the program and saw that the GPIO configuration register for GPIOA was set to all zeros. Now the RM explicitly states that the reset value is supposed to be 0x44444444, indicating all eight pins of GPIOA are configured as inputs with no pull-up or pull-down resistors. Being all zeros, or 0x00000000, this represents a configuration of all “analog inputs”, which is a different thing. But this profound mystery will have to wait for its investigation as “randomly hanging up” is not a thing I can tolerate at all.

And by “randomly” I mean very randomly. I happened to notice the LED “not blinking” as it was supposed to be cycling through eight basic combinations of red, green and blue. Then I added a debug message on the console for each pass through the entire loop. As there is a 250 ms delay after each LED combination is set, that means the loop takes two seconds (or so) to complete. I left it running overnight and it stalled at loop #27,719. That means that everything worked splendidly for at least 55,438 seconds, which is more easily comprehended as almost 16 hours.

I had left it running under the debugger, so that when (or if) it should misbehave, I would be able to examine its state. I was able to do so, and discovered that it was hanging up at the only place that it possibly could, assuming as I always do that it was my code that was causing the problem. This was in the spi_send() function that first waits for the transmit register to be empty before sending the next byte out the SPI port. And sure enough, the TXE bit of the SPI’s STATR status register is reading a solid zero, meaning that the transmit register is not empty and that more waiting is indicated. Something is amiss here.

Assuming that the SPI is still receiving clock pulses from its prescaler, anything “transmitted” should clock itself out in eight bit times, or roughly ~1.333 us. I’m not using any sort of handshaking controls or other possibly interfering mechanisms here.

Now it has hung up after only 31 loops. It’s bad. Really bad.

So at the moment it seems the only logical thing to do in this situation is to add a timeout feature to the spi_send() function. How long to wait before declaring a ‘mayday’ and implementing Directive Omega? We should know within 2 us if there is a problem, given any eight bit byte should transmit completely in eight cycles of the 6 MHz clock. The little chip can only execute at 48 MHz, and even if it were executing one instruction in every clock cycle, that would only be 64 clock cycles. It’s not, because at system clocks of 24 MHz or over, an additional wait state is introduced for every flash memory access. It’s not entirely clear to me how that maps to the final cycles-per-second equation, but it’s got to be in there somewhere.

So a very safe and humanly undetectable amount of time would be a maximum of 64 iterations of the wait loop. If this were a more time-critical matter, we could enlist the help of the system timer, which is a 32 bit counter that can be clocked by the system clock either directly or after being divided by eight. It is in many ways almost identical to the SysTick peripheral in ARM Cortex devices.

But again, we’re blinking an LED and not landing on the moon or anything of material impact, so ‘close enough’ on this fail-safe device is sufficient.

Now that we’ve calculated a reasonable time frame for the transmit register to report itself empty and ready for new data, what exactly do we do when (not “if”, it seems) this failure occurs?

The only thing that seems to work with things like this is to turn it off and on again. “Have you tried turning it off and on again?” is a classic for a reason. We can just re-initialize the SPI device and just start over again. Just to be safe, it would be prudent to send a protocol reset signal, i.e., a low-level signal of ~50 us, before resuming our attempts to transmit.

I originally coded the SPI initialization code within the main() function, as I had originally only ever intended to execute it once. Now it is its own little function, which I lovingly named spi_init(), which in no way conflicts with the SDK-provided SPI_Init() function.

Well, I almost fell into a trap here. By adding the ‘reset’ function to the end of the recovery procedure, my little function would have been, in effect, calling itself, as the ws2812b_reset() function in turn calls the spi_send() function. Now we’re talking about an exceptional condition here, not something that is guaranteed to happen every time. But the one thing we know about this situation is that we don’t know what is causing it (yet) or why it is happening, much less if or when it will recur.

And now we wait, while the code ‘tests itself’. In the meantime, I’ll describe the original code that I was using to break down the transmission protocol into manageable chunks.

You’ll recall that at the lowest level, we were using the SPI to generate some arbitrary wave forms for us. A short-ish pulse was emitted when we transmitted a 0x60 via the SPI port, and that represented a zero, while a longer-ish pulse was created by shifting out 0x7E, to be interpreted as a one. I wrote a function called ws2812b_bit() which took a single argument, either a zero or something other than a zero and transmitted the appropriate value via the spi_send() function.

Then on top of that, I wrote a function to send the eight bits in a byte by sending the MSB of a byte via the ws2812b_bit() function, then shifting the entire byte to the left, so as to move the next least significant bit up to the MSB position. This happened a total of eight times and the single byte was transmitted.

The top layer was a function called ws2812b_rgb() which took three eight-bit values for the red, green and blue components of the signal, and called the ws2812b_byte() function, except in green, red then blue order.

The application could use the ws2812b_rgb() function to send out a string of RGB values to a string of LEDs, even a string of only one LED. After all the values had been sent, the ws2812b_reset() function would confirm their election and shift all the transmitted data values to the appropriate departments within each LED and start to display them accordingly.

It was totally working and we could have totally gotten away with it, had I not turned the blinding spotlight of the oscilloscope on the signal. The signal was nowhere near running at the throughput I had hoped for. There were biiiig gaps between the individual pulses, and while it still met the ever-so-relaxed requirements of the LED, it was only running at about 250 KHz, and not the 750 KHz theoretical maximum we should have seen, given our SPI clocking constraints.

So I played with about a bazillion combinations of different timing setups, including “unrolling” my functions to eliminate any excessive call overhead, all to no avail. Then I discovered by re-reading the reference manual for the tenth time, that I was relying on the SPI’s ‘busy’ flag instead of the ‘TXE’ flag. You go read the RM and tell me how clear that would have been to you. Here’s what it says about the ‘busy’ flag:

Busy flag. This flag is set and cleared by hardware.
1:SPI is busy in communication or Tx buffer is not empty.
0:SPI (or I2S) not busy.

And here is what it says about the ‘TXE’ flag:

Transmit buffer empty.
1:Tx buffer empty.
0:Tx buffer not empty.

Not interchangeable! And now I know. Well, I think I know. Something is still very messed up. Continued testing has revealed multiple failures after only 256 loops. And these are sequential errors, occurring right after the SPI reboot. Sometimes it’s four or five errors, and sometimes it’s more than I can count, as the error messages scroll off the top of the screen.

The good news is that it always, eventually, recovers and starts playing nice again.

As this is my first real exposure to this chip’s SPI hardware, it’s not entirely unreasonable that my expectations and its actual behavior have diverged. But I really think that I am asking the ‘bare minimum’ from this peripheral. It’s not expecting any sort of input at all and we’re not even using the clock signal that it is providing. I just don’t know what else could be causing these randomly-spaced events to occur. Yet.

As a sanity check, I will try this again on the official WCH CH32V003F4 development board, with just a single WS2812B LED attached directly to PC6, without all this also-connected-to-PA2 nonsense, and see if this happens there as well.

Posted on Leave a comment

CH32V003 driving WS2812B LEDs with SPI – Part 1

26 February 2025

After thinking about the WS2812B driver (if you can call it that) for the CH32V003 chip that I described a few days ago, I determined to make a couple of small improvements:

1.  Use the hardware SPI to deliver a full-speed bit stream to the addressable LED
2.  Be able to adjust the overall brightness of the demo program in real time

I created a new MounRiver Studio 2 (MRS2) project called, imaginatively, “F4-WS2812B-SPI”. This time I adjusted the system clock to the full 48 MHz, but using the internal HSI oscillator as the base instead of the external quartz crystal that is still not there.

In the MRS2-supplied file, system_ch32v00x.c, I un-commented the desired setting, like this:

//#define SYSCLK_FREQ_8MHz_HSI    8000000
//#define SYSCLK_FREQ_24MHZ_HSI   HSI_VALUE
#define SYSCLK_FREQ_48MHZ_HSI   48000000
//#define SYSCLK_FREQ_8MHz_HSE    8000000
//#define SYSCLK_FREQ_24MHz_HSE   HSE_VALUE
//#define SYSCLK_FREQ_48MHz_HSE   48000000

I find the best test of your system operating frequency is a serial terminal. If your USART is setting the baud rate based on the assumed clock frequency, you’re going to find out quickly if it is right or not. The generic, boiler-plate code created by the MRS2 new project wizard for this chip family (-003) sets up USART1 to be able to use the printf() family of console output functions. It also prints out the “System Clock” value and the unique Chip ID before entering the main loop of the application. I added the program name announcement to this list just so I can keep track of which program is actually running on the terminal. So I normally get this output every time the chip is either re-programmed or reset:

SystemClk:48000000
ChipID:00310510
F4-WS2812B-SPI

So I have confirmation that the system clock is somewhere in the neighborhood of 48 MHz. First, it told me itself. Second, I can actually read what it wrote, so that’s another Good Sign.

Now I’m now having some curiosity spring up around exactly how “unique” this “ChipID” really is. But perhaps I can follow up on that in the near future. It’s not looking altogether unique at this very moment.

So to talk to the WS2812B addressable LED with a ‘serial peripheral interface’ (SPI), um, peripheral, I should warn you that we are going not going to use the SPI as it was originally intended. You already know that the WS2812B uses its own proprietary bit stream protocol, which I vaguely described in a very hand-wavy manner in the previous article. It’s certainly not SPI-compliant, on the face of it.

But since SPI is a protocol of Very Little Brain, we can use it more as a ‘waveform generator’ than strictly a data transmission protocol. Any eight bit byte that you transmit through the SPI emerges as a sequence of bits from a single pin, along with a synchronized clock signal on another pin. We will not be using the clock pin at all, just the data line.

Now the SPI is a versatile beastie with ever so many options for configuring the data stream. This works out well because there are ever so many different SPI-enabled devices and every one of them has its own idea of what is a right and proper configuration.

As a peripheral of the first rank on this chip, it gets an entire chapter (Chapter 14) in the Reference Manual (RM). And here we see again the lingering legacy of “master” and “slave” devices. I’ve described my opinion on this topic in the past, so I will be referring to these two roles as “coordinator” and “participant” from now on. Our chip will coordinate the data flow and the LED will participate in this activity.

The SPI peripheral, which is unfortunately but irrevocably redundant, has access to up to four (4) input and output pins, depending on the required configuration. As previously stated, we will only need one, which is the output data line, called “MOSI” which translates to “coordinator out, participant in”. Other chips from other manufacturer’s sometimes refer to this pin simply as “SDO”, for ‘serial data out’. This pin is routed to PC6 (GPIO port C, pin 6), which is pinned out on the CH32V003F4U6 package on physical pin 13.

Now while the CH32V device is housed in a tiny (3x3mm) square plastic package with teensy weensy pads on the bottom of it, I had the foresight to route all the signals to the correspondingly numbered pins of a 20 pin DIP package, which is the form factor of the little development board I’m using on this project. So pin 13 on the QFN20 (quad flat no leads, 20 pins) maps directly to pin 13 on the dual in-line (DIP) footprint of the board.

Of course, before I go too far on congratulating myself on what a great job I did on laying out this board, let’s consider that I routed the output to the LED on the wrong pin entirely. I picked PA2 only because I had used that pin in the past as an output in a similar project. Now I need to figure out how to “correct” this error and get the signal from the SPI output to the LED.

Well, it’s not at all hard to do. Since the default state of most of the device pins is a high-impedance input, there should be no conflict if I just short PC6 to PA2 using a short jumper wire. I might mention at this point that I have installed the little DIP prototype development board onto a small solderless breadboard. Adding more components and attaching them to the device becomes very easy. Also, I don’t have to do any micro-circuit-surgery on the little board.

The down side is that I won’t be able to use PA2 for anything else.

So now let’s configure the SPI for our purposes. This begins with setting up PC6 as an “alternate function, push-pull output”, i.e., an output driven by one of the internal peripherals and not by the GPIO port. Then configure the SPI port to blast out those bits. Here is the configuration code:

// configure SPI

RCC_APB2PeriphClockCmd(RCC_APB2Periph_SPI1 | RCC_APB2Periph_GPIOC, ENABLE); // enable GPIOC peripheral clock

GPIO_InitTypeDef GPIO_init_struct = { 0 }; // GPIO initialization parameter structure

GPIO_StructInit(&GPIO_init_struct); // set default values
GPIO_init_struct.GPIO_Pin = GPIO_Pin_6; // PC6 is SDO
GPIO_init_struct.GPIO_Speed = GPIO_Speed_10MHz; // need 6 MHz
GPIO_init_struct.GPIO_Mode = GPIO_Mode_AF_PP; // alternate function, push-pull output
GPIO_Init(GPIOC, &GPIO_init_struct); // initialize PC6
GPIO_WriteBit(GPIOC, GPIO_Pin_6, Bit_RESET); // clear PC6

SPI_InitTypeDef SPI_init_struct = { 0 }; // SPI initialization parameter structure

SPI_I2S_DeInit(SPI1); // reset peripheral
SPI_StructInit(&SPI_init_struct); // set default values
SPI_init_struct.SPI_Direction = SPI_Direction_1Line_Tx; // one line for output only
SPI_init_struct.SPI_Mode = SPI_Mode_Master; // or 'coordinator', if you prefer
SPI_init_struct.SPI_DataSize = SPI_DataSize_8b; // 8 bits
SPI_init_struct.SPI_BaudRatePrescaler = SPI_BaudRatePrescaler_8; // 48 MHz / 8 = 6 MHz SPI clock
SPI_init_struct.SPI_FirstBit = SPI_FirstBit_MSB; // MSB first
SPI_Init(SPI1, &SPI_init_struct); // initialize SPI
SPI_Cmd(SPI1, ENABLE); // enable SPI

So to send the individual ‘wave forms’ that make up the binary ones and zeros that the WS2812B understand, we’ll shift out a few ones as a zero and a few more ones as a one. Yes? Yes!

For example, to send the code for a zero, we send a shorter high-level pulse, followed by a longer low-level pulse. I use a 0x60 byte value, or 01100000 in binary. To send a one, I use the value 0xFC, or 11111100 in binary, instead.

I wrote a simple function that sends the data byte out the SPI port, while waiting for any previously-transmitted bytes to clear first. It looks like this:

void spi_send(uint8_t data) { // send 8-bit data out via SPI

    while((SPI1->STATR & SPI_I2S_FLAG_TXE) == 0) {
        // wait for transmit register to be empty
    }

    SPI1->DATAR = data;
}

Now if you’ve done the math, and I know you’ve done the math, you’ll quickly figure out that the timing is still not exactly right on these transmissions. This is due to the limited number of SPI clock prescalers available. The system is running at 48 MHz, and we are only provided with powers-of-two for clock divisors. For our purposes, we use “/8” so that we get a 6 MHz clock running the SPI. This means that each “bit” in the eight bit byte that gets sent out occupies ~167 ns, and eight of them adds up to 1.33 us, which is longer than the 1.25 us minimum bit cell duration. So we’re getting 750 KHz instead of 800 KHz. Not perfect, and not 100% of what is possible, but much better than before.

So that’s the first of my two goals accomplished. Now to “adjust” the apparent brightness of the LEDs in real time for demonstration purposes.