Posted on Leave a comment

CH32V003 Driving WS2812B Addressable LEDs

23 February 2025

I’d like to be able to control some WS2812B addressable RGB LEDs using the CH32V003 chips. I’ve designed a little development board with a CH32V003F4U6 QFN20 device and it has a microscopically tiny WS2812B-compatible LED on it.

When I designed this board, I connected the data pin of the WS2812B LED to PA2, but only because I had written some earlier code that already used that pin to drive the LEDs. In retrospect, I should have used PC6, as it is also the SPI MOSI output, which is ideal for shifting out bits in a serial fashion, which is what the WS2812B type of LEDs want.

But here we are and I already have a stack of these boards here to play with, so play with them I will.

I’m using the new MounRiver Studio 2 (MRS2) IDE for this project. I created a new project called F4-WS2812B because that’s how creatively I name things.

I made a few tweaks to the project settings. Instead of attempting to crank up a 24 MHz quartz crystal attached to PA1 and PA2 that is not there, I set the system clock to 8 MHz. This is coincidentally the default system clock for the CH32V003 chips in their native state, before MRS2 decides these things for you. The chip powers up with the high-speed internal (HSI) oscillator running at ~24 MHz and sets up a prescaler of 3 on the system clock. For the speed crazy fans out there, you can speed up this little chip to 48 MHz, no problem. But for this project, I am currently estimating (i.e., totally guessing) that 8 MHz should be sufficient for driving the serial bitstream out to the LED in accordance with its timing constraints.

I also switched the default compiler from GCC8 to GCC12 and made a handful of other, less signficiant changes to the project settings.

So the first thing to do is to configure PA2 as a push-pull output with modest speed capabilities. The slowest setting is 2 MHz, and the target signal specification is 800 KHz. I’m not 100% clear on what all this affects on the output pin drive circuitry, but I assume it reduces some of the EMI that might otherwise be emitted if driven at the maximum speeds. Here is the code to do that, using the supplied HAL library:

// configure PA2 as push-pull output, 2 MHz max
RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE);
GPIO_InitTypeDef  GPIO_InitStructure = {0};
GPIO_InitStructure.GPIO_Pin = GPIO_Pin_2;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_2MHz;
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_Out_PP;
GPIO_Init(GPIOA, &GPIO_InitStructure);
GPIO_WriteBit(GPIOA, GPIO_Pin_2, Bit_RESET); // PA2 low

The next thing to do is to start sending out some pulses and see how close we can get to the WS2812B’s timing requirements. Running at 8 MHz, each instruction cycle lasts 125 nanoseconds, and the RISC-V CPU in this chip executes most instructions in a single cycle.

There are three types of pulses that we need to send to talk to this LED. A short high pulse followed by a longer low pulse counts as a zero. A longer high pulse followed by a shorter low pulse counts as a one. A long low pulse of at least 50 microseconds acts as a ‘reset’ signal, telling the LED to latch in any data that has been shifted into it and using that data to light up the red, green or blue LEDs accordingly.

Now much has been said about the strictness of these timing requirements. The only thing that is really critical is the difference in the “short” and “long” periods of the high portion of the pulses. There can be quite a bit of variability on the low portion of the pulse, as long as it’s not so long as to be interpreted as the reset signal.

My simple adaptation of this timing protocol has pretty good control over the high part of the pulse, but the low parts tend to go on just a bit too long – but it still works just fine. The downside is that the overall bit frequency is only about 250 KHz, much lower than the maximum of 800 KHz. Right now I’m only trying to light up a single addressable LED, so this works fine, but if I was wanting to talk to a lengthy string of LEDs, this would seriously limit the maximum update rate for the entire string.

At the lowest level, I created a function called ws2812b_pulse() that takes a single argument. For a short pulse, you send it a zero. For the longer pulse, send it a 1. To send the reset signal, send it a 2. Here is the code:

void ws2812b_pulse(uint8_t length) { // send out a pulse

    // note:  system clock assumed to be 8 MHz

    // 0 = short pulse
    // 1 = long pulse
    // 2 = 'reset' signal

    switch(length) {
    case 0: // short pulse, 250 ns
        GPIOA->BSHR = GPIO_Pin_2; // high
        GPIOA->BCR = GPIO_Pin_2; // low
        break;
    case 1: // long pulse, 750 ns
        GPIOA->BSHR = GPIO_Pin_2; // high
        __asm__("nop"); // extend that pulse by 125 ns
        __asm__("nop"); // extend that pulse by 125 ns
        __asm__("nop"); // extend that pulse by 125 ns
        __asm__("nop"); // extend that pulse by 125 ns
        GPIOA->BCR = GPIO_Pin_2; // low
        break;
    case 2: // reset, > 50 us
        GPIOA->BCR = GPIO_Pin_2; // low
        Delay_Us(50);
        break;
    }
}

Next up the chain, I wrote a function that repeatedly calls the ws2812b_pulse() function with the data bits of a single byte, starting with the most significant bit (MSB) and going down to the least significant bit (LSB), as this is how the WS2812B listens for bytes. Here is the code:

void ws2812b_byte(uint8_t byte) { // send a byte one bit at a time, MSB first

    uint8_t i; // bit counter

    for(i = 0; i < 8; i++) { // loop through all the bits in the byte
        if(byte & 0x80) { // send a 1
            ws2812b_pulse(1);
        } else { // send a 0
            ws2812b_pulse(0);
        }
        byte <<= 1; // shift all the bits
    }
}

The actual protocol of the WS2812B is to get three bytes worth of data, in green, red, blue order. If you send in another three bytes, it will shift out the previous bits to the next LED. Once you’re ready to commit, you send the reset signal, and the chips latch all their data and start showing the corresponding colors with their LEDs.

Here is the code to send all three bytes in the proper order:

void ws2812b_rgb(uint8_t red, uint8_t green, uint8_t blue) { // send RGB data to LED

    ws2812b_byte(green); // green data
    ws2812b_byte(red); // red data
    ws2812b_byte(blue); // blue data
}

To latch in that data, I created a macro that sends the reset pulse:

#define ws2812b_reset() ws2812b_pulse(2) // send 'reset' pulse to latch data

As a demonstration of all the possible colors at the lowest possible brightness, I wrote this simple loop that repeats endlessly:

while(true) { // an endless loop

    ws2812b_rgb(0, 0, 0); // black
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(1, 0, 0); // red
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(1, 1, 0); // yellow
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(0, 1, 0); // green
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(0, 1, 1); // cyan
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(0, 0, 1); // blue
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(1, 0, 1); // magenta
    ws2812b_reset(); // reset
    Delay_Ms(250);

    ws2812b_rgb(1, 1, 1); // white
    ws2812b_reset(); // reset
    Delay_Ms(250);
}

You’ll find that these little LEDs are quite bright when told to shine at their utmost capacity. In fact, you need to be careful when you are working with more than just a very few of these in a string, as the current consumption goes way up way quick, and they tend to produce a good amount of heat in the process. But one little LED on my little dev board is going to continue to behave itself and blink merrily along into the night.

Posted on Leave a comment

Getting getchar() to Get Characters

22 February 2025

Yesterday I was poking around inside the GPIO ports of the CH32V003 chip, and “printing” out the results to the “console”. The default applications created by the MounRiver Studio 2 “new project wizard” sets up a nice mechanism whereby us old-school programmers can use the printf() function from the standard I/O library, stdio. I tried, unsuccessfully to use the corresponding getchar() function to read a single character back from the console, but it flat didn’t work at all.

I totally guessed yesterday, albeit correctly, that this was due to a lack of a lower-level function to redirect console input from the USART. Today, I did some research and discovered that it takes that lower-level function and an additional incantation to get the thing to work as I wish it to work.

That lower-level function is called _read(), with a leading underscore, and it is expected to have the following prototype:

int _read(int file, char *result, size_t len);

Since I’m not using a bunch of files and other niceties, I can just ignore the first parameter. If, in the future, I wanted to support “getting characters” from other sources, I could go back and match up the file number specified with the various sources. Today I skip it. If you like to compile with lots of error checking, you will most likely get an “unused parameter” error, so you might need to flag it as used. The default setup provided by the MRS2 project is quite forgiving in this regard.

The “result” parameter is a pointer to an array where we will store the incoming number of bytes requested. The final parameter indicates how many characters are wanted by the caller.

The little “helper function” needs to send back a return code that either represents the number of characers actually read or -1 in the case of an error.

Here’s what mine ended up looking like:

int _read(int file, char *result, size_t len) { // support stdio.h getchar(), etc.
    
    int return_code = 0; // code we will return
    size_t bytes_to_return = len; // capture number of requested bytes to return

    if(len == 0) return 0; // that was easy

    while(bytes_to_return) {

        if((USART1->STATR & USART_STATR_RXNE) == USART_STATR_RXNE) {

            // there is a character ready to be read from the USART

            *result = USART1->DATAR; // read character from USART data register and store it in the requested buffer
            result++; // advance buffer address
            bytes_to_return--; // decrement requested number of bytes requested
            return_code++; // count how many bytes have been returned so far

        } else {
            // probably should add some sort of time-out mechanism here
        }
    }

    return return_code; // number of bytes returned;  no error states to report as yet
}

You’ll note a special case at the beginning where if the caller asks for exactly zero bytes, we throw up our hands in joy and say, “Your wish is granted!” and just return. After all, we did everything that was asked.

Next the code goes into a loop to retrieve the requested number of characters, one at a time. Within the loop, we just look at the status bit RXNE (receive register not empty) in the USART STATR status register. If it is set, we read the character from the USART data register DATAR and stuff it into the receive buffer. If it is not set, we effectively wait forever until something comes in.

The number of bytes to return is tracked as is the return_code, which represents the number of bytes actually returned and is kept in the variable named return_code. The loop eventually exits and the return_code (number of bytes returned, in this case) is returned to the caller.

This code by itself is not enough to make the getchar() function get any characters; well, at least not in the way you would expect it to. Since all these standard I/O functions are inherited from Linux/UNIX and were originally written for much larger systems, the default behavior of the function is to gather up a bunch of characters and buffer them before actually returning to the caller. This makes sense when your program is being run alongside many other programs as well as an operating system. On the scale of our little CH32V chip, maybe not so much.

So here is the incantation I promised you:

setvbuf(stdin, NULL, _IONBF, 0); // disable buffering on stdin

As you see explained in my little code comment, this otherwise cryptic and mysterious function disables buffering on ‘stdin’, the default input device for the imaginary console we are using.

Now we can use the very handy getchar() function to wait for and get a character from the USART, which is in turn connected via a system of tubes and hoses to our host development system, somehow.

Future improvements to the very basic _read() function demonstrated here would include both a time-out mechanism and the ability to differentiate between the various possible sources for input to our little chip. Perhaps you can think of some more improvements, as well. Please do share them in the comments.

Posted on Leave a comment

An Experiment to Satisfy My Curiosity

21 February 2025

I was setting up a small CH32V003 demo project to see which GPIO pin got toggled in the default MounRiver Studio 2 application. On the CH32X035 default application, it’s PA0 (GPIO port A, pin 0). But there’s no PA0 on the CH32V003, even on the largest package. So which pin is it?

The answer surprised me.

It turns out that the default application generated by MRS2 does not blink an LED or toggle a GPIO pin at all. It sets up the USART to receive and echo characters via the virtual serial port on the WCH-LinkE programming adapter. Technically, it doesn’t faithfully echo the character; it inverts all the bits of any received character then transmits that back to the console.

So as not to feel entirely shut down, I plowed ahead and made it blink an LED. No PA0, you say? No problem, I answer. I see that there is a PA1, which is pin 2 on the CH32V003F4U6 I’m using, and that should do just as well.

I added the requisite code to enable GPIOA and set up PA1 as a push-pull output of modest speed:

    RCC_APB2PeriphClockCmd(RCC_APB2Periph_GPIOA, ENABLE); // enable GPIOA peripheral clock

    GPIO_InitTypeDef GPIO_InitStructure = { 0 };

    GPIO_InitStructure.GPIO_Pin = GPIO_Pin_1; // PA1
    GPIO_InitStructure.GPIO_Mode = GPIO_Mode_Out_PP; // output, push-pull
    GPIO_InitStructure.GPIO_Speed = GPIO_Speed_2MHz; // doesn't have to be fast

    GPIO_Init(GPIOA, &GPIO_InitStructure);

Actually, I cut and pasted that code from the existing, pre-generated code from the project that sets up the USART. Now since the default mapping of the USART’s transmit (PD5) and receive (PD6) pins belong entirely to GPIOD, not GPIOA, it was a while before I noticed that my initialization code was wrong. This only took an embarrassingly long time to find by single-stepping the code and looking at the bits in the configuration register for the GPIOA peripheral.

So once I had discovered that I was, in fact, re-initializing GPIOD, at least pin 1 in any case, I assumed it would just start working. I had attached one of my very favorite LEDs, a 5mm blue LED from waaay back. Indeed, it might be one of the first blue LEDs I ever obtained. I still remember the moment that Billy Gage of BG Micro showed these to me. It was an amazing experience.

So I’ve attached the blue LED with its requisite 270Ω resistor between PA1 and ground. You know the steps: save the file, recompile and download. Blinky blue goodness?

Goodness, no. Still no blinkage. I’m getting just a little exasperated at this point. Blinking an LED is the “Hello, world!” of the embedded development world. It is both a rite of passage and a trivial accomplishment at the same time. I would have assumed at this point in my career and at this particular point on my learning curve of these devices that I would be seeing a blinking blue LED. I had done it before, countless times. I would do it again!

The only other thing I could think of was that the PA1 pin had been re-mapped to a different function. These new microcontrollers have so many internal peripherals that not all of them get their very own pins. Scarce resources must be thoughtfully allocated. I looked up the re-mapping options for this pin in the CH32V003 data sheet. Yes, it can be re-purposed as the external crystal input, OSCI. But I hadn’t asked it to do that.

Cranking up the debugger again (how would I even survive without this tool?), I look at the remap register PCFR1 in the AFIO peripheral, and there is PA12_RM. Now that’s not the best possible name for it, is it? It’s the ‘remap option bit for PA1 and PA2’, but it sure looks like they are referring to a mysterious PA12, which totally does not exist on this chip.

And yes, the bit is set, meaning that the function of PA1 (and PA2) has been shifted over to quartz crystal oscillator duty, and not GPIO function, as I intended.

Now this is not the default state of this re-mapping option. Someone, somewhere, was sneaking in during the night, replacing everything with an exact duplicate and setting that bit in blatant contradiction to my wishes.

Something told me to review the clock options located in the “system_ch32v00x.c” source file, which is created by the MRS2 new project wizard. Sure enough, it had selected “#define SYSCLK_FREQ_48MHz_HSE 48000000” as the default clock for the system. The HSE is the “High Speed External” oscillator. My circuit has no quartz crystal attached to PA1 and PA2. You might remember that I have a very special blue LED attached to PA1. PA2 happens to have a WS2812 programmable LED attached to it, but I wasn’t even going to play with that (yet).

Changing the selection to “#define SYSCLK_FREQ_8MHz_HSI 8000000”, saving, recompiling and downloading finally gave me the blinky blue triumph I felt that I deserved at this point. Whew!

Now you may be asking, “How could the system even run at all with no crystal attached, if that was how it was configured to run?” And that would be an excellent question. The answer is that it goes through a sequence of steps to get to that point, and when any of those steps fail, it just continues on. There is an internal variant of the high speed oscillator, properly named the HSI oscillator, that is always present and is on by default when the chip first powers up. It runs at a nominal 24 MHz, but can be divided a selection of integer prescalers (1, 2, 3, 4, 5, 6, 7, 8, 16, 32, 64, 128 and 256). It divides the 24 MHz signal by 3 to give me my selected 8 MHz clock, once I specified it correctly. So it was previously running at 24 MHz, clocked from the HSI oscillator, since the HSE failed to start, and so it never even tried enabled the built-in phase locked loop to double the frequency to 48 MHz. Additionally, the mechanism to switch system clocks will just silently ignore your request if the required signal is not available and stable.

So now I have my blinking blue LED and all is well with the world. I should stop here, right? Always quit a winner, they say.

Well, of course not. Now is the time to answer all the other nagging questions I have had about certain aspects of this chip, and specifically some of the functions of the pins.

Having first been introduced to this family of chips by the smaller, eight pin packaged CH32V003J4, I had struggled to understand the availability of pins and functions. That particular beastie has multiple GPIO pins tied to each physical pin – but not all of them! PD7, GPIO port D, pin 7, which can double as the external reset signal NRST, was not pinned out at all. On the expansive F4U6 package (QFN20, quad flat no leads 20 pins) sitting before me, PD7 is brought out to pin 1. Now what will it take to actually be able to use this pin as a chip reset?

The answer might surprise you.

Nothing, actually. It’s already set up from the factory to be the reset input signal. In fact, you would have to go into the ‘user option bytes’ and change the configuration of the RST_MODE field to allow PD7 to be used as a GPIO pin. Then you would have to reset the chip for the new setting to take place.

I set out to confirm this theory by connecting a momentary push button switch between PD7 and ground. When I press the button, the chip resets. If I hold the button down, the chip does nothing at all.

Now a clever sort of developer could enable PD7 as a GPIO pin, then connect an external interrupt to it, so that it could be an ‘intelligent’ reset input, while still being completely asynchronous. The interrupt handler would consider the ‘request for reset’ and decide, based on what was important at the time, whether to reset or not. Resetting the chip from code can actually be done in a number of ways. How many do you know? Share your favorites in the comments.

So that experiment was quick and satisfying for me. PD7, which is available on every package except the -J4 SOP8, is a perfectly cromulent nRST input, and works exactly as one would expect it to work.

So does that wrap up all the experimentation for today? Can you not see the scroll bar on the right side of this screen? Of course it doesn’t!

Reviewing the pinout of the F4U6 package, I see that GPIO port A has only two pins present, PA1 and PA2, while ports C and D both have eight pins each. Now still thinking that these devices are bigger on the inside than they are on the outside, as far as available peripheral connections to available physical pins is concerned, it seems to me that it was odd that GPIOA only had two pins to it. It probably wasn’t even going to have any pins, as those two pins would or could be allocated to an external crystal for system clocking purposes. But it would seem to be a waste of two perfectly good pins if the end-application did not require the exquisitely precise timing that a quartz-based oscillator can provide. So they wisely put in another GPIO port on the chip.

But does it really only have two pins in it? Or is it, and this was my suspicion, actually an exact copy of the other two ports, GPIOC and GPIOD, with a total of eight ‘pins’ internally and only two of those pins brought out to physical pins on the package?

Now without decapsulating the device and taking some pictures through a microscope, which is completely a reasonable thing to do in someone else’s laboratory (not mine), how could we determine if those phantom pins exist or not?

One way would be to write various bit patterns to the output register, OUTDR, then read them back in and see if they all toggled on and off in unison. So just to be exhaustive, I wrote a short loop that wrote all 256 combination of ones and zeros to GPIOA->OUTDR, then read them back in and compared the results. If they all matched, it meant that all eight bits were realized internally and just not pinned out. If there were mismatches, it would indicate that some or all of the other bits were, in fact, unimplemented.

So I got tons and tons of mismatches. But since I was writing them out one line at a time on the serial terminal, the first results scrolled past too fast to examine.

I added a dummy ‘getchar()’ call to wait for the user (me) to hit a key on the keyboard after every 16 lines, for a very simple sort of pagination of the output.

For some reason that I have yet to investigate, the getchar() function simply returns without ‘getting’ any ‘chars’ at all. It probably has something to do with the fact that I have not provided a low-level read() function for the stdio library to use to let it know whence the aforementioned characters. An experiment for a future day.

Since I already had the USART initialized for the console output using the printf() function, et al., I just called the SDK-provided function to wait for a character to arrive, then read and discard said character. Pagination accomplished.

Now while my progression of output test patterns went from 0x00 to 0xFF in the expected order, the returned values that were read back in consisted only of 0x00, 0x02, 0x04 and 0x06. These values represent the four possible states in binary of bit positions 1 and 2, or PA1 and PA2 as we know them.

The conclusion I reach at this point is that only the two published GPIO pins, PA1 and PA2, are actually implemented on this chip. Do you agree or disagree with my conclusion? What other testing methodology should I apply to dive deeper into this Important Scientific Investigation?

Just to help me feel better about the testing I had done on GPIOA, I repeated the same test on both GPIOC and GPIOD. In all 256 cases, each port read back the exact expected value as had been written to it. All eight bits of GPIOC and GPIOD are implemented, which is not surprising at all as they have all been routed to different pins on the package. But it does give me a positive result to help me have a little confidence in my testing strategy.

What I found especially interesting about the testing on GPIOD was that it ‘succeeded’ even when some of the pins were being used for other function, such as the USART (PD5 and PD6) and the nRST input (PD7).

But you may be asking, “Wait a minute… what happened to GPIO port B?” And that would be an excellent question. So I set out to try to discover if there was any vestige of a GPIO port B on the chip.

The first thing I did was to try to set the ‘peripheral reset’ bit for GPIOB in the RCC peripheral. There are bits defined to reset the GPIO ports A, C and D, as well as the AFIO (alternate function input output controller) peripheral. There is a suspiciously ‘reserved’ spot between IOPARST and IOPCRST bits in the APB2PRSTR register within RCC. I fudged my own definition for this missing bit, as well as the upcoming IOPBEN peripheral clock enable bits, like this:

#define RCC_IOPBRST (1 << 3) // not defined in device header (for a reason)
#define RCC_IOPBEN  (1 << 3) // not defined in device header (for a reason)

If you write all ones into this peripheral reset register (there are two of them, actually), then read back that register, you will find that only some of the bits still have ones in them. Those ones represent peripherals that are 1) implemented and 2) able to be reset. GPIOB, as represented by my completely fake RCC_IOPBRST bit, was a zero.

Now remember to release all those peripherals that you just reset or they will remain in a reset state in perpetuity.

You can do the same thing with the peripheral clock enable registers (there are three of these in total). Again, GPIOB fails to stick when writing a one to the RCC_IOPBEN bit.

So there really is no GPIOB implemented on this chip.

Now we know.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 19

14 February 2025

I spent some more time debating with myself about expanding my coverage of these lovely Hershey fonts to some of the other sets, but have decided for the moment to pause with the plain and simplex versions of the Roman set. These more than meet my immediate requirements for the present project and are nice to look at as well.

Now I need to go beyond plotting a single character in the center of the screen and write a little code to send them out to the screen in a more utilitarian manner. For now I’m going to use the ‘native’ resolution of 21 ‘raster units’ in height and see if I can get three lines of readable type on the screen at once.

Without scaling the fonts, the most I can get is two lines of text. But that is when I don’t accomodate the ‘tall bois’, like the ‘[‘ and ‘]’ brackets and , surprisingly, the lower case ‘j’. Expanding all the margins so that everything actually fits only allows a single line of text, sometimes with as few as 4 characters, for important messages such as “mmmm” or “—-“.

Revisiting the ever-so-fascinating statistics of a few days ago, we see where this is coming from:

Max width   30  613  m
Max x       11  613  m
Min x       -11 613  m
Max y       16  607  g
Min y       -16 719  $

Well, there’s our friend, the expansive ‘m’ and the other titans of the simplex set.

So now it’s time to scale the fonts and see if I can get a more useful number of characters on the screen at the same time and still have them be legible.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 18

12 February 2025

I can fit the scalar values for each glyph into a one-dimensional array. Then I need an array of pointers to variable-length arrays of coordinates. Others have been able to do all this with a single array, but I see a lot of wasted space in there.

I’m trying to decide ahead of time if I need to reproduce the left column and right column values in the representation array, or if I can just get away with character widths. Or do I even need to keep track of the character widths? I could just treat these as monospaced characters and just pick a number.

Here are the leftest and rightest columns from the plain set:

Max left = (-2, 9)
Min left = (-8, 1241)
Max right = (9, 1273)
Min right = (2, 9)

And here are the same statistics from the simplex set:

Max left = (-4, 509)
Min left = (-15, 613)
Max right = (15, 613)
Min right = (4, 509)

After I’m ‘done’ with these scalable fonts, there’s one more bit-mapped font trick I want to try. I can take my existing 5×8 font and double or triple it in size, giving a blocky character. That might better represent the types of letters and numbers seen on temporary highway signs, as those still tend to be composed of 5×7 (or so) LED matrices. But when am I ever ‘done’ with anything?

So I am going to assume that I need all this data for now, and incorporate it into some data structures and try to port them over to the project and see if I can plot some nice looking characters onto the little OLED screen.

The first array encodes the ASCII value of the character as the index, so that doesn’t need an actual slot in the data file. In reality, since the first 32 ASCII characters are technically unprintable, our array index [0] points to ASCII value 32, the space, which, ironically, while a ‘printable’ character, does not print anything. This offset is just something that has to be remembered.

Each entry in the array will be a typedef’d structure containing the requisite information:

Information         Plain       Simplex
------------------  --------    ---------
Number of vertices  (0, 38)     (0, 56)
Left hand column    (-8, -2)    (-15, -4)
Right hand column   (2, 9)      (4, 15)
Coordinate index

Note that these sampled data values only represent the two subsets, roman plain and roman simplex. Using any of the other styles will have different values. Just for completeness, here are the statistics for the entire occidental glyph set:

Statistic           Value   Character
------------------- ------  ---------
Max vertices        143     3323
Max left            0       197
Min left            -41     907
Max right           41      907
Min right           0       197
Max character width 82      907
Max x               41      907
Min x               -41     907
Max y               41      907
Min y               -48     2411
Max dx              40      796
Min dx              -29     2825
Max dy              78      2405
Min dy              -80     2411
------------------- ------
Total vertices      47,465

Just looking at the total number vertices, and remembering that each vertex will require a minimum of two bytes for storage, we see that this little device with its 62K of flash memory will not be big enough to render every one of these characters, without adding an external memory device of some sort. So for now, I’ll content myself with the plain and simplex roman variations.

The vertex encoding get tantalizingly close to a single byte per coordinate pair. However, I want to also encode the ‘pen up’ information, which I use to distinguish ‘move to’ and ‘draw to’ commands. If I felt like running histograms on these data sets, I might be able to see a further pattern or trend that would allow me to use a look-up table for these values. But I am going to leave that as an exercise for you, my Dear Reader. I have to draw the line, somewhere.

So it looks like our benefactor, Dr. Hershey, was on to something when he originally encoded his coordinates as pairs of single digits. I’m not going to use his precise technique, although it will still end up as 16 bits of data per vertex. I’m just folding in the out-of-band ‘pen up’ condition to each coordinate pair.

Reviewing the summary, it looks like our friend character 906 is bringing home all the gold medals. It’s the ‘very large circle’ glyph, and I’m going to disqualify it for being an outlier. This is the one that broke my Python script and simplistic transmission encoding. It’s a lovely pentacontagon, or fifty-sided polygon, and therefore the smoothest of the approximated circles in the repertory.

Statistic           Value   Character
------------------- ------  ---------
Max vertices        143     3323
Max left            0       197
Min left            -27     2411
Max right           24      2381
Min right           0       197
Max character width 46      992
Max x               22      906
Min x               -24     2411
Max y               39      2403
Min y               -48     2411
Max dx              40      796
Min dx              -29     2825
Max dy              78      2405
Min dy              -80     2411
------------------- ------
Total vertices      47,415

So for the vector array, each vector will be a typedef’d struct holding the x and y coordinates as signed integers, as well as a boolean ‘pen up’ flag to distinguish ‘move to’ from ‘draw to’. Since the x axis shows a slightly smaller range of values, I’ll squeeze the ‘pen up’ flag into the x side, perhaps like this:

typedef struct { // vertex data
    int         x:7;        // x coordinate
    PEN_UP_t    pen_up:1;   // 'pen up' flag
    int         y:8;        // y coordinate
} VERTEX_t;

So I’ll need to add some more to my little Python script to generate the data for these two arrays, then emit it in a close approximation of my C coding style.

It took a bit of fiddling and also some back-and-forth to get the data structures ‘just right’, but I was able to port over both the plain and simplex roman character sets and have them plot out on the OLED screen. One thing that tripped me up was the vertex count. The original definition file described a ‘vertex count’ that also included the left and right column data as an additional vertex. Also, it counted, as it should, the ‘PEN_UP’ codes. These two little deviations that I introduced into the True Form sure made things look weird on the little screen for a while. But I eventually realized the error of my ways and corrected the code. Now it runs through either the plain set or the simplex set with the greatest of ease. Drawing a single character at a time happens so quickly, it seems almost instantaneous. I’ll have to try printing out a whole screen of text and see if I can tell how long it’s taking.

Next I’ll need to see about scaling these ‘scalable’ fonts to fit my imagined sizes for the different formats I’d like to support. I also need to look at the big-blocky font I suggested previously.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 17

11 February 2025

Now I can focus on compacting the vector data for the glyphs I need for the project. But first, I have to identify them. This has already been done many times in the past by many people, but I feel that I have to do it myself. Unless I change my mind, which is something I can totally do.

A clever collection of interesting Hershey font information has been published by Paul Bourke:

https://paulbourke.net/dataformats/hershey/

Included in this archive, dated 1997, are two files, romanp.hmp (Roman Plain) and romans.hmp (Roman Simplex). These files contain the ASCII mapping data for the plain and simplex varieties, respectively.

The ‘plain’ subset consists of the smaller glyphs. There are no lower case versions (miniscules). The upper case (majiscules) is repeated in their stead. Some statistics I gathered from the plain subset include:

Statistic       Value   Character
--------------  -----   ---------
Max vertices    38      1225 {
Max width       17      1273 @
Max x           7       1273 @
Min x           -6      1246 ~
Max y           10      1223 [
Min y           -10     1223 [
                -----
Total vertices  764

These glyphs can be encoded with 4 bits for the x coordinate and 5 bits for the y coordinate.

The ‘simplex’ subset contains the larger glyphs, including upper and lower case, numerals and punctuation. They are also much more detailed. Here are the same statistics from the simplex set:

Statistic       Value   Character
--------------  -----   ---------
Max vertices    56      2273 @
Max width       30      613  m
Max x           11      613  m
Min x           -11     613  m
Max y           16      607  g
Min y           -16     719  $
                -----
Total vertices  1303

These larger glyphs can be encoded using 5 bits for the x coordinate and can almost squeeze the y coordinate into 5 bits… almost.

So far we’ve only been using absolute coordinates for these mappings. I wonder how much space we could save by using a relative distance from point to point? Start with an absolute coordinate and then just specify relative motion along each axis?

For the plain set, we get these statistics for relative distances:

Statistic   Value   Character
---------   -----   ---------
Max dx      10      809  \
Min dx      -12     1246 ~
Max dy      20      1223 [
Min dy      -20     1223 [

For the simplex set, we get these numbers:

Statistic   Value   Character
---------   -----   ---------
Max dx      18      724 -
Min dx      -18     720 /
Max dy      32      720 /
Min dy      -32     733 #

So the answer is no, the relative values have a greater range than the absolute values. I find this result entirely counter-intuitive.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 16

10 February 2025

The little Python script I wrote last night was able to open the ‘occident’ file of Hershey font descriptions and then import them into a list of lines. I then iterated over the list, line by line, extracting the character number, the number of vertices as well as the left- and right-hand extents of each of the characters, then write them to the console.

I added some more analysis to the script to get a better feel for the data. Each line can be a different length, as each character can have as many or as few strokes defined as it needs. The number of vertices should give me a clue to the actual length of the line. Since each vertex, which I’m just remembering includes the character extent pair as the first vertex, is exactly two bytes long, and the line header is fixed at 8 bytes, the formula:

vertices * 2 + 8

gives us the expected length of the line. This is the case except for these character numbers:

                                 Actual  Calculated
Number  Vertices Width           Length  Length
------  -------- -------------   ------  ----------
2,331   3        <-13,20>=[33]   214     14
3,258   4        <-10,11>=[21]   216     16
3,313   28       <-16,16>=[32]   264     64
3,323   43       <-16,17>=[33]   294     94
3,502   10       <-12,12>=[24]   228     28
3,508   12       <-12,12>=[24]   232     32
3,511   15       <-12,12>=[24]   238     38
3,513   7        <-14,14>=[28]   222     22
3,518   8        <-12,12>=[24]   224     24

And I see a pattern. Looking at the original data for character number 2,331, we have this very long line:

2331103EfNSOUQVSVUUVSVQUOSNQNOONPMSMVNYP[S\V\Y[[Y\W]T]P\MZJXIUHRHOIMJKLIOHSHXI]KaMcPeTfYf]e`cba RKLJNIRIXJ\L`NbQdUeYe]d_cba RPOTO ROPUP RNQVQ RNRVR RNSVS ROTUT RPUTU RaLaNcNcLaL RbLbN RaMcM RaVaXcXcVaV RbVbX RaWcW

It very clearly declares that there are 103 vertices, but my conversion resulted in a 3, so I’m obviously not pointing to the right segment of the string when extracting that value, missing out on the hundreds digit, for the very small number of characters that have over 100 vertices.

And that’s what it was. I incorrectly specified the ‘slice’ parameters of the vertex segment of the string. I am not very good at the Pythoning yet, but I am getting better.

So now I have some faith in the internal consistency of the data preserved lo these many years. Now I can move on to actually extracting the coordinate pairs from each string, knowing the exact moment that I should stop.

More Python trial and error has produced a working model that will output the coordinate pairs for each character, along with the ‘pen up’ commands.

Now I need to translate that into a series of simple commands that I can send to the OLED device via the serial link and have them drawn on the screen to visualize the characters.

I needed to install PySerial as a module so that Python can talk to the serial port:

python3 -m pip install pyserial

It installed pyserial-3.5.

The serial port available via the WCH-LinkE is found in the /dev folder as:

/dev/cu.usbmodemC0F98F0645CF2

I’ve got a good start on the Python script. It’s pushing out the coordinates both to the console and the serial port. I re-formatted the coding going out the serial port into ‘move to’ commands and ‘draw to’ commands. ‘Move to’ just updates the coordinates and ‘draw to’ actually draws the vector between the points.

As an intermediate stage, I was totally faking it by just drawing the endpoints of the vectors, and you could tell the overall shape of the character that way. I already had the point() function working, so that was an easy step. Adapting Bresenham’s line algorithm to the code was also staight-forward. It’s a delightful thought experiment and has been around longer than I have.

There are still some edge cases that bring them whole thing to its knees, such as character 907 and its -41 y-coordinate. I had added a +32 offset to the data points for serial transmission as a single byte, but that just didn’t work for our friend number 907. But I’ve seen enough of the characters drawn on the target OLED now to be sure that I want to go ahead and build these into the project.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 15

8 February 2025

It’s time to rename this series of posts, as I haven’t been using any sort of RISC-V assembly language at all in this project lately.

So now on to bigger and bolder fonts.

9 February 2025

So not a lot of work got done on the project yesterday, but I did have some time to think about it. And it occurred to me that bit-mapped fonts are great when they’re small but start to take up a lot of resources, i.e., memory space when they get bigger.

My mental arithmetic last night suggests that the biggest possible font on a 128 x 64 pixel display would be 64 x 64 pixels per character, giving a total of two characters on a single line of text. They would be perhaps a bit too squarish for my taste, so I could slim them down a bit and have 42 x 64 pixel characters, allowing up to 3 characters, but still only a single line of them. As I have defined 96 glyphs in my first font design for this project, I project that it would take 32K of memory space for just this one font. The target chip at the moment has 62K of memory available, so perhaps we’ve come up both a good minimum and maximum size for this display. As a point of comparison, the existing font that I have lovingly named font_5x8 takes up 480 bytes of memory.

Pondering further, a font sized to allow two lines of text would be 32 bits tall, 3 lines would allow up to 21 bits tall and four lines would divide nicely into characters 16 bits tall. It was at this point that it occurred to me that bit-mapped fonts were not the only way to go, especially on a resource-constrained device such as I’d prefer to use.

Another option are stroke or vector fonts. Instead of a predetermined array of ones and zeros are used to map out the appearance of the individual characters, a series of lines and perhaps arcs are described for each glyph.

A famous set of vector fonts was developed around 1967 by Dr. Allen Vincent Hershey. Like myself, he struggled with the age-old question of “but which font should I use?” as well as how to do so in an efficient way. These fonts are now referred to collectively as “Hershey fonts”. They use a relatively compact notation to describe a set of strokes between integer coordinates on a Cartesean plane, resulting in very legible characters.

Now while I smile quietly to myself for my efforts to give the world lower case characters with descenders, Dr. Hershey spent untold hours designing and transcribing characters in as many languages as he could find.

I found a copy of the original data file as part of an archive on:

https://media.unpythonic.net/emergent-files/software/hershey/tex-hershey.zip

Within this archive, a file called, simply, ‘occident’ contains a number of lines (1,610, to be exact), each defining the appearance of a single character. They are numbered from 1 to 3,926, as not all the characters are present in this file.

Now I would like to write a simple-ish program to plot these characters to the OLED module and see what they look like. This ‘program’ will be more of a system that has a portion that runs on my laptop and another that is running on the embedded device.

I’ll start writing the big-end of the system in Python and the little-end in C. The big-end will read in the data file in its entirety and convert the provded encoding into a series of ‘move to’ and ‘draw to’ commands for the OLED. So it turns out I’ll be needing those line generating functions, after all.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 14

7 February 2025

Now for some odd reason the display is not working at all today. Ummm, well, no, I’m wrong. It was working just fine. It was just displaying a screen full of zeros, as was right and proper for it to be doing. I was messing around with the screen initialization values, poking various bit patterns in to see where they showed up. Yesterday, the dots would show up in a random-seeming column. As I had not specifically programmed the column address, that was fine and to be expected. But today, oddly, the column pointer was randomly set to one of the ‘invisible’ columns: 0, 1, 130, 131. The SH1106 supports a 132 x 64 display, but this module has a 128 x 64 OLED attached. The designers decided to put it in the middle of the columns, starting with column 2. Again, fine and something that I was already aware of. But disconcertng when you think things are ‘going great’ and suddenly nothing works anymore.

One good thing about this diversion was that I had the opportunity to measure the screen update time to be ~24 ms, which gives an effective frame rate just over 40 Hz. So that’s not going to be the bottleneck that I thought it might be. I’m really not motivatated at this point to try to up the SCL frequency in hopes of a maximized data rate.

Because of the way the SH1106 wraps around from the end of a page to the beginning of the same page, it truly doesn’t matter where you start writing values, as long as you write 132 of them. If it’s all zeros, you can’t see any difference. If it’s a proper image, then it does matter.

The reason I was tinkering with the initialization values is that I had been experimenting yesterday with it and not being happy with the outcome. I eventually added a separate ‘clear screen’ loop that wrote zeros to all the memory and that did the trick. So instead of initializing the data in the frame buffer declaration as ‘{ 0 }’, which I thought would populate all of the elements with zeros, I just specifiy ‘{ }’, and the compiler treats it as ‘uninitialized’ and writes zeros in there for me.

Having a frame buffer for the display is nice. I no longer have to think about accessing the display’s memory buffer in pages and stacks of pixels. This allows me the freedom to think about designing glyphs in their appropriate sizes, not what is mathematically convenient.

I’d like to be able to use a Cartesian coordinate system to refer to the individual pixels on the display, in furtherance of my graphical ambitions. In one respect, half of the work has already been done for me, as the abscissa, also known as the x coordinate or column, maps directly to the index of an array I set up to represent the frame buffer. The ordinate, or y coordinate or row, has to be broken down into two components: the memory page index and a bitmask.

The frame buffer is built as an array of pages, with each page containing a three byte header and another array of 132 bytes. The three byte header contains the secret language of the SH1106 and allows me to just blast the entire 135 byte payload to the module and have it magically go to the right place within the OLED’s memory map.

Each page is defined by this typedef’d structure:

typedef struct { // data structure for holding display data with OLED header
    uint8_t     control_byte_1;
    uint8_t     page_address_command;
    uint8_t     control_byte_2;
    uint8_t     page_data[SH1106_WIDTH];
} SH1106_PAGE_t;

My frame buffer is just an array of these pages:

SH1106_PAGE_t SH1106_frame_buffer[SH1106_PAGES];

where I have previously #define’d various dimensions as:

// specific SH1106 module parameters are defined here

#define SH1106_WIDTH 132
#define SH1106_HEIGHT 64
#define SH1106_PAGES 8

Assuming we stay in Quadrant I of the Cartesean plane, arguably the best quadrant, with the origin (0,0) in the lower left corner, the x coordinate maps directly to the index of the page_data[] array. That part was easy.

The y coordinate is only a bit more complex. Given the range of 0-63 of possible y values, we can represent that with a 6 bit integer. The upper 3 bits determine the page number, which is the index into the frame buffer array, and the lower 3 bits identify a single bit within what I refer to as a ‘stripe’ in the SH1106 memory. It’s a short, vertical space, one bit wide and 8 bits tall. The lowest bit is the top-most spot within the stripe.

Now if we acted like we didn’t care, we could just take the three upper bits of the y coordinate and call that the page number. That would have the consequence of giving us a plane mirrored about the x axis, as page 0 is at the top and page 7 is at the bottom. We just need to subtract the upper 3 bits from 7 to get the right-side-up, happy Quadrant I orientation that I happen to prefer. So a little more complex, but not much.

So having now spelt this out in people jibber-jabber, it’s time to encode this into a series of mathematical transformations and some hopefully readable source code.

My first function will be the point() function. Technically, a point has no dimension, only a location. Our ‘points’ actually have a size of ‘one’ in both dimensions, but they do have a location that can be specified as offsets from the origin of our Cartesean coordinate system.

The parameters of the point() function should include the x and y coordinates as well as a ‘color’ value. Being a display of modest ambition, this OLED supports the binary options of ‘on’ or ‘off’. We can reporesent that as a one or a zero in the code.

I have taken the liberty of formalizing the available color palette:

typedef enum { // all the colors
    COLOR_OFF = 0,
    COLOR_ON = 1
} COLOR_t;

Now I am making an exective-level decision to have the graphics functions pretend that the display is only 128 x 64 pixels in extent. Perhaps this will save me some time in the future and keep me from looking for ‘invisible’ pixels that are there but hiding just off stage.

I will have to try to remember to update the display after these functions, as they only manipulate the contents of the frame buffer but do not actually communicate with the OLED.

So here is the point() function as it currently stands:

void point(uint8_t x, uint8_t y, COLOR_t color) { // plot a single point of color at (x,y)

    uint8_t page = (SH1106_PAGES - 1) - (y >> 3); // top three bits represent page number, reversed to be in Quadrant I
    uint8_t bit_mask = 1 << (y & 0x07); // bit mask of pixel location within display memory stripe

    x += 2; // move into visible portion of OLED screen

    if(color == COLOR_OFF) { // we'll reset a bit in the memory array
        SH1106_frame_buffer[page].page_data[x] &= ~bit_mask; // clear bit
    } else { // we'll set a bit in the memory array
        SH1106_frame_buffer[page].page_data[x] |= bit_mask; // set bit
    }
}

I realized later that I could just invert the top three bits of the y coordinate instead of subtracting them from ‘one less than the number of pages’. Either way seems equally obtuse.

And it works! Why am I always so surprised when anything does as expected?

Now to see how performant this little manifestation of my algorithm can be. I’ll write a loop that sets and then clears all the pixels, one by one. If it’s visibly slow, I’ll have to think about spending some time optimizing the process. If not, I’m not going to worry about it.

It’s pretty fast. It causes a brief flash on the screen, and then it goes blank again, all pretty quickly. There is a visible ‘tearing’ artifact across the bottom of the screen in this process.

Looking at the oscilloscope, I measure ~18 ms to write ones to the screen, and ~16.4 ms to write zeros. That’s a surprising difference. Given there are 8,192 indiviual pixels to be written, the setting function, including the loop overhead, is taking ~2.2us and the clearing function is taking 2us per pixel.

So it takes less time to set or clear all the pixels in the frame buffer than it does to send them to the display via I2C. Good to know.

Here is where, historically, I go nuts writing a bunch of optimized graphics primitives, such as vertical, horizontal and ‘other’ lines, filled and unfilled rectangles and circles, etc.

But for now I want to pretend to focus on actually finishing this project and resist the urge to write yet another library of functions that may or may not ever get used.

So now we will proceed to fonts or glyphs, as you prefer. The first one is always the most interesting. I’ve already got one that I like and will start there, but it was designed to be small and permit a larger amount of text on the screen at one time. One of the overall goals of this project is to make it at least somewhat visible and legible at a distance, so larger formats will be needed.

This brings me back to the need for a better font design tool. I’ve spent way too much time typing in ones and zeros and squinting at the screen while transcribing hexadecimal numbers. I have serached for a more appropriate tool that is already in existence but have yet to find anything that works within the constraints of this project. I feel yet another tangent coming.

Well, before embarking on the world’s greatest font design tool tangent, I’ll have to be happy with a tiny side quest. I noticed that to accomodate the discrepancy between the SH1106 memory and the physical OLED screen width, I had hard-coded a “+2” to the x coordinate in the point() function. The solution was to add a couple of new fields to the page structure to align with the ‘invisible’ columns on the left and the right side of the screen.

That part was easy. Modifying the dimension of the page_data member to not use what looks like (and totally is) another magic number, I had used a uint16_t as the requisite padding on each side, which is exactly two bytes long, then used the friendly-looking (not really) equation:

SH1106_WIDTH - (2 * sizeof(uint16_t))

as the number of elements. So it should still send out all 132 bytes of the frame buffer, but we don’t have to offset the x coordinate every single time. That saves about 20ns per pixel!

Now that’s fixed, I went back and checked the ‘single pixel at the origin’ test and noticed that sometimes the pixel seemed to travel along the bottom edge of the screen. That’s because nowhere was I setting the column address to 0, or to anything else, either. It was going to be whatever it happened to end up being. After a power on reset, the module is supposed to reset the column address to zero, and I’m sure it does. But I have updated the initialization sequence to specifically set the column address to zero. This is done in two steps, as there is a single byte command to set the lower four bits of the column address and another to set the four upper bits. Here is the new sequence:

uint8_t SH1106_init_sequence[] = {
    SH1106_NO_CONTINUATION | SH1106_COMMAND, // control byte
    SH1106_COMMON_REVERSE, // swap column scanning order
    SH1106_SEGMENT_REVERSE, // swap row scanning order
    SH1106_COLUMN_LOWER, // column address lower 4 bits = 0
    SH1106_COLUMN_UPPER, // column address upper 4 bits = 0
    SH1106_DISPLAY_ON, // command to turn on display
};

Now my little pixel is just where it belongs… or is it? Honestly, it’s pretty hard to see. One way to test this is to draw a single line rectangle around the edge or the screen and make sure all edges are visible.

Which reminds me that I am not checking the input arguments to the point() function. I’ll just do a quick test and silently return on out-of-bound values.

So I added a couple of quick argument checks to the point() function that just return on out-of-bounds values. Another option would be to simply mask off the invalid bits and look like we’ve “wrapped around” after passing the edge of the screen.

So the rectangle test shows that there is still room for improvement in my equations. It’s hard to describe, exactly, but it looks like each page starts writing a little to the right of the previous page, so that the ‘vertical’ lines are distinctly leaning.

One thing is for sure, and that’s that my bit mask formula is exactly backwards. In retrospect, I see it now. The larger the y value, the lower the bit position within the strip should be, not the other way. I replaced:

1 << (y & 0x07)

with:

0x80 >> (y & 0x07)

and the horizontal lines seem to be right on the edge of the screen now.

But each page is still scooched over one pixel to the right after the last page. This could be caused by sending out one too many bytes per page in the update function. As the function uses the reported size of the page structure as the byte count, it occurred to me that the compiler was padding the struct somehow. Adding the modifier ‘__attribute__((packed))’ to the struct declaration fixed the problem. This is not the first time that structure packing issues have created off-by-a-little-bit errors for me, especially in communication protocols.

Now my rectangle looks properly rectangular. Going back, I also check that the origin pixel is very decidedly in the lowest leftest spot. With just the right amount of background light, I can barely see the edge of the OLED grid.

Now I can import my existing, hand-crafted OLED font from another, similar project. The font is contained in a C source code file named ‘font_5x8.c’ from the previously-mentioned C8-SH1106 project for the 203.

Copying the bits out of the font definition array and writing them to the frame buffer works like a charm.

I put that code in a little loop to go through and print all the available characters, and it goes by a bit too quickly to be able to see what it happening. I added a short delay to the loop and it’s quite satisfying to see it working so well. Here is the code:

for(uint8_t glyph = 0x20; glyph < 0x80; glyph++) { // all the characters in the font file

    for(x = 0; x < 5; x++) { // columns
        for(y = 0; y < 8; y++) { // rows

            if(font_5x8[glyph][x] & 0x80 >> y) {
                point(x, y, COLOR_ON); // draw the pixel
            } else {
                point(x, y, COLOR_OFF); // erase the pixel
            }
        }
    }

    SH1106_update(); // let's see what happened
    Delay_Ms(250); // short delay
}

The Delay_Ms() function is provided by the boilerplate example project generated by the MRS2 software when asked to create a new project.

Posted on Leave a comment

Notes on RISC-V Assembly Language Programming – Part 13

6 February 2025

Today’s first objective is to capture and measure the SCL signal and see how close it gets to the requested 400 KHz that I specified in the I2C initialization function.

After attaching an extension cable in order to tap into the SCL line going to the OLED module, I measure a SCL signal trying so hard to wiggle at 423 KHz, which is almost 5% over what I specified. Again, it’s not a critical value, as I have successfully run these OLED displays at 1 MHz in the recent past.

Debugging the program, I can look at the I2C registers directly and see what has been set up for me. The CTLR2 has a field named FREQ, and it has been set to 48. This is in line with what the RM indicates should be done. The CLKCFGR has a field called CCR, for the clock division factor field, and it is set to 0x04.

The actual timing calculations are shrouded in mystery, at least from the standpoint of trying to understand what the RM says. My experimentation suggests that the FREQ field has zero effect on the SCL frequency, and that the CCR field alone sets the pace. It’s also dependent on whether or not you are using ‘fast mode’ or not, as well as the selected clock duty cycle.

Also worthy of note is that the waveform has a very slow rise time and quite abrupt fall time, as would be expected from an open-drain output with no pull-up resistor to help. I have a second OLED module set up with 1KΩ pull-up resistors installed, which is considered quite stiff in the I2C community. This module’s SCL line shows much sharper rise times. So I think that in the lab it’s OK to “get away with” no pull-up resistors for testing purposes, but any final product design should certainly incorporate them. Surface mount resistors are not expensive.

The first improvement I would like to make on the existing system is to use interrupts to pace the transmission and reception of data over the bus, instead of spin loops that may or may not be monitored for a time-out condition. There are two interrupts available for each I2C periperhal on these chips, one for ‘events’ and the other for ‘errors’. I’ll need to define a state machine that is set up before any communications and serviced by the two interrupt handlers.

I will also need to come up with a suitable API to be able to hand off various payloads to the display. While the OLED controller chip allows for both reading and writing, I am not immediately seeing a strong case for ever reading anything back. So I’m thinking that the majority of transfers will be writing some combination of commands and data to the display.

The first case is the initialization phase. Ideally, the screen memory needs to be either cleared or preset to a boot splash screen, followed by commands to adjust any operating parameters. The controller chip’s built-in power on reset sequence does almost everything we need as far as setting up its internal timing. We only need to flip the ‘on’ switch to see dots. But as I alluded to yesterday, the screen on this module is mounted upside down and backwards. While there is no single “rotate 180°” command available, there are two other commands that will do effectively the same thing. One reverses the column scanning order and the other reverses the row scanning order. So we’ll need to send those two commnds before we turn on the display. There’s also a setting called ‘contrast’ that might more accurately be called ‘brightness’ that defaults to spang in the middle of the range.

Unlike the other popular OLED controller, the SSD1306, the SH1106 does not automatically roll over to the next ‘page’ of memory once it gets to the end of the row. This means that the ‘screen fill’ task must be broken up into eight page fills. Each of these must be preceded with a page address command. So the initialization ‘payload’ begins to take shape:

Address page 7, fill with 132 bytes of some pattern
Address page 6, fill with 132 bytes of some pattern
Address page 5, fill with 132 bytes of some pattern
Address page 4, fill with 132 bytes of some pattern
Address page 3, fill with 132 bytes of some pattern
Address page 2, fill with 132 bytes of some pattern
Address page 1, fill with 132 bytes of some pattern
Address page 0, fill with 132 bytes of some pattern
Reverse column scan
Reverse row scan
Optionally set contrast level
Display on

I fill the pages in ‘reverse’ order so that it ends up addressing page 0, which seems the logical place to start in the next stage. It will save at most one page address command, so this trick might get axed to favor clarity over cleverness.

The SH1106, much like the SSD1306, is a simple matrix display controller and does not offer any sort of built-in text capabilities. We have to supply our own fonts, which translates into “We get to supply our own fonts”.

I had originally used a 8×8 font that was very easy to read, but ultimately went with a 6×8 font that was, to me, much nicer looking. I then spent a lot of time writing what I considered ‘optimized’ routines to place characters on the screen in what seemed a sensible manner. Mostly this had to do with working within the constraints of the memory organization of the controller chip’s display RAM. This resulted in feeling very much blocked in to using either 8×8 or 16×16 fonts.

What I’m thinking about doing now is very different. Instead of writing each character directly to the screen’s memory, I’m going to introduce an intermediate frame buffer within the CH32X035 memory space. It’s only 1,056 bytes if we map every display location, but only 1,024 bytes if we only map the visible 128 columns that are supported by the physical OLED screen of this module. Each byte contians, as you know, 8 bits, and each bit corresponds to a single screen pixel. There are no shades of gray; it’s either on or off.

So my ‘print’ and ‘plot’ functions will actually only write to an internal SRAM-based frame buffer, and when ‘the time is right’, the whole memory will be transferred to the OLED display. This could be aided by DMA and interrupts to help off-load some of this burden from the CPU.

So that’s my plan for completely over-engineering this project and multiplying the amount of effort required to get to the finish line.

A couple of little experiments to try before diving into the big stuff. I noticed in the SDK that the GPIO initialization for the I2C port used two separate calls to the GPIO_Init() function, one for each of the two I2C signals. The library can actually set up as many pins on a single port as you need. You just indicate the pins needing initiailization with a bitmap passed in as the GPIO_Pin structure member. So I was able to combine the two calls into one:

// configure SCL/SDA pins

GPIO_InitTypeDef GPIO_InitStructure = {0};

// GPIO_InitStructure.GPIO_Pin = GPIO_Pin_10;
// GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
// GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
// GPIO_Init( GPIOA, &GPIO_InitStructure );

// GPIO_InitStructure.GPIO_Pin = GPIO_Pin_11;
// GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
// GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
// GPIO_Init( GPIOA, &GPIO_InitStructure );

GPIO_InitStructure.GPIO_Pin = GPIO_Pin_10 | GPIO_Pin_11; // PA10/SCL, PA11/SDA
GPIO_InitStructure.GPIO_Mode = GPIO_Mode_AF_PP;
GPIO_InitStructure.GPIO_Speed = GPIO_Speed_50MHz;
GPIO_Init(GPIOA, &GPIO_InitStructure);

I also tried bumping up the SCL frequency to 1 MHz, via the I2C_ClockSpeed structure member passed to the I2C_Init() function. No dice. I don’t know why yet, but I might find out in the future. Right now it’s chugging along at over 400 KHz, and that should be fine for now. In theory, I should be able to push almost 38 frames per second to the display at this speed.

And now on to the Great Embiggening of the I2C API. First, I want to enable the available interrupts and get a feel for how and when they are triggered and then work around that.

The SDK provides a function to enable or disable the various combinations of available interrupts. There appears to be an additional type of event interrupt for when either the TXE or RXNE status bits are set, indicating space is now available for more of whatever was going on at the time.

Right now I just want to look at the event interrupts, then I will look into the error interrupts and once I get the DMA configured, I’ll have another look at the buffer interrupts.

Note that the I2C_ITConfig() function only enables the interrupts at the device level. It does not enable any interrupts at the system level.

To do that, we use the SDK function NVIC_EnableIRQ(). The argument to pass is the interrupt number, and it took a bit of sleuthing on my part to track it down. There is an enumerated type IRQn_Type in ch32x035.h that contains the values of all the interrupt numbers. The one we want right now is I2C1_EV_IRQn, which has a value of 30. I was able to find the value in the RM, but I much prefer to have a defined value referenced and not a “magic number”.

There is also a SDK function called NVIC_Init() that will let you either enable or disable an interrupt as well as set the Preemption priority and subpriority.

Note that the system-level global interrupts are enabled in the supplied startup_ch32x035.S file.

The SDK also defines labels for all the interrupts. The I2C interrupts are:

I2C1_EV_IRQHandler
I2C1_ER_IRQHandler

So at this point, I need to define a function for this interrupt handler. It also needs to specify that it is an interrupt handler, so it gets the proper signature and whatever else the compiler wants.

The first thing the interrupt handler needs to do is figure out why it was invoked. Going in order things we actually did, the first thing to look for would be the start bit SB in STAR1, bit 0 being set, indicating that a START condition was set.

I have seen the I2C event interrupt being triggered as expected. I added code to examine the status registers and respond accordingly. There are really only three condition of note.

1.  I2C_EVENT_MASTER_MODE_SELECT
2.  I2C_EVENT_MASTER_TRANSMITTER_MODE_SELECTED
3.  I2C_EVENT_MASTER_BYTE_TRANSMITTED

The first happens after a START condition is set to indicate that the device has entered MASTER mode.

The second happens after the device address and direction bit have been successfully transmitted.

The third happens after each byte has been transmitted.

Additionally, and for no obvious reason, one more interrupt occurs after the STOP condition is set, even though the status registers all read zero. I choose to ignore this.

So I replaced the entire SH1106_init() function with a call to the new i2c_write() function, passing the SH1106 device address and both a pointer to and a length of an initialization sequence:

uint8_t SH1106_init_sequence[] = {
    SH1106_NO_CONTINUATION | SH1106_COMMAND, // control byte
    SH1106_COMMON_REVERSE, // swap column scanning order
    SH1106_SEGMENT_REVERSE, // swap row scanning order
    SH1106_DISPLAY_ON, // command to turn on display
};

So now the display should be neither umop episdn upside down nor backwards, as well as on. And it works!

Now I need to dive a little deeper into the SH1106 data sheet and try to understand the ways to send data and commands to the controller chip. I’m still a little fuzzy on how the ‘continuation bit’ works as far as sending larger packages of data and commands to module is supposed to work.

The next communique I would like to send to the module is a ‘page fill’ command. This is composed of a ‘page address’ command, from 0-7, followed by 132 of your favorite numbers.

I added a ‘state’ variable to the I2C API, as it exists now, so that it doesn’t clobber itself. This is possible, as starting the process is quick and the function returns immediately, but the transfer takes a small but non-zero amount of time to complete.

I had a bright idea to break up the page fill routine into sending a ‘preamble’ with the page address command preformatted, then send the data as a separate function call. This doesn’t work, because each call to i2c_write() is a self-contained thing, with its own START and STOP conditions. This does not seem to sit well with the SH1106.

I reformatted the frame buffer to actually have some space between the data rows to fit in the OLED commands, and this seems to be working fine. Right now I’m just zeroing out the memory and it clears the screen. Ultimately, I would like to have a ‘splash’ screen that shows up for a second when the device is first powered on.

So the first of my goals (using interrupts) has been realized. I’m debating the value of pursuing the DMA option at this point. I think I will spend some time trying to get some reasonable looking dots onto the screen, such as text and maybe some geometric graphics.