Frustrations of Raspberry Pi Baremetal

leon de boer

4.82/5 (17 votes)

Nov 21, 2016

CPOL

10 min read

41294

295

Failing and musings of two weekends with the Raspberry Pi

Download source code - 11.9 KB

PART 2 now available: https://www.codeproject.com/Articles/1158245/More-Fun-frustration-and-laughs-with-the-Pi-Part-I

Introduction

I have had a Raspberry Pi sitting in the draw for about 6 months to have a play with. Coming into Christmas, work had slowed down a bit so I thought I would have some time to play with a new toy. On paper, the Raspberry Pi is my idea of heaven from my embedded background, it has far more memory and speed than would be the norm for me.

Background

Now the question was what would I program onto it. Well sort of a novelty in my line of work is a graphics screen so you can bet that whatever I was going to do would involve the screen. Going way back to my early programming life, I had come to the attention of companies by doing a full graphical port of TURBO VISION in C++ and Pascal from Borlands original release. Much of the code still exists in the FreePascal & FreeVision libraries.

For my first large commercial project, I had started to do a cleanroom port of windows back onto a generic graphics framework which was called Darkside. We had got basic versions running without any great problems but events were about to overtake us. Microsoft relaunched Windows CE this time with acceptable pricing and Wine had been released on Linux. For our commercial project, it was silly to re-invent the wheel for what amounted to a small licensing cost and so we readily switched to Windows CE and I filed the Darkside code into the archives.

So I thought this might be a good chance to dust off the old code and try putting Darkside onto the Raspberry Pi as a small O/S a sort of micro Windows.

Using the Code

Having decided on the task, I dusted off the old code. Essentially, Darkside like Turbo Vision (Free Vision) is an event drive you can couple multitasking under them but it isn't strictly required. So to initially keep life easy, I would get the eventdrive working and look to putting a pthreads implementation under it at a later stage.

The normal message loop of windows sets up an ideal event drive, no surprise its heritage of Windows 3.1 was an event drive with a co-operative multitask kernel weaved into it.

The normal windows message loop takes a fairly familiar form like this:

    // Standard message handler loop BY THE MSDN BOOK
    // https://msdn.microsoft.com/en-us/library/ms644936(v=vs.85).aspx
    MSG msg = { 0 };
    BOOL bRet;
    while ((bRet = GetMessage(&msg, 0, 0, 0)) != 0) {
        if (bRet == -1) {
            // handle the error and possibly exit
        } else {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    }

So GetMessage gets events and DispatchMessage delivers them.

To make Darkside portable essentially three external function calls are needed to be implemented by the hardware.These 3 functions cover the 3 main input events being Mouse, KeyBoard and Timer. Essentially, the code in those 3 functions would be unique to each hardware setup and you create your own routines to do them and set the handler to your code by a special function in the Darkside API.

So I decided to do the easiest first which was to setup the timers. In my implementation in Darkside, you are expected to prove a function that provides a 1 millisecond tickcount which matches the GetTickCount API function in Windows. At the bottom of a call to GetMessage is a NextTimerMsg function which fetches the next timer that has passed its timeout count as a WM_TIMER message. It's a rather simple function that starts at the first entry of set timer array and looks if the interval time set on that timer has elapsed. If it has elapsed, it prepares a WM_TIMER message to match and sets up so the next time NextTimerMsg is entered, it starts from the next timer. You don't want to always start the search for timer timeouts from the first entry in array for the obvious reason that you want all timers to be equally treated. My implementation looked like this:

/*-INTERNAL: NextTimerMsg---------------------------------------------------
 If there is a timer has exceeded its timeout then return timer message.
 Internal message routine so msg pointer guaranteed to be valid
 12Nov16 LdB
 --------------------------------------------------------------------------*/
static void NextTimerMsg (LPMSG msg) {
    if ((TimerCount > 0) && (TimerTickFunc)) {     // Basic check that timers are set and 
                                                   // system is running
        unsigned short i;
        i = TimerChkStart;                         // We will resume at next timer check position
        do {
            if (TimerQueue[i].TimerId > 0) {       // Check timer is in use
                DWORD tick, elapsed;
                tick = TimerTickFunc();            // Get timer tick
                if (tick < TimerQueue[i].LastTime) {  // Timer rolled (** Remember it rolls min 
                                                      // every 49 days **)
                    elapsed = ULONG_MAX - TimerQueue[i].LastTime;  // Amount of ticks left until it 
                                                                   // would have rolled
                    elapsed += (tick + 1);         // Now add the current tick to the amount from above
                } else {                           // Time was in correct order
                    elapsed = tick - TimerQueue[i].LastTime;       // Simply subtract the two times
                }
                if (elapsed > TimerQueue[i].Interval) {        // Elapsed time exceeds timer interval
                    TimerQueue[i].LastTime += TimerQueue[i].Interval; // Add the timer interval .. 
                                                                      // we want pulses near interval.
                    msg->hwnd = TimerQueue[i].TimerWindow;            // Set the message window handle
                    if (msg->hwnd == 0) msg->hwnd = (HWND)FocusedWnd; // If window still zero try the 
                                                                      // focused window
                    if (msg->hwnd == 0) msg->hwnd = (HWND)AppWindow;  // If window handle still valid 
                                                                      // try app window
                    msg->message = WM_TIMER;                          // WM_TIMER message
                    msg->wParam = (WPARAM)TimerQueue[i].TimerId;      // Set timer id to wParam
                    msg->lParam = (LPARAM)TimerQueue[i].UserTimerFunc;// Set user timer function 
                                                                      // to lParam
                }
            }
            i++;                                  // Increment to next timer to check
            if (i >= TimerMax) i = 0;             // Roll back to first timer if it exceeds array
        } while ((msg->message == WM_NULL) && (i != TimerChkStart));// Exit if we get a timer message 
                                                            // or we have check each timer
        TimerChkStart = i;                                  // Hold the timer we start next check at
    }
}

Now our version of windows DispatchMessage API function essentially sees a WM_TIMER message and if the Timer function pointer is set calls it directly or else sends a WM_TIMER message to the selected window. The later in our case is out of play in our tinker code because we haven't setup a window yet. Anyhow, the DispatchMessage took this form:

/*-DispatchMessage----------------------------------------------------------
 The DispatchMessage function dispatches a message to a window procedure.
 It is typically used to dispatch a message retrieved by GetMessage function.
 11Nov16 LdB
 --------------------------------------------------------------------------*/
BOOL DispatchMessage (LPMSG lpMsg){
    PWNDSTRUCT Wp;
    if ((lpMsg) && (lpMsg->message != WM_NULL)) {                // Check ptr and message valid
        Wp = (PWNDSTRUCT)lpMsg->hwnd;                            // Typecast window handle to ptr
        if (Wp == NULL) Wp = AppWindow;                          // Zero means application window
        if ((lpMsg->message == WM_TIMER) && (lpMsg->lParam)){    // Timer messages with 
                                                             // valid function pointer called directly
            TIMER* timer = TimerFromID((UINT_PTR)lpMsg->wParam); // Find the timer record
            timer->UserTimerFunc(lpMsg->hwnd, WM_TIMER,
                timer->TimerId, timer->LastTime);                // Call timer function pointer
            return (1);                                          // Return successful dispatch
        }
        if (Wp) return ((INT) Wp->lpfnWndProc(lpMsg->hwnd,
            lpMsg->message,    lpMsg->wParam, lpMsg->lParam));   // Call the handle if Wp valid
    }
    return (0);                                                  // Either message or handler invalid
}

Everything looked okay so I thought I had better check it all worked and to do that, I would setup a Windows console program (BUT no include of Windows.h). Instead, we include our "Darkside.h" and I made a simple 5 line C file which would link the actual Windows GetTickCount in for me. So, it was called user.h and pretty straight forward.

//User.H
#ifndef _USER_
#define _USER_
DWORD GetTimerTick (void);
#endif

//User.C
#include <windows.h>
#include "user.h"
DWORD GetTimerTick(void) {
    return (GetTickCount());
}

You get the trick that I can import the GetTickCount of Windows without dragging in the whole of Windows.h.

Now my test code:

#include <stdio.h>            // Needed for printf
#include "Darkside.h"        // Our windows replacement
#include "User.h"            // Gives use GetTimerCount function

// Some count variables
int count1 = 0;
int count2 = 0;

// Forward declare our functions
void CALLBACK MyTimerFunc1(HWND hwnd, UINT uMsg, UINT_PTR idEvent, DWORD dwTime);
void CALLBACK MyTimerFunc2(HWND hwnd, UINT uMsg, UINT_PTR idEvent, DWORD dwTime);

int main(void) {

    printf("Press escape to exit\r\n\r\n");
    printf("TIMER 1 = %i\r\n", count1);
    printf("TIMER 2 = %i\r\n", count2);

    /* OKAY COUPLE THE TIMER TICK FUNCTION TO DARKSIDE */
    SetGetTimerTickHandler(GetTimerTick);

    // Start some timers like you do in windows
    SetTimer(0, 0, 1000, MyTimerFunc1);  // 1 second
    SetTimer(0, 0, 2500, MyTimerFunc2);  // 2.5 sec
  
    // Standard message handler loop BY THE MSDN BOOK
    // https://msdn.microsoft.com/en-us/library/ms644936(v=vs.85).aspx
    MSG msg = { 0 };
    BOOL bRet;
    while ((bRet = GetMessage(&msg, 0, 0, 0)) != 0) {
        if (bRet == -1) {
            // handle the error and possibly exit
        } else {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    }

   return(0);
}

/* FUNCTION TIMER 1 */
void CALLBACK MyTimerFunc1(HWND hwnd, UINT uMsg, UINT_PTR idEvent, DWORD dwTime) {
    count1++;
    printf("TIMER 1 = %i\r\n", count1);
}

/* FUNCTION TIMER 2 */
void CALLBACK  MyTimerFunc2(HWND hwnd, UINT uMsg, UINT_PTR idEvent, DWORD dwTime) {
    count2++;
    printf("TIMER 2 = %i\r\n", count2);
}

Everything worked and I got the expected output.

With all working, I then started implementation on the Raspberry Pi. There are some nice articles on BareMetal programming of the Pi by Brian Sidebotham from the website valvers (http://www.valvers.com/open-software/raspberry-pi/step01-bare-metal-programming-in-cpt1/).

I started doing tutorials 1-3 and no real issue. I had a couple of quirky compiler bugs off his code because I had gone against his advice and used GCC version 5.4 rather than 4.7 he suggested. The bugs appear to be in the optimizer and I haven't got my head around if it was his assembler or the compiler that was at fault (My Arm assembler experience is next to nil). Anyhow, the solution was easy use pragma and/or wrappers to stop the optimizer and everything was fine. So that got me to the detail on the system timer which was easy a nice 1Mhz system clock which rolled in a 64 bit timer structure. So it was easy to produce my required function to interface.

/*-GetTickCount-------------------------------------------------------------
 We are going to try and match GetTickCount on windows which produces return
 value of the number of milliseconds that have elapsed since the system was
 started. So it frequency is 1000Hz. The raspberry Pi timer is 1Mhz so we need
 to divid it down by 1000. Being 64 bits we are fortunate as we have more bits
 than needed for our 32 bit result we need.
 20Nov16 LdB
 --------------------------------------------------------------------------*/
unsigned long GetTickCount (void) {
    rpi_sys_timer_t    Val = *rpiSystemTimer;                 // Fetch all the timer registers
    unsigned long long resVal = Val.counter_hi;               // Fetch high 32 bits                
    resVal = resVal << 32;                                    // Roll to upper 32 bits
    resVal |= rpiSystemTimer->counter_lo;                     // Add the lower 32 bits in
    resVal /= 1000;                                           // Divid by 1000
    return((unsigned long)resVal);                            // Return the 32 bit result
}

I thought I was home and dry and only when I started to compile the code did the big problem hit home. I got linker errors on the console write to screen of my simple program and it was at that moment it sunk in what Brian had been doing with the start assembler files. Lost in the detail to me was the fact the compiler has absolutely no console output possible from the standard GCC compiler ... woah I haven't run across this in a very long time. Even on the most basic C embedded system, the console would be set to at least a serial port.

I went back to do Brian SideBotham's number 4 & 5 tutorial and the reason for no console output became obvious most of the graphics engine is totally hidden from you in a horrible blob the documentation is scant, incomplete and mostly heresay. I suspect the compiler programmers baulked at this like I did. Brian has a small section in Article 5 where he does some wiring and code to get a console output to a serial port which would at least be the minimum for me and what almost even the smallest Micro C compiler does. However, in a patch to his article, he had become aware of the issue with the Video Float Point unit not enabled. This sent shudders through me, am I really going to try and put a graphics intensive things like an graphics O/S on top of something I have no specifications on and no real idea what it is doing and the timings involved are. Even in Brian Sidebottoms tutorial 5, you could see screen tearing because he could not organize better access to the screen.

In then reading the forums the font libraries was problematic for many. This was trivial to me, I have both bitmap fonts and truetype fonts in c code. The quad bezier and how to scan line it effectively with font hints is well outside the range of novice programmers, but something I carry out of commercial necessity as a standard library. Again, for many, the graphics primitives were problematic which again for me is trivial. I have large libraries of video routines and most allow for granularity sections (like the PI's framebuffer) because the code harks back to VESA compliance days when all SuperVGA cards had granual windows.

My issue is none of the above which may have stopped others. It is to do even a mouse cursor on the screen you need to ideally mask it onto and off the screen. That means two way communication with both reading and writing to the screen and the timing detail is not available. You talk to the screen via mailbox channels on the Raspberry PI and I have no idea of what the synchronization is like. If I issue a write message and immediately issue a read message, is the pixel I get back the original pixel OR the pixel I just sent a message for? Even if I test it on my board, I have no way to know that is how all boards will react because what I really need is the specifications of the video unit which apparently is subject to a large Non Disclosure agreement. This dual buffering is used a lot on a Windows O/S everytime you drag a window you essentially want to redraw the minimum amount of screen and so you organize read/write buffers with clip limits. Basically, almost everything I need to do with graphics on this project will require read/write and masking access to the video unit.

I then went to the linux source files to look at how they deal with the screen and as I feared, it is all hidden inside thin shim routines which are commented in what can be best described as a foreign language.

I now see why almost all the Baremetal Raspberry Pi stuff you see on the net are blinking the lights or pulsing the IO ports. To try and use the graphics much beyond anything basic would mean going into linux so you have access to the driver and the routines which were developed by the Arm as a vendor (they of course have all the details).

I have nothing against Linux. It is just too large for almost anything embedded I work on. I have used ucLinux on a snapgear firewall code and a couple of FPGA Cortex core boards but they are about the only products I have done that are large enough to accommodate it.

So the use of the PI for my Darkside project is dead, I am just way too uncomfortable to put in large amount of time on something I can't be sure what the specification of the video access is.

Points of Interest

So what did I learn about the Raspberry Pi, well foremost that it is a really bad target to do Baremetal on if you want to do graphics. Bluntly, I wouldn't waste my time to port a full graphics engine on something you have no idea what performance you are going to get and with such poor documentation.

So I guess my Raspberry Pi will be going back into the draw until I need a really small Linux setup for something.

I was reasonably disappointed but I have dragged out the old code and so I will look around for another target board and probably continue on. As with a lot of programming the hardest part is getting started.

I have included the source code which is the basic message event drive system if anyone wants to play around with it.

Update and New Article Coming

It's here .. Part 2

https://www.codeproject.com/Articles/1158245/More-Fun-frustration-and-laughs-with-the-Pi-Part-I

Not bad so far for a 51K IMG file and just the 3 files on the SD card.