|All of your code you have done is a co-operative system for the situation of longer tasks you need to do pre-emptive context switches and for the asynchronous order problem you need semaphores,mutex or spinlock implementation.
For pre-emptive context switching you usually have a timer interrupt it triggers 10-1000 times a second. When the interrupt triggers it looks to see what task should run next in a scheduler scheme, it then forcibly saves every cpu register and every fpu register to a context stack so the program can resume from that point. Next it loads the cpu registers and fpu registers for the next task to run from a context stack and then it proceeds to run that task until a subsequent timer tick stops it passing control to another task. You will require critical section process which is small areas of code when it enters the task switcher can not interrupt and often that is as simple as switching off interrupts so the timer interrupt doesn't fire.
Semaphores or spinlocks are the usual way of dealing with sharing the I/O functionality. So in your DMA example the use of the DMA would have an aquire and release process. So any function wanting to do a DMA transfer would first ask for ownership of the DMA, if the DMA is in use it will already be aquired and the caller must wait for whoever has it to release. So anyone that has the DMA then releases and the act of doing so then allows waiting callers to then execute so it looks like.
void DMA_Write ( /.* some variables */)
ActualDMAFunction(/.* some variables */);
For the processor you are on if you search for "ARM Synchronization Primitives" and ARM will have a whitepaper on typical setups and minimum requirements. It will usually involve the use of the special opcodes LDREX/STREX (as well as WFE/WFI on multicore systems if you want to sleep the core while waiting).
You will also find the code for the context switch by a simple search of "ARM Context switch" with the processor name and you will get an ARM white paper describing it. They will all generally work on entering the call with one register pointing to the where to save all the current registers and another to where to load the registers from.
There are some really simple AVR task switcher projects which would be easy to adapt .. you simply need to change the context_switch assembler and replace the ATOMIC_BLOCK store and release with the interrupt enable/disable (it is all it generally does on smaller AVR processors) it is a basic CRITICAL SECTION control so it doesn't get interruptted with the code inside the curly brackets.
GitHub - kcuzner/kos-avr: Kevin's RTOS for AVR microcontrollers[^]
That is basically one file kos.c and 1 header kos.h
That uses a simple round robin scheduler (every task has same priority) if you want advanced priority based scheduler you can simply replace the scheduler code but I would suggest get the most simple running first then add the advanced scheduler. All you will end up doing is replacing the scheduler function with a more advanced one and your task creation call will have a priority you assign.
Final note if you are on a multicore system you will need to change the semaphore to the proper arm code the presented one generally won't work with multicores. It should be fine on a single core.
If you need further help we really need what ARM processor.
In vino veritas
modified 11-Jun-18 0:54am.