Using the DMA controller on STM32F4

A little while ago I got one of the fairly common “Nokia 5110”  LCD modules, a 84×48 b/w graphic LCD screen, thinking it would be handy to have in current or future projects. One of the things I especially like about this module is that it is using a serial protocol (SPI) to send data and control messages. This reduces the number of pins required tremendously. When I finally got around to playing around with it, I managed to get it to work with my STM32F4-Discovery board fairly quickly. I could have left it at that, but the code I was using during my initial tests was rather crude and inefficient, and I thought this module would be a good reason to finally get my hands dirty with the DMA controller on the STM32F4.

What is DMA?

If you don’t know what a DMA controller is or why it might be helpful, maybe a quick explanation will clarify this. DMA stands for “Direct Memory Access”. As the name suggests, it has something to do with accessing memories, or more precisely with moving data between different locations in “memory space”. A very typical example would be reading the value of a receive register of a communication module such as a UART and storing it in a variable or array in RAM. Of course you can easily do this with the CPU, so why do you need a special module for this transfer? The problem with using the CPU for this type of work is that it can actually tie up the CPU quite a bit. You would either have to constantly check if new data is available in the peripheral by reading and checking a status bit (or bits), or you could use an interrupt (with the appropriate handler) to alert you to the arrival of new data and to perform the transfer. This might be perfectly fine if you’re only interested in transferring a small number of bytes, but what if you would like to transfer larger amounts of data? Continuously? In that case even the interrupt method can become quite inefficient, because there is some overhead associated with entering and exiting the interrupt handlers. So wouldn’t it be nice if there was a little helper module that you could tell to watch out for incoming bytes on the peripheral and to store them in some chunk of memory as they come in, and to only alert the CPU (if at all) if a certain number of bytes have been received? This is what a DMA controller is for: it allows you to transfer data without the involvement of the CPU ( that’s where the “direct” comes from). Essentially it allows you to use the resources in your microcontroller/microprocessor system more efficiently. It frees up the CPU for other tasks while data are being shuffled back and forth between memories and peripherals (or between memories).

Setting up DMA transfers

So, what is required to setup a DMA transfer? If you think about the above explanation, it’s actually quite straight forward: you need a “source”, a memory address where you’re copying data from, a “destination”, i.e. where you’re copying to, some information about the size of the data chunks to be transferred, and some sort of signal telling the DMA controller when the next byte is ready to be transferred. In practice, it’s slightly more complicated as there are a few other (quite useful) features that need to be configured to complete the setup. So let’s look at the DMA controller in the STM32F4 device family.

When you read the data sheet or the reference manual for the STM32F4 microcontrollers regarding DMA, the first thing you will notice that there are actually two DMA controllers on these devices. Both of these DMA controllers are almost identical. The main difference between them is that only the DMA2 controller can perform memory-to-memory transfers. The other difference is that DMA1 is connected to the APB1 peripherals, whereas DMA2 is connected to the APB2 peripherals. This will become important soon. Each DMA controller has 8 separate streams. Each stream itself needs to be associated with one of 8 channels. These channels carry the signal I mentioned above to notify the DMA controller that data are ready to be transferred. More formally, these are called “DMA requests”. And that’s the first slightly complicated thing in using DMA. Which stream and channel do I need to use? Well, if your intended DMA transfer involves a peripheral it’s actually not complicated at all, as long as you know where to look. On pages 164 and 165 of the Reference Manual you will find the following two tables:

DMA1 request mapping
DMA1 request mapping
DMA2 request mapping
DMA2 request mapping

These tables show which of the DMA streams a particular peripheral is connected to and via which channel. A quick side note: the text above the tables in the manual states: ” The 8 requests from the peripherals … are independently connected to each channel and their connection depends on the product implementation.” This seems to imply that these mappings can change from device to device. However, I have not yet seen any specific information about the DMA mapping in a controller-specific document, and apparently I’m not alone.

An example using the standard peripheral library

As usual, the Standard Peripheral Library provides convenient structures and functions to get everything set up correctly. Here’s the initialization procedure I used to set up DMA transfers between an array in RAM and the SPI2 module.

DMA_InitTypeDef dma_is;

dma_is.DMA_Channel = DMA_Channel_0;
dma_is.DMA_Memory0BaseAddr = (uint32_t)screenBuffer;
dma_is.DMA_PeripheralBaseAddr = (uint32_t)(0x4000380C); //SPI2 DR
dma_is.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
dma_is.DMA_PeripheralDataSize = DMA_MemoryDataSize_Byte;
dma_is.DMA_DIR = DMA_DIR_MemoryToPeripheral;
dma_is.DMA_Mode = DMA_Mode_Normal;
dma_is.DMA_MemoryInc = DMA_MemoryInc_Enable;
dma_is.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
dma_is.DMA_BufferSize = 6*84;
dma_is.DMA_Priority = DMA_Priority_High;
dma_is.DMA_MemoryBurst = DMA_MemoryBurst_Single;
dma_is.DMA_PeripheralBurst = DMA_PeripheralBurst_Single;
dma_is.DMA_FIFOMode = DMA_FIFOMode_Disable;

DMA_Init(DMA1_Stream4, &dma_is);
DMA_Cmd(DMA1_Stream4, ENABLE);

// ...

The first few lines should be fairly self-explanatory. The DMA request for the transmit buffer empty signal of the SPI2 module is located on channel 0 of DMA1, stream 4.  Our memory base address is the pointer to an array (and since the name of an array is just the pointer to the first element, we don’t need the ampersand. The peripheral base address is the pointer to the data register of the SPI2 module. We’re transmitting single bytes from memory to peripheral. For DMA mode there are two options: normal and circular. In normal mode, the DMA stops transferring bytes after the specified number of data units, whereas in circular mode it simply returns to the initial pointer and keeps going. “Initial pointer?” you might ask: that’s where the next two lines come into play. You can ask the DMA controller to automatically increment one or both of the pointers after each transfer. That’s what’s happening in my example, while the peripheral address doesn’t change (obviously we want to keep reading the same register), the pointer location into memory does get incremented thereby stepping through my “screenBuffer” array, and the size of this array is exactly 6*84 bytes (corresponding to the 48 x 84 pixels). Lastly, we specify a priority level for this DMA stream (since other streams may be active as well, and they may compete for access to the buses).

The last couple of lines of the initialization structure are slightly less obvious. To understand what’s happening here you have to know that each DMA stream has its own small FIFO (first-in-first-out) buffer, which can temporarily store data from the source before transferring it to the destination. In addition, the transfer can happen in burst instead of single transfers. Together this can be helpful to deal with bus contention issues and in other scenarios, but there are a few extra rules that need to be followed when using the FIFO and burst transfers. If you’re considering using them, make sure to carefully study pages 173ff of the Reference Manual. In my example I’m using “direct” mode (=no FIFO), in single item modes (=no bursts).

Once the initialization structure is defined, it’s time to initialize the DMA stream, and turn it on. It should then be ready to receive and react to DMA requests on the specified channel. Lastly, once we’re ready to start the transfer we need to start issuing DMA requests. In my example this is done via the SPI2 module, whenever the transmitter empty (TXE) flag is set. Of course the SPI module itself needs to be initialized first, which is not shown here. Similarly, I didn’t show that the clock signal to the DMA controller needs to be turned on first, just like for many (or all?) of the other peripherals.

End of transfer and double buffering

What happens next? The DMA controller should now transfer single bytes from the “screenBuffer” array to the SPI2 data register, whenever the transmitter empty flag gets set. It will do so 504 times (=6*84), and once completed will turn itself off (since it’s not in circular mode). If we wanted to re-transmit the data in screenBuffer (maybe after they have been altered), all we would need to do is re-enable the DMA stream. If we need a continuous data stream between memory and peripheral, it may actually be better to configure the DMA stream in circular mode, but in “double buffer” mode. In this mode there are two memory locations (hence double) and the DMA stream swaps to the respective other memory location each time it finished dealing with one (i.e. reading from or writing to it). That allows the rest of the software to process the location/buffer that’s not currently used for the DMA transfers. To keep track of which of the two buffers is currently in use, a bit in the stream’s configuration register becomes a status indicator (i.e. read-only; you can write to this bit only if the stream is disabled) once the stream is enabled.  Of course you could also use a DMA generated interrupt to keep track of the two buffers.

So far a quick overview of DMA with the STM32F4. I’m sure you’ll agree with me that it can be an incredibly powerful feature that allows you to make the most of available computing power of the microcontroller. How useful it is will of course depend on the precise nature of the task to be accomplished. If there’s nothing else for the CPU to do as long as the data haven’t been transferred then DMA isn’t really of much help. However, if you rather not wait for a slow peripheral to transfer to or from the microcontroller before doing other things, DMA can be an excellent choice.

33 thoughts on “Using the DMA controller on STM32F4

  1. Thank you so much for taking the time to write this. Between your explanation and the Peripheral_Example “ADC3_DMA” that comes with the STMf4 discovery kit, I finally understand the basics of DMA and how to use it.


  2. It was a really helpful tutorial. Im going to take images from ov7660 camera module using f4 discovery board. Your explanation gave me a good start on dma. Without dma i was able to read images at lower frequency. Dma will help me to increase the rate of processing image.
    Thank you


  3. Hi!

    First of all, thanks for this write-up, it helped me a lot!
    But I’m stuck at “all we would need to do is re-enable the DMA stream”.
    I get the correct signal at the selected pins right after the system initialization, but i can’t seem to resend the data.
    What command resets/reenables the DMA controller?



    1. Hi David!
      I can’t check this at the moment in my code, but I think all you need to do is re-issue the “DMA_Cmd(DMA1_Stream4, ENABLE);” command.


      1. I had a look at my code, which unfortunately has been modified somewhat by now.

        In my current code I am indeed clearing a flag, but that probably only became necessary because I started enabling DMA interrupts. One of the things I do in the interrupt handler is to clear the interrupt flag (“DMA_ClearFlag(DMA1_Stream4, DMA_FLAG_TCIF4);”) before re-enabling the DMA stream.

        If you want, I can take a look at your code to see if anything jumps out at me – you could send me an email at andreas[a]


      2. Yes, repeated activation works now, thanks a lot!

        But I’m unable to reset the pointers for the DMA Channel, after the first transmission I only get garbadge. – Any ideas?


      3. I never tried changing the source or destination addresses on the fly yet, however I remember reading in the reference manual that you can only change those under certain circumstances (either stream must be disabled or the current location must be unused in dual buffer mode).

        What exactly are you trying to achieve?


  4. Sorry for the long response time, I got sidetracked a lot 😉
    I am trying to use the SPI Port to send the same high-frequency signal each time i get a certain interrupt.
    My current setup is the following:


    NVIC_InitStruct.NVIC_IRQChannel = DMA1_Stream4_IRQn;
    NVIC_InitStruct.NVIC_IRQChannelPreemptionPriority = 0;
    NVIC_InitStruct.NVIC_IRQChannelSubPriority = 1;
    NVIC_InitStruct.NVIC_IRQChannelCmd = ENABLE;

    RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_GPIOB | RCC_AHB1Periph_GPIOB, ENABLE);
    GPIO_PinAFConfig(GPIOB, GPIO_PinSource13, GPIO_AF_SPI2);
    GPIO_PinAFConfig(GPIOB, GPIO_PinSource15, GPIO_AF_SPI2);

    GPIO_InitStruct.GPIO_Mode = GPIO_Mode_AF;
    GPIO_InitStruct.GPIO_Speed = GPIO_Speed_100MHz;
    GPIO_InitStruct.GPIO_OType = GPIO_OType_PP;
    GPIO_InitStruct.GPIO_PuPd = GPIO_PuPd_NOPULL;

    GPIO_InitStruct.GPIO_Pin = GPIO_Pin_13; //SCK
    GPIO_Init(GPIOB, &GPIO_InitStruct);
    GPIO_InitStruct.GPIO_Pin = GPIO_Pin_15; //MOSI
    GPIO_Init(GPIOB, &GPIO_InitStruct);

    SPI_InitTypeDef SPI_InitStruct;

    // enable the SPI peripheral clock
    RCC_APB1PeriphClockCmd(RCC_APB1Periph_SPI2, ENABLE);

    SPI_InitStruct.SPI_Mode = SPI_Mode_Master;
    SPI_InitStruct.SPI_Direction = SPI_Direction_1Line_Tx;
    SPI_InitStruct.SPI_NSS = SPI_NSS_Soft;
    SPI_InitStruct.SPI_BaudRatePrescaler = SPI_BaudRatePrescaler_8;
    SPI_Init(SPI2, &SPI_InitStruct);
    // Enable the SPI port
    SPI_Cmd(SPI2, ENABLE);

    DMA_InitTypeDef DMA_InitStruct;

    unsigned char data[2];

    data[0] = 0x1;

    RCC_AHB1PeriphClockCmd(RCC_AHB1Periph_DMA1, ENABLE);

    DMA_InitStruct.DMA_Channel = DMA_Channel_0;
    DMA_InitStruct.DMA_Memory0BaseAddr = (uint32_t)&data;
    DMA_InitStruct.DMA_PeripheralBaseAddr = (uint32_t)&(SPI2->DR);
    DMA_InitStruct.DMA_MemoryDataSize = DMA_MemoryDataSize_Byte;
    DMA_InitStruct.DMA_PeripheralDataSize = DMA_MemoryDataSize_Byte;
    DMA_InitStruct.DMA_DIR = DMA_DIR_MemoryToPeripheral;
    DMA_InitStruct.DMA_Mode = DMA_Mode_Normal;
    DMA_InitStruct.DMA_MemoryInc = DMA_MemoryInc_Enable;
    DMA_InitStruct.DMA_PeripheralInc = DMA_PeripheralInc_Disable;
    DMA_InitStruct.DMA_BufferSize = 0;
    DMA_InitStruct.DMA_Priority = DMA_Priority_High;
    DMA_InitStruct.DMA_MemoryBurst = DMA_MemoryBurst_Single;
    DMA_InitStruct.DMA_PeripheralBurst = DMA_PeripheralBurst_Single;
    DMA_InitStruct.DMA_FIFOMode = DMA_FIFOMode_Disable;

    DMA_Init(DMA1_Stream4, &DMA_InitStruct);

    DMA_ITConfig(DMA1_Stream4, DMA_IT_TC,ENABLE);

    In my interrupt-handler i just clear the flag and shut down the dma controller:

    void DMA1_Stream4_IRQHandler(void) {
    if(DMA_GetITStatus(DMA1_Stream4, DMA_IT_TCIF4) == SET) {
    toggleGPIO(GPIOD, GPIO_Pin_13);
    DMA_Cmd(DMA1_Stream4, DISABLE);
    DMA_ClearITPendingBit(DMA1_Stream4, DMA_IT_TCIF4);

    And when i want to launch the transmission again i do the following:

    DMA_MemoryTargetConfig(DMA1_Stream4, (uint32_t)&data, (uint32_t)&(SPI2->DR));

    The clock-line shows 8 pulses as expected but on the data line i do not see the expected byte, any ideas?


      1. Hi David,

        At a first glance, one thing stand out to me: your buffer size is 0, so I’m somewhat surprised it’s sending anything at all. This should have the number of bytes you’re trying to send (in your case 2). And since “data” is an array, you actually don’t need the & in front of it. I’m surprised the compiler isn’t complaining about that, but I don’t know what it would be pointing at, and that might be the cause of your problems. If you look at my example, I’m only doing the cast to uint32_t, but no reference operator “&” for my screenBuffer array, since the name of an array variable by itself is essentially the address of the first element of the array.


  5. Hi, thanks for still staying with me 😉

    The buffersize is 1 in my code, I just wated to change it from 2 to 1 for copying the code since I was experimenting with multiple bytes and made a typo.
    As for the ‘&’ it does not appear to change the behaviour.

    What happens is the following:
    First burst
    All other bursts

    Where channel 2 is the clock and channel 1 is spi-out.



    1. Ok, it seems i found my own problem -.-

      When i posted the code i actually edited it for better readability. I had an array with data in it i used to initialise the dma with and in my main code i used a different array to feed data to the dma-controller – It ignored both arrays. Now i moved the array initialisation into a header file and use the same variable in bot c-files and it works – no idea why 🙂

      Cheers and THANKS A LOT for helping me out!


      1. I’m glad you figured it out. Sounds like you had an issue with variable scope.


  6. Hi Andreas,

    Very nice and clear description about how to use the DMA with this STM device. I am currently working on a project that requires transferring collected data to an external FLASH NOR ram via SPI bus and your information may help. Not sure yet how to implement the code because of the ram requirements and quirkiness. The external memory requires first some allow-write commands and page address and then a 256 byte data transfer. Once the page is transferred the ram takes some time to storage the data, however, I have to keep recording incoming information. There is where the double buffering option may come in handy. Any previous experience with this type of implementation?

    Anyway, thanks a lot for the great tutorial! I wish ST had better information and examples.



    1. Thanks, Santiago!

      I have not yet tried double buffering. One thing to keep in mind here though, is that double buffering will automatically also put the DMA channel into circular mode. This may not work for your RAM (though it should be fine for collecting further data).

      Good luck!


  7. great post.

    i am getting error “.\Objects\dc.axf: Error: L6218E: Undefined symbol DMA_Cmd (referred from d1.o).”
    i am not able to find DMA_cmd in any of the library file.
    please help…


    1. Judging by the file format (d1.o), you seem to be using some pre-compiled object files. Where do they come from? Also, depending on which library you use, ST now has a new library they push (HAL or CUBE), which on one hand is very similar to the old Standard Peripheral Library, but has changed the names of many of its structures and functions. However, if that was the problem you would undoubtedly get a lot more errors, so I’m not entirely sure what’s going on.


  8. Where does the datasheet or reference manual states that DMA1 is connected to APB1 peripherals and DMA2 is connected to APB2 peripherals?, it seems true but I couldn’t find it in the datasheet or reference manual.
    I was trying to use DMA1 with ADC1 and it didn’t worked, so I tried to use DMA2 and it works.

    Also, the tables you referenced (DMA1,2 request mapping) are just examples, they aren’t fixed.


    1. If you look at Figure 34 of the reference manual you see that DMA1 is connected to the “AHB-APB bridge1” which connects to APB1 peripherals, and DMA2 is connected to “AHB-APB bridge2”, which connects to APB2 peripherals. Similarly, the bus matrix figure in the datasheet (Fig 6) shows the connections of DMA1-APB1 and DMA2-APB2.

      If it is true that the mapping isn’t fixed (and you are right that the tables are described as “examples”) I don’t know how you can associate a specific DMA peripheral request with a particular DMA stream. In the code, all you do is associate a channel with a stream. That’s why certain peripheral requests show up multiple times in the table. My understanding is that you have to choose a combination of channel and stream to get a particular peripheral request. I think the reason they call them “examples” is because of this phrase “…and their connection depends on the product implementation.”


      1. Thanks, I never looked at those buses in that diagram.

        To me, it would make sense if the peripheral-stream-channel association were fixed but ST just give that table in the Reference Manual, says that these are just examples, then says that those associations are device-dependent and gives no list of possible associations.


      2. I absolutely agree. If the mapping is, as they say, device specific, you would expect to find a similar table (or other indications about the mapping) in the specific device’s datasheet. Yet in the datasheets I’ve seen, there are none. Maybe they’ve been added in newer versions of the datasheet? I haven’t looked.

        OK, just checked. The latest versions from March 2015 still have the exact same text and tables. I also found application note AN4640 “Peripherals interconnections on STM32F405/7xx …” which states regarding the DMA mapping: “Each stream is associated with a DMA request that can be selected out of 8 possible channel requests. … This interconnection is explained in the following tables of RM0390 and RM0090 reference manuals”, the same tables that present those mappings as “examples”.


  9. Hi Andreas,

    A thoughtful and well-written article.

    I have a question about the frequency of a DMA transfer. I have an ADC that must collect data samples at a very specific rate – i.e., 1000 samples/second. It would be nice to set this up using a DMA, but as of this point cannot see how. Presently I am using a “pace timer”, and on the timer interrupt, I add to the array. After 1 second, I process. Perhaps there’s no advantage in using DMA, but I would like to try and see what advantages there might be.

    So the question is straightforward – how do I control the DMA timing to transfer from the ADC to memory at an exact frequency?



    1. Thanks, Gary.

      What you want to do should* be easily doable, and you’re already halfway there. Using a timer to trigger the ADC conversion is of course the easiest way to insure a constant and known sample rate. However, instead of triggering an interrupt request at the end of each conversion and copying the result into memory in the interrupt service routine, you could simply have it trigger a DMA request and have the DMA controller copy the value for you without any involvement of the CPU itself. The only interrupt you might need is the one from the DMA controller itself to tell you that the requested number of transfers have been completed.

      So, in short, set up the ADC to use a timer signal for triggering conversions (i.e. no need for a timer interrupt routine), and setup a DMA channel to transfer the ADC results to memory whenever the ADC sends a DMA request at the end of a conversion.

      Of course, if your overall program can’t do anything further until the 1000 samples are collected, and the CPU would just sit idle anyway, then you are right, a DMA-based transfer doesn’t have a real advantage (other than perhaps slightly more compact code, which usually isn’t a problem) and you may as well use interrupt routines.

      Hope this helps.

      *I’m using “should” because I haven’t actually checked if this possible, but I would be very surprised if it wasn’t.


  10. Thanks for the reply Andreas. Now after having a look at the manual, according to the grid on DMA setup, your explanation seems to be sound. At these sample rates using the DMA is more cosmetic, but I do have some projects that will require much higher sampling and this is where I will try it.

    Hope to talk again, and will watch for more of your posts! Cheers!


  11. Hi Andreas,

    Is it possible to do a DMA from DCMI to SD interface for writing the camera image to SD card?



  12. Thanks for this explanation.
    Often ,when is about programming stuff,some explanations seem to be made for readers having the same skills of the writer,this one is very clear and encouraging.
    After years of programming in a very basic way i started few weeks ago using DMA,and i still have to explore all advantages.
    I find that tools as STM32CubeMx are useful when a configuration is needed in short time,but when things are not going exactly in the right way one has to do some step back on the reference manual,and sometimes back again to the basic.


  13. I have been doing the same thing for a Sharp 2.7 inch memory display. The DMA-> SPI is working but all the DMA error flags get set.

    I trap them in the DMA interrupt, but not sure why they are occurring. Any ideas?


  14. wondeгful points altogеther, yоu just won a new reader.
    What could you suggѕt about your put սp that you made
    a few days in the past? Any positive?


Comments are closed.

Create your website with
Get started
%d bloggers like this: