Siemens Simcenter STAR-CCM+ and AMD ROCm blog post

Some of you might know that I work at Siemens as a software developer on Simcenter STAR-CCM+. Our fluid dynamics simulation software can use Nvidia GPUs to accelerate its calculations, and earlier this year we also introduced support for AMD GPUs. AMD recently published a blog post to advertise their and our product, based on some questions they asked me and my product manager:

Siemens taps AMD Instinct GPUs to expand high-performance hardware options for Simcenter STAR-CCM+
May 16, 2024

Emulating PPC64 inside Docker

PPC64, the architecture of the IBM POWER4 through POWER7, is big-endian. The POWER8 through POWER10 also have a little-endian mode, which is why PPC64LE is significantly more common nowadays, even though these newer processors can still switch to big-endian mode. I have a bit of manually-vectorized code that specifically supports POWER7 or newer POWERs running in big-endian mode, so to make sure that still works I occasionally need to emulate a PPC64 Linux. There are not many suitable distributions left — Debian dropped the architecture after Debian 8, Ubuntu after 16.04, Void Linux for PowerPC is discontinued, even Fedora dropped it after Fedora 28. Out of the big distributions, that only leaves CentOS 7, which is getting old and only has one year of support left. However, there is Adélie Linux, a relatively new distribution still in development, and Debian unstable still has a PPC64 port. Read on below for Dockerfiles that you can use to run the two inside Docker on amd64 via multiarch.

Continue reading

Running the MOTU M4 USB audio interface without bus power

The MOTU M4 is a high quality, yet relatively affordable USB audio interface with 2x XLR/TRS in, 2x TRS in, 2x TRS/RCA out, 2x TRS out, 1x headphone out, 1x MIDI in, 1x MIDI out. The M2 is very similar, just without the 2x TRS in and 2x TRS out. Judging by the firmware, I am tempted to claim that the M2 and M4 are technically identical, just with a different back panel with fewer connectors hooked up to the ADC/DAC.

Some high-end audio interfaces have a separate power input, but the M4 is exclusively bus-powered. In theory you only need the separate power supply when your computer does not supply enough power, but the M4 is well within the limits of what the specification guarantees. The M4 actually also has a standalone mode where you can use it as a mic preamp by connecting it to a USB power supply. However, like most audio interfaces, it has one power-related downside: it makes the speakers pop when it is turned off. Some people say it’s on the quieter end of the spectrum of audio interface popping, but I would rather have complete silence when I reboot or shut down my computer. Interestingly, MOTU managed to make it completely quiet when powering on. I tried a powered USB hub, but its power output is still controlled by the computer, so the interface loses power and the speakers pop when I reboot my computer.

There is a simple solution to this problem: the USB-C/PWR Splitter by 8086 Consultancy. You can use it to power the M4 from a USB power supply, but connect its data lines to your computer as usual. This solution may also work with some other USB interfaces, but it’s not guaranteed that the interface will keep its output powered when it loses the data connection to the computer.

There is also a significantly more expensive solution: the MOTU M6, which can optionally be powered by a separate but included power supply. You might also shop around for other audio interfaces with separate power connectors (Focusrite Scarlett 8i6 Gen3, Arturia MiniFuse 4, Universal Audio Volt 4), but it’s not guaranteed that all of them will keep running when they lose the data connection to the computer. I picked the MOTU M4 because its case and knobs are mostly metal. My previous audio interface had some parts made from soft-touch plastic, which after about a decade began getting stick and shedding drops of plasticizer, so I was not going to spend money again on soft-touch plastic like on the Arturia’s knobs.

Scientific Article: A thermalized electrokinetics model including stochastic reactions suitable for multiscale simulations of reaction-advection-diffusion systems

I co-authored a scientific article in Journal of Computational Science.

A thermalized electrokinetics model including stochastic reactions suitable for multiscale simulations of reaction–advection–diffusion systems
Ingo Tischler, Florian Weik, Robert Kaufmann, Michael Kuron, Rudolf Weeber, Christian Holm
J. Comp. Sci. 63, 101770 (2022)
DOI: 10.1016/j.jocs.2022.101770

The journal does not provide open access to the article, but you can download it for free from chemRxiv: 10.26434/chemrxiv-2021-39nhv-v3.

Dissertation: Lattice Boltzmann methods for microswimmers in complex environments

My dissertation has been published by the university:

Lattice Boltzmann methods for microswimmers in complex environments
Michael Kuron
PhD thesis, Universität Stuttgart
DOI: 10.18419/opus-11926

Printed copies are available at the university library and at the national library in Frankfurt and Leipzig. I also have a few spare ones, so if you think you really need one, let me know. The SHA1 hash of the PDF file I submitted to the library is 45f8b26dc10c04a5221d79fa3a2c42478a2b89b6, which matches the file available online as of today.


This dissertation introduces, validates, and applies various models for the study of microswimmers, predominantly focusing on the development of lattice algorithms. The models are applicable to biological swimmers like bacteria, but also to artificial ones propelled via chemical reactions. The unifying theme is a complex fluidic environment, ranging from Newtonian single-component fluids, to electrolyte solutions, to viscoelastic media flowing through arbitrary geometries. A particular focus is placed on resolving each swimmer’s surface since the propulsion, or phoresis, originates from a small layer of fluid around it. Resolving the propulsion mechanism is necessary to accurately study hydrodynamic interactions with obstacles and other swimmers. It is also a prerequisite for the study of taxis, that is, alignment with an external field such as a nutrient gradient. Similarly, phoretic interactions can be investigated, like when a swimmer senses and avoids the trail where another swimmer has already depleted the fuel.

Update January 2024

Here is an updated PDF with all the preprint citations replaced with the published versions. This should make it easier to find all references via the DOI system.

Dell WD19TBS review

Four years ago, I reviewed the Dell WD15 USB-C docking station for use with a MacBook Pro. It worked reasonably well, but not all of the USB ports worked and high resolutions didn’t work over DisplayPort. Also, over time it has become a bit unstable for me, requiring the USB-C cable to occasionally be unplugged and replugged, or even requiring the dock to be power-cycled. The latter is also necessary when switching between a Dell and a MacBook, which can be a bit annoying. I also noticed that the WD15’s firmware cannot be upgraded unless you use a Dell laptop from around 2017 — newer ones will try and fail, while non-Dell laptops won’t even run the updater.

Dell’s current docking solution is the WD19. It is available in three variants: the WD19S connects via USB-C, the WD19TBS connects via Thunderbolt or USB-C, and the WD19DCS connects to certain high-end Dell laptops via two USB-C cables. Conveniently, the WD19TBS is certified for use with Macs too. The WD19/WD19TB/WD19DC (without the “S” at the end) are identical, but do not have an audio output.

This review is going to be really short. The WD19 eliminates all the WD15’s flaws that I complained about. Dell provides good documentation that tells you what works via USB-C and what requires Thunderbolt, which resolutions you can get on how many video outputs, etc., so I am not going to repeat that here (the executive summary is that there are no surprises here).

The WD19 has one downside though: it has a fan. Mine turns on and spins up to relatively loud levels every couple hours for a few minutes, which I find quite annoying. Another WD19 I have used never spins up. I don’t really think the fan is even needed for regular usage unless you have a laptop that draws more than 80W of power. So if you’re feeling adventurous and want to void the warranty, open up the WD19 and disconnect the fan. Dell laptops will occasionally display a BIOS warning about a failed fan, but other than that it works fine and completely silent.

Firmware updates no longer require Windows or even a Dell laptop. You can apply them via fwupd on Linux too.

Novation Launchkey 61 MK3 and MainStage 3.5

Santa got me a Novation Launchkey 61 MK3 this year. I learned playing piano as a kid on a Yamaha PSR-340 and have been wanting to get back into music for a while now. These days, good low-priced MIDI keyboards and great-sounding virtual instruments are available for low prices, so the up-front investment is much smaller than back then.

I wanted a MIDI keyboard with a display and a couple of buttons and faders so I could select and control virtual pianos, synths, and organs on my computer. I also wanted integration of the controls with Apple Logic Pro. Some older MIDI devices used binary plugins for this purpose (which get installed into /Library/Application Support/MIDI Device Plug-ins), but with Apple recently having switched from Intel to its own custom Arm processors and many manufacturers not providing updates in a timely manner, the better way going forward is using Lua scripts.

Browsing through the Thomann store, I found that my criteria are met by the Akai MPK 261, Nektar Panorama P6, Novation Launchkey 61 MK3, and Roland A-800 Pro. (The Nektar Panorama T6 might also be okay once the manufacturer delivers the update promised. Same might go for the Novation SL MKIII if it gets an update.) The Nektar Panorama P series only has a binary plugin for Logic Pro, but a Lua script for MainStage. The Novation MK3 has a downloadable Lua script for Logic. The Roland A-PRO series and Akai MPK series are apparently supported out of the box through Lua scripts. Finally, there is the Studiologic Mixface SL, which is a controller with faders, knobs and buttons that magnetically attached to the Studiologic SL series of MIDI keyboards and which has a Lua script for Logic. There is also the Roland Fantom 6, a high-end synthesizer, that has a binary plugin for Logic and a Lua script for MainStage.

When you are not recording, but just playing virtual instruments, a DAW like Logic Pro is overkill. That’s what Apple MainStage is for — it hosts Audio Units (virtual instruments and effects), but unlike a DAW it has no concept of recording or timeline. After seeing Roland’s and Nektar’s documentation on their support of MainStage (they display all the on-screen controls on the keyboard display and allow you to interact with them via the knobs, buttons and faders), I wanted to see how much I could do with the Launchkey. It has special MIDI messages for all kinds of things and should thus be able to do most of the same. The Lua scripts that configure MIDI devices are installed into ~/Music/Audio Music Apps/MIDI Device Scripts (for Logic) and ~/Music/Audio Music Apps/MainStage Devices (for MainStage). The Lua API is not documented publicly, but can easily be deduced by poking through Apple’s own scripts, which are in /Applications/MainStage Device Scripts. The basic API is identical between MainStage and Logic, but Logic uses a different parameter feedback mechanism and supports multiple layers (or “modes”), both of which are not used by any of Apple’s scripts.

I am happy to report that I managed to create a complete MainStage integration for the Launchkey that pretty much matches what Roland (Fantom) and Nektar managed to do. Of course, due to lack of a graphical display, it’s not as nice, but it only costs half as much as the Nektar and a tenth of the Roland Fantom. Automatic mapping of knobs, faders, buttons, and drum pads works perfectly. The LEDs of the buttons mirror the state of the UI. The display shows parameter feedback (name and value) when you move a knob or fader. This goes beyond what the Roland A-800 or Akai MPK261 do, which have a similar price as the Launchkey, but cannot display parameter information.

Note that MainStage’s automatic mapping of controls has a few bugs. My device script cannot work around these, but you can manually re-map these controls if you need them:

  • The Keyboard quick-start project does not map Smart Drawbars to MIDI faders. The Tonewheel organ project template does however. You can manually map the Smart Drawbar controls though.
  • Smart Faders are not mapped to MIDI faders. You can manually map the Smart Fader controls.
  • Instruments that have Smart Controls spread across multiple pages only have their first page’s controls mapped. You can manually map the Tab 2 Smart Knobs though.
  • Drumpads on the keyboard trigger notes in the C6-B7 range and are mapped to MainStage’s Drum Pad controls. However, the virtual instruments expect notes in the C1-B2 range. You can manually change the trigger notes on all 24 drum channels.

Check out if you want to use your own Launchkey MK3 with MainStage. The versions for the smaller (25-key, 37-key, 49-key) models are untested, but should work just as well.

Update February 2022: The Novation SL MKIII got its promised Logic Pro integration. MainStage integration could be done in a similar way as I did for the Launchkey MK3. The Novation SL MKIII seems quite comparable to the Nektar Panorama P6, but a bit more modern.

Update March 2022: I uploaded a video to YouTube and tweeted about this, hoping to get it out to more people. Novation even liked my tweet:

Update April 2022: Downloadable installer packages (.pkg files) are now available for release versions at You will need to right-click to install the package as it is not signed.

Update September 2022: The Novation SL61 MK3 is not currently supported as it has a completely different MIDI protocol is different from the Launchkey MK3. I would gladly add support for it, but don’t own this keyboard, so unless someone ships me one, it won’t happen. The SL61 would be perfect for MainStage with its little displays etc.

Update May 2023: The new Arturia KeyLab Essential 61 mk3 also fulfills my requirements and it has a Lua script for Logic. Writing one for MainStage should also be possible.

Scientific Article: An extensible lattice Boltzmann method for viscoelastic flows: complex and moving boundaries in Oldroyd-B fluids

I’ve published a scientific article in the European Physical Journal E.

An extensible lattice Boltzmann method for viscoelastic flows: complex and moving boundaries in Oldroyd-B fluids
Michael Kuron, Cameron Stewart, Joost de Graaf, and Christian Holm
European Physical Journal E 44, 1 (2021)

The article is available as open-access from the publisher, thanks to Projekt DEAL. One of my pictures even made it onto the cover of the January 2021 issue:

Scientific Article: waLBerla: A block-structured high-performance framework for multiphysics simulations

I co-authored a scientific article in Computesr & Mathematics with Applications:

waLBerla: A block-structured high-performance framework for multiphysics simulations
Martin Bauer, Sebastian Eibl, Christian Godenschwager, Nils Kohl, Michael Kuron, Christoph Rettinger, Florian Schornbaum, Christoph Schwarzmeier, Dominik Thönnes, Harald Köstler, and Ulrich Rüde
Comp. Math. Appl. 81, 478 (2021)

The journal does not provide open access to the article, but you can download it for free from arXiv: arXiv:1909.13772.

Ubuntu 20.04: OpenMPI bind-to NUMA is broken when running without mpiexec

I tend to set the CPU pinning for my OpenMPI programs to the NUMA node. That way, they always access fast local memory without having to cross between processors. Some recent CPUs like the AMD Ryzen Threadripper have multiple NUMA nodes per socket, so pinning to the socket is not the same thing.

Since upgrading to Ubuntu 20.04, we were seeing error messages like this:

$ python3 -m mpi4py.bench helloworld
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 Setting processor affinity failed failed
 --> Returned value Error (-1) instead of ORTE_SUCCESS

Launching through mpiexec/mpirun, even if it was with just one MPI rank, did not show the error:

$ mpiexec -n 1 python3 -m mpi4py.bench helloworld
Hello, World! I am process 0 of 1 on host1.
$ mpirun -n 1 python3 -m mpi4py.bench helloworld
Hello, World! I am process 0 of 1 on host1.
$ mpiexec -n 4 python3 -m mpi4py.bench helloworld
Hello, World! I am process 3 of 4 on host1.
Hello, World! I am process 0 of 4 on host1.
Hello, World! I am process 1 of 4 on host1.
Hello, World! I am process 2 of 4 on host1.

If you look through the OpenMPI code, you can see that CPU pinning is done by different code depending on whether you run standalone (called singleton mode) or through mpiexec. The relevant bit for the former is in ess_base_fns.c. It searches for a hwloc object of type HWLOC_OBJ_NODE (which is deprecated on the hwloc side and identical to the newer HWLOC_OBJ_NUMANODE). Since hwloc 2.0, NUMA nodes are no longer containers for CPU cores, but exist besides them inside a HWLOC_OBJ_GROUP.

$ lstopo --version
lstopo 1.11.9
$ lstopo --output-format console
Machine (31GB total) + Package L#0
  NUMANode L#0 (P#0 16GB)
    L3 L#0 (8192KB)
      L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#12)
$ lstopo --version
lstopo 2.1.0
$ lstopo --output-format console
Machine (31GB total) + Package L#0
  Group0 L#0
    NUMANode L#0 (P#0 16GB)
    L3 L#0 (8192KB)
      L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
        PU L#0 (P#0)
        PU L#1 (P#12)

The current OpenMPI master (i.e. versions beyond the 4.1.x series) don’t bind through hwloc anymore, so the issue is fixed upstream (if only by accident). However, we’re stuck with Ubuntu 20.04 for the next two years, so let’s fix it ourselves. We load up the incriminating file, /usr/lib/x86_64-linux-gnu/openmpi/lib/, in Hopper and jump to orte_ess_base_proc_binding. Comparing it to its C code quickly reveals the instruction we need to change:

0x3 is OPAL_BIND_TO_NUMA and 0xd is HWLOC_OBJ_NODE. Looking at the hex code tells us that we need to make this change:

- 66 83 F8 03 0F 85 70 02 00 00 BA 0D 00 00 00
+ 66 83 F8 03 0F 85 70 02 00 00 BA 0C 00 00 00

Here’s a bit of Python code to do that:

import mmap
with open("/usr/lib/x86_64-linux-gnu/openmpi/lib/", 'r+b') as f:
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)"66 83 F8 03 0F 85 70 02 00 00 BA 0D 00 00 00")))
m.write( bytes.fromhex("66 83 F8 03 0F 85 70 02 00 00 BA 0C 00 00 00"))

Update 2021-02-26

The recent kernel update from to switched us from HWLOC_OBJ_GROUP to HWLOC_OBJ_DIE. lstopo now reports

$ lstopo --output-format console
 Machine (31GB total) + Package L#0
   Die L#0
     NUMANode L#0 (P#0 16GB)
     L3 L#0 (8192KB)
       L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (64KB) + Core L#0
         PU L#0 (P#0)
         PU L#1 (P#16)

So the patch needs to be modified to have 0x13 in its fourth-to-last byte now.

Update 2021-05-07

The AMD Epyc still uses HWLOC_OBJ_GROUP instead of HWLOC_OBJ_DIE and thus needs the previous patch:

Machine (252GB total)
   Package L#0
     Group0 L#0
       NUMANode L#0 (P#0 31GB)
       L3 L#0 (16MB)
         L2 L#0 (512KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0
           PU L#0 (P#0)
           PU L#1 (P#48)

Update 2022

Unfortunately, OpenMPI 5 was still not released and Ubuntu 22.04 thus retains this problem. My binary-patching trick does not work anymore either because the compiler makes some complex optimizations. Therefore, I suggest you use

OMPI_MCA_rmaps_base_mapping_policy=l3cache OMPI_MCA_hwloc_base_binding_policy=l3cache

instead of

OMPI_MCA_rmaps_base_mapping_policy=numa OMPI_MCA_hwloc_base_binding_policy=numa

This still gives you the benefit of pinning to more than a single core, which gives the kernel some scheduling flexibility.