I’ve searched into it, and indeed the garbage collector got better for us. And I’ve seen some compilation flags that may influence that, although never tried.
I believe it’s possible but not with low latency, and I’m not willing to give up that))
I’ve searched into it, and indeed the garbage collector got better for us. And I’ve seen some compilation flags that may influence that, although never tried.
I believe it’s possible but not with low latency, and I’m not willing to give up that))
It’s definitely possible to write non-allocating, non-GC-ing real time code in Haskell with predictable performance, but you’d need to do it within a framework.
I haven’t looked into the Atom DSL which you linked to, but it looks intereesting too.
You can also get control of allocations without completely removing them, using streaming libraries like pipes/conduit/etc. These are claimed to sometimes get as fast as C.
An older way of dealing with streaming computations is to use arrows. These are kind of weird to work with but they at least “feel” like composing UGens.
I do wonder though: when working with large blocks of memory one can run into all kinds of timing issues due to caching behavior. Memory cache misses can be quite expensive in terms of time and energy consumption. Pre-allocating a large block of destination memory therefore in general will not automatically lead to predictable performance.
If I understood correctly, DPS style mentioned above pre-allocates memory to write results to and then passes it around to the calculations. So depending on what you are calculating (writing output to a buffer e.g.) you might run into such issues.
I think there is some trickery there: “The elements of the intermediate vectors are consumed as they are produced”. The important thing is that there is no interruption. As I’m understanding, it’s not so different from scsynth. (I’m still absorbing this stuff, idk
I do wonder though: when working with large blocks of memory one can run into all kinds of timing issues due to caching behavior. Memory cache misses can be quite expensive in terms of time and energy consumption. Pre-allocating a large block of destination memory therefore in general will not automatically lead to predictable performance.
I think the usual strategy in audio is to allocate everything in a big block, where areas of memory are adjacent where their execution would be adjacent (meaning, basically, if you have node_a -> node_b then you have [memory_of_node_a, memory_of_node_b]). IIRC both scsynth and supernova are already doing this. This way, you have a higher likelihood of cache hits as you traverse the graph. Both servers do this for memory with executable memory as well, instruction cache misses are I think even more costly.
Definitely you’re right re “automatically lead to predictable performance”, but a big block with predictable is better than a big block with bad order, or heavily fragmented memory.
@scztt I’m confused, you said the opposite. So a big block WILL lead to predictable performance in this context of no gc no malloc etc. or you’re saying “but not automatically”?
So it bowls to get a big block always for all your nodes regardless.
“Big block” is probably the easiest simple strategy to get predictable performance. I say “but not automatically” just because cache performance is complex and not perfectly predictable, and can vary a lot across e.g. different CPU’s. You can make some basic assumptions, but lots of code that I’ve encountered where cache optimization has been done, has at least one or two things that would not be obvious without discovering them in real-world testing.
Generally, you want to keep the memory footprint as small as possible. For example, all the input and output buffers are allocated from one larger block of memory with a buffer coloring algorithm to keep the total amount of accessed memory as small as possible. (In scsynth, these wire buffers can be shared among all Synth instances; in Supernova, however, Synths may run in parallel so each of them needs its own set of wire buffers.)
For anyone interested in the topic of data locality and data oriented design, I highly recommend Mike Acton’s (in)famous CppCon talk: https://www.youtube.com/watch?v=rX0ItVEVjHc
He presents a perspective that seems to oppose Simon Peyton Jones’ paper. You did on purpose? ))) Although my attention wasn’t all in it, I still have to watch the whole thing, but it appeared he drew a sharp distinction between “abstract models” (other coders) and those who immerse themselves in the shit, I mean, in the practicalities of data and hardware (his company).
It strikes me bc for me concepts such as categories and functors are more than mere mental constructs or ways to avoid confronting reality, but that’s a way to understand how we think in general. His presentation was refreshing for me to think again about it.
Functors emerge as unique conduits between different domains, preserving their inherent structures. Imagine a functor as a means to integrate one domain into another seamlessly, without disrupting their relational dynamics. It’s akin to embedding a model from one domain into another, where the original serves as a blueprint or scheme within the new context.
The integration of one domain into another can manifest in VARIOUS forms, with some methods bearing resemblance and others markedly different. One approach might condense the entirety of the original domain into a singular point, whereas another might meticulously map each element and its connections to new counterparts in the receiving domain. This thing can be actualized in multiple ways. We will always be able to compare these diverse implementations. Something out there still represents a special category of mappings between functors that maintain the integrity of the functor’s properties.
The thing is, this man KNOWS A FUCKING LOT ABOUT HARDWARE. I’m sure about that!!! But I thought he didn’t need the manifesto, did he? hehe
It’s just one of those forms of integrate domains…
I mean, I would prefer to know more about memory management and not ask stupid questions of you guys (with whom I learn a lot), I hope I will get there, but I don’t think if I knew more it would change something essentially different.
He presents a perspective that seems to oppose Simon Peyton Jones’ paper. You did on purpose? )))
I haven’t read the paper, so it was not intentional
I hope I will get there, but I don’t think if I knew more it would change something ‘deep’.
I don’t think there is something ‘deep’ about low level programming. You need to follow the hardware and a simple imperative programming style is likely the best way to achieve that. We use OOP and FP to build all these wonderful abstractions, but they eventually all break down once we try to squeeze the last bit of performance out of our machines. Also, keep in mind that OOP and FP have been invented in a time when memory latency simply wasn’t an issue; RAM access was basically as fast as any other instruction. This has changed dramatically over the last decades!
To be clear: OOP and FP certainly have their place, just not in low-level high-performance code
EDIT: if you want the real fun, have a look at advanced SIMD programming. Often you cannot even write trivial for-loops; instead you need to figure out how to manually reorder and interleave your operations to make the best use of the SIMD registers and prevent CPU pipeline stalls.
I think there’s a misunderstanding. My point was that with SIMD programming sometimes you can’t even do simple sequential programming. Lambda calculus simply does not help you at all when trying to deal with the individual bits of memory in the most efficient way.
I have some experiments going on. I even have a new FFI for JACK to understand the unsafe side and also look for some optimization.
Did you know this about the CLAP plugins? Since they can represent so many things, Haskell could use them as UGen, Synth, and GUI. It looks like an exciting path.
I’m exploring the idea of using JACK2’s potential as a platform for parallelization and determining what would be a good balance with several independent Jack clients (just like non-mixer).
" GHC, since version 8.4, guarantees that garbage collection will never occur during an unsafe call, even in the bytecode interpreter, and further guarantees that unsafe calls will be performed in the calling thread. Making it safe to pass heap-allocated objects to unsafe functions."