As I am beginning to write UGen’s, I am also trying to implement SIMD versions to get the most out them.

Two questions have come up:

- It seems that in my case
`sin()`

and`cos()`

are significantly slower on nova vectors—`vec<float>`

— vs. simply iterating through each sample and calculating sin/cos on each sample. (MacBookPro, 2.9 GHz Intel Core i7)

```
const int vs = nova::vec<float>::size;
const int loops = nSamples / vs;
for (int i = 0; i < loops; ++i)
{
vec<float> r, r2, sinr, cosr, cosrm2, sinr2, cosr2;
r.load_aligned(rotation);
r2 = r * 2;
sinr = sin(r);
cosr = cos(r);
cosrm2 = cosr * 2;
sinr2 = sin(r2);
cosr2 = cos(r2);
cosrm2.store_aligned(cm2);
sinr.store_aligned(s);
cosr.store_aligned(c);
sinr2.store_aligned(s2);
cosr2.store_aligned(c2);
rotation += vs;
cm2 += vs;
s += vs;
c += vs;
s2 += vs;
c2 += vs;
}
```

(`rotation`

, `cm2`

, `s`

, etc. are `float *`

for data buffers, but could be I/O buffers, etc…)

While I haven’t meticulously isolated the cost of sin/cos on these vectors, in the context of the UGen I’m writing, simply swapping out a basic sample-iterating pattern that doesn’t use `nova::vec`

:

```
for (int frm = 0; frm != nSamples; ++frm)
{
float r = *rotation++;
float r2 = r * 2;
float cosr = cos(r);
*s++ = sin(r);
*c++ = cosr;
*cm2++ = cosr * 2;
*s2++ = sin(r2);
*c2++ = cos(r2);
}
```

shows that it’s **twice** as fast. This seems counterintuitive, as I typically see a pretty decent speedup on things like SIMD binary operations.

Is this expected? How this is implemented is a bit obscure to me…

Is there something obviously wrong with how I’m using `nova::vec`

?

I would imagine specific performance is architecture dependent.

If there is indeed a slowdown, this might be affecting other operations?

- This is a speculative question about whether this exists or could be implemented (feature request): Is there something like a “SIMD pointer type” for which there could be defined a custom iterator which steps
`nova::vec<float>::size`

?

Such an iterator could make code like that above more concise:

```
const int vs = nova::vec<float>::size;
const int loops = nSamples / vs;
for (int i = 0; i < loops; ++i)
{
vec<float> r, r2, sinr, cosr, cosrm2, sinr2, cosr2;
r.load_aligned(rotation++);
r2 = r * 2;
sinr = sin(r);
cosr = cos(r);
cosrm2 = cosr * 2;
sinr2 = sin(r2);
cosr2 = cos(r2);
cosrm2.store_aligned(cm2++);
sinr.store_aligned(s++);
cosr.store_aligned(c++);
sinr2.store_aligned(s2++);
cosr2.store_aligned(c2++);
}
```

… and potentially get a performance boost?

Thanks for any insights!