I think we have barely scratched the surface of post-trained inference/generative model inference efficiency.
A uniquely efficient hardware stack, for either training or inference, would be a great moat in an industry that seems to offer few moats.
I keep waiting to here of more adoption of Cerebras Systems' wafer-scale chips. They may be held back by not offering the full hardware stack, i.e. their own data centers optimized around wafer-scale compute units. (They do partner with AWS, as a third party provider, in competition with AWS own silicon.)
A uniquely efficient hardware stack, for either training or inference, would be a great moat in an industry that seems to offer few moats.
I keep waiting to here of more adoption of Cerebras Systems' wafer-scale chips. They may be held back by not offering the full hardware stack, i.e. their own data centers optimized around wafer-scale compute units. (They do partner with AWS, as a third party provider, in competition with AWS own silicon.)