There’s a lot of new hype everyday around new models that can do language + images + audio etc all at once. There’s a convenience to it, but is it the best way? Can’t help but think ”jack of all trades” when a new one is announced.
I suppose it could be because of the chase towards the first AGI, but if you’re building an AI tool or platform, don’t you want a specialized model rather than a general one?
Would be interested to hear if anyone here has actually built something on top of this — or tried and hit a wall.