Unlike text, which is relatively uniform, spoken language is richly-layered-with cultural nuances, colloquialisms, and emotion. Startups building voice-first AI models are now doubling down on one thing above all else: the depth and diversity of datasets.