Keep it simple stupid. From openAI, simple batch processing, and the simplicity of AWS Sagemaker Algorithms. That’s the theme this week.
Thanks for reading Post Deployment Data Science Newsletter! Subscribe for free to receive new posts and support my work.
Our industry is moving fast, as data scientists we sometime’s forget that the core of our work is simple elegance. We need to strive for simple models, simple architectures, and simple data transformations (when possible of course).
This week we will look at all the conversations that happened about simplicity in data science, learn about batch processing post-deployement (#batchforlife), and how easy to use Sagemaker Algorithms are (🤯)… oh and a data science conspiracy that impacts all of us.
KISS at Open AI
Shots fired
Friendly reminder: The fancier the model, the less likely it is to work.
The art of benchmarking with simple models is something junior data scientist (and some seniors) struggle with.
Something in the water this week was inspiring conversations about simplicity in ML across linkedin and twitter.
One of the things that is unfortunately not simple is evaluating the quality of speech to text models. Especially in an obscure language like flemish. With words like muggengeheugen, a new dialect every 2 streets (ask Niels Nuyttens to pronounce this world and then ask Wiljan Cools too..).
Normally you need humans to review all the files and you calculate a mean opinion score. But Silke Plessers wrote a blog researching using PCA-based Reconstruction error to automatically evaluate quality.
And also unfortunately for Greg at openAI, LLM evaluation is also not simple. Hopefully this research will lead to the automated (quantitative) evaluation of LLMs.
#batchforlife
Deploying models in production is already complex enough. Thankfully most machine learning use cases today need batch processing, not streaming. Maria Vechtomova and co. at wrote a great post about deploying models in batch mode.
Evening as streaming use cases become more common, batch processing isn’t going anywhere. Especially in AWS Sagemaker, where batch transforms make it simple.
Sagemaker Algorithms and their beautiful simplicity
Sagemaker Algorithms allows you to very simply take a model from training to deployment.
There is a conspiracy that effects all data science.
Data science is changing. You need to learn post-deployment data science because all of the value your models bring, only comes once they have been deployed.
I just published an intro course to the concepts of machine learning monitoring. This lays the ground work of the more in-depth course to come.
And in the final hour Raghu Venkat. Indeed it is! Great to have you around.
And of course very one else mentioned in this edition Maria Vechtomova , Silke Plessers , Bojan Tunguz, Ph.D.. There are of course many more people, but these are what i could remember, thanks even if i didn’t mention you here.
Thanks again, until next time, and don’t forget to: