All Posts
News bits
GoogleがStaxを公開、LLM評価を効率化するツール
GoogleがStaxを公開、LLM評価を効率化するツール
*現在、日本では利用不可。
GoogleがStaxを公開。LLM評価を効率化する実験的な開発者ツールで、“vibe testing”から脱却し、本格的な評価を可能にする。
Staxでは事前構築されたAutorater(LLM-as-a-judge)が提供されており、データセットをアップロードすれば、すぐに利用出来る。カスタムAutoraterの構築も可能。
このツールにより、LLM搭載アプリケーションの品質向上とデータ駆動型の意思決定が可能になる。
出展:Stop “vibe testing” your LLMs. It’s time for real evals. - Google Developers Blog
著者について
Hi there. I'm hrdtbs, a frontend expert and technical consultant. I started my career in the creative industry over 13 years ago, learning on the job as a 3DCG modeler and game engineer in the indie scene.
In 2015 I began working as a freelance web designer and engineer. I handled everything from design and development to operation and advertising, delivering comprehensive solutions for various clients.
In 2016 I joined Wemotion as CTO, where I built the engineering team from the ground up and led the development of core web and mobile applications for three years.
In 2019 I joined matsuri technologies as a Frontend Expert, and in 2020 I also began serving as a technical manager supporting streamers and content creators.
I'm so grateful to be working in this field, doing something that brings me so much joy. Thanks for stopping by.