AI Trend2026.07.04

The Five-Percent Disruption: China's MiniMax M3 Rewrites the AI Cost Map

5%의 혁명: 중국 MiniMax M3가 AI 비용 지형도를 다시 그리다

English

한국어

Hook

When Cheaper Also Means Better

'더 싸고 더 빠르다'는 말이 현실이 됐을 때

On June 1, 2026, a Chinese AI startup launched a model that outperformed both GPT-5.5 and Gemini 3.1 Pro on a rigorous coding test — and charged roughly five to ten cents for every dollar its rivals charged. The company was MiniMax, barely known outside China until that morning; the model was M3, and it arrived carrying an architecture that quietly rewrote what was considered possible at any price. Within forty-eight hours of the launch, every major AI research team on the planet was asking the same question: if a new entrant can match frontier benchmarks at one-twentieth the compute cost, what exactly are the closed-model labs selling?

2026년 6월 1일, 한 중국 AI 스타트업이 엄격한 코딩 테스트에서 GPT-5.5와 Gemini 3.1 Pro 모두를 앞지르면서도, 경쟁사가 1달러를 받을 때 5~10센트만 받는 모델을 출시했어요. 그 회사는 그날 아침까지 중국 밖에서는 거의 알려지지 않았던 MiniMax였고, 모델은 M3였어요. M3는 어떤 가격대에서도 가능하다고 여겨지던 것을 조용히 다시 쓰는 아키텍처를 품고 등장했어요. 출시 48시간 안에, 전 세계 모든 주요 AI 연구팀이 같은 질문을 던지고 있었어요. 신규 진입자가 프론티어 벤치마크를 연산 비용의 20분의 1로 달성할 수 있다면, 폐쇄형 모델 연구소들이 팔고 있는 것은 정확히 무엇인가요?

Under the hood

The Architecture Nobody Expected

아무도 예상하지 못한 아키텍처

The secret inside M3 is an attention mechanism called MiniMax Sparse Attention, or MSA — a way of telling the model which parts of a very long document actually matter and which parts it can safely ignore while maintaining a context window of one million tokens. Standard attention — the engine inside almost every large language model since 2017 — compares every word in a text to every other word, a process whose compute cost grows with the square of the document length; sparse attention breaks that curve. MSA identifies the most relevant tokens first and computes full attention only on those — delivering 15.6 times faster decoding and 9.7 times faster document intake than MiniMax's own previous generation at a one-million-token context, at just one-twentieth the compute cost. The one-million-token context window means M3 can ingest the complete source code of a mid-sized software product, the entire chat history of a customer-support team, or five years of financial filings — all at once, without truncation, and process them as a single coherent whole.

M3 안에 숨어 있는 비밀은 MiniMax Sparse Attention, 즉 MSA라는 어텐션 메커니즘이에요. 이는 100만 토큰의 컨텍스트 윈도우를 유지하면서, 매우 긴 문서에서 어떤 부분이 실제로 중요하고 어떤 부분은 안전하게 무시해도 되는지를 모델에 알려주는 방법이에요. 표준 어텐션, 즉 2017년 이후 거의 모든 대형 언어 모델의 엔진은 텍스트의 모든 단어를 다른 모든 단어와 비교해요. 이 과정은 문서 길이의 제곱에 비례해 연산 비용이 늘어나는데, 희소 어텐션은 이 곡선을 깨뜨려요. MSA는 먼저 가장 관련성 높은 토큰을 파악하고 그 토큰에 대해서만 전체 어텐션을 계산해요. 그 결과 100만 토큰 컨텍스트에서 이전 세대보다 15.6배 빠른 디코딩, 9.7배 빠른 문서 처리를 연산 비용의 20분의 1로 달성했어요. 100만 토큰 컨텍스트 윈도우는 M3가 중형 소프트웨어 제품의 전체 소스코드, 고객 지원팀의 전체 채팅 내역, 또는 5년치 금융 공시를 모두 한꺼번에, 잘림 없이, 하나의 일관된 전체로 처리할 수 있다는 뜻이에요.

What it means for the market

The End of the Compute Moat?

AI의 '규모 해자'가 무너지고 있다

For two years, the dominant narrative held that scale was a moat — that OpenAI, Google, and Anthropic had structural advantages rooted in massive training runs that smaller or newer labs could not afford to replicate, and that their closed models would always outperform open alternatives. MiniMax M3 did not match that scale — its training budget was not publicly disclosed — but it matched, and in several tests exceeded, the performance of far more resource-intensive models by exploiting architectural efficiency rather than raw compute, suggesting the moat was always less about money than about ideas. On SWE-Bench Pro — a that simulates the full cycle of a software engineer fixing bugs in real open-source repositories — M3 scored higher than GPT-5.5, and on BrowseComp, which tests autonomous web research, its score of 83.5 surpassed Claude Opus 4.7 at 79.3. The pricing gap is the number that stops a conversation: at $0.30 per million input tokens and $1.20 per million output tokens during its launch period, M3's inference cost undercuts Gemini 3.1 Pro's comparable tier by roughly ninety percent and GPT-5.5 by a similar margin.

2년 동안 지배적인 서사는 규모가 해자라는 것이었어요. OpenAI, Google, Anthropic은 소규모 또는 신규 연구소가 복제할 여유가 없는 대규모 훈련 실행에 뿌리를 둔 구조적 우위를 가지며, 폐쇄형 모델이 오픈 대안을 항상 앞설 것이라는 믿음이었어요. MiniMax M3는 그 규모에 맞서지 않았어요. 학습 예산은 공개되지 않았지만, M3는 순수한 연산 대신 아키텍처 효율을 활용해 훨씬 많은 자원을 투입한 모델의 성능에 맞먹고 일부 테스트에서는 넘어섰어요. 이는 그 해자가 항상 돈보다 아이디어에 관한 것이었음을 시사해요. 실제 오픈소스 저장소에서 버그를 수정하는 소프트웨어 엔지니어의 전체 주기를 시뮬레이션하는 벤치마크 SWE-Bench Pro에서 M3는 GPT-5.5보다 높은 점수를 받았어요. 자율 웹 리서치를 테스트하는 BrowseComp에서는 83.5점으로 Claude Opus 4.7의 79.3점을 앞질렀어요. 대화를 멈추게 만드는 숫자는 가격 격차예요. 출시 기간 동안 입력 토큰 100만 개당 $0.30, 출력 토큰 100만 개당 $1.20으로, M3의 추론 비용은 비교 가능한 Gemini 3.1 Pro 요금제를 약 90%, GPT-5.5를 비슷한 폭으로 밑돌아요.

Korea angle

Why Korean Builders Should Pay Close Attention

한국 개발자와 스타트업이 주목해야 할 이유

Korean AI companies have faced a painful cost equation since large language models went mainstream: training or licensing frontier-level models with strong Korean-language capability is expensive enough that only Naver, Kakao, and Samsung have been able to sustain serious investment in it. An open-weight M3 — available to download and fine-tune on Korean-language data without per-query fees — could materially lower the floor for hundreds of Korean startups trying to build AI-native products in legal tech, electronic health records, education, and local e-commerce. There is a geopolitical wrinkle that every Korean enterprise must address: MiniMax is a Chinese company, and any organization deploying its model will need to assess data-residency requirements under Korea's Personal Information Protection Act, especially in finance and healthcare where user data is tightly regulated.

한국 AI 기업들은 대형 언어 모델이 주류화된 이후 고통스러운 비용 방정식에 직면해 왔어요. 뛰어난 한국어 능력을 갖춘 프론티어급 모델을 훈련하거나 라이선스를 얻는 것은 네이버, 카카오, 삼성 정도만 진지하게 투자를 이어갈 수 있을 만큼 비쌌어요. 쿼리당 요금 없이 한국어 데이터로 다운로드·파인튜닝할 수 있는 오픈 웨이트 M3는 법률 기술, 전자 의무기록, 교육, 로컬 이커머스에서 AI 네이티브 제품을 구축하려는 수백 개의 한국 스타트업의 진입 장벽을 실질적으로 낮출 수 있어요. 모든 한국 기업이 해결해야 할 지정학적 문제가 있어요. MiniMax는 중국 기업이기 때문에 해당 모델을 배포하는 조직은 특히 사용자 데이터가 엄격히 규제되는 금융·의료 분야에서 개인정보 보호법상 데이터 보관 요건을 검토해야 해요.

What you can do

Three Moves Worth Making This Month

지금 당장 해볼 수 있는 세 가지 행동

Before dismissing M3 as just another Chinese model, run it on one of your actual internal tasks — customer-feedback summarization, internal document search, or code review — and compare the output quality against your current solution at the same prompt; the answer may surprise you. Next, watch for the open-weight release expected within weeks of this article's publication: downloading and self-hosting the model weights means you control the data pipeline end-to-end, which removes the most significant objection for regulated industries and makes a true total-cost-of-ownership comparison possible. Whether you are a developer choosing a model API or a manager setting your company's AI budget for Q4, the question worth asking is no longer "which model is the best?" but "at five percent of the cost, how much better does the pricier model need to be to justify the gap — and can you even measure that difference on your actual workload?"

M3를 그저 또 다른 중국 모델로 치부하기 전에, 고객 피드백 요약, 내부 문서 검색, 또는 코드 리뷰 등 실제 내부 작업에 직접 돌려보고 같은 프롬프트에서 현재 솔루션과 출력 품질을 비교해 보세요. 결과가 놀라울 수도 있어요. 다음으로, 이 글 발행일로부터 몇 주 안에 예정된 오픈 웨이트 공개를 주목하세요. 모델 가중치를 다운로드해 자체 호스팅하면 데이터 파이프라인을 처음부터 끝까지 직접 통제할 수 있어 규제 산업에서 가장 큰 반대 이유가 사라지고, 진정한 총소유비용 비교가 가능해져요. 여러분이 모델 API를 선택하는 개발자든 Q4 AI 예산을 책정하는 관리자든, 이제 물어볼 만한 질문은 "어떤 모델이 최고인가?"가 아니라 "비용이 5%라면, 더 비싼 모델이 그 차이를 정당화할 만큼 얼마나 더 뛰어나야 하는가, 그리고 실제 업무에서 그 차이를 측정할 수 있는가?"예요.

KEY TERMS

sparse attentionn. phrase희소 어텐션, 트랜스포머 모델에서 모든 토큰을 계산하는 대신 중요한 토큰만 선택해 계산하는 효율적인 주의 메커니즘
open-weight modeln. phrase오픈 웨이트 모델, 학습된 모델의 가중치(내부 파라미터)를 공개해 누구나 다운로드하거나 직접 실행·조정할 수 있는 AI 모델
context windown. phrase컨텍스트 윈도우, AI 모델이 한 번에 처리할 수 있는 텍스트의 최대 분량으로, 클수록 더 긴 문서나 대화를 처리할 수 있음
benchmarkn.벤치마크, AI 모델의 성능을 표준화된 과제로 측정해 비교하는 평가 기준 또는 시험
inference costn. phrase추론 비용, AI 모델이 실제로 답변을 생성할 때 드는 연산·클라우드 비용으로, AI 서비스 수익성에 직결됨

0 / 17 pairs explored