Member-only story

Serving LLM locally using LM Studio

Yuwei Sung
3 min readFeb 25, 2025

--

Thanks to ChatGPT. We don’t need to go through Google search results and “determine” which link gives the right answer. ChatGPT is a service and users require an API token to access the models through the internet. Recently, I found LM Studio developer mode has a “server” option and users can access the downloaded models through compatible OpenAI API. This article shows how to run an LLM server with different models on my laptop, so I can start to write prompt interface.

Follow the doc to download and install LM Studio. This is very straightforward. After opening the app, it prompts what models you want to download. In my case, I choose deepseek-r1-distill-llama-8b. A couple minutes later, the model will be automatically loaded and you can create a new chat and ask questions. Now you have a very small “offline” chat app you can “talk” to.

For example, I typed “what’s capital of 50 states” with my bad English. The assistant spent 27 seconds “thinking” and gave me the “thoughts” and “results”.

Also, LM Studio shows the details of tokens, and stop reason. I am not sure if the assistant is 100% correct on the result though.

--

--

Yuwei Sung
Yuwei Sung

Written by Yuwei Sung

A data nerd started from data center field engineer to cloud database reliability engineer.

No responses yet