Member-only story
Serving LLM locally using LM Studio
Thanks to ChatGPT. We don’t need to go through Google search results and “determine” which link gives the right answer. ChatGPT is a service and users require an API token to access the models through the internet. Recently, I found LM Studio developer mode has a “server” option and users can access the downloaded models through compatible OpenAI API. This article shows how to run an LLM server with different models on my laptop, so I can start to write prompt interface.
Follow the doc to download and install LM Studio. This is very straightforward. After opening the app, it prompts what models you want to download. In my case, I choose deepseek-r1-distill-llama-8b. A couple minutes later, the model will be automatically loaded and you can create a new chat and ask questions. Now you have a very small “offline” chat app you can “talk” to.

For example, I typed “what’s capital of 50 states” with my bad English. The assistant spent 27 seconds “thinking” and gave me the “thoughts” and “results”.

Also, LM Studio shows the details of tokens, and stop reason. I am not sure if the assistant is 100% correct on the result though.