The Generative AI and it’s impact on my Daily development work
So far, I had tried to avoid the Generative AI buzz in my usage and also in my learning for as long as possible. But it is becoming harder and harder with every passing day to ignore the GenAI buzz.
In last few months, I was exposed to more and more GenAI as a user.. Firstly, due to free (but limited) access to GitHub Copilot. Then thanks to work account – I got Copilot Business subscription as well. I really doubt that I have made any special use of this Business plan yet.
Due to this access, I also connected my vs-code to GitHub copilot and now it provides me code completions on every key I type, which is bewildering (and sometimes actually plain irritating too – due to bad UX of mixing my text with suggested text)
Side effect of this is that, I am using more GenAI than I would have imagined 6 months ago. While, I still have not been able to create anything substantial of value out of this, I would say that I got at some decent experience with it. So, I thought I will document it here.
The good
The amount of knowledge AI possesses if mind boggling! And the fact that it can coherently provide that information in easy to use digest is amazing too!
The bad
But.. it is little too confident about the quality of content it dishes out. I see AI hallucinations literally every day in my AI interactions! I feel that GenAI should know and indicate what it is confident in saying and what is “made up” output!
The ugly
The most disturbing part about Copilot and Gemini and such is that all the large AI models are controlled by small handful of companies in the world. They already know so much about us and now by making us dependent upon these AI assistants, they can soon have ironclad grip on our life! Once we become overly dependent upon it, it would be easy for them to control pricing, access, no. of tokens etc and we (developers) would have lost our patience to code the entire applications by then. That’s when those large Gen AI models and the companies behind it win!
Can I target and solve that ugly part?
I decidedly hate the fact that this is making and entire developer community dependent upon some extremely large computer to suggest code to us. And yet, I can see the value these AI models can bring to our productivity (at least when you already know the principles)
So – I decided to dig into running the AI models locally!
The Local AI
Thankfully, there are many AI models – even the most complete ones – available for free (to use) on internet. So we can download them and run them our own hardware. The tooling around it has also matured a lot in last 6-12 months. Lets see how we can get Local AI setup.
The Hardware
First thing that I started to understand is what hardware am I going to need. Internet obviously pointed me to a GPU. I also slowly learnt that not all GPUs are same when it comes to GenAI. GenAI developers are really fond of Nvidia GPUs!
Now, I had a Nvidia GPU (GTX 1650 with 4GB VRAM) sitting in my older laptop which I just retired. Its a pity that I never ran any AI model with it. Again, people on internet will promptly tell you that GTX 1650 with 4GB VRAM is so weak a GPU that Gen AI models can barely run on it!
Some of those claims were true and some were not. Yes, 4GB GTX 1650 is really a very basic GPU, but it still works fine! You can run fairly decent AI models with it. Specifically, many newer models are coming up which are getting optimized to work with lower resources e.g. Phi 3 model from MS.
I tried to use another model – optimized for coding – Qwen3-4b-thinking model with my GPU and it could spit out BRDs, and code snippets with impressive (and somewhat acceptable) token generation speed (20 tokens/s with 1.21s for 1st token). See last part of the post about how to set this up and get going.
The Alternative GPU of new world – NPU
A tangential research lead me to NPU (Neural processing unit) which is now available in newer copilot PCs. I recently purchased Lenovo Yoga 7 laptop that came with a NPU. I was told not to expect much from a NPU as it can only go up to 50 tops. NPUs are specialized to work as low powered AI optimizer. So they don’t drain the battery and provide AI acceleration.
Sounds Cool, eh? Well but beware! There aren’t many easy ways to consume NPU in the traditional AI chatbot UI just yet (as of Aug 2025). So, if you try to run Local AI tools and use this above qwen3 model, it will not make any use of the NPU. It will simply use integrated AMD GPU while NPU sits idle.
Image generation
Another aspect of GenAI is – image generation – generating an image based on text prompt or based on another image. I decided to dig a bit deeper on how this is achieved (again without getting locked into large models offered by big corporations), I learnt about stable diffusion model. You can run this model just like other normal GenAI models. This of course is lot slower (heavier on resources) but it works. When I tried using this model on new laptop, I was able to generate simple images using GPU (almost takes 1-2 minutes to generate a 20pass image).
Image generation on AMD NPU
Now, this is where it gets interesting! While I was not able to use any ChatGPT style prompter for text GenAI with AMD NPU, AMD and Amuse have partnered to come up with AMD specific models and their integration in Amuse to allow utilizing NPU for generating images! So that was pretty impressive to me. AMD already offers some AMD Ryzen NPU specific model variants for Text GenAI but so far, they cannot be simply pulled and used via local ai tools.
Note about AI model licensing
It is very important to highlight that licensing of each model is different. And just because you are not using e.g. Copilot or Grok and running things locally does not mean you can use it for commercial purposes. Please check model license before any using the model output for commercial purposes.
How to setup local AI
Ok! So, if you really made it this far, I congratulate you! You really seem interested in knowing how to get AI models to run locally! Alright! lets get our feet wet!
You are going to need some hardware support – at least like integrated GPU otherwise, you would staring at a very slow and hot CPU!
Firstly, you will need a tool like ollama or LMStudio. Both offer great OS support so use with your current OS.
Then you need to choose the model. Models has suffixes like 4g,7b etc – which stand of how many billion parameters the model uses.higher the number, better the prediction would be. But that also means larger the amount of resources, it is going to consume. Typically, a lower end 4GB GPU can only work relatively ok with 4b models.
So you can choose and try something like qwen-3 / gemma-3 / mistral / stable coder with 4b instruction set.
The next wanderings
Of course, I have so far only scratched the surface of Local AI.
- I still have not connected vscode to locally running AI model.
- I want to try few different models like Phi-3, stable-coder, mistral, etc to see which works better and provide better quality output for similar prompts.
- I would also like to run my local AI on older laptop with dedicated GPU and connect vscode from my current laptop via network to remote local-AI!
- Also FastFlowLM / GAIA could be a game changer for NPU based models. I should try that out
I hope I can write another blog about that in near future!
Leave a Reply