Blog

  • Running AI Locally – First impressions

    The Generative AI and it’s impact on my Daily development work

    So far, I had tried to avoid the Generative AI buzz in my usage and also in my learning for as long as possible. But it is becoming harder and harder with every passing day to ignore the GenAI buzz.

    In last few months, I was exposed to more and more GenAI as a user.. Firstly, due to free (but limited) access to GitHub Copilot. Then thanks to work account – I got Copilot Business subscription as well. I really doubt that I have made any special use of this Business plan yet.

    Due to this access, I also connected my vs-code to GitHub copilot and now it provides me code completions on every key I type, which is bewildering (and sometimes actually plain irritating too – due to bad UX of mixing my text with suggested text)

    Side effect of this is that, I am using more GenAI than I would have imagined 6 months ago. While, I still have not been able to create anything substantial of value out of this, I would say that I got at some decent experience with it. So, I thought I will document it here.

    The good

    The amount of knowledge AI possesses if mind boggling! And the fact that it can coherently provide that information in easy to use digest is amazing too!

    The bad

    But.. it is little too confident about the quality of content it dishes out. I see AI hallucinations literally every day in my AI interactions! I feel that GenAI should know and indicate what it is confident in saying and what is “made up” output!

    The ugly

    The most disturbing part about Copilot and Gemini and such is that all the large AI models are controlled by small handful of companies in the world. They already know so much about us and now by making us dependent upon these AI assistants, they can soon have ironclad grip on our life! Once we become overly dependent upon it, it would be easy for them to control pricing, access, no. of tokens etc and we (developers) would have lost our patience to code the entire applications by then. That’s when those large Gen AI models and the companies behind it win!

    Can I target and solve that ugly part?

    I decidedly hate the fact that this is making and entire developer community dependent upon some extremely large computer to suggest code to us. And yet, I can see the value these AI models can bring to our productivity (at least when you already know the principles)

    So – I decided to dig into running the AI models locally!

    The Local AI

    Thankfully, there are many AI models – even the most complete ones – available for free (to use) on internet. So we can download them and run them our own hardware. The tooling around it has also matured a lot in last 6-12 months. Lets see how we can get Local AI setup.

    The Hardware

    First thing that I started to understand is what hardware am I going to need. Internet obviously pointed me to a GPU. I also slowly learnt that not all GPUs are same when it comes to GenAI. GenAI developers are really fond of Nvidia GPUs!

    Now, I had a Nvidia GPU (GTX 1650 with 4GB VRAM) sitting in my older laptop which I just retired. Its a pity that I never ran any AI model with it. Again, people on internet will promptly tell you that GTX 1650 with 4GB VRAM is so weak a GPU that Gen AI models can barely run on it!

    Some of those claims were true and some were not. Yes, 4GB GTX 1650 is really a very basic GPU, but it still works fine! You can run fairly decent AI models with it. Specifically, many newer models are coming up which are getting optimized to work with lower resources e.g. Phi 3 model from MS.

    I tried to use another model – optimized for coding – Qwen3-4b-thinking model with my GPU and it could spit out BRDs, and code snippets with impressive (and somewhat acceptable) token generation speed (20 tokens/s with 1.21s for 1st token). See last part of the post about how to set this up and get going.

    The Alternative GPU of new world – NPU

    A tangential research lead me to NPU (Neural processing unit) which is now available in newer copilot PCs. I recently purchased Lenovo Yoga 7 laptop that came with a NPU. I was told not to expect much from a NPU as it can only go up to 50 tops. NPUs are specialized to work as low powered AI optimizer. So they don’t drain the battery and provide AI acceleration.

    Sounds Cool, eh? Well but beware! There aren’t many easy ways to consume NPU in the traditional AI chatbot UI just yet (as of Aug 2025). So, if you try to run Local AI tools and use this above qwen3 model, it will not make any use of the NPU. It will simply use integrated AMD GPU while NPU sits idle.

    Image generation

    Another aspect of GenAI is – image generation – generating an image based on text prompt or based on another image. I decided to dig a bit deeper on how this is achieved (again without getting locked into large models offered by big corporations), I learnt about stable diffusion model. You can run this model just like other normal GenAI models. This of course is lot slower (heavier on resources) but it works. When I tried using this model on new laptop, I was able to generate simple images using GPU (almost takes 1-2 minutes to generate a 20pass image).

    Image generation on AMD NPU

    Now, this is where it gets interesting! While I was not able to use any ChatGPT style prompter for text GenAI with AMD NPU, AMD and Amuse have partnered to come up with AMD specific models and their integration in Amuse to allow utilizing NPU for generating images! So that was pretty impressive to me. AMD already offers some AMD Ryzen NPU specific model variants for Text GenAI but so far, they cannot be simply pulled and used via local ai tools.

    Note about AI model licensing

    It is very important to highlight that licensing of each model is different. And just because you are not using e.g. Copilot or Grok and running things locally does not mean you can use it for commercial purposes. Please check model license before any using the model output for commercial purposes.

    How to setup local AI

    Ok! So, if you really made it this far, I congratulate you! You really seem interested in knowing how to get AI models to run locally! Alright! lets get our feet wet!

    You are going to need some hardware support – at least like integrated GPU otherwise, you would staring at a very slow and hot CPU!

    Firstly, you will need a tool like ollama or LMStudio. Both offer great OS support so use with your current OS.

    Then you need to choose the model. Models has suffixes like 4g,7b etc – which stand of how many billion parameters the model uses.higher the number, better the prediction would be. But that also means larger the amount of resources, it is going to consume. Typically, a lower end 4GB GPU can only work relatively ok with 4b models.

    So you can choose and try something like qwen-3 / gemma-3 / mistral / stable coder with 4b instruction set.

    The next wanderings

    Of course, I have so far only scratched the surface of Local AI.

    • I still have not connected vscode to locally running AI model.
    • I want to try few different models like Phi-3, stable-coder, mistral, etc to see which works better and provide better quality output for similar prompts.
    • I would also like to run my local AI on older laptop with dedicated GPU and connect vscode from my current laptop via network to remote local-AI!
    • Also FastFlowLM / GAIA could be a game changer for NPU based models. I should try that out

    I hope I can write another blog about that in near future!

  • k8s version upgrade nuisance – apt’s extreme configurability to the rescue

    Ever since last year’s migration of kubernetes’ APT repo to community repository, every kubernetes upgrade has become a minor nuisance for people like me who have proxy repositories sitting in between the k8s apt repo and actual target kubernetes nodes.

    Before migration to community apt repository, the old apt-repository – apt.kubernetes.io – was a single repository for all kubernetes versions toghether. This was simple to get proxied as we can just add apt.kubernetes.io repo in nexus / artifactory and use that proxy repo path as our apt repositories for machines.

    Now, in new world, each k8s version has it’s own repo e.g. https://pkgs.k8s.io/core:/stable:/v1.31/deb/. I cannot fathom the reason for such decision!

    So each time you are going to upgrade k8s version, you need to update the nexus apt proxy repository configuration to update the repository upstream path to have correct upstream version. This also means, we cannot properly have multiple kubernetes versions supported via proxy properly.

    While I have grown accustomed to this nuisance in last few months, recently, I stared observing another blocker issue – espl when I started upgrading OS for our kubernetes nodes to Ubuntu 24.04.

    apt version in Ubuntu 24.04 OS is more strict about where is it fetching packages from, and how is upstream behaving etc. So I started getting new errors when trying to use apt with nexus upstream.

    Error 1 – Enforcement of signature verification

    W: GPG error: https://nexus.xyz.com/repository/k8s-package-proxy jammy InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 7EA0A9C3F273FCD8
    W: GPG error: https://nexus.xyz.com/repository/k8s-package-new-proxy  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 234654DA9A296436

    You can find the new requirements of apt failing with error rather than warning documented in this thread.

    As unfortunate as it is, nexus apt plugin does not proxy the public key from upstream.. not does it provide it’s own. So if you are using nexus as apt proxy, you must work around above error. See below how it can be done.

    Error 2 – Complaining that I have changed my proxy repo to repoint to new k8s version

    As noted above, with every kubernetes version, I had to update location in nexus apt proxy configuration. But with Ubuntu 24.04, this change started breaking with below errors on node OS upgrade.

    E: Repository 'https://nexus.xyz.com/repository/k8s-package-new-proxy  InRelease' changed its 'Origin' value from 'obs://build.opensuse.org/isv:kubernetes:core:stable:v1.29/deb' to 'obs://build.opensuse.org/isv:kubernetes:core:stable:v1.30/deb'
    E: Repository 'https://nexus.xyz.com/repository/k8s-package-new-proxy  InRelease' changed its 'Label' value from 'isv:kubernetes:core:stable:v1.29' to 'isv:kubernetes:core:stable:v1.30'

    After running around searching for solution, I finally got solution to work around both the issues. I was thankful that I was dealing with a software written to address all such special needs via configuration while keeping sensible defaults and also one with great amount of documentation and huge community of users and mainteners!

    Solution(s)

    apt allowed me to create a new apt-conf file. e.g. /etc/apt/apt.conf.d/99-allow-nexus with below content:

    # NOTE: you should use below settings after careful consideration.
    # If used incorrectly, you can trust a malicious online apt repository to install packages in your environment! You have been warned!!
    # Fix issue 1 - ignore the absense of signature for apt update
    Acquire::AllowInsecureRepositories "true";
    # Fix issue 1 - ignore the absense of signature for apt install
    APT::Get::AllowUnauthenticated "true";
    # Fix issue 2 - Ignore the fact that upstream has changes various release information like Label, release, etc.
    Acquire::AllowReleaseInfoChange "true";

    Once we provisioned above config file to our machines, all our machines got upgraded to Ubuntu 24.04 (and our k8s packages installed on them) without a hitch!

    … so configurability in apt package saved the day!

    Hope you enjoyed reading this quick bite!

    Now – I need to run to k8s infra mainteners to ask them why we cannot have a single root apt repository for all the versions! Until then – bye!

  • Byobu Tips

    I use Byobu as my default terminal multiplexer for many years now. Best part is Byobu comes pre-installed in ubuntu. My experience is largely with Ubuntu so I am not sure if CentOS/RHEL and other Linux OS provide this package OOTB or not.

    Internally Byobu uses tmux as the multiplexer. So many of the features are also inherited from tmux but basically, byobu makes it so much easier to exploit the tmux features in those cases.

    Some of the not-so-obvious features that I like about Byobu are:
    save scrollback buffer to file! You can use printscreen facility in Byobu. Hit Ctrl + F7 to open current window history in your editor. Then you can save that file somewhere for your records. Very handy when you are making some changes and want to keep record of what happened.
    rename the tabs. You can create multiple tabs / windows using F2 key and navigate between them using F3 and F4 keys. But those tabs usually get just number index. But you can F8 to rename those tabs to have more meaningful names like dev and prod etc.
    Resize the panes in the given tab – In a given tab of byobu can create further split screens using Ctrl+F2 and Shift+F2 keys. Those will split current screen area into vertical or horizontal half. But then sometimes, you want to resize these panes to have better viewing area for some tasks at hand. You can use Shift + Alt +arrow keys to resize those panes.
    Re-arrange the panes – Sometimes, you are not happy that you created panes by splitting screen horizontally, then you use Shift+F8 (sometimes twice) to cycle through various different layout changes offered by Byobu to suit a new layout for yourself.

    For now, that’s it. Do you have your favourite trick for Byobu? leave a message as I would like to learn it as well!

  • Add `allowVolumeExpansion: true` to your storage class

    StorageClass

    If you have a Kubernetes cluster running for couple of years, you would probably have legacy StorageClass defined which does not contain allowVolumeExpansion: true bit set. If this bit is not set on your storageClass, then you cannot expand your PVCs which are using this storageClass.

    But, if you try to edit storage class to add this attribute to the specs, kubernetes does not allow you to do so saying that this field is immutable and cannot be changed after definition.

    So now, how do we expand these PVCs which are created using legacy storage class without allowVolumeExpansion bit set?

    It turns out – its quite easy. You can delete your old storage class without affecting your PVCs. And recreate it again, with same name, but this time with allowVolumeExpansion: true in the sepcs.

    Now the old PVCs which were creating using the said StorageClass can now be expanded, automatically!!

    That’s it for today!

  • push / pull images without Docker

    Increasingly, docker is being replaced with containerd in kubernetes clusters. These days, in my daily work, I am dealing with clusters almost exclusively with containerd as CRI. In such cases, sometimes there is a need to pull or push some images to say, internal / private registries. without docker cli installed and docker daemon running, how do we achieve this? Read on.

    Option 1 – use ctr – the containerd CLI

    ContainerD gets installed with its own CLI – ctr – which is lot more minimal than docker cli. But fortunately, push and pull commands are available. I am yet to find a way to see the logs and exec into a containers ( at least in v1.4)

    # pull image using ctr 
    sudo ctr i pull docker.io/utkuozdemir/pv-migrate-sshd:1.0.0
    
    # tag the image for your internal container registry
    sudo ctr i tag docker.io/utkuozdemir/pv-migrate-sshd:1.0.0 my-internal-nexus:5002/pv-migrate-sshd:1.0.0
    
    # Push to internal container registry server
    sudo ctr i push --platform linux/amd64 my-internal-nexus:5002/pv-migrate-sshd:1.0.0

    One thing to remember about ctr is that it expects that you have image for all the platforms to push. If you have downloaded the image, it would only be for one platform say.. linux-amd64. So in push command, we must specify --platform linux/amd64. If we don’t do this, ctr push command fails with very cryptic error about not having image content.

    Option 2 – use crictl

    kubernetes CRI team has created crictl which you can download and install and it provides docker like CLI to deal with containers.

    crictl is certainly much easier to use than ctr but downloading arbitary things on your servers is not an option in many places, rightfully, due to security concerns. In such cases, ctr is the only option you are left with.

    That’s it for now.