<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Computer Vision - Tag - Vindrin</title><link>https://vindrin.top/tags/computer-vision/</link><description>Computer Vision - Tag - Vindrin</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>vindrin@outlook.com (Vindrin)</managingEditor><webMaster>vindrin@outlook.com (Vindrin)</webMaster><lastBuildDate>Wed, 18 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://vindrin.top/tags/computer-vision/" rel="self" type="application/rss+xml"/><item><title>Local AI Image Classifier</title><link>https://vindrin.top/project/ai-image-classifier/</link><pubDate>Wed, 18 Feb 2026 00:00:00 +0000</pubDate><author>vindrin@outlook.com (Vindrin)</author><guid>https://vindrin.top/project/ai-image-classifier/</guid><description><![CDATA[<h1 id="local-ai-image-classifier">Local AI Image Classifier</h1>
<p>Classify images by description — no cloud API needed, everything runs locally.</p>
<h2 id="how-it-works">How it works</h2>
<p>Uses <strong>OpenAI CLIP</strong> to compute similarity between an image and a list of text labels. The highest-scoring label wins.</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="Copy to clipboard"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">PIL</span> <span class="kn">import</span> <span class="n">Image</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">clip</span><span class="o">,</span> <span class="nn">torch</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">model</span><span class="p">,</span> <span class="n">preprocess</span> <span class="o">=</span> <span class="n">clip</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&#34;ViT-B/32&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">image</span> <span class="o">=</span> <span class="n">preprocess</span><span class="p">(</span><span class="n">Image</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="s2">&#34;photo.jpg&#34;</span><span class="p">))</span><span class="o">.</span><span class="n">unsqueeze</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">labels</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&#34;a cat&#34;</span><span class="p">,</span> <span class="s2">&#34;a dog&#34;</span><span class="p">,</span> <span class="s2">&#34;a car&#34;</span><span class="p">,</span> <span class="s2">&#34;a tree&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="n">text</span> <span class="o">=</span> <span class="n">clip</span><span class="o">.</span><span class="n">tokenize</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">torch</span><span class="o">.</span><span class="n">no_grad</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">logits</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">image</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">probs</span> <span class="o">=</span> <span class="n">logits</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">best</span> <span class="o">=</span> <span class="n">labels</span><span class="p">[</span><span class="n">probs</span><span class="o">.</span><span class="n">argmax</span><span class="p">()]</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Predicted: </span><span class="si">{</span><span class="n">best</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span></span></span></code></pre></div></div>
<h2 id="interface">Interface</h2>
<p>Flask-based web UI. Drop an image, enter labels, get a classification.</p>]]></description></item></channel></rss>