<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>学习 - 标签 - Vindrin</title><link>https://vindrin.top/zh-cn/tags/%E5%AD%A6%E4%B9%A0/</link><description>学习 - 标签 - Vindrin</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>vindrin@outlook.com (Vindrin)</managingEditor><webMaster>vindrin@outlook.com (Vindrin)</webMaster><lastBuildDate>Sun, 05 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://vindrin.top/zh-cn/tags/%E5%AD%A6%E4%B9%A0/" rel="self" type="application/rss+xml"/><item><title>学机器学习之前我希望知道的事</title><link>https://vindrin.top/zh-cn/posts/ml-lessons-learned/</link><pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate><author>vindrin@outlook.com (Vindrin)</author><guid>https://vindrin.top/zh-cn/posts/ml-lessons-learned/</guid><description><![CDATA[<h1 id="学机器学习之前我希望知道的事">学机器学习之前我希望知道的事</h1>
<p>花了三个月刷 ML 教程，这是我想在开始时告诉自己的话。</p>
<h2 id="1-先学数学再写代码">1. 先学数学，再写代码</h2>
<p>你可以在不理解的情况下复制粘贴模型代码，但出问题了就完全不知道从哪下手。把线性代数和概率基础打扎实，后面会省很多时间。</p>
<h2 id="2-从小开始">2. 从小开始</h2>
<p>别一上来就搞 GPT 或者扩散模型。先从 CSV 文件上的线性回归开始，真正搞懂什么是损失函数。</p>
<h2 id="3-sklearn-是最好的老师">3. Sklearn 是最好的老师</h2>
<p>在碰 PyTorch 或 TensorFlow 之前，先把 <code>scikit-learn</code> 用熟。它把概念讲得最清楚。</p>
<div class="code-block code-line-numbers open" style="counter-reset: code-block 0">
    <div class="code-header language-python">
        <span class="code-title"><i class="arrow fas fa-angle-right" aria-hidden="true"></i></span>
        <span class="ellipses"><i class="fas fa-ellipsis-h" aria-hidden="true"></i></span>
        <span class="copy" title="复制到剪贴板"><i class="far fa-copy" aria-hidden="true"></i></span>
    </div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">LinearRegression</span>
</span></span><span class="line"><span class="cl"><span class="n">model</span> <span class="o">=</span> <span class="n">LinearRegression</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="n">model</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span></span></span></code></pre></div></div>
<h2 id="4-过拟合是真实存在的问题">4. 过拟合是真实存在的问题</h2>
<p>你的模型在训练集上表现很好，但在新数据上一塌糊涂。尽早学训练/测试集划分、交叉验证和正则化。</p>
<h2 id="5-数据处理占-80-的工作量">5. 数据处理占 80% 的工作量</h2>
<p>清洗数据、处理缺失值、特征工程——大部分时间都在这上面。接受它。</p>]]></description></item></channel></rss>