<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Ffeed.xml" rel="self" type="application/atom+xml" /><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2F" rel="alternate" type="text/html" /><updated>2026-03-05T13:59:10+00:00</updated><id>https://blog.scikit-learn.org/feed.xml</id><title type="html">scikit-learn Blog</title><subtitle>The official blog of scikit-learn, an open source library for machine learning in Python.</subtitle><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><entry><title type="html">Update on array API adoption in scikit-learn</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fupdates%2Fupdate-array-api%2F" rel="alternate" type="text/html" title="Update on array API adoption in scikit-learn" /><published>2026-03-05T00:00:00+00:00</published><updated>2026-03-05T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/updates/update-array-api</id><content type="html" xml:base="https://blog.scikit-learn.org/updates/update-array-api/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Farray-api-scikit-learn-2026-feature.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://github.com/lucyleeow" href="iframe.php?url=https%3A%2F%2Fgithub.com%2Flucyleeow" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Flucyliu.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Lucy Liu</a>
     

<br /><br />

</div>
<p><em>Note: this blog post is a cross-post of a <a href="iframe.php?url=https%3A%2F%2Flabs.quansight.org%2Fblog%2Farray-api-scikit-learn-2026">Quansight Labs blog post</a>.</em></p>

<p>The <a href="iframe.php?url=https%3A%2F%2Fdata-apis.org%2F">Consortium for Python Data API Standards</a>
developed the <a href="iframe.php?url=https%3A%2F%2Fdata-apis.org%2Farray-api%2F">Python array API standard</a>
to define a consistent interface for array libraries, specifing core
operations, data types, and behaviours. This enables ‘array-consuming’
libraries (such as scikit-learn) to write array-agnostic code that can
be run on any array API compliant backend. Adopting array API support in scikit-learn
means that users can pass arrays from any array API compliant library to
functions that have been converted to be array-agnostic. This is useful because it
allows users to take advantage of array library features, such as hardware
acceleration, most notably via GPUs.</p>

<p>Indeed, GPU support in scikit-learn has been of interest for a long time - 11 years
ago, we added an entry to our FAQ page explaining that we had no plans to add GPU
support in the near future due to the software dependencies and platform specific
issues it would introduce. By relying on the array API standard, however, these
concerns can now be avoided.</p>

<p>In this blog post, I will provide an update to the array API adoption work in
scikit-learn, since it’s initial introduction in version 1.3 two years ago.
Thomas Fan’s <a href="iframe.php?url=https%3A%2F%2Flabs.quansight.org%2Fblog%2Farray-api-support-scikit-learn">blog post</a>
provides details on the status when array API support was initially added.</p>

<h2 id="current-status">Current status</h2>

<p>Since the introduction of array API support in version 1.3 of scikit-learn,
several key developments have followed.</p>

<h3 id="vendoring-array-api-compat-and-array-api-extra">Vendoring <code class="language-plaintext highlighter-rouge">array-api-compat</code> and <code class="language-plaintext highlighter-rouge">array-api-extra</code></h3>

<p>Scikit-learn now vendors both
<a href="iframe.php?url=https%3A%2F%2Fdata-apis.org%2Farray-api-compat%2F"><code class="language-plaintext highlighter-rouge">array-api-compat</code></a> and
<a href="iframe.php?url=https%3A%2F%2Fdata-apis.org%2Farray-api-extra%2F"><code class="language-plaintext highlighter-rouge">array-api-extra</code></a>.
<code class="language-plaintext highlighter-rouge">array-api-compat</code> is a wrapper around common array libraries (e.g., PyTorch,
CuPy, JAX) that bridges gaps to ensure compatibility with the standard. It
enables adoption of backwards incompatible changes while still allowing array
libraries time to adopt the standard slowly. <code class="language-plaintext highlighter-rouge">array-api-extra</code> provides array
functions not included in the standard but deemed useful for array-consuming
libraries.</p>

<p>We chose to vendor these now much more mature libraries in order to avoid the
complexity of conditionally handling optional dependencies throughout the
codebase. This approach also follows precedent, as SciPy also vendors these
packages.</p>

<h3 id="array-libraries-supported">Array libraries supported</h3>

<p>Scikit-learn currently supports CuPy ndarrays, PyTorch tensors (testing
against all devices: ‘cpu’, ‘cuda’, ‘mps’ and ‘xpu’) and NumPy arrays. JAX
support is also on the horizon. The main focus of this work is addressing
in-place mutations in the codebase. Follow
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29647">PR #29647</a> for
updates.</p>

<p>Beyond these libraries, scikit-learn also tests against <code class="language-plaintext highlighter-rouge">array-api-strict</code>, a
reference implementation that strictly adheres to the array API specification.
The purpose of <code class="language-plaintext highlighter-rouge">array-api-strict</code> is to help automate compliance checks for
consuming libraries and to enable development and testing of array
API functionality without the need for GPU or other specialized hardware.
Array libraries that conform to the standard and pass the <code class="language-plaintext highlighter-rouge">array-api-tests</code> suite
should be accepted by scikit-learn and SciPy, without any additional modifications
from maintainers.</p>

<h3 id="estimators-and-metrics-with-array-api-support">Estimators and metrics with array API support</h3>

<p>The full list of metrics and estimators that now support array API can be
found in our
<a href="iframe.php?url=https%3A%2F%2Fscikit-learn.org%2Fdev%2Fmodules%2Farray_api.html%23">Array API support</a>
documentation page. The majority of high impact metrics have now been
converted to be array API compatible. Many transformers are also now
supported, notably <code class="language-plaintext highlighter-rouge">LabelBinarizer</code> which is widely used internally and
simplifies other conversions.</p>

<p>Conversion of estimators is much more complicated as it often involves
benchmarking different variations of code or consensus gathering on
implementation choices. It generally requires many months of work by several
maintainers. Nonetheless, support for <code class="language-plaintext highlighter-rouge">LogisticRegression</code>, <code class="language-plaintext highlighter-rouge">GaussianNB</code>,
<code class="language-plaintext highlighter-rouge">GaussianMixture</code>, <code class="language-plaintext highlighter-rouge">Ridge</code> (and family: <code class="language-plaintext highlighter-rouge">RidgeCV</code>, <code class="language-plaintext highlighter-rouge">RidgeClassifier</code>,
<code class="language-plaintext highlighter-rouge">RidgeClassifierCV</code>), <code class="language-plaintext highlighter-rouge">Nystroem</code> and <code class="language-plaintext highlighter-rouge">PCA</code> has been added. Work on
<code class="language-plaintext highlighter-rouge">GaussianProcessRegressor</code> is also underway (follow at
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F33096">PR #33096</a>).</p>

<h3 id="handling-mixed-array-namespaces-and-devices">Handling mixed array namespaces and devices</h3>

<p>scikit-learn takes a unique approach among ‘array-consuming’ libraries by
supporting mixed array namespace and device inputs. This design choice enables
the framework to handle the practical complexities of end-to-end machine
learning pipelines.</p>

<p>String-valued class labels are common in classification tasks and enable users
to work with interpretable categories rather than integer codes. NumPy is
currently the only array library with string array support, meaning that any
workflow involving both GPU-accelerated computation and string labels
necessarily involves mixed array type inputs.</p>

<p>Mixed array input support also enables flexible pipeline workflows. Pipelines
provide significant value by chaining preprocessing steps and estimators into
reusable workflows that prevent data leakage and ensure consistent
preprocessing. However, they have an intentional design limitation: pipeline
steps can transform feature arrays (<code class="language-plaintext highlighter-rouge">X</code>) but cannot modify target arrays
(<code class="language-plaintext highlighter-rouge">y</code>). Allowing mixed array inputs means a pipeline can include a
<code class="language-plaintext highlighter-rouge">FunctionTransformer</code> step that moves feature data from CPU to GPU to leverage
hardware acceleration, while allowing the target array, which cannot be
modified, to remain on CPU.</p>

<p>For example, mixed array inputs enable a pipeline where string classification
features are encoded on CPU (as only NumPy supports string arrays), converted
to torch CUDA tensors, then passed to the array API-compatible
<code class="language-plaintext highlighter-rouge">RidgeClassifier</code> for GPU-accelerated computation:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">functools</span> <span class="kn">import</span> <span class="n">partial</span>

<span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <span class="n">RidgeClassifier</span>
<span class="kn">from</span> <span class="nn">sklearn.pipeline</span> <span class="kn">import</span> <span class="n">make_pipeline</span>
<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">FunctionTransformer</span><span class="p">,</span> <span class="n">TargetEncoder</span>

<span class="n">pipeline</span> <span class="o">=</span> <span class="n">make_pipeline</span><span class="p">(</span>
    <span class="c1"># Encode string categories with average target values
</span>    <span class="n">TargetEncoder</span><span class="p">(),</span>
    <span class="c1"># Convert feature array `X` to Torch CUDA device
</span>    <span class="n">FunctionTransformer</span><span class="p">(</span><span class="n">partial</span><span class="p">(</span><span class="n">torch</span><span class="p">.</span><span class="n">asarray</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="s">"float32"</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s">"cuda"</span><span class="p">))</span>
    <span class="n">RidgeClassifier</span><span class="p">(</span><span class="n">solver</span><span class="o">=</span><span class="s">"svd"</span><span class="p">),</span>
<span class="p">)</span>
</code></pre></div></div>

<p>Work on adding mixed array type inputs for metrics and estimators is underway
and expected to progress quickly. This work includes developing a robust
testing framework, including for pipelines using mixed array types (follow
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F32755">PR #32755</a> for details).</p>

<p>Finally, we have also revived our work to support the ability to fit and
predict on different namespaces/devices. This allows users to train models on
GPU hardware but deploy predictions on CPU hardware, optimizing costs and
accommodating different resource availability between training and production
environments. Follow
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F33076">PR #33076</a> for
details.</p>

<h2 id="challenges">Challenges</h2>

<p>The challenges of array API adoption remain largely unchanged from when this
work began. These are also common to other array-consuming libraries, with a
notable addition: the need to handle array movement between namespaces and
devices to support mixed array type inputs.</p>

<h3 id="array-api-standard-is-a-subset-of-numpys-api">Array API Standard is a subset of NumPy’s API</h3>

<p>The array API standard only includes widely-used functions implemented across
most array libraries, meaning many NumPy functions are absent. When such a
function is encountered while adding array API support, we have the following
options:</p>

<ul>
  <li>add the function to <code class="language-plaintext highlighter-rouge">array-api-extra</code> - this allows other array-consuming
libraries to benefit and allows sharing of maintenance burden, but is only
relevant for more widely used functions</li>
  <li>add our own implementation in scikit-learn - these functions live in
<code class="language-plaintext highlighter-rouge">sklearn/utils/_array_api.py</code></li>
  <li>check if SciPy implements an array API compatible version of the function</li>
</ul>

<p>The <code class="language-plaintext highlighter-rouge">quantile</code> function illustrates this decision-making process. <code class="language-plaintext highlighter-rouge">quantile</code>
is not included in the standard as it is not widely used (outside of
scikit-learn) and while it is implemented in most array libraries, the
set of quantile methods supported and their APIs vary. Currently, scikit-learn maintains its
own array API compatible version that supports both weights and NaNs, but due
to the maintenance burden we decided to investigate alternatives. SciPy has an
array API compatible implementation, but it did not support weights. We thus
investigated adding <code class="language-plaintext highlighter-rouge">quantile</code> to <code class="language-plaintext highlighter-rouge">array-api-extra</code>; however, during this
effort, SciPy decided to add weight support. Thus, we ultimately decided to
transition to the SciPy implementation once our minimum SciPy version allows.</p>

<h3 id="compiled-code">Compiled code</h3>

<p>Many performance-critical parts of scikit-learn are written using compiled
code extensions in Cython, C or C++. These directly access the underlying
memory buffers of NumPy arrays and are thus restricted to CPU.</p>

<p>Metrics and estimators, with compiled code, handle this in one of two ways:
convert arrays to NumPy first or maintain two parallel branches of code, one
for NumPy (compiled) and one for other array types (array API compatible).
When performance is less critical or array API conversion provides no gains
(e.g., <code class="language-plaintext highlighter-rouge">confusion_matrix</code>), we convert to NumPy. When performance gains are
significant, we accept the maintenance burden of dual code paths. This was the case for
<code class="language-plaintext highlighter-rouge">LogisticRegression</code> and the extensive process required for making such implementation
decisions can be seen in the
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F32644">PR #32644</a>.</p>

<h3 id="unspecified-behaviour-in-the-standard">Unspecified behaviour in the standard</h3>

<p>The array API standard intentionally leaves some function behaviors
unspecified, permitting implementation differences across array libraries. For
example, the order of unique elements is not specified for the <code class="language-plaintext highlighter-rouge">unique_*</code>
functions and as of NumPy version 2.3, some <code class="language-plaintext highlighter-rouge">unique_*</code> functions no longer
return sorted values. This will require code amendments in cases where sorted
output was relied upon.</p>

<p>Similarly, NaN handing is also unspecified for <code class="language-plaintext highlighter-rouge">sort</code>; however, in this case, all
array libraries currently supported by scikit-learn follow NumPy’s NaN
semantics, placing NaNs at the end. This consistency eliminates the need for
special handling code, though comprehensive testing remains essential when
adding support for new array libraries.</p>

<h3 id="device-transfer">Device transfer</h3>

<p>Mixed array namespace and device inputs necessitates conversion of arrays
between different namespaces and devices. This presented a number of
considerations and challenges.</p>

<p>The array API standard adopted DLPack as the recommended
<a href="iframe.php?url=https%3A%2F%2Fdata-apis.org%2Farray-api%2Flatest%2Fdesign_topics%2Fdata_interchange.html%23data-interchange">data interchange</a>
protocol. This protocol is widely implemented in array libraries and offers an
efficient, C ABI compatible protocol for array conversion. While this provided
us with an easy way to implement these transfers, there were limitations.
Cross-device transfer capability was only introduced in DLPack v1, released in
September 2024. This meant that only the latest PyTorch and CuPy versions have
support for DLPack v1. Moreover, not all array libraries have adopted support
yet. We therefore implemented a ‘manual’ fallback; however, this requires
conversion via NumPy when the transfer involves two non-NumPy arrays.
Additionally, there are no DLPack tests in
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fdata-apis%2Farray-api-tests">array-api-tests</a>, a testing
suite to verify standard compliance, leaving DLPack implementation bugs easier
to overlook. Despite these challenges, scikit-learn will benefit from future
improvements, such as addition of a C-level API for DLPack exchange that
bypasses Python function calls, offering significant benefit for GPU
applications.</p>

<p>Beyond the technical considerations, there were also user interface
considerations. How should we inform users that these conversions, which incur
memory and performance cost, are occurring? We decided against warnings, which
risk being ignored or becoming a nuisance, and to instead clearly document this
behaviour. Additionally, different devices have different data type limitations;
for example, Apple MPS only supports float32. How best to handle these differences
when performing conversions while ensuring users are informed of precision
impacts is an ongoing consideration.</p>

<h2 id="a-quick-benchmark">A quick benchmark</h2>

<p>Array API support for <code class="language-plaintext highlighter-rouge">Ridge</code> regression was added in version 1.5, enabling
GPU-accelerated linear models in scikit-learn. Combined with support of
several transformers, this allows for complete preprocessing and estimation
pipelines on GPU.</p>

<p>The following benchmark shows the use of the <code class="language-plaintext highlighter-rouge">MaxAbsScaler</code> transformed
followed by <code class="language-plaintext highlighter-rouge">Ridge</code> regression using randomly generated data with 500,000
samples and 300 features. The benchmarks were run on AMD Ryzen Threadripper
2970WX CPU, NVIDIA Quadro RTX 8000 GPU and Apple M4 GPU (Metal 3).</p>

<p>The figure below shows the performance speed up on CuPy, Torch CPU and Torch
GPU relative to NumPy.</p>

<p><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Farray_api_timings.png" alt="Benchmarking of MaxAbsScaler/Ridge pipeline" title="Benchmarking of MaxAbsScaler/Ridge pipeline" /></p>

<p><em>Performance speedup relative to NumPy across different backends.</em></p>

<p>The observed speedups are representative of performance gains achievable with
sufficiently large datasets on datacenter-grade GPUs for linear
algebra-intensive workloads. Mobile GPUs, such as those in laptops, would
typically yield more modest improvements.</p>

<p>Note that scikit-learn’s <code class="language-plaintext highlighter-rouge">Ridge</code> regressor currently only supports ‘svd’
solver. We selected this solver for initial implementation as it exclusively
uses standard-compliant functions available across all backends and is the
most stable solver. Support for the ‘cholesky’ solver is also underway (see
details in <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29318">PR #29318</a>).</p>

<h2 id="looking-forward">Looking forward</h2>

<p>As of version 1.8, array API support is still in experimental mode and thus
not enabled by default. However, we welcome early adopters and interested
users to try it and report any
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fissues">issues</a>. See
<a href="iframe.php?url=https%3A%2F%2Fscikit-learn.org%2Fdev%2Fmodules%2Farray_api.html%23enabling-array-api-support">our documentation</a>
for details on enabling array API support.</p>

<p>Before removing experimental status, we would like to:</p>

<ul>
  <li>develop a system for automatically documenting functions and classes that
support array API, potentially with the ability to add relevant details</li>
  <li>mixed array type input support</li>
  <li>support fit and predict on different hardware by allowing conversion of
fitted estimators between namespaces/devices using utility functions</li>
  <li>improved testing, in particular for the new mixed array type
functionalities</li>
  <li>improved documentation, including adding an example to our gallery</li>
  <li>decide on the minimal dependency versions required</li>
  <li>get real world user feedback</li>
</ul>

<p>Alongside these infrastructure and framework improvements, we look forward to
adding support for more estimators. These improvements will deliver
production-ready GPU support and flexible deployment options to scikit-learn
users. We welcome community involvement through testing and feedback
throughout this development phase.</p>

<h2 id="acknowledgements">Acknowledgements</h2>

<p>Work on array API in scikit-learn has been a combined effort from many
contributors. This work was partly funded by CZI and NASA Roses.</p>

<p>I would like to thank <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fogrisel">Olivier Grisel</a>,
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fbetatim">Tim Head</a> and
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fev-br">Evgeni Burovski</a> for helping me with my array API
questions.</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Updates" /><category term="Open Source" /><category term="Machine Learning" /><category term="Array API" /><category term="Performance" /><summary type="html"><![CDATA[Author: Lucy Liu Note: this blog post is a cross-post of a Quansight Labs blog post.]]></summary></entry><entry><title type="html">Enhancing user experience through interactive inspection</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fupdates%2Fenhancing-user-experience%2F" rel="alternate" type="text/html" title="Enhancing user experience through interactive inspection" /><published>2026-01-06T00:00:00+00:00</published><updated>2026-01-06T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/updates/enhancing-user-experience</id><content type="html" xml:base="https://blog.scikit-learn.org/updates/enhancing-user-experience/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fsklearn_czi.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://deamarialeon.com" href="iframe.php?url=https%3A%2F%2Fdeamarialeon.com" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fdea-leon.png" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Dea María Léon</a>
      
        <a href="iframe.php?url=mailto%3Adeamarialeon%40gmail.com" title="deamarialeon@gmail.com"><span><i class="elastic-fai fas fa-envelope"></i></span></a>
      

<br /><br />

</div>

<p>User experience (UX) has always been an important focus for <code class="language-plaintext highlighter-rouge">scikit-learn</code>. 
As we know, UX encompasses many aspects, but here we will focus specifically on 
how easy it is for the user to understand <code class="language-plaintext highlighter-rouge">scikit-learn</code> models during development, 
especially while using tools like jupyter notebooks.</p>

<h2 id="first-visualizations">First visualizations</h2>

<p>Initial work to allow users to inspect their models interactively began in 2019, 
when Thomas J. Fan introduced HTML visualizations for estimators. 
He continued to build on this foundation with additional improvements 
in subsequent contributions.</p>

<h2 id="lack-of-resources-to-go-forward">Lack of resources to go forward</h2>

<p>In June 2023, <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2F%0Ascikit-learn%2Fscikit-learn%2Fissues%2F26595">issue 26595 was opened by Gaël Varoquaux</a> outlining several potential enhancements 
to the HTML displays. These ideas stemmed from direct interactions with users, 
which clearly highlighted the need for further work in this area. 
Unfortunately, due to a lack of resources, the issue remained open for 
approximately a year and a half.</p>

<h2 id="wellcome-grant-awarded-to-scikit-learn">Wellcome grant awarded to <code class="language-plaintext highlighter-rouge">scikit-learn</code></h2>

<p>This was until the end of 2023, when Guillaume Lemaitre applied for a grant with
the help of NumFOCUS, that the broader topic of Predictive model evaluation and 
inspection was formalized. Enhancing user experience through interactive inspection 
is an essential part of this effort and falls within the scope of the grant.</p>

<p>The grant was awarded to <code class="language-plaintext highlighter-rouge">scikit-learn</code>and it is from the Chan Zuckerberg 
Initiative (CZI) through its Essential Open-Source Software for Science 
(EOSS) program. It is funded by The Wellcome Trust and administered by NumFOCUS. 
Thanks to this financial support, work is well underway. And several objectives
from the said issue have already been completed. 
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2F%0Aadministrative%2Fblob%2Fmaster%2Fczi_eoss_proposal%2FEOSS6-0000000551_202312181059.pdf">See the grant application here.</a></p>

<h2 id="first-milestone-added-interactive-parameters-table-for-each-element">First milestone: Added interactive parameters table for each element</h2>

<p>The first milestone was introduced in<code class="language-plaintext highlighter-rouge">scikit-learn</code> version 1.7, released in 
June 2025. A parameters table was added to the HTML representation of models, 
displaying parameter names and their corresponding values. 
Non-default parameters—those explicitly set by the user—are highlighted. 
In addition, a copy-to-clipboard button is available for each parameter name. 
The parameter name that is copied to the clipboard is the fully classified name, 
which is shown on hover as well. The parameters table is collapsed by default 
and can be opened by the user.</p>

<p>The following two images show a pipeline table before and after the milestone.</p>

<figure>
<img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fenhancing-UX%2Fbefore1_7.png" alt="HTML visualization before scikit-learn 1.7" />
<figcaption>
HTML visualization before scikit-learn version 1.7
</figcaption>
</figure>

<figure>
<img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fenhancing-UX%2Fscikit-learn1_7.png" alt="HTML visualization with scikit-learn 1.7" style="width:50%" />
<figcaption>
HTML visualization with scikit-learn version 1.7
</figcaption>
</figure>

<h2 id="second-milestone-links-to-parameters-documentation-and-tooltip-preview">Second milestone: Links to parameters documentation and tooltip preview</h2>

<p>This feature was further enhanced in version 1.8, released in December 2025.
We added tooltips that provide documentation for each parameter, 
as well as links to the online documentation. 
See the GIF below or this example for more details: 
<a href="iframe.php?url=https%3A%2F%2Fscikit-learn.org%2F%0Astable%2Fauto_examples%2Fmiscellaneous%2Fplot_estimator_representation.html">Displaying estimators and complex pipelines</a>.</p>

<figure>
<img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fenhancing-UX%2Fscikit-learn-1_8.gif" alt="HTML visualization with scikit-learn 1.8" />
<figcaption>
HTML visualization after scikit-learn 1.8
</figcaption>
</figure>

<h2 id="planned-improvements">Planned improvements</h2>

<p>More features are now being implemented. In particular, users will be able to 
visualize feature names and values, display fitted attributes and further improve 
the overall appearance of the interactive displays.</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Updates" /><category term="Open Source" /><category term="Funding" /><category term="Diversity" /><category term="Machine Learning" /><summary type="html"><![CDATA[Author: Dea María Léon]]></summary></entry><entry><title type="html">Interview with Virgil Chan, scikit-learn Team Member</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fteam%2Fvirgil-chan-interview%2F" rel="alternate" type="text/html" title="Interview with Virgil Chan, scikit-learn Team Member" /><published>2025-11-26T00:00:00+00:00</published><updated>2025-11-26T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/team/virgil-chan-interview</id><content type="html" xml:base="https://blog.scikit-learn.org/team/virgil-chan-interview/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2F" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://reshamas.github.io" href="iframe.php?url=https%3A%2F%2Freshamas.github.io" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Freshama_shaikh.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Reshama Shaikh</a>
     , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://virchan.github.io" href="iframe.php?url=https%3A%2F%2Fvirchan.github.io" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fvirgil-chan.jpg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Virgil Chan</a>
     

<br /><br />

</div>

<p>BIO: Virgil Chan is currently a Forward Deployed Engineer - Pre-Sales at Union.ai. Before that, he worked as a consultant in the San Francisco Bay Area, specialising in predictive data analytics and machine learning. Earlier, he studied mathematics before moving into data science. Virgil joined the scikit-learn team as a Contributor Experience Team member in December 2024.</p>

<ul>
  <li>GitHub: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fvirchan">@virchan</a></li>
  <li>LinkedIn: <a href="iframe.php?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fvirgil-chan-0a65b11b8">@virgil-chan</a></li>
  <li>Website: <a href="iframe.php?url=https%3A%2F%2Fvirchan.github.io">https://virchan.github.io</a></li>
</ul>

<ol>
  <li>
    <p><strong>Tell us about yourself.</strong></p>

    <p>My name is Virgil, and I’m currently working as a Forward Deployed Engineer – Pre-Sales at Union.ai. Based in San Jose, California, I previously worked as a consultant, using libraries from the scientific Python ecosystem on data science and machine-learning projects, including medical data analysis, traffic-network prediction, and model evaluation. Before deciding that computers are more fun, I was doing mathematical research in topology.</p>
  </li>
  <li>
    <p><strong>How did you first get involved in open source?</strong></p>

    <p>I first got involved in open source during the COVID-19 lockdown. I used that time to study Python programming, data science, analytics, and machine learning, and that’s when I discovered libraries like NumPy, Pandas, scikit-learn, NetworkX, and TensorFlow. Once I became more confident in my skills, I started working as a consultant and used these libraries to deliver data-driven solutions for clients.</p>
  </li>
  <li>
    <p><strong>We would love to learn of your open source journey.</strong></p>

    <p>I was transitioning from academia into software development, and I quickly learnt that companies valued hands-on experience more than an advanced degree. At the same time, the rise of GPU-driven workloads and LLM-based solutions made my earlier consulting projects look less impressive on paper. I ended up stuck in the infinite loop of no-job-no-experience.</p>

    <p>Even though I came from a non-traditional background, and my resume didn’t match what recruiters and ATS systems usually look for, I’ve always believed that my experience is something I can build myself. Since companies weren’t keen on training junior developers, open source became one of the not-so-many viable paths. I started looking for a project where I could grow, be useful, and apply my academic training in a meaningful way. That search naturally led me to scikit-learn.</p>
  </li>
  <li>
    <p><strong>How did you get involved in scikit-learn?</strong></p>

    <p>My first PR to scikit-learn (<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F27913">scikit-learn/scikit-learn#27913</a>) was a classic “good first issue”: adding the URL of a scikit-learn example to the relevant places in the documentation. I opened it in December 2023 and it was merged into the main branch in March 2024. Maren helped me navigate the codebase and understand the CI workflow, which gave me a solid foundation for later contributions. Even though I’m now more experienced with the contributing workflow, I still revisit that PR from time to time to remind myself of the challenges first-time contributors face, and how I can support them.</p>

    <p>My next PR (<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29709">scikit-learn/scikit-learn#29709</a>) was more technical, fixing a bug in the (root) mean squared log error function. The expected behaviour was to check that inputs were in the domain of $\log(1 + x)$, but the implementation at the time checked the domain of $\log(x)$ instead. It was one of the few issues I fully understood and knew how to solve, so I volunteered to create a PR. Adrin reviewed it and mentored me throughout the process. Once everything looked good and the CI passed, he asked me to add array API support to the function. And that’s where the fun began.</p>

    <p>I had no idea what the array API was, but I already had the habit of reading discussions and merged PRs in my spare time. With a bit of Googling, I quickly understood what needed to be done and the broader importance of the array API project. In fact, completing the array API project has become one of my mid-term goals for my scikit-learn work. Under the guidance of Adrin, Guillaume, Olivier, and Omar, my PRs improved, and contributing became even more rewarding because of how supportive the maintainers were. I also started reviewing PRs, especially from first-time contributors working on the same “good first issue” I began with. In December 2024, I joined the scikit-learn team.</p>

    <p>I’m honoured that the team welcomed me and trusted me with more responsibility, such as representing scikit-learn at the <a href="iframe.php?url=https%3A%2F%2Fscientific-python.org%2Fsummits%2Fdeveloper%2F2025%2F">Scientific Python Developer Summit in May 2025</a>, <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F31068">implementing temperature scaling as a new feature (with Christian)</a>, and having the ability to run <a href="iframe.php?url=https%3A%2F%2Fbetatim.github.io%2Fposts%2Fgithub-action-with-gpu%2F">CUDA CI</a> myself. It feels good to pass the same positivity I received back into the community.</p>
  </li>
  <li>
    <p><strong>To which OSS projects and communities do you contribute?</strong></p>

    <p>I’m also interested in scaling machine-learning algorithms, so I’ve been exploring CUDA and cuML as well.</p>
  </li>
  <li>
    <p><strong>What is alluring about OSS?</strong></p>

    <p>Open source fosters a collaborative environment where everyone wins: end-users, maintainers, and contributors. Because it is volunteer-driven, it becomes easier to recognise that the problem itself is the problem, the bug or the issue, rather than the people involved. As a result, the usual institutional complications, such as power or ego struggles, conflicts of interest over funding, or pressure from deadlines, are far less likely to drag the project down. People have more freedom to focus on solving problems, which creates an ideal environment for exploration, experimentation, collaboration, learning, and growth.</p>

    <p>Open source has given me the chance to grow, develop new skills, and broaden my perspective, something I’ve been battling since finishing college. By trading my time for responsibility, I’ve found open source to be a meaningful and genuinely rewarding experience.</p>
  </li>
  <li>
    <p><strong>What are your favorite resources, books, courses, conferences, etc?</strong></p>

    <p>I found the <a href="iframe.php?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DdMbWkAosBVA">interview between scikit-learn and Code for Thought</a> on YouTube. The maintainers shared their open-source journeys from how they got started to how they became involved in scikit-learn, which I found inspiring and motivating. For example, I can’t agree more with Gael’s point that “open source should be spontaneous” and that “a diversity of opinion will make better software.” I also learned from Adrin that I could get more involved in the project by becoming the second reviewer for a PR, which gave me the confidence to start reviewing PRs. I think this interview can help people understand the project from a more human and non-technical perspective.</p>
  </li>
  <li>
    <p><strong>What are your hobbies, outside of work and open source?</strong></p>

    <p>If I’m done with work and house chores, I usually listen to music. I enjoy classical music (Mozart, Brahms, Rachmaninoff, etc.), and I’m currently getting more exposure to Chopin’s work. I also like Rock ‘n’ Roll (Led Zeppelin, Eric Clapton, Deep Purple, etc.), and I find that AC/DC can “push me to eleven” whenever I’m stuck at work.</p>

    <p>I also enjoy reading novels. At the moment I’m reading The Silmarillion by Tolkien, and my to-read list keeps growing.</p>

    <p>I like hanging out with cats as well. I volunteer with an animal rescue group in San Jose, where I help care for the cats in their sanctuary and assist at adoption fairs.</p>
  </li>
</ol>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Team" /><category term="Open Source" /><summary type="html"><![CDATA[Author: Reshama Shaikh , Virgil Chan]]></summary></entry><entry><title type="html">scikit-learn Completes the GitHub Secure Open Source Training</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fpress%2Fgh-sosf%2F" rel="alternate" type="text/html" title="scikit-learn Completes the GitHub Secure Open Source Training" /><published>2025-08-16T00:00:00+00:00</published><updated>2025-08-16T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/press/gh-sosf</id><content type="html" xml:base="https://blog.scikit-learn.org/press/gh-sosf/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fcover.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://reshamas.github.io" href="iframe.php?url=https%3A%2F%2Freshamas.github.io" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Freshama_shaikh.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Reshama Shaikh</a>
     

<br /><br />

</div>

<h2 id="summary">Summary</h2>

<p>scikit-learn was honored to be selected to participate in Cohort 2 of the GitHub Secure Open Source Fund (OSF) Training Program. Cohort 1 took place earlier in 2025 with 19 projects, and Cohort 2 took place with 52 projects during June 2025.</p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fblog-title.png" alt="GitHub announcement of GH-S-OS Fund" style="border-width: thick" max-width="50%" max-height="50%" /> 
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Fgithub.blog%2Fopen-source%2Fmaintainers%2Fsecuring-the-supply-chain-at-scale-starting-with-71-important-open-source-projects">GH Secure OSS Announcement</a>
 </figcaption>
</figure>

<p>It was an intense 3-week training program, with over 90 open source maintainers joining the training. Read the announcement from GitHub: <a href="iframe.php?url=https%3A%2F%2Fgithub.blog%2Fopen-source%2Fmaintainers%2Fsecuring-the-supply-chain-at-scale-starting-with-71-important-open-source-projects">Securing the supply chain at scale: Starting with 71 important open source projects</a></p>

<p>There were numerous workshops delivered by experts in the GitHub Security Lab. For many of these workshops, the learning materials are publicly available, and they are shared below.</p>

<h3 id="github-security-lab">GitHub Security Lab</h3>
<p>GitHub has its own security department, and GitHub Security Lab’s mission is to empower developers and secure open source.</p>
<ul>
  <li>GitHub Security Lab: <a href="iframe.php?url=https%3A%2F%2Fsecuritylab.github.com%2Fresources-os">Resources</a></li>
</ul>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fgh-security-lab.png" alt="GitHub Security Lab" style="padding:1px;border:solid black" max-width="50%" max-height="50%" />  
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2FGitHubSecurityLab">GitHub Security Lab</a>
 </figcaption>
</figure>

<h2 id="resources-for-security-training">Resources for Security Training</h2>
<p>The training provided many trainings by experts in the field. Below we share trainings that are available to the public.</p>

<ul>
  <li><a href="iframe.php?url=https%3A%2F%2Fdocs.github.com%2Fen%2Fcode-security%2Fsecurity-advisories%2Fworking-with-repository-security-advisories%2Fconfiguring-private-vulnerability-reporting-for-a-repository">Configuring private vulnerability reporting for a repository</a>
    <blockquote>
      <p>Owners and administrators of public repositories can allow security researchers to report vulnerabilities securely in the repository by enabling private vulnerability reporting.</p>
    </blockquote>
  </li>
  <li><a href="iframe.php?url=https%3A%2F%2Fsecurityscorecards.dev">OpenSSF Scorecard</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fmicrosoft.design%2Farticles%2Fsecure-by-design-a-ux-toolkit">Secure by design: A UX toolkit</a></li>
</ul>

<h4 id="codeql-from-zero-to-hero">CodeQL: From Zero to Hero</h4>

<p>This workshop introduces fundamentals of security research and static analysis used when looking for vulnerabilities in software. They use an example of a simple vulnerability, walk through how CodeQL could detect it, and provide examples on how the audience could use CodeQL to find vulnerabilities themselves.</p>

<p>slides: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fsylwia-budzynska%2F2025-soss-codeql-workshop%2Fblob%2Fmain%2FSOSS-CodeQL-slides.pdf">Finding Vulnerabilities with CodeQL</a></p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2FCodeQL.png" alt="CodeQL audience and topics covered" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fsylwia-budzynska%2F2025-soss-codeql-workshop">Finding Vulnerabilities with CodeQL</a>
 </figcaption>
</figure>

<h4 id="developing-secure-software">Developing Secure Software</h4>

<p>This course includes specific tips on how to use and develop open source and other software securely. Learn the security basics to develop software that is hardened against attacks, and understand how you can reduce the damage and speed the response when a vulnerability is exploited.</p>

<p>It was developed by the Open Source Security Foundation (OpenSSF), a cross-industry collaboration that brings together leaders to improve the security of open source software by building a broader community, targeted initiatives, and best practices.</p>

<ul>
  <li>Online, Self Paced</li>
  <li>16-20 Hours of Course Material</li>
  <li>Quizzes and Hands-on Labs</li>
</ul>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fdss-lfd121.png" alt="course: Developing Secure Software" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Ftraining.linuxfoundation.org%2Ftraining%2Fdeveloping-secure-software-lfd121">LFD121: Developing Secure Software</a>
 </figcaption>
</figure>

<h4 id="oss-fuzz">OSS-Fuzz</h4>
<p><a href="iframe.php?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FFuzzing">Fuzz testing</a> is a well-known technique for uncovering programming errors in software.</p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Foss-fuzz.png" alt="OSS-Fuzz" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fgoogle%2Foss-fuzz">OSS-Fuzz</a>
 </figcaption>
</figure>

<h3 id="secure-code-game">Secure Code Game</h3>
<p>Secure Code Game is a GitHub Security Lab initiative, providing an in-repo learning experience, where learners to secure intentionally vulnerable code. At the same time, this is an open source project that welcomes your contributions as a way to give back to the community.</p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fsecure-code-game.png" alt="Secure Code Game" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 Original post: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fskills%2Fsecure-code-game">Secure Code Game</a>
 </figcaption>
</figure>

<h3 id="participate-in-future-cohorts-of-the-github-secure-open-source-training">Participate in Future Cohorts of the GitHub Secure Open Source Training</h3>
<p>If you are a maintainer of an open source project, this training is an excellent opportunity to secure your project with guidance from highly trained experts in the security field. <a href="iframe.php?url=https%3A%2F%2Fdocs.google.com%2Fforms%2Fd%2Fe%2F1FAIpQLScDBalom0XhmJrvyI3kwD7dZ-dD4_uhmLNysVXtA8fH_WUKoA%2Fviewform">Applications are open</a>.</p>

<h3 id="references">References</h3>
<ul>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.blog%2Fopen-source%2Fmaintainers%2Fsecuring-the-supply-chain-at-scale-starting-with-71-important-open-source-projects">Securing the supply chain at scale: Starting with 71 important open source projects</a> (11-Aug-2025)</li>
  <li>TechCrunch: <a href="iframe.php?url=https%3A%2F%2Ftechcrunch.com%2F2024%2F11%2F19%2Fgithub-launches-1-25m-open-source-fund-with-a-focus-on-security">GitHub launches $1.25M open source fund with a focus on security</a> (19-Nov-2024)</li>
  <li><a href="iframe.php?url=https%3A%2F%2Fresources.github.com%2Fgithub-secure-open-source-fund%2F">GitHub Secure Open Source Fund</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fwww.eclipse.org%2Fsecurity%2Fpolicy">Eclipse Foundation Security Policy</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fwww.linuxfoundation.org%2Fsecurity">Linux Foundation Security Policy</a></li>
</ul>

<h3 id="blogs-from-participating-open-source-projects">Blogs from Participating Open Source Projects</h3>
<ul>
  <li>OpenCV: <a href="iframe.php?url=https%3A%2F%2Fopencv.org%2Fblog%2Fopencvs-participation-in-the-github-secure-open-source-fund">OpenCV’s Participation in the GitHub Secure Open Source Fund</a></li>
  <li>Bootstrap: <a href="iframe.php?url=https%3A%2F%2Fwww.linkedin.com%2Fpulse%2Fbootstrap-github-secure-open-source-fund-julien-d%252525C3%252525A9ramond-cvjie">Bootstrap at GitHub Secure Open Source Fund</a></li>
  <li>Cobra &amp; Viper: <a href="iframe.php?url=https%3A%2F%2Fspf13.com%2Fp%2Fcobra-viper-fortify-security-as-part-of-github-secure-open-source-fund">Cobra &amp; Viper Fortify Security as Part of GitHub Secure Open Source Fund</a></li>
  <li>Zitadel: <a href="iframe.php?url=https%3A%2F%2Fzitadel.com%2Fblog%2Fgithub-secure-open-source-fund">A Leap Forward in Security: Our Journey with the GitHub Secure Open Source Fund</a></li>
</ul>

<h2 id="acknowledgments">Acknowledgments</h2>

<p>Thank you to the funders and ecosystem partners of the GitHub Secure Open Source Fund.</p>

<p><strong>Funding Partners:</strong> Alfred P. Sloan Foundation, American Express, Chainguard, Datadog, Herodevs, Kraken, Mayfield, Microsoft, Shopify, Stripe, Superbloom, Vercel, Zerodha, 1Password</p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Ffunders2.png" alt="Sponsors" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 <a href="iframe.php?url=https%3A%2F%2Fgithub.blog%2Fopen-source%2Fmaintainers%2Fsecuring-the-supply-chain-at-scale-starting-with-71-important-open-source-projects"></a>
 </figcaption>
</figure>

<p><strong>Ecosystem Partners:</strong> Ecosyste.ms, CURIOSS, Digital Data Design Institute Lab for Innovation Science, Digital Infrastructure Insights Fund, Microsoft for Startups, Mozilla, OpenForum Europe, Open Source Collective, OpenUK, Open Technology Fund, OpenSSF, Open Source Initiative, OpenJS Foundation, University of California, Santa Cruz OSPO, Sovereign Tech Agency, SustainOSS</p>

<figure>
 <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fgh-sosf%2Fecosystem.png" alt="Ecosystem Partners" style="padding:1px;border:solid black" max-width="50%" max-height="50%" /> 
 <figcaption>
 <a href="iframe.php?url=https%3A%2F%2Fgithub.blog%2Fopen-source%2Fmaintainers%2Fsecuring-the-supply-chain-at-scale-starting-with-71-important-open-source-projects"></a>
 </figcaption>
</figure>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Press" /><category term="Open Source" /><summary type="html"><![CDATA[Author: Reshama Shaikh]]></summary></entry><entry><title type="html">Skolar: an open-source initiative to democratize open data science</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fupdates%2Fprobabl-skolar%2F" rel="alternate" type="text/html" title="Skolar: an open-source initiative to democratize open data science" /><published>2025-06-30T00:00:00+00:00</published><updated>2025-06-30T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/updates/probabl-skolar</id><content type="html" xml:base="https://blog.scikit-learn.org/updates/probabl-skolar/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fsklearn_skolar.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://skolar.probabl.ai/" href="iframe.php?url=https%3A%2F%2Fskolar.probabl.ai%2F" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fskolar-logo.png" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Skolar</a>
     , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://www.linkedin.com/in/gittospenelope-data-analyst-growth-bilingual/" href="iframe.php?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fgittospenelope-data-analyst-growth-bilingual%2F" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fpenelope_gittos.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Pénélope Gittos</a>
     

<br /><br />

</div>

<p><span style="color:red"><em>This blog post has been submitted by Probabl, a sponsor of scikit-learn.</em> </span>
The scikit-learn project values educational efforts that build and nurture a
strong vibrant open-source community. The goal of this is straightforward: give
everyone, everywhere, the tools they need to easily grasp, engage with, and
meaningfully contribute to data science using open-source software. This mission
is shared and actively supported by <a href="iframe.php?url=https%3A%2F%2Fprobabl.ai%2F">Probabl</a>, a company
that helps maintain scikit-learn by employing many of its core contributors and
investing in its long-term sustainability. With Probabl’s support and a deep
commitment from the community, the scikit-learn ecosystem continues building bridges between research,
software, and education.</p>

<p>When the <a href="iframe.php?url=https%3A%2F%2Finria.github.io%2Fscikit-learn-mooc%2F">Inria scikit-learn MOOC</a>
(Massive Open Online Course) first went live, the community got a front-row seat
to the amazing impact of practical, accessible and open learning. Created by
several core developers and maintainers of scikit-learn—now working at
Probabl—the MOOC has reached over 40,000 learners worldwide, clearly
highlighting the demand for organized, hands-on resources that blend theory with
real-world practice.</p>

<p>Today, Probabl is excited to introduce
<a href="iframe.php?url=https%3A%2F%2Fapp.arcade.software%2Fshare%2FvCN6ik9dR22zD35XP5a7">Skolar</a>, a new, fully
open-source educational initiative, built directly from your feedback and all
the lessons we’ve learned along the way. Developed and extended by those same
core developers of scikit-learn, Skolar is designed specifically for data
science practitioners, offering hands-on, high-quality learning resources
grounded in real-world applications and open-source values.</p>

<p>Skolar exists to boost our shared values: openness, teamwork, and practicality.
It offers clear, interactive tutorials and structured courses carefully designed
to match industry challenges and specialized use-cases. But even more
importantly, it captures the true spirit of open source: encouraging
collaboration, peer-to-peer learning, and guidance from experts.</p>

<p>Right now, we’re just at the beginning. Today, you can dive into our
Scikit-learn Associate Practitioner online course, adapted from the popular
Inria MOOC but enhanced with new material on unsupervised learning, especially
clustering.</p>

<p>The next stages, professional and expert levels, will be released soon. We’ll
also add more courses covering other open-source libraries such as
<a href="iframe.php?url=https%3A%2F%2Fskrub-data.org">skrub</a> (for data wrangling),
<a href="iframe.php?url=https%3A%2F%2Fsoda-inria.github.io%2Fhazardous%2F">hazardous</a> (for survival analysis),
and <a href="iframe.php?url=https%3A%2F%2Ffairlearn.org%2F">fairlearn</a> (for fairness).
Additionally, our scikit-learn team is planning to create  industry-specific
modules tackling real-world needs in fields like healthcare, finance, medicine,
and beyond.</p>

<p>At its core, Skolar is about empowering people through education, driven
entirely by our passion for openness and collaboration. We firmly believe that
true open data science begins with community-built learning resources. We warmly
welcome you, whether you’re a contributor, learner, teacher, or just someone
curious, to join us. Help shape Skolar’s future and support open-source
education in data science.</p>

<p>Create your account on Skolar today: https://skolar.probabl.ai</p>

<p>Contribute to the <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fprobabl-ai%2Fscikit-learn-course">scikit-learn course
contents</a>, or contribute to
the learning platform’s <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2FFrance-ioi%2FAlgoreaBackend">backend</a>
or <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2FFrance-ioi%2FAlgoreaFrontend">frontend</a>.</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Updates" /><category term="Sponsor" /><category term="Open Source" /><category term="Machine Learning" /><summary type="html"><![CDATA[Author: Skolar , Pénélope Gittos]]></summary></entry><entry><title type="html">Changes and development of scikit-learn’s developer API</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fupdates%2Fdev-api%2F" rel="alternate" type="text/html" title="Changes and development of scikit-learn’s developer API" /><published>2024-12-12T00:00:00+00:00</published><updated>2024-12-12T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/updates/dev-api</id><content type="html" xml:base="https://blog.scikit-learn.org/updates/dev-api/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2FBSD_watermark.svg" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://adrin.info/" href="iframe.php?url=https%3A%2F%2Fadrin.info%2F" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fadrin-jalali.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Adrin Jalali</a>
     

<br /><br />

</div>

<p>Historically, scikit-learn’s API has been divided into public and private. Public API is
intended to be used by users, and private API is used internally in scikit-learn to
develop new features and estimators. However, many of those functionalities have become
essential to develop scikit-learn estimators by third parties who develop them outside
the scikit-learn codebase.</p>

<p>When it comes to our public API, we have very strict and high standards on backward
compatibility. The rule of thumb is that no change should cause a change in users’
code unless we warn about it for two release cycles, which means we give users a year
time to update their code.</p>

<p>On the other hand, we have no such guarantees or constraints on our private API. This
brings an issue to third party developers who would like to use methods used by
scikit-learn developers to develop their estimators. Constantly changing private API
without prior warning brings certain challenges to third party developers which is not
ideal.</p>

<p>As a result, we’ve been working on creating a developer API which would sit somewhere
between our public and private API in terms of backward compatibility. That means we
intend to try to keep that API stable, and if needed, introduce changes with one release
cycle warning.</p>

<p>In the past few releases, we’ve slowly introduced more functionalities under this
umbrella. <code class="language-plaintext highlighter-rouge">__sklearn_clone__</code> and <code class="language-plaintext highlighter-rouge">__sklearn_is_fitted__</code> are two examples.</p>

<p>In the 1.6 release, we focused on the testing infrastructure and estimator tag system.
Estimator tags used to be private, and we were not sure about their design. In the 1.6
release, new tags are introduced and using them looks like the following:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClassifierMixin</span>

<span class="k">class</span> <span class="nc">MyEstimator</span><span class="p">(</span><span class="n">ClassifierMixin</span><span class="p">,</span> <span class="n">BaseEstimator</span><span class="p">):</span>

  <span class="p">...</span>

  <span class="k">def</span> <span class="nf">__sklearn_tags__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
    <span class="n">tags</span> <span class="o">=</span> <span class="nb">super</span><span class="p">().</span><span class="n">__sklearn_tags__</span><span class="p">()</span>
    <span class="c1"># modify tags here
</span>    <span class="n">tags</span><span class="p">.</span><span class="n">non_deterministic</span> <span class="o">=</span> <span class="bp">True</span>
    <span class="k">return</span> <span class="n">tags</span>
</code></pre></div></div>

<p>The new tags mostly follow the same structure as the old tags, but there are certain
changes to them. The main change is that the old <code class="language-plaintext highlighter-rouge">_xfail_checks</code> is no longer present
in the new tags. That tag was used to tell the common testing tools about the tests
which are known to fail and are to be skipped. That information is now directly passed
to the test functionalities. The old way of skipping a test was the following:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.base</span> <span class="kn">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClassifierMixin</span>

<span class="k">class</span> <span class="nc">MyEstimator</span><span class="p">(</span><span class="n">ClassifierMixin</span><span class="p">,</span> <span class="n">BaseEstimator</span><span class="p">):</span>

  <span class="p">...</span>

  <span class="k">def</span> <span class="nf">_more_tags</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
    <span class="k">return</span> <span class="p">{</span>
      <span class="s">"_xfail_checks"</span><span class="p">:</span> <span class="p">{</span>
        <span class="s">"check_to_skip_name"</span><span class="p">:</span> <span class="s">"this check is known to fail"</span><span class="p">,</span>
        <span class="p">...</span>
      <span class="p">}</span>
    <span class="p">}</span>
</code></pre></div></div>

<p>And then when calling <code class="language-plaintext highlighter-rouge">check_estimator</code> or using <code class="language-plaintext highlighter-rouge">parametrize_with_checks</code> with <code class="language-plaintext highlighter-rouge">pytest</code>
would automatically ignore those tests for the estimator.</p>

<p>Instead, in this release, you pass that information directly to those methods:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">sklearn.utils.estimator_checks</span> <span class="kn">import</span> <span class="n">check_estimator</span><span class="p">,</span> <span class="n">parametrize_with_checks</span>

<span class="n">CHECKS_EXPECTED_TO_FAIL</span> <span class="o">=</span> <span class="p">{</span>
  <span class="s">"check_to_skip_name"</span><span class="p">:</span> <span class="s">"this check is known to fail"</span><span class="p">,</span>
  <span class="p">...</span>
<span class="p">}</span>

<span class="c1"># Using check_estimator
</span><span class="k">def</span> <span class="nf">test_with_check_estimator</span><span class="p">():</span>
  <span class="n">check_estimator</span><span class="p">(</span><span class="n">MyEstimator</span><span class="p">(),</span> <span class="n">expected_failed_checks</span><span class="o">=</span><span class="n">CHECKS_EXPECTED_TO_FAIL</span><span class="p">)</span>

<span class="c1"># Using parametrize_with_checks
</span><span class="o">@</span><span class="n">parametrize_with_checks</span><span class="p">(</span>
  <span class="p">[</span><span class="n">MyEstimator</span><span class="p">()],</span>
  <span class="n">expected_failed_checks</span><span class="o">=</span><span class="k">lambda</span> <span class="n">est</span><span class="p">:</span> <span class="n">CHECKS_EXPECTED_TO_FAIL</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">test_with_parametrize_with_checks</span><span class="p">(</span><span class="n">estimator</span><span class="p">,</span> <span class="n">check</span><span class="p">):</span>
  <span class="n">check</span><span class="p">(</span><span class="n">estimator</span><span class="p">)</span>
</code></pre></div></div>

<p>While working on the testing infrastructure, we have also been working on improving our
tests and that means in this release we had a particularly high number of changes in
their names and what they do. The changes will make it easier for developers to fix
issues with their estimators. Note that you can now pass <code class="language-plaintext highlighter-rouge">legacy=False</code> to both
<code class="language-plaintext highlighter-rouge">check_estimator</code> and <code class="language-plaintext highlighter-rouge">parametrize_with_checks</code> to include only strictly API related
tests.</p>

<p>The above changes mean developers need to update their estimators and depending on
what they use, write scikit-learn version specific code to handle supporting multiple
scikit-learn versions. To make that process easier, we’ve worked on a package called
<a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fsklearn-compat%2Fsklearn-compat%2F"><code class="language-plaintext highlighter-rouge">sklearn_compat</code></a>. You can either
depend on it as a package dependency, or vendor a single file inside your project. At
the moment this project is in its infancy and might change in the future. But hopefully
it helps developers out there.</p>

<p>If you think there are missing functionalities in the developer API, please let us know
and give us feedback on our <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fissues">issue tracker</a>.</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Updates" /><category term="Open Source" /><category term="Machine Learning" /><category term="License" /><summary type="html"><![CDATA[Author: Adrin Jalali]]></summary></entry><entry><title type="html">Announcing the launch of the scikit-learn user survey</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fupdates%2Fsurvey-announcement%2F" rel="alternate" type="text/html" title="Announcing the launch of the scikit-learn user survey" /><published>2024-09-02T00:00:00+00:00</published><updated>2024-09-02T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/updates/survey-announcement</id><content type="html" xml:base="https://blog.scikit-learn.org/updates/survey-announcement/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2F" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://github.com/inessapawson" href="iframe.php?url=https%3A%2F%2Fgithub.com%2Finessapawson" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Finessa-pawson.jpg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Inessa Pawson</a>
      
        <a href="iframe.php?url=mailto%3Ainessapawson%40gmail.com" title="inessapawson@gmail.com"><span><i class="elastic-fai fas fa-envelope"></i></span></a>
      , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://github.com/francoisgoupil" href="iframe.php?url=https%3A%2F%2Fgithub.com%2Ffrancoisgoupil" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Ffrancois_goupil.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />François Goupil</a>
      
        <a href="iframe.php?url=mailto%3Afrancois.goupil%40inria.fr" title="francois.goupil@inria.fr"><span><i class="elastic-fai fas fa-envelope"></i></span></a>
      

<br /><br />

</div>

<p>We are excited to announce the launch of the scikit-learn user survey! Scikit-learn
continues to evolve thanks to contributions from its diverse user community. As we plan
for future releases, we want to ensure we are focusing on what matters most to you — our
users.</p>

<p>The goal of this survey is to better understand how users interact with the library,
identify any pain points, learn about the features you find most useful, and what’s
missing. This is your chance to have a say in how the library grows and adapts to meet
the evolving needs of the machine learning community.</p>

<p>The survey will take about 15 minutes of your time. It is available in Arabic, French,
English, Japanese, Mandarin, Spanish, and Portuguese. You have the option to remain
completely anonymous, and the data collected will be used solely for the purpose of
improving scikit-learn.</p>

<p>This user survey is a truly collaborative effort. We would like to thank the teams from
probabl, University of Oxford (UK), and POSSEE OpenTeams, as well as many scikit-learn
contributors, for their time and effort in designing and translating it.</p>

<p>Once the survey closes, we’ll analyze the responses and publish the findings in a
follow-up blog post.</p>

<p>To take the survey, visit: 
<a href="iframe.php?url=https%3A%2F%2Fforms.gle%2Fp5P7AweCJCbFMzfo6">https://forms.gle/p5P7AweCJCbFMzfo6</a>. 
The survey will remain open until October 14th, 2024, and we encourage you to share it with your 
colleagues and extended network.</p>

<p>We value every contribution in our community, and we’re committed to making scikit-learn
even better. Your feedback is the foundation upon which scikit-learn will continue to
grow and evolve. We look forward to hearing from you!</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Updates" /><category term="Community" /><category term="Open Source" /><summary type="html"><![CDATA[Author: Inessa Pawson , François Goupil]]></summary></entry><entry><title type="html">Chan Zuckerberg Initiative considers scikit-learn an Essential Open Source Software</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Ffunding%2Fczi-eoss6-announcement%2F" rel="alternate" type="text/html" title="Chan Zuckerberg Initiative considers scikit-learn an Essential Open Source Software" /><published>2024-08-06T00:00:00+00:00</published><updated>2024-08-06T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/funding/czi-eoss6-announcement</id><content type="html" xml:base="https://blog.scikit-learn.org/funding/czi-eoss6-announcement/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fsklearn_czi.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://github.com/glemaitre" href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fglemaitre" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fguillaume-lemaitre.jpg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Guillaume Lemaitre</a>
     , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://github.com/lucyleeow" href="iframe.php?url=https%3A%2F%2Fgithub.com%2Flucyleeow" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Flucyliu.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Lucy Liu</a>
     

<br /><br />

</div>

<p>We are delighted to announce that <code class="language-plaintext highlighter-rouge">scikit-learn</code> has been awarded a grant from
the <a href="iframe.php?url=https%3A%2F%2Fchanzuckerberg.com%2F">Chan Zuckerberg Initiative (CZI)</a>’s <a href="iframe.php?url=https%3A%2F%2Fchanzuckerberg.com%2Frfa%2Fessential-open-source-software-for-science%2F">Essential Open
Source Software for Science
(EOSS)</a>
program. This grant is funded by <a href="iframe.php?url=https%3A%2F%2Fwellcome.org%2F">Wellcome Trust</a>.
As in previous rounds, this cycle supports open-source software projects that are
essential to biomedical research. This is the third time that CZI EOSS supports
<code class="language-plaintext highlighter-rouge">scikit-learn</code>.</p>

<p>In this new grant, we will focus on improving the <a href="iframe.php?url=https%3A%2F%2Fchanzuckerberg.com%2Feoss%2Fproposals%2Fpredictive-models-evaluation-inspection-in-scikit-learn%2F">evaluation and inspection of
predictive
models</a>.</p>

<h2 id="predictive-models-evaluation--inspection">Predictive models evaluation &amp; inspection</h2>

<p>When building a machine learning pipeline for a specific research problem, two key
aspects are closely connected: (i) design of the pipeline and (ii) assessment, analysis, and
inspection of it. Researchers strive to identify the optimal pipeline, maximizing specific
evaluation metrics, while also seeking at explaining the validity and rationale behind
the pipeline’s predictions. This is the cornerstone of answering research
questions. With this proposal we aim to improve and extend the available <code class="language-plaintext highlighter-rouge">scikit-learn</code>
tools.</p>

<p><code class="language-plaintext highlighter-rouge">scikit-learn</code> provides building blocks for model evaluation and statistical analysis of
results. Originally, this information was presented in a raw format and required
expertise from scientists to create intuitive reports for outreach to peers and
outsiders. Recently, the <code class="language-plaintext highlighter-rouge">scikit-learn</code> community developed displays to easily generate
visual figures for communicating such results. However, these displays are still in
their early development stages and do not leverage all available statistical analysis
tools (i.e., cross-validation) from <code class="language-plaintext highlighter-rouge">scikit-learn</code>. Thus, we aim to expand these
displays, using the right statistical tools and thus promote the adoption of best
practices when reporting results. Additionally, we also intend to create new displays
to support common analysis tasks that are not yet covered in <code class="language-plaintext highlighter-rouge">scikit-learn</code>.</p>

<p>In the domain of model inspection, we aim to address several areas: (i) model inspection
during training, (ii) enhancing user experience through interactive inspection, and
(iii) model explainability. First, during the training of a pipeline, researchers are
interested in monitoring the internal characteristics of the model, which is a not yet
addressed long-standing issue in <code class="language-plaintext highlighter-rouge">scikit-learn</code>. We want to build upon some initial work
by implementing a “callback” framework that allows users to track these internal
parameters. Next, researchers commonly use interactive tools such as Jupyter Notebook to
develop pipelines. <code class="language-plaintext highlighter-rouge">scikit-learn</code> started some efforts to visually and interactively
display pipelines in these environments. However, there is room for improvement in terms
of user interaction and accessibility. Finally, as <code class="language-plaintext highlighter-rouge">scikit-learn</code> is widely used as a
reference package, it is crucial to improve the section of the library dedicated to
model explainability. We aim to improve the documentation and user experience with the
existing explainability tools, making sure that they use the appropriate tool for their
use cases. In addition, we propose to work on a scikit-learn enhancement proposal (SLEP)
to define a common API for model explainability within scikit-learn. Ultimately, the
goal is to come to a consensus to provide scikit-learn end-users with a consistent
experience when using model explainability tools.</p>

<p>On top of all these items, we intend to continue working on the general maintenance of
the project, addressing bug reports and performance regressions. As a community-driven
project, we also want to dedicate time reviewing external contributions.</p>

<h2 id="involved-people">Involved people</h2>

<p>To execute this project, we plan the following hires:</p>

<ul>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Flucyleeow">Lucy Liu</a> (Quansight Labs) will work about half-time on
the project, on topic related to displays and feature importance.</li>
  <li>We will hire full-time internships to work on the other part of the project. The
initial plan is to hire two interns for a period of 6 months each and repeat this
process for the next 2 years. We want to provide opportunities to underrepresented
groups in the field of machine learning and data science, similarly to previous
initiatives (cf. <a href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fdiversity%2Fmentoring%2F">NumFOCUS Small Development
Grant</a>).</li>
</ul>

<h2 id="past-czi-eoss-grants">Past CZI EOSS grants</h2>

<p>In the past <code class="language-plaintext highlighter-rouge">scikit-learn</code> has been awarded two grants from the CZI EOSS program:</p>

<ul>
  <li><a href="iframe.php?url=https%3A%2F%2Fchanzuckerberg.com%2Feoss%2Fproposals%2Fscikit-learn-maintenance-and-enhancement-for-gradient-boosting%2F">CZI EOSS Cycle 1</a>
helped at creating to the
<a href="iframe.php?url=https%3A%2F%2Fscikit-learn.org%2Fstable%2Fmodules%2Fgenerated%2Fsklearn.ensemble.HistGradientBoostingClassifier.html"><code class="language-plaintext highlighter-rouge">HistGradientBoostingClassifier</code></a> and
<a href="iframe.php?url=https%3A%2F%2Fscikit-learn.org%2Fstable%2Fmodules%2Fgenerated%2Fsklearn.ensemble.HistGradientBoostingRegressor.html"><code class="language-plaintext highlighter-rouge">HistGradientBoostingRegressor</code></a> estimators.
These estimators are the equivalent of gradient boosting models implemented in
<code class="language-plaintext highlighter-rouge">LightGBM</code> and <code class="language-plaintext highlighter-rouge">XGBoost</code>.</li>
  <li><a href="iframe.php?url=https%3A%2F%2Fchanzuckerberg.com%2Feoss%2Fproposals%2Fmaintenance-extension-of-scikit-learn-machine-learning-in-python%2F">CZI EOSS Cycle 4</a>
extended <code class="language-plaintext highlighter-rouge">scikit-learn</code> to work better with missing values and categorical data in
several estimators.</li>
</ul>

<p>Both grants allowed us to maintain and enhance <code class="language-plaintext highlighter-rouge">scikit-learn</code> to better serve the
community.</p>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Funding" /><category term="Open Source" /><category term="Funding" /><category term="Internship" /><category term="Diversity" /><summary type="html"><![CDATA[Author: Guillaume Lemaitre , Lucy Liu]]></summary></entry><entry><title type="html">Interview with Adam Li, scikit-learn Team Member</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fteam%2Fadam-li-interview%2F" rel="alternate" type="text/html" title="Interview with Adam Li, scikit-learn Team Member" /><published>2024-07-24T00:00:00+00:00</published><updated>2024-07-24T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/team/adam-li-interview</id><content type="html" xml:base="https://blog.scikit-learn.org/team/adam-li-interview/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2Fadam-li-interview.png" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://reshamas.github.io" href="iframe.php?url=https%3A%2F%2Freshamas.github.io" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Freshama_shaikh.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Reshama Shaikh</a>
     , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://adam2392.github.io/" href="iframe.php?url=https%3A%2F%2Fadam2392.github.io%2F" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fadam-li.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Adam Li</a>
     

<br /><br />

</div>

<p>BIO:  Adam is currently a Postdoctoral Research Scientist at Columbia University in the Causal Artificial Intelligence Lab, directed by <a href="iframe.php?url=https%3A%2F%2Fcausalai.net%2F">Dr. Elias Bareinboim</a>. He is an <a href="iframe.php?url=https%3A%2F%2Fcifellows2021.org%2F2021-class%2F">NSF-funded Computing Innovation Research Fellow</a>. He did his PhD in biomedical engineering, specializing in computational neuroscience and machine learning at Johns Hopkins University working with Dr. Sridevi V. Sarma in the <a href="iframe.php?url=https%3A%2F%2Fsarmalab.icm.jhu.edu%2F">Neuromedical Control Systems group</a>. He also jointly obtained a MS in Applied Mathematics and Statistics with a focus in statistical learning theory, optimization and matrix analysis. He was fortunate to be a <a href="iframe.php?url=https%3A%2F%2Ficm.jhu.edu%2F2017%2F03%2F20%2Fadam-li-selected-for-nsf-graduate-research-and-whitaker-international-fellowships%2F%23.YH2ZT6lKj0o">NSF-GRFP fellow, Whitaker International Fellow</a>, <a href="iframe.php?url=https%3A%2F%2Ficm.jhu.edu%2F2017%2F06%2F16%2Fadam-li-icm-phd-student-selected-for-chateaubriand-fellowship%2F%23.YH2Zi6lKj0o">Chateaubriand Fellow</a> and <a href="iframe.php?url=https%3A%2F%2Ficm.jhu.edu%2F2020%2F07%2F20%2Fadam-li-icm-phd-student-receives-arcs-scholarship%2F%23.YH2ZbKlKj0o">ARCS Chapter Scholar</a> during his time at JHU. Adam officially joined the scikit-learn team as a maintainer in July 2024.</p>

<ul>
  <li>GitHub: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fadam2392">@adam2392</a></li>
  <li>LinkedIn: <a href="iframe.php?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fadam2392%2F">@adam2392</a></li>
  <li>Website: <a href="iframe.php?url=https%3A%2F%2Fadam2392.github.io%2F">https://adam2392.github.io</a></li>
</ul>

<p>Link to scikit-learn contributions (issues, pull requests):</p>
<ul>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F27966">FEA Add missing-value support for ExtaTreeClassifier and ExtaTreeRegressor</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29331">DOC Fix tree explanation of tree_.value in example</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F28622">ENH Enable prediction of isolation forest in parallel</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F26736">ENH Adding estimators_samples_ attribute to forest models</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F28494">FEA SLEP006: Metadata routing for SelfTrainingClassifier</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29266">FEAT SLEP006 permutation_test_score to support metadata routing</a></li>
  <li><a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29437">FEA Categorical split support for DecisionTree<em>, ExtraTree</em>, RandomForest* and `ExtraTrees* #29437</a></li>
  <li>Issue: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fissues%2F20819">Adding Oblique Trees (Forest-RC) to the Cythonized Tree Module</a></li>
</ul>

<ol>
  <li>
    <p><strong>Tell us about yourself.</strong></p>

    <p>I currently live in New York City, where I work on theoretical and applied AI research through the lens of causal inference, statistical modeling, dynamical systems and signal processing. My current research is focused on telling a causal story, specifically in the case one has multiple distributions of data from the same causal system. For example, one may have access to brain recordings from monkeys and humans. Given these heterogeneous datasets, I am interested in answering: what causal relationships can we learn. This is known as the causal discovery problem, where given data, one attempts to learn what causes what. Another problem that I work on that is highly relevant to generative AI is the problem of causal representation learning. Here, I develop theory and train deep neural networks to understand causality among latent factors. Specifically, we demonstrate how to leverage multiple datasets and a causal neural network to generate data that is causally realistic. This can enable more robust data generation from general latent variable models.</p>
  </li>
  <li>
    <p><strong>How did you first become involved in open source and scikit-learn?</strong></p>

    <p>I first got involved in open source as a user. I was making the switch from Matlab to Python and started using packages like numpy and scipy pretty regularly. In my PhD research, I dealt with a lot of electrophysiological data (i.e. EEG brain recordings).  I was writing hundreds of lines of code to load and preprocess data, and it was always changing based on different constraints.  That was when I discovered <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools%2Fmne-bids">MNE-BIDS</a>, a Python package within the MNE framework for reading and writing brain recording data in a structured format. This changed my life because now my preprocessing and data loading code was a few lines of code that adhered to an open standard tested by thousands of researchers. I realized the value of open source, and began contributing in my spare time.</p>
  </li>
  <li>
    <p><strong>We would love to learn of your open source journey.</strong></p>

    <p>I first started contributing to open-source in the <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools">MNE</a> organization. This package implements data structures for the processing and analysis of neural recording data (e.g. MEG, EEG, iEEG data). I contributed over 70 pull requests in the MNE-BIDS package, and subsequently was invited to be a maintainer for <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools%2Fmne-bids">MNE-BIDS</a> and <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools%2Fmne-python">MNE-Python</a>. Later one, I participated in a Google Summer of Code to port the connectivity submodule within MNE-Python to a new package, known as <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools%2Fmne-connectivity">MNE-Connectivity</a>. I added new data structures, and algorithms for the sake of improving the feature developments for connectivity algorithms among neural recording data. Later on, I also worked with a team on porting a neural network architecture from Matlab to the MNE framework to automatically classify ICA derived components. This became known as <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fmne-tools%2Fmne-icalabel">MNE-ICALabel</a>. These experiences gave me the experience necessary to work in a large asynchronous team environment that is common in OSS. It also taught me how to begin contributing to an OSS project. This led me to scikit-learn.</p>

    <p>I first got involved in scikit-learn as a user, who was heavily interested in the decision tree model in scikit-learn (random forest, randomized trees). Here, I was interested in contributing a <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fissues%2F20819">new oblique decision tree model</a> that was a generalization of the existing random forest model. However, the code was not easily added to scikit-learn, and currently the decision to include it is inconclusive. Throughout this process, I learned about the challenges and intricacies of maintaining such a large OSS project as scikit-learn. It is not trivial to simply add new features to a large OSS project because code comes with a maintenance cost, and should fit with the current internal design. At this point in time, there were very few maintainers that were able to maintain the tree submodule, and as such new features are included conservatively.</p>

    <p>I was eager to improve the project to enable more exciting features for the community, so I began contributing to scikit-learn starting with smaller issues such as documentation improvements, or minor bug fixes to get acquainted with the codebase. I also refactored various Cython code to begin upgrading the codebase, especially in the tree submodule. Throughout this process, I identified other projects the maintainers team were working on, and also contributed there. For example, I added metadata routing to a variety of different functions and estimators in scikit-learn. I also began reviewing PRs for the tree submodule and metadata routing where I had knowledge. I also added missing-value support for extremely randomized tree models (called ExtraTrees in scikit-learn). This allows users to pass in data that contains missing values (encoded as <code class="language-plaintext highlighter-rouge">np.nan</code>) to ExtraTrees. Around this time, I was invited to join the maintainer team of scikit-learn. More recently, I have taken on the project to add <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fscikit-learn%2Fscikit-learn%2Fpull%2F29437">categorical data support</a> to the decision tree models, which will make random forests and extremely randomized tree models more performant and capable to handle real world settings where there is commonly categorical data.</p>
  </li>
  <li>
    <p><strong>To which OSS projects and communities do you contribute?</strong></p>

    <p>I currently primarily contribute to scikit-learn, <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fpy-why%2Fdodiscover">PyWhy</a> (a community for causal inference in Python), and also develop my own OSS project: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2Fneurodata%2Ftreeple">treeple</a>. Treeple is an exciting package that implements different decision tree models beyond those offered in scikit-learn with an efficient Cython implementation stemming from the scikit-learn tree internals.</p>
  </li>
  <li>
    <p><strong>What do you find alluring about OSS?</strong></p>

    <p>OSS is so exciting because of the impact it has. Everyone from private projects to other OSS projects will use OSS. Any fixes to documentation, performance improvements, or new features will potentially impact the workflows of potentially millions of people. This is what makes contributing to OSS so exciting. Moreover, this impact ensures that best practices are usually carried out in these projects, and it’s a great playground to learn from the best, while giving back to the larger community.</p>
  </li>
  <li>
    <p><strong>What pain points do you observe in community-led OSS?</strong></p>

    <p>Right now, community lead OSS moves very slowly in most places. This is for a number of very good reasons: i) not releasing buggy features that may impact millions of people, and ii) backwards compatibility. One of the challenges of maintaining a high-quality OSS project is that you would like to satisfy your users, who may all utilize different components of the project from different versions. As such, many community led OSS projects take a conservative approach when implementing new features and new ideas. However, there may be many exciting better features that are already known by the community, but still lack an OSS implementation.</p>

    <p>I think this can be partially solved by increased funding for OSS, so OSS maintainers and developers are able to dedicate more time to maintaining and improving the projects. In addition, I think this can be improved if more developers in the community contribute to said OSS projects. I hope that I have convinced you though that contributing to OSS is impactful and highly educational.</p>
  </li>
  <li>
    <p><strong>If we discuss how far OS has evolved in 10 years, what would you like to see happen?</strong></p>

    <p>I think more interoperability and integrated workflows for projects will make projects that utilize OSS more streamlined and efficient. For example, right now there are different array libraries (e.g. numpy, cupy, xarray, pytorch, etc.), which all support some manner of a n-dimensional array, but with a slightly different API. This makes it very painful to transition across different libraries that use different arrays. In addition, there are multiple dataframe libraries, such as pandas and polars, and this problem of API consistency also arises there.</p>

    <p>Some work has been made on the Array-API front to allow different array libraries to serve as backends given a common API. This will enable GPU acceleration for free without a single code change, which is great! This will be exciting because users will eventually only have to write code in a single way, and can then leverage any array/dataframe library that has different advantages and disadvantages based on the user use case.</p>
  </li>
  <li>
    <p><strong>What are your hobbies, outside of work and open source?</strong></p>

    <p>I enjoy running, trying new restaurants and bars, cooking and reading. I’m currently training for a half-marathon, where my goal is to run under 8 minutes per mile. I’m also trying to perfect a salad with an asian-themed dressing. In a past life, I was a bboy (breakdancer) for ten years until I stopped in graduate school because I got busy (and old).</p>
  </li>
</ol>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Team" /><category term="Open Source" /><summary type="html"><![CDATA[Author: Reshama Shaikh , Adam Li]]></summary></entry><entry><title type="html">Interview with Yao Xiao, scikit-learn Team Member</title><link href="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fteam%2Fyao-interview%2F" rel="alternate" type="text/html" title="Interview with Yao Xiao, scikit-learn Team Member" /><published>2024-07-18T00:00:00+00:00</published><updated>2024-07-18T00:00:00+00:00</updated><id>https://blog.scikit-learn.org/team/yao-interview</id><content type="html" xml:base="https://blog.scikit-learn.org/team/yao-interview/"><![CDATA[<div>
  <img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fposts_images%2F" alt="" />
  

  
  
  
  

  

  

Author:  
      <a itemprop="sameAs" content="https://reshamas.github.io" href="iframe.php?url=https%3A%2F%2Freshamas.github.io" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Freshama_shaikh.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Reshama Shaikh</a>
     , 


  
  
  
  

  

  


      <a itemprop="sameAs" content="https://charlie-xiao.github.io/" href="iframe.php?url=https%3A%2F%2Fcharlie-xiao.github.io%2F" rel="me noopener noreferrer" style="vertical-align:top;"><img src="iframe.php?url=https%3A%2F%2Fblog.scikit-learn.org%2Fassets%2Fimages%2Fauthor_images%2Fyao-xiao.jpeg" style="width:1em;margin-right:.5em;border-radius: 50%;" alt="Author Icon" class="orcid-icon" />Yao Xiao</a>
     

<br /><br />

</div>

<p>Yao Xiao recently earned his undergraduate degree in mathematics and computer science. He will be pursuing a Master’s degree in Computational Science and Engineering at Harvard SEAS. Yao joined the scikit-learn team in February 2024.</p>

<ol>
  <li>
    <p><strong>Tell us about yourself.</strong></p>

    <p>My name is Yao Xiao and I live in Shanghai, China. At the time of interview I have just got my Bachelor’s degree in Honors Mathematics and Computer Science at NYU Shanghai, and I’m going to pursue a Master’s degree in Computational Science and Engineering at Harvard SEAS. My current research interests are in networks and systems (e.g. sys4ml and ml4sys), but this may change in the future.</p>

    <ul>
      <li>GitHub: <a href="iframe.php?url=https%3A%2F%2Fgithub.com%2FCharlie-XIAO">@Charlie</a></li>
      <li>LinkedIn: <a href="iframe.php?url=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fyao-xiao-200073244%2F">@yao-xiao</a></li>
      <li>Website: <a href="iframe.php?url=https%3A%2F%2Fcharlie-xiao.github.io%2F">https://charlie-xiao.github.io</a></li>
    </ul>
  </li>
  <li>
    <p><strong>How did you first become involved in open source and scikit-learn?</strong></p>

    <p>In my junior year I took a course at NYU Courant called Open Source Software Development where we needed to make contributions to an open source software as our final project - and I chose scikit-learn.</p>
  </li>
  <li>
    <p><strong>We would love to learn of your open source journey.</strong></p>

    <p>I was lucky to get involved in a pretty easy meta-issue when I first started contributing to scikit-learn. I made quite a few PRs towards that issue, familiarizing myself with the coding standards, contributing workflow etc., and during which I gradually explored the codebase and learned a lot from maintainers how to write better code. After that meta-issue was completed, I decided to continue contributing since I enjoyed the experience, and I started looking through the open issues, tried reproducing and investigating them, then opened PRs for those that I was able to solve. It is the process of familiarizing with more parts of the codebase, being able to make more PRs, so on and so forth. While contributing to scikit-learn, sometimes there are also issues to solve upstream, so I also had opportunities to contribute to projects like pandas and pydata-sphinx-theme. Up till today I’m still far from familiar with the entire scikit-learn project, but I will definitely continue the amazing open-source journey.</p>
  </li>
  <li>
    <p><strong>To which OSS projects and communities do you contribute?</strong></p>

    <p>I have contributed to scikit-learn, pandas, pydata-sphinx-theme, sphinx-gallery. I’m also writing some small softwares that I decide to make open source.</p>
  </li>
  <li>
    <p><strong>What do you find alluring about OSS?</strong></p>

    <p>It is amazing to feel that my code is being used by so many people all around the world through contributing to open source projects. Well it might be inappropriate to say “my code”, but I do feel like making some actual contributions to the community instead of just writing code for myself. Also OSS makes me care about code quality and so on instead of merely making things “work”, which is very important for programmers but not really taught in school.</p>
  </li>
  <li>
    <p><strong>What pain points do you observe in community-led OSS?</strong></p>

    <p>Collaboration can lead to better code but also slows down the development process. Especially when there are not enough reviewers around, issues and PRs can easily get stale or forgotten. But I would say it’s more like a tradeoff rather than a pain point.</p>
  </li>
  <li>
    <p><strong>If we discuss how far OS has evolved in 10 years, what would you like to see happen?</strong></p>

    <p>I couldn’t say about the past 10 years since I’ve only been involved for about one and a half years, but regarding the scientific Python ecosystem I would like to see better coordination across projects (which is already happening). For instance a common interface for array libraries and dataframe libraries would allow downstream dependents to easily provide more flexible support for different input/output types, etc. And as a Chinese I would also hope that open source can thrive in my country some day as well.</p>
  </li>
  <li>
    <p><strong>What are your favorite resources, books, courses, conferences, etc?</strong></p>

    <p>As for physical books I would recommend <em>The Pragmatic Programmer</em> by Andy Hunt and Dave Thomas, and <em>Refactoring: Improving the Design of Existing Code</em> by Martin Fowler and Kent Back. As for courses I like MIT’s <em>The Missing Semester of Your CS Education</em>. In particular about learning Python, <em>The Python Tutorial</em> in the official Python documentation is good enough for me. By the way I want to mention that <strong>documentations</strong> of most languages and popular packages are very nice and they are the best place to learn the most up-to-date information.</p>
  </li>
  <li>
    <p><strong>What are your hobbies, outside of work and open source?</strong></p>

    <p>I would say my largest hobby is programming (not for school, not for work, just for fun). I’ve recently been fascinated with <a href="iframe.php?url=https%3A%2F%2Fv2.tauri.app%2F">Tauri</a> and wrote a lot of small desktop applications for myself in my spare time. Apart from this I also love playing the piano and I’m an anime lover, so I often listen to or play piano versions of anime theme songs (mostly arranged by <a href="iframe.php?url=https%3A%2F%2Fwww.animenzpiano.com%2F">Animenz</a>).</p>
  </li>
</ol>]]></content><author><name>{&quot;bio&quot;=&gt;&quot;Open source library for machine learning in Python.&quot;, &quot;links&quot;=&gt;[{&quot;label&quot;=&gt;&quot;GitHub&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-github-square&quot;, &quot;url&quot;=&gt;&quot;https://github.com/scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;LinkedIn&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-linkedin&quot;, &quot;url&quot;=&gt;&quot;https://linkedin.com/company/scikit-learn/&quot;}, {&quot;label&quot;=&gt;&quot;Bluesky&quot;, &quot;icon&quot;=&gt;&quot;&quot;, &quot;url&quot;=&gt;&quot;https://bsky.app/profile/scikit-learn.org&quot;}, {&quot;label&quot;=&gt;&quot;Mastodon&quot;, &quot;icon&quot;=&gt;&quot;fab fa-brands fa-mastodon&quot;, &quot;url&quot;=&gt;&quot;https://fosstodon.org/@sklearn&quot;}, {&quot;label&quot;=&gt;&quot;YouTube&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-youtube&quot;, &quot;url&quot;=&gt;&quot;https://www.youtube.com/@scikit-learn&quot;}, {&quot;label&quot;=&gt;&quot;Facebook&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-facebook-square&quot;, &quot;url&quot;=&gt;&quot;https://facebook.com/scikitlearnofficial/&quot;}, {&quot;label&quot;=&gt;&quot;Instagram&quot;, &quot;icon&quot;=&gt;&quot;fab fa-fw fa-instagram&quot;, &quot;url&quot;=&gt;&quot;https://instagram.com/scikitlearnofficial/&quot;}]}</name></author><category term="Team" /><category term="Open Source" /><summary type="html"><![CDATA[Author: Reshama Shaikh , Yao Xiao]]></summary></entry></feed>