Sesión 5 A

Parameters

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators <combining_estimators>` for more details.	[('features', ...), ('scaler', ...), ...]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing <metadata_routing>`. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. Only defined if the underlying first estimator in `steps` exposes such an attribute when fit. .. versionadded:: 0.24	int	1

PolynomialFeatures

Parameters

	degree degree: int or tuple (min_degree, max_degree), default=2 If a single int is given, it specifies the maximal degree of the polynomial features. If a tuple `(min_degree, max_degree)` is passed, then `min_degree` is the minimum and `max_degree` is the maximum polynomial degree of the generated features. Note that `min_degree=0` and `min_degree=1` are equivalent as outputting the degree zero term is determined by `include_bias`.	3
	interaction_only interaction_only: bool, default=False If `True`, only interaction features are produced: features that are products of at most `degree` distinct input features, i.e. terms with power of 2 or higher of the same input feature are excluded: - included: `x[0]`, `x[1]`, `x[0] * x[1]`, etc. - excluded: `x[0] 2`, `x[0] 2 * x[1]`, etc.	False
	include_bias include_bias: bool, default=True If `True` (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).	True
	order order: {'C', 'F'}, default='C' Order of output array in the dense case. `'F'` order is faster to compute, but may slow down subsequent estimators. .. versionadded:: 0.21	'C'

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	1
n_output_features_ n_output_features_: int The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.	int	4
powers_ powers_: ndarray of shape (`n_output_features_`, `n_features_in_`) `powers_[i, j]` is the exponent of the jth input in the ith output.	ndarray[int64](4, 1)	[[0], [1], [2], [3]]

4 features

1

x0

x0^2

x0^3

StandardScaler

?Documentation for LinearRegression

Parameters

	copy copy: bool, default=True If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.	True
	with_mean with_mean: bool, default=True If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.	True
	with_std with_std: bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).	True

Fitted attributes

Name	Type	Value
mean_ mean_: ndarray of shape (n_features,) or None The mean value for each feature in the training set. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](4,)	[1. ,0.49,0.33,0.25]
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	4
n_samples_seen_ n_samples_seen_: int or ndarray of shape (n_features,) The number of samples processed by the estimator for each feature. If there are no missing samples, the ``n_samples_seen`` will be an integer, otherwise it will be an array of dtype int. If `sample_weights` are used it will be a float (if no missing data) or an array of dtype float that sums the weights seen so far. Will be reset on new calls to fit, but increments across ``partial_fit`` calls.	float64	16
scale_ scale_: ndarray of shape (n_features,) or None Per feature relative scaling of the data to achieve zero mean and unit variance. Generally this is calculated using `np.sqrt(var_)`. If a variance is zero, we can't achieve unit variance, and the data is left as-is, giving a scaling factor of 1. `scale_` is equal to `None` when `with_std=False`. .. versionadded:: 0.17 scale_	ndarray[float64](4,)	[1. ,0.3 ,0.3 ,0.28]
var_ var_: ndarray of shape (n_features,) or None The variance for each feature in the training set. Used to compute `scale_`. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](4,)	[0. ,0.09,0.09,0.08]

4 features

x0

x1

x2

x3

LinearRegression

Parameters

	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies the convergence criterion of the underlying solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. `tol` is set as `cond` of :func:`scipy.linalg.lstsq` when fitting on dense training data. .. versionadded:: 1.7 .. versionchanged:: 1.9 Now supported on dense data, interpreted as the `cond` parameter.	1e-06
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.	None
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24	False

Fitted attributes

Name	Type	Value
coef_ coef_: array of shape (n_features, ) or (n_targets, n_features) Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.	ndarray[float64](4,)	[ 0. , 2.69,-8.69, 5.47]
intercept_ intercept_: float or array of shape (n_targets,) Independent term in the linear model. Set to 0.0 if `fit_intercept = False`.	float64	0.1327
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	4
rank_ rank_: int Rank of matrix `X`. Only available when `X` is dense.	int	3
singular_ singular_: array of shape (min(X, y),) Singular values of `X`. Only available when `X` is dense.	ndarray[float64](4,)	[6.84,1.08,0.14,0. ]

# Genera los coeficientes de la regresión lineal
model_3.named_steps['model'].coef_

array([ 0.        ,  2.68944506, -8.69362895,  5.46899018])

# ¿cuál es el score sobre los datos de entrenamiento?
model_3.score(x_train, y_train)

0.9689517224841414

# ¿cuál es el score sobre los datos de test?
model_3.score(x_test, y_test)

0.624938368842177

# Grafica los resultados de train y test, así como el polinomio
plt.plot(x_train[:, 0], y_train, 'ob', label='Datos de entrenamiento')
plt.plot(x_test[:, 0], y_test, 'og', label='Datos de prueba')
x_model = np.linspace(0, 1, 100)
y_model = model_3.predict(x_model[:, None])
plt.plot(x_model, y_model, '-r', label='Modelo')
plt.plot(x_test[:, 0], model_3.predict(x_test), '*r', label='Predicciones')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()

../_images/89f5e141932745e3950a13427822f5796dd9382a46b7b72d817604ab298a0246.png

# Define un nuevo modelo con grado 10
model_10 = Pipeline([
    ('features', PolynomialFeatures(degree=10)),
    ('scaler', StandardScaler()),
    ('model', LinearRegression())
])

# Ajusta el modelo anterior
model_10.fit(x_train, y_train)

Pipeline(steps=[('features', PolynomialFeatures(degree=10)),
                ('scaler', StandardScaler()), ('model', LinearRegression())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

Parameters

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators <combining_estimators>` for more details.	[('features', ...), ('scaler', ...), ...]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing <metadata_routing>`. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. Only defined if the underlying first estimator in `steps` exposes such an attribute when fit. .. versionadded:: 0.24	int	1

PolynomialFeatures

Parameters

	degree degree: int or tuple (min_degree, max_degree), default=2 If a single int is given, it specifies the maximal degree of the polynomial features. If a tuple `(min_degree, max_degree)` is passed, then `min_degree` is the minimum and `max_degree` is the maximum polynomial degree of the generated features. Note that `min_degree=0` and `min_degree=1` are equivalent as outputting the degree zero term is determined by `include_bias`.	10
	interaction_only interaction_only: bool, default=False If `True`, only interaction features are produced: features that are products of at most `degree` distinct input features, i.e. terms with power of 2 or higher of the same input feature are excluded: - included: `x[0]`, `x[1]`, `x[0] * x[1]`, etc. - excluded: `x[0] 2`, `x[0] 2 * x[1]`, etc.	False
	include_bias include_bias: bool, default=True If `True` (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).	True
	order order: {'C', 'F'}, default='C' Order of output array in the dense case. `'F'` order is faster to compute, but may slow down subsequent estimators. .. versionadded:: 0.21	'C'

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	1
n_output_features_ n_output_features_: int The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.	int	11
powers_ powers_: ndarray of shape (`n_output_features_`, `n_features_in_`) `powers_[i, j]` is the exponent of the jth input in the ith output.	ndarray[int64](11, 1)	[[ 0], [ 1], [ 2], ..., [ 8], [ 9], [10]]

11 features

1

x0

x0^2

x0^3

x0^4

x0^5

x0^6

x0^7

x0^8

x0^9

x0^10

StandardScaler

?Documentation for LinearRegression

Parameters

	copy copy: bool, default=True If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.	True
	with_mean with_mean: bool, default=True If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.	True
	with_std with_std: bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).	True

Fitted attributes

Name	Type	Value
mean_ mean_: ndarray of shape (n_features,) or None The mean value for each feature in the training set. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](11,)	[1. ,0.49,0.33,...,0.11,0.09,0.08]
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	11
n_samples_seen_ n_samples_seen_: int or ndarray of shape (n_features,) The number of samples processed by the estimator for each feature. If there are no missing samples, the ``n_samples_seen`` will be an integer, otherwise it will be an array of dtype int. If `sample_weights` are used it will be a float (if no missing data) or an array of dtype float that sums the weights seen so far. Will be reset on new calls to fit, but increments across ``partial_fit`` calls.	float64	16
scale_ scale_: ndarray of shape (n_features,) or None Per feature relative scaling of the data to achieve zero mean and unit variance. Generally this is calculated using `np.sqrt(var_)`. If a variance is zero, we can't achieve unit variance, and the data is left as-is, giving a scaling factor of 1. `scale_` is equal to `None` when `with_std=False`. .. versionadded:: 0.17 scale_	ndarray[float64](11,)	[1. ,0.3 ,0.3 ,...,0.19,0.17,0.16]
var_ var_: ndarray of shape (n_features,) or None The variance for each feature in the training set. Used to compute `scale_`. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](11,)	[0. ,0.09,0.09,...,0.03,0.03,0.03]

11 features

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

LinearRegression

Parameters

	fit_intercept fit_intercept: bool, default=True Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	tol tol: float, default=1e-6 The precision of the solution (`coef_`) is determined by `tol` which specifies the convergence criterion of the underlying solver. `tol` is set as `atol` and `btol` of :func:`scipy.sparse.linalg.lsqr` when fitting on sparse training data. `tol` is set as `cond` of :func:`scipy.linalg.lstsq` when fitting on dense training data. .. versionadded:: 1.7 .. versionchanged:: 1.9 Now supported on dense data, interpreted as the `cond` parameter.	1e-06
	n_jobs n_jobs: int, default=None The number of jobs to use for the computation. This will only provide speedup in case of sufficiently large problems, that is if firstly `n_targets > 1` and secondly `X` is sparse or if `positive` is set to `True`. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.	None
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays. For a comparison between a linear regression model with positive constraints on the regression coefficients and a linear regression without such constraints, see :ref:`sphx_glr_auto_examples_linear_model_plot_nnls.py`. .. versionadded:: 0.24	False

Fitted attributes

Name	Type	Value
coef_ coef_: array of shape (n_features, ) or (n_targets, n_features) Estimated coefficients for the linear regression problem. If multiple targets are passed during the fit (y 2D), this is a 2D array of shape (n_targets, n_features), while if only one target is passed, this is a 1D array of length n_features.	ndarray[float64](11,)	[ 0. , -0.23, 39.43,..., 207.67,-592.22, 235.49]
intercept_ intercept_: float or array of shape (n_targets,) Independent term in the linear model. Set to 0.0 if `fit_intercept = False`.	float64	0.1327
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	11
rank_ rank_: int Rank of matrix `X`. Only available when `X` is dense.	int	8
singular_ singular_: array of shape (min(X, y),) Singular values of `X`. Only available when `X` is dense.	ndarray[float64](11,)	[12.23, 3.1 , 0.82,..., 0. , 0. , 0. ]

# ¿cuál es el score sobre los datos de entrenamiento?
model_10.score(x_train, y_train)

0.975489947964474

# ¿cuál es el score sobre los datos de test?
model_10.score(x_test, y_test)

0.2709806393273839

model_3.named_steps['model'].coef_

array([ 0.        ,  2.68944506, -8.69362895,  5.46899018])

model_10.named_steps['model'].coef_

array([ 0.00000000e+00, -2.33114139e-01,  3.94319098e+01, -2.54326370e+02,
        6.12383498e+02, -5.38141998e+02, -2.15129994e+02,  5.04521036e+02,
        2.07668725e+02, -5.92221950e+02,  2.35487640e+02])

# Grafica los resultados de train y test, así como el polinomio
plt.plot(x_train[:, 0], y_train, 'ob', label='Datos de entrenamiento')
plt.plot(x_test[:, 0], y_test, 'og', label='Datos de prueba')
x_model = np.linspace(0, 1, 100)
y_model = model_10.predict(x_model[:, None])
plt.plot(x_model, y_model, '-r', label='Modelo')
plt.plot(x_test[:, 0], model_10.predict(x_test), '*r', label='Predicciones')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()

../_images/f0fa363c11f12562e23a9e788703034c5e6705f0058ef94a93de5e8111dfce87.png

Overfitting#

Hasta ahora, hemos definido la función de error como

\[ \textcolor{#960096}{E(\mathbf{w}) \;=\; \tfrac{1}{2}\|\Phi \mathbf{w} - \mathbf{y}\|^2}, \]

donde la norma euclidiana mide el tamaño del vector de errores \(\mathbf{e} = \Phi \mathbf{w} - \mathbf{y}\).

👉 Sin embargo, aún no hemos puesto ninguna restricción sobre los parámetros \(\mathbf{w}\).
Si los pesos crecen demasiado (por ejemplo, al intentar ajustar exactamente todos los puntos de entrenamiento), podemos caer en overfitting: el modelo memoriza el ruido en lugar de generalizar.

Para evitarlo, añadimos un nuevo término a la función objetivo que ahora también mida la magnitud de los parámetros. Así obtenemos la regresión Ridge:

\[ E_{\text{ridge}}(\mathbf{w}) = \textcolor{#960096}{\tfrac{1}{2}\|\Phi \mathbf{w} - \mathbf{y}\|^2} + \textcolor{#164ec6}{\tfrac{\lambda}{2}\|\mathbf{w}\|^2}, \]

donde el hiperparámetro \(\lambda \geq 0\) controla el equilibrio entre:

Buen ajuste a los datos (primer término).
Mantener los parámetros pequeños (segundo término).
Si \(\lambda = 0\), recuperamos la regresión normal.
Si \(\lambda\) es grande, los pesos se reducen mucho y el modelo se vuelve más simple.

¿qué solución óptima nos ofrece la regresión Ridge?

from sklearn.linear_model import Ridge, Lasso

# Define un pipeline, incluyendo ahora Ridge
model_10_ridge = Pipeline([
    ('features', PolynomialFeatures(degree=10)),
    ('scaler', StandardScaler()),
    ('model', Ridge(alpha=1e-3))
])

# Ajusta el modelo
model_10_ridge.fit(x_train, y_train)

Pipeline(steps=[('features', PolynomialFeatures(degree=10)),
                ('scaler', StandardScaler()), ('model', Ridge(alpha=0.001))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

Parameters

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators <combining_estimators>` for more details.	[('features', ...), ('scaler', ...), ...]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing <metadata_routing>`. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. Only defined if the underlying first estimator in `steps` exposes such an attribute when fit. .. versionadded:: 0.24	int	1

PolynomialFeatures

Parameters

	degree degree: int or tuple (min_degree, max_degree), default=2 If a single int is given, it specifies the maximal degree of the polynomial features. If a tuple `(min_degree, max_degree)` is passed, then `min_degree` is the minimum and `max_degree` is the maximum polynomial degree of the generated features. Note that `min_degree=0` and `min_degree=1` are equivalent as outputting the degree zero term is determined by `include_bias`.	10
	interaction_only interaction_only: bool, default=False If `True`, only interaction features are produced: features that are products of at most `degree` distinct input features, i.e. terms with power of 2 or higher of the same input feature are excluded: - included: `x[0]`, `x[1]`, `x[0] * x[1]`, etc. - excluded: `x[0] 2`, `x[0] 2 * x[1]`, etc.	False
	include_bias include_bias: bool, default=True If `True` (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).	True
	order order: {'C', 'F'}, default='C' Order of output array in the dense case. `'F'` order is faster to compute, but may slow down subsequent estimators. .. versionadded:: 0.21	'C'

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	1
n_output_features_ n_output_features_: int The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.	int	11
powers_ powers_: ndarray of shape (`n_output_features_`, `n_features_in_`) `powers_[i, j]` is the exponent of the jth input in the ith output.	ndarray[int64](11, 1)	[[ 0], [ 1], [ 2], ..., [ 8], [ 9], [10]]

11 features

1

x0

x0^2

x0^3

x0^4

x0^5

x0^6

x0^7

x0^8

x0^9

x0^10

StandardScaler

Parameters

	copy copy: bool, default=True If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.	True
	with_mean with_mean: bool, default=True If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.	True
	with_std with_std: bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).	True

Fitted attributes

Name	Type	Value
mean_ mean_: ndarray of shape (n_features,) or None The mean value for each feature in the training set. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](11,)	[1. ,0.49,0.33,...,0.11,0.09,0.08]
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	11
n_samples_seen_ n_samples_seen_: int or ndarray of shape (n_features,) The number of samples processed by the estimator for each feature. If there are no missing samples, the ``n_samples_seen`` will be an integer, otherwise it will be an array of dtype int. If `sample_weights` are used it will be a float (if no missing data) or an array of dtype float that sums the weights seen so far. Will be reset on new calls to fit, but increments across ``partial_fit`` calls.	float64	16
scale_ scale_: ndarray of shape (n_features,) or None Per feature relative scaling of the data to achieve zero mean and unit variance. Generally this is calculated using `np.sqrt(var_)`. If a variance is zero, we can't achieve unit variance, and the data is left as-is, giving a scaling factor of 1. `scale_` is equal to `None` when `with_std=False`. .. versionadded:: 0.17 scale_	ndarray[float64](11,)	[1. ,0.3 ,0.3 ,...,0.19,0.17,0.16]
var_ var_: ndarray of shape (n_features,) or None The variance for each feature in the training set. Used to compute `scale_`. Equal to ``None`` when ``with_mean=False`` and ``with_std=False``.	ndarray[float64](11,)	[0. ,0.09,0.09,...,0.03,0.03,0.03]

11 features

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

Ridge

?Documentation for Ridge

Parameters

	alpha alpha: float or array-like of shape (n_targets,), default=1.0 Constant that multiplies the L2 term, controlling regularization strength. `alpha` must be a non-negative float i.e. in `[0, inf)`. When `alpha = 0`, the objective is equivalent to ordinary least squares, solved by the :class:`LinearRegression` object. For numerical reasons, using `alpha = 0` with the `Ridge` object is not advised. Instead, you should use the :class:`LinearRegression` object. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number. See :ref:`sphx_glr_auto_examples_linear_model_plot_ridge_coeffs.py` for an illustration of the effect of alpha on the model coefficients.	0.001
	fit_intercept fit_intercept: bool, default=True Whether to fit the intercept for this model. If set to false, no intercept will be used in calculations (i.e. ``X`` and ``y`` are expected to be centered).	True
	copy_X copy_X: bool, default=True If True, X will be copied; else, it may be overwritten.	True
	max_iter max_iter: int, default=None Maximum number of iterations for conjugate gradient solver. For 'sparse_cg' and 'lsqr' solvers, the default value is determined by scipy.sparse.linalg. For 'sag' solver, the default value is 1000. For 'lbfgs' solver, the default value is 15000.	None
	tol tol: float, default=1e-4 The precision of the solution (`coef_`) is determined by `tol` which specifies a different convergence criterion for each solver: - 'svd': `tol` has no impact. - 'cholesky': `tol` has no impact. - 'sparse_cg': norm of residuals smaller than `tol`. - 'lsqr': `tol` is set as atol and btol of scipy.sparse.linalg.lsqr, which control the norm of the residual vector in terms of the norms of matrix and coefficients. - 'sag' and 'saga': relative change of coef smaller than `tol`. - 'lbfgs': maximum of the absolute (projected) gradient=max\|residuals\| smaller than `tol`. .. versionchanged:: 1.2 Default value changed from 1e-3 to 1e-4 for consistency with other linear models.	0.0001
	solver solver: {'auto', 'svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'}, default='auto' Solver to use in the computational routines: - 'auto' chooses the solver automatically based on the type of data. - 'svd' uses a Singular Value Decomposition of X to compute the Ridge coefficients. It is the most stable solver, in particular more stable for singular matrices than 'cholesky' at the cost of being slower. - 'cholesky' uses the standard :func:`scipy.linalg.solve` function to obtain a closed-form solution. - 'sparse_cg' uses the conjugate gradient solver as found in :func:`scipy.sparse.linalg.cg`. As an iterative algorithm, this solver is more appropriate than 'cholesky' for large-scale data (possibility to set `tol` and `max_iter`). - 'lsqr' uses the dedicated regularized least-squares routine :func:`scipy.sparse.linalg.lsqr`. It is the fastest and uses an iterative procedure. - 'sag' uses a Stochastic Average Gradient descent, and 'saga' uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from :mod:`sklearn.preprocessing`. - 'lbfgs' uses L-BFGS-B algorithm implemented in :func:`scipy.optimize.minimize`. It can be used only when `positive` is True. All solvers except 'svd' support both dense and sparse data. However, only 'lsqr', 'sag', 'sparse_cg', and 'lbfgs' support sparse input when `fit_intercept` is True. .. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver.	'auto'
	positive positive: bool, default=False When set to ``True``, forces the coefficients to be positive. Only 'lbfgs' solver is supported in this case.	False
	random_state random_state: int, RandomState instance, default=None Used when ``solver`` == 'sag' or 'saga' to shuffle the data. See :term:`Glossary <random_state>` for details. .. versionadded:: 0.17 `random_state` to support Stochastic Average Gradient.	None

Fitted attributes

Name	Type	Value
coef_ coef_: ndarray of shape (n_features,) or (n_targets, n_features) Weight vector(s).	ndarray[float64](11,)	[ 0. , 2.12,-5.03,...,-0.24,-0.51,-0.51]
intercept_ intercept_: float or ndarray of shape (n_targets,) Independent term in decision function. Set to 0.0 if ``fit_intercept = False``.	float64	0.1327
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	11
n_iter_ n_iter_: None or ndarray of shape (n_targets,) Actual number of iterations for each target. Available only for 'sag' and 'lsqr' solvers. Other solvers will return None. .. versionadded:: 0.17	NoneType	None
solver_ solver_: str The solver that was used at fit time by the computational routines. .. versionadded:: 1.5	str	'ch...ky'

# ¿cuál es el score sobre los datos de entrenamiento?
model_10_ridge.score(x_train, y_train)

0.9728003574124893

# ¿cuál es el score sobre los datos de test?
model_10_ridge.score(x_test, y_test)

0.8762430489487874

model_10.named_steps['model'].coef_

array([ 0.00000000e+00, -2.33114139e-01,  3.94319098e+01, -2.54326370e+02,
        6.12383498e+02, -5.38141998e+02, -2.15129994e+02,  5.04521036e+02,
        2.07668725e+02, -5.92221950e+02,  2.35487640e+02])

# Observa los coeficientes
model_10_ridge.named_steps['model'].coef_

array([ 0.        ,  2.12074277, -5.03426724, -0.72828463,  1.42179304,
        1.60445406,  1.00996397,  0.30152153, -0.23913149, -0.51434418,
       -0.50884092])

# Grafica los resultados de train y test, así como el polinomio
plt.plot(x_train[:, 0], y_train, 'ob', label='Datos de entrenamiento')
plt.plot(x_test[:, 0], y_test, 'og', label='Datos de prueba')
x_model = np.linspace(0, 1, 100)
y_model = model_10_ridge.predict(x_model[:, None])
plt.plot(x_model, y_model, '-r', label='Modelo')
plt.plot(x_test[:, 0], model_10_ridge.predict(x_test), '*r', label='Predicciones')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()

../_images/e1af3bba404c075550a11f15765a3a1cde9c0d3bd04c7ee2fd5034dc1e40d915.png

¿qué pasa si incrementamos la cantidad de datos de entrenamiento?

# Genera N=201
N = 201
x = np.linspace(0, 1, N)
y = np.sin(2 * np.pi * x)  + np.random.normal(0, 0.2, N)

# Grafica la nube de puntos
plt.plot(x, y, 'o', label='Datos', color='purple')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid()
plt.show()

../_images/bf0d136388ce9d0cbe087b5389e321c626464512f8d07be9d34be56ab01817d6.png

# Separa en train/test
x_train, x_test, y_train, y_test = train_test_split(x[:, None], y, test_size=0.2, random_state=0)

# 
model_10_ridge.fit(x_train, y_train)

Pipeline(steps=[('features', PolynomialFeatures(degree=10)),
                ('scaler', StandardScaler()), ('model', Ridge(alpha=0.001))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

Parameters

	steps steps: list of tuples List of (name of step, estimator) tuples that are to be chained in sequential order. To be compatible with the scikit-learn API, all steps must define `fit`. All non-last steps must also define `transform`. See :ref:`Combining Estimators <combining_estimators>` for more details.	[('features', ...), ('scaler', ...), ...]
	transform_input transform_input: list of str, default=None The names of the :term:`metadata` parameters that should be transformed by the pipeline before passing it to the step consuming it. This enables transforming some input arguments to ``fit`` (other than ``X``) to be transformed by the steps of the pipeline up to the step which requires them. Requirement is defined via :ref:`metadata routing <metadata_routing>`. For instance, this can be used to pass a validation set through the pipeline. You can only set this if metadata routing is enabled, which you can enable using ``sklearn.set_config(enable_metadata_routing=True)``. .. versionadded:: 1.6	None
	memory memory: str or object with the joblib.Memory interface, default=None Used to cache the fitted transformers of the pipeline. The last step will never be cached, even if it is a transformer. By default, no caching is performed. If a string is given, it is the path to the caching directory. Enabling caching triggers a clone of the transformers before fitting. Therefore, the transformer instance given to the pipeline cannot be inspected directly. Use the attribute ``named_steps`` or ``steps`` to inspect estimators within the pipeline. Caching the transformers is advantageous when fitting is time consuming. See :ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py` for an example on how to enable caching.	None
	verbose verbose: bool, default=False If True, the time elapsed while fitting each step will be printed as it is completed.	False

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. Only defined if the underlying first estimator in `steps` exposes such an attribute when fit. .. versionadded:: 0.24	int	1

PolynomialFeatures

Parameters

	degree degree: int or tuple (min_degree, max_degree), default=2 If a single int is given, it specifies the maximal degree of the polynomial features. If a tuple `(min_degree, max_degree)` is passed, then `min_degree` is the minimum and `max_degree` is the maximum polynomial degree of the generated features. Note that `min_degree=0` and `min_degree=1` are equivalent as outputting the degree zero term is determined by `include_bias`.	10
	interaction_only interaction_only: bool, default=False If `True`, only interaction features are produced: features that are products of at most `degree` distinct input features, i.e. terms with power of 2 or higher of the same input feature are excluded: - included: `x[0]`, `x[1]`, `x[0] * x[1]`, etc. - excluded: `x[0] 2`, `x[0] 2 * x[1]`, etc.	False
	include_bias include_bias: bool, default=True If `True` (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).	True
	order order: {'C', 'F'}, default='C' Order of output array in the dense case. `'F'` order is faster to compute, but may slow down subsequent estimators. .. versionadded:: 0.21	'C'

Fitted attributes

Name	Type	Value
n_features_in_ n_features_in_: int Number of features seen during :term:`fit`. .. versionadded:: 0.24	int	1
n_output_features_ n_output_features_: int The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.	int	11
powers_ powers_: ndarray of shape (`n_output_features_`, `n_features_in_`) `powers_[i, j]` is the exponent of the jth input in the ith output.	ndarray[int64](11, 1)	[[ 0], [ 1], [ 2], ..., [ 8], [ 9], [10]]

11 features

1

x0

x0^2

x0^3

x0^4

x0^5

x0^6

x0^7

x0^8

x0^9

x0^10

StandardScaler