bridgescaler.distributed
========================

.. py:module:: bridgescaler.distributed


Attributes
----------

.. autoapisummary::

   bridgescaler.distributed.CENTROID_DTYPE


Classes
-------

.. autoapisummary::

   bridgescaler.distributed.DBaseScaler
   bridgescaler.distributed.DStandardScaler
   bridgescaler.distributed.DMinMaxScaler
   bridgescaler.distributed.DQuantileScaler


Functions
---------

.. autoapisummary::

   bridgescaler.distributed.fit_variable
   bridgescaler.distributed.transform_variable
   bridgescaler.distributed.inv_transform_variable
   bridgescaler.distributed.tdigest_cdf
   bridgescaler.distributed.tdigest_quantile


Module Contents
---------------

.. py:data:: CENTROID_DTYPE

.. py:class:: DBaseScaler(channels_last=True)

   Bases: :py:obj:`object`


   Base distributed scaler class. Used only to store attributes and methods shared across all distributed
   scaler subclasses.


   .. py:attribute:: x_columns_
      :value: None


   .. py:attribute:: is_array_
      :value: False


   .. py:attribute:: _fit
      :value: False


   .. py:attribute:: channels_last
      :value: True


   .. py:method:: is_fit()


   .. py:method:: extract_x_columns(x, channels_last=True)
      :staticmethod:


      Extract the variable names to be transformed from x depending on if x is a pandas DataFrame, an
      xarray DataArray, or a numpy array. All of these assume that the columns are in the last dimension.
      If x is an xarray DataArray, there should be a coorindate variable with the same name as the last dimension
      of the DataArray being transformed.

      :param x: array of values to be transformed.
      :type x: Union[pandas.DataFrame, xarray.DataArray, numpy.ndarray]
      :param channels_last: If True, then assume the variable or channel dimension is the last dimension of the
                            array. If False, then assume the variable or channel dimension is second.
      :type channels_last: bool

      :returns: Array of values to be transformed.
                is_array (bool): Whether or not x was a np.ndarray.
      :rtype: xv (numpy.ndarray)


   .. py:method:: extract_array(x)
      :staticmethod:


   .. py:method:: get_column_order(x_in_columns)

      Get the indices of the scaler columns that have the same name as the columns in the input x array. This
      enables users to pass a DataFrame or DataArray to transform or inverse_transform with fewer columns than
      the original scaler or columns in a different order and still have the input dataset be transformed properly.

      :param x_in_columns: list of input columns.
      :type x_in_columns: Union[list, numpy.ndarray]

      :returns: indices of the input columns from x in the scaler in order.
      :rtype: x_in_col_indices (np.ndarray)


   .. py:method:: package_transformed_x(x_transformed, x)
      :staticmethod:


      Repackaged a transformed numpy array into the same datatype as the original x, including
      all metadata.

      :param x_transformed: array after being transformed or inverse transformed
      :type x_transformed: numpy.ndarray
      :param x:
      :type x: Union[pandas.DataFrame, xarray.DataArray, numpy.ndarray]

      Returns:


   .. py:method:: set_channel_dim(channels_last=None)


   .. py:method:: process_x_for_transform(x, channels_last=None)


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)


   .. py:method:: fit_transform(x, channels_last=None, weight=None)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: __add__(other)


   .. py:method:: subset_columns(sel_columns)


   .. py:method:: add_variables(other)


.. py:class:: DStandardScaler(channels_last=True)

   Bases: :py:obj:`DBaseScaler`


   Distributed version of StandardScaler. You can calculate this map-reduce style by running it on individual
   data files, return the fitted objects, and then sum them together to represent the full dataset. Scaler
   supports numpy arrays, pandas dataframes, and xarray DataArrays and will return a transformed array in the
   same form as the original with column or coordinate names preserved.


   .. py:attribute:: mean_x_
      :value: None


   .. py:attribute:: n_
      :value: 0


   .. py:attribute:: var_x_
      :value: None


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)

      Transform the input data from its original form to standard scaled form. If your input data has a
      different dimension order than the data used to fit the scaler, use the channels_last keyword argument
      to specify whether the new data are `channels_last` (True) or `channels_first` (False).

      :param x: Input data.
      :param channels_last: Override the default channels_last parameter of the scaler.

      :returns: Transformed data in the same shape and type as x.
      :rtype: x_transformed


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: get_scales()


   .. py:method:: __add__(other)


.. py:class:: DMinMaxScaler(channels_last=True)

   Bases: :py:obj:`DBaseScaler`


   Distributed MinMaxScaler enables calculation of min and max of variables in datasets in parallel then combining
   the mins and maxes as a reduction step. Scaler
   supports numpy arrays, pandas dataframes, and xarray DataArrays and will return a transformed array in the
   same form as the original with column or coordinate names preserved.


   .. py:attribute:: max_x_
      :value: None


   .. py:attribute:: min_x_
      :value: None


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: get_scales()


   .. py:method:: __add__(other)


.. py:function:: fit_variable(var_index, xv_shared=None, compression=None, channels_last=None)

.. py:function:: transform_variable(td_obj, xv, min_val=1e-06, max_val=0.9999999, distribution='normal')

.. py:function:: inv_transform_variable(td_obj, xv, distribution='normal')

.. py:function:: tdigest_cdf(xv, cent_mean, cent_weight, t_min, t_max, out)

.. py:function:: tdigest_quantile(qv, cent_mean, cent_weight, t_min, t_max, out)

.. py:class:: DQuantileScaler(compression=250, distribution='uniform', min_val=1e-07, max_val=0.9999999, channels_last=True)

   Bases: :py:obj:`DBaseScaler`


   Distributed Quantile Scaler that uses the crick TDigest Cython library to compute quantiles across multiple
   datasets in parallel. The library can perform fitting, transforms, and inverse transforms across variables
   in parallel using the multiprocessing library. Multidimensional arrays are stored in shared memory across
   processes to minimize inter-process communication.

   DQuantileScaler supports

   .. attribute:: compression

      Recommended number of centroids to use.

   .. attribute:: distribution

      "uniform", "normal", or "logistic".

   .. attribute:: min_val

      Minimum value for quantile to prevent -inf results when distribution is normal or logistic.

   .. attribute:: max_val

      Maximum value for quantile to prevent inf results when distribution is normal or logistic.

   .. attribute:: channels_last

      Whether to assume the last dim or second dim are the channel/variable dimension.


   .. py:attribute:: compression
      :value: 250


   .. py:attribute:: distribution
      :value: 'uniform'


   .. py:attribute:: min_val
      :value: 1e-07


   .. py:attribute:: max_val
      :value: 0.9999999


   .. py:attribute:: centroids_
      :value: None


   .. py:attribute:: size_
      :value: None


   .. py:attribute:: min_
      :value: None


   .. py:attribute:: max_
      :value: None


   .. py:method:: td_objs_to_attributes(td_objs)


   .. py:method:: attributes_to_td_objs()


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None, pool=None)


   .. py:method:: fit_transform(x, channels_last=None, weight=None, pool=None)


   .. py:method:: inverse_transform(x, channels_last=None, pool=None)


   .. py:method:: __add__(other)