bridgescaler.distributed_tensor
===============================

.. py:module:: bridgescaler.distributed_tensor


Attributes
----------

.. autoapisummary::

   bridgescaler.distributed_tensor.CENTROID_DTYPE


Classes
-------

.. autoapisummary::

   bridgescaler.distributed_tensor.DBaseScalerTensor
   bridgescaler.distributed_tensor.DStandardScalerTensor
   bridgescaler.distributed_tensor.DMinMaxScalerTensor
   bridgescaler.distributed_tensor.DQuantileScalerTensor


Functions
---------

.. autoapisummary::

   bridgescaler.distributed_tensor.fit_variable_tensor
   bridgescaler.distributed_tensor.transform_variable_tensor
   bridgescaler.distributed_tensor.inv_transform_variable_tensor
   bridgescaler.distributed_tensor.tdigest_cdf_tensor
   bridgescaler.distributed_tensor.tdigest_quantile_tensor


Module Contents
---------------

.. py:data:: CENTROID_DTYPE

.. py:class:: DBaseScalerTensor(channels_last=True)

   Base distributed scaler class for torch.Tensor. Used only to store attributes and methods
   shared across all distributed scaler subclasses.


   .. py:attribute:: x_columns_
      :value: None


   .. py:attribute:: _fit
      :value: False


   .. py:attribute:: channels_last
      :value: True


   .. py:method:: is_fit()


   .. py:method:: extract_x_columns(x, channels_last=True)

      Extract the variable names from input x.

      The variable names are expected to be stored in the `variable_names`
      attribute of the torch.Tensor. If the attribute is missing, a warning is
      issued to notify the user that alignment validation will be limited.

      :param x: The input tensor containing data and optionally the
                `variable_names` attribute.
      :type x: torch.Tensor
      :param channels_last: If True, then assume the variable or channel dimension
                            is the last dimension of the array. If False, then assume the variable or channel
                            dimension is second.
      :type channels_last: bool

      :returns:

                Variable names if available; otherwise,
                    integer indices generated based on the length of the variable/channel dimension.
      :rtype: x_columns (list[str] | list[int])

      :raises TypeError: If `x` is not a torch.Tensor or if `variable_names`
          is not a list.
      :raises ValueError: If `variable_names` contains duplicate entries.


   .. py:method:: extract_array(x)
      :staticmethod:


   .. py:method:: get_column_order(x_in_columns)

      Get the indices of the scaler columns that have the same name as the variables (columns) in the input x tensor. This
      enables users to pass a torch.Tensor to transform or inverse_transform with fewer variables than
      the original scaler or variables in a different order and still have the input dataset be transformed properly.

      :param x_in_columns: list of input variable names.
      :type x_in_columns: list

      :returns: integer indices of the input variables from x in the scaler in order.
      :rtype: x_in_col_indices (list)


   .. py:method:: package_transformed_x(x_transformed, x)
      :staticmethod:


      Repackaged a transformed torch.Tensor into the same datatype as the original x, including
      all metadata.

      :param x_transformed: array after being transformed or inverse transformed
      :type x_transformed: torch.Tensor
      :param x: original data
      :type x: torch.Tensor

      Returns:


   .. py:method:: set_channel_dim(channels_last=None)


   .. py:method:: process_x_for_transform(x, channels_last=None)


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)


   .. py:method:: fit_transform(x, channels_last=None, weight=None)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: __add__(other)


   .. py:method:: subset_columns(sel_columns)


   .. py:method:: add_variables(other)


   .. py:method:: reshape_to_channels_first(stat, target)
      :staticmethod:


      Reshapes 'stat' to align with the channel dimension (index 1).


   .. py:method:: reshape_to_channels_last(stat, target)
      :staticmethod:


      Reshapes 'stat' to align with the last dimension.


.. py:class:: DStandardScalerTensor(channels_last=True)

   Bases: :py:obj:`DBaseScalerTensor`


   Distributed version of StandardScaler. You can calculate this map-reduce style by running it on individual
   data files, returning the fitted objects, and then summing them together to represent the full dataset. Scaler
   supports torch.Tensor and returns a transformed tensor.


   .. py:attribute:: mean_x_
      :value: None


   .. py:attribute:: n_
      :value: 0


   .. py:attribute:: var_x_
      :value: None


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)

      Transform the input data from its original form to standard scaled form. If your input data has a
      different dimension order than the data used to fit the scaler, use the channels_last keyword argument
      to specify whether the new data are `channels_last` (True) or `channels_first` (False).

      :param x: Input data.
      :type x: torch.Tensor
      :param channels_last: Override the default channels_last parameter of the scaler.

      :returns: Transformed data in the same shape and type as x.
      :rtype: x_transformed (torch.Tensor)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: get_scales(x_col_order=slice(None))


   .. py:method:: __add__(other)


.. py:class:: DMinMaxScalerTensor(channels_last=True)

   Bases: :py:obj:`DBaseScalerTensor`


   Distributed MinMaxScaler enables calculation of min and max of variables in datasets in parallel, then combining
   the mins and maxes as a reduction step. Scaler supports torch.Tensor and will return a transformed tensor in the
   same form as the original with variable/column names preserved.


   .. py:attribute:: max_x_
      :value: None


   .. py:attribute:: min_x_
      :value: None


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: get_scales(x_col_order=slice(None))


   .. py:method:: __add__(other)


.. py:function:: fit_variable_tensor(var_index, xv, compression=None, channels_last=None)

.. py:function:: transform_variable_tensor(cent_mean, cent_weight, t_min, t_max, xv, min_val=1e-06, max_val=0.9999999, distribution='normal')

.. py:function:: inv_transform_variable_tensor(cent_mean, cent_weight, t_min, t_max, xv, distribution='normal')

.. py:function:: tdigest_cdf_tensor(xv, cent_mean, cent_weight, t_min, t_max)

.. py:function:: tdigest_quantile_tensor(qv, cent_mean, cent_weight, t_min, t_max)

.. py:class:: DQuantileScalerTensor(compression=250, distribution='uniform', min_val=1e-07, max_val=0.9999999, channels_last=True)

   Bases: :py:obj:`DBaseScalerTensor`


   Distributed Quantile Scaler for tensors that uses the crick TDigest Cython library to compute quantiles across multiple
   datasets in parallel. The library can perform fitting, transforms, and inverse transforms.

   DQuantileScaler supports

   .. attribute:: compression

      Recommended number of centroids to use.

   .. attribute:: distribution

      "uniform", "normal", or "logistic".

   .. attribute:: min_val

      Minimum value for quantile to prevent -inf results when distribution is normal or logistic.

   .. attribute:: max_val

      Maximum value for quantile to prevent inf results when distribution is normal or logistic.

   .. attribute:: channels_last

      Whether to assume the last dim or second dim are the channel/variable dimension.


   .. py:attribute:: compression
      :value: 250


   .. py:attribute:: distribution
      :value: 'uniform'


   .. py:attribute:: min_val
      :value: 1e-07


   .. py:attribute:: max_val
      :value: 0.9999999


   .. py:attribute:: centroids_
      :value: None


   .. py:attribute:: size_
      :value: None


   .. py:attribute:: min_
      :value: None


   .. py:attribute:: max_
      :value: None


   .. py:attribute:: centroids_mean_tensor
      :value: None


   .. py:attribute:: centroids_weight_tensor
      :value: None


   .. py:attribute:: min_tensor
      :value: None


   .. py:attribute:: max_tensor
      :value: None


   .. py:method:: td_objs_to_attributes(td_objs)


   .. py:method:: attributes_to_td_objs()


   .. py:method:: tensorize_attributes()


   .. py:method:: fit(x, weight=None)


   .. py:method:: transform(x, channels_last=None)


   .. py:method:: fit_transform(x, channels_last=None, weight=None)


   .. py:method:: inverse_transform(x, channels_last=None)


   .. py:method:: __add__(other)