Sisyphe.core.sisypheSheet

External packages/modules

class Sisyphe.core.sisypheSheet.SisypheSheet(*args, **kwargs)

Description

Class derived from pandas DataFrame class. Adds IO methods (xml, matfile) to pandas DataFrame class.

Inheritance

DataFrame -> SisypheSheet

Creation: 10/04/2023 Last revision: 13/12/2024

createXML(doc: Document, currentnode: Element) None

Write the current SisypheSheet instance attributes to xml document instance.

Parameters

docminidom.Document

xml document

currentnodeminidom.Element

xml root node

classmethod getFileExt() str

Get SisypheSheet file extension.

Returns

str

‘.xsheet’

classmethod getFilterExt() str

Get SisypheSheet filter used by QFileDialog.getOpenFileName() and QFileDialog.getSaveFileName().

Returns

str

‘PySisyphe Sheet (.xsheet)’

classmethod load(filename: str) SisypheSheet

Create a SisypheSheet instance from a PySisyphe Sheet file (.xsheet).

Parameters

filenamestr

PySisyphe Sheet file name (.xsheet)

Returns

SisypheSheet

loaded sheet

classmethod loadCSV(filename: str) SisypheSheet

Create a SisypheSheet instance from a CSV file (.csv).

Parameters

filenamestr

CSV file name (.csv)

Returns

SisypheSheet

loaded sheet

classmethod loadJSON(filename: str) SisypheSheet

Create a SisypheSheet instance from a Json file (.json).

Parameters

filenamestr

Json file name (.json)

Returns

SisypheSheet

loaded sheet

classmethod loadTXT(filename: str, sep: str = '|') SisypheSheet

Create a SisypheSheet instance from a text file (.txt).

Parameters

filenamestr

text file name (.txt)

sepstr

optional (default ‘|’), char separator between values

Returns

SisypheSheet

loaded sheet

classmethod loadXLSX(filename: str) SisypheSheet

Create a SisypheSheet instance from an Excel file (.xlsx).

Parameters

filenamestr

Excel file name (.xlsx)

Returns

SisypheSheet

loaded sheet

classmethod parseXML(doc: Document) SisypheSheet

Create a SisypheSheet instance from xml document.

Parameters

docminidom.Document

xml document

Returns

SisypheSheet

new sheet

classmethod parseXMLNode(currentnode: Element) SisypheSheet

Create a SisypheSheet instance from xml node.

Parameters

currentnodeminidom.Element

xml node

Returns

SisypheSheet

new sheet

save(filename: str) None

Save the current SisypheSheet instance to a PySisyphe Sheet file (.xsheet).

Parameters

filenamestr

PySisyphe Sheet file name (.xsheet)

saveCSV(filename: str) None

Save the current SisypheSheet instance to a CSV file (.csv).

Parameters

filenamestr

CSV file name (.csv)

saveJSON(path_or_buf: FilePath | WriteBuffer[bytes] | WriteBuffer[str] | None = None, *, orient: Literal['split', 'records', 'index', 'table', 'columns', 'values'] | None = None, date_format: str | None = None, double_precision: int = 10, force_ascii: bool_t = True, date_unit: TimeUnit = 'ms', default_handler: Callable[[Any], JSONSerializable] | None = None, lines: bool_t = False, compression: CompressionOptions = 'infer', index: bool_t | None = None, indent: int | None = None, storage_options: StorageOptions | None = None, mode: Literal['a', 'w'] = 'w') str | None

Convert the object to a JSON string.

Note NaN’s and None will be converted to null and datetime objects will be converted to UNIX timestamps.

Parameters

path_or_bufstr, path object, file-like object, or None, default None

String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string.

orientstr

Indication of expected JSON string format.

  • Series:

    • default is ‘index’

    • allowed values are: {‘split’, ‘records’, ‘index’, ‘table’}.

  • DataFrame:

    • default is ‘columns’

    • allowed values are: {‘split’, ‘records’, ‘index’, ‘columns’, ‘values’, ‘table’}.

  • The format of the JSON string:

    • ‘split’ : dict like {‘index’ -> [index], ‘columns’ -> [columns], ‘data’ -> [values]}

    • ‘records’ : list like [{column -> value}, … , {column -> value}]

    • ‘index’ : dict like {index -> {column -> value}}

    • ‘columns’ : dict like {column -> {index -> value}}

    • ‘values’ : just the values array

    • ‘table’ : dict like {‘schema’: {schema}, ‘data’: {data}}

    Describing the data, where data component is like orient='records'.

date_format{None, ‘epoch’, ‘iso’}

Type of date conversion. ‘epoch’ = epoch milliseconds, ‘iso’ = ISO8601. The default depends on the orient. For orient='table', the default is ‘iso’. For all other orients, the default is ‘epoch’.

double_precisionint, default 10

The number of decimal places to use when encoding floating point values. The possible maximal value is 15. Passing double_precision greater than 15 will raise a ValueError.

force_asciibool, default True

Force encoded string to be ASCII.

date_unitstr, default ‘ms’ (milliseconds)

The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’, ‘ns’ for second, millisecond, microsecond, and nanosecond respectively.

default_handlercallable, default None

Handler to call if object cannot otherwise be converted to a suitable format for JSON. Should receive a single argument which is the object to convert and return a serialisable object.

linesbool, default False

If ‘orient’ is ‘records’ write out line-delimited json format. Will throw ValueError if incorrect ‘orient’ since others are not list-like.

compressionstr or dict, default ‘infer’

For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to None for no compression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'xz', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdCompressor, lzma.LZMAFile or tarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive: compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.

Added in version 1.5.0: Added support for .tar files.

Changed in version 1.4.0: Zstandard support.

indexbool or None, default None

The index is only used when ‘orient’ is ‘split’, ‘index’, ‘column’, or ‘table’. Of these, ‘index’ and ‘column’ do not support index=False.

indentint, optional

Length of whitespace used to indent each record.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

modestr, default ‘w’ (writing)

Specify the IO mode for output when supplying a path_or_buf. Accepted args are ‘w’ (writing) and ‘a’ (append) only. mode=’a’ is only supported when lines is True and orient is ‘records’.

Returns

None or str

If path_or_buf is None, returns the resulting json format as a string. Otherwise returns None.

See Also

read_json : Convert a JSON string to pandas object.

Notes

The behavior of indent=0 varies from the stdlib, which does not indent the output but does insert newlines. Currently, indent=0 and the default indent=None are equivalent in pandas, though this may change in a future release.

orient='table' contains a ‘pandas_version’ field under ‘schema’. This stores the version of pandas used in the latest revision of the schema.

Examples

>>> from json import loads, dumps
>>> df = pd.DataFrame(
...     [["a", "b"], ["c", "d"]],
...     index=["row 1", "row 2"],
...     columns=["col 1", "col 2"],
... )
>>> result = df.to_json(orient="split")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
{
    "columns": [
        "col 1",
        "col 2"
    ],
    "index": [
        "row 1",
        "row 2"
    ],
    "data": [
        [
            "a",
            "b"
        ],
        [
            "c",
            "d"
        ]
    ]
}

Encoding/decoding a Dataframe using 'records' formatted JSON. Note that index labels are not preserved with this encoding.

>>> result = df.to_json(orient="records")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
[
    {
        "col 1": "a",
        "col 2": "b"
    },
    {
        "col 1": "c",
        "col 2": "d"
    }
]

Encoding/decoding a Dataframe using 'index' formatted JSON:

>>> result = df.to_json(orient="index")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
{
    "row 1": {
        "col 1": "a",
        "col 2": "b"
    },
    "row 2": {
        "col 1": "c",
        "col 2": "d"
    }
}

Encoding/decoding a Dataframe using 'columns' formatted JSON:

>>> result = df.to_json(orient="columns")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
{
    "col 1": {
        "row 1": "a",
        "row 2": "c"
    },
    "col 2": {
        "row 1": "b",
        "row 2": "d"
    }
}

Encoding/decoding a Dataframe using 'values' formatted JSON:

>>> result = df.to_json(orient="values")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
[
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

Encoding with Table Schema:

>>> result = df.to_json(orient="table")
>>> parsed = loads(result)
>>> dumps(parsed, indent=4)  
{
    "schema": {
        "fields": [
            {
                "name": "index",
                "type": "string"
            },
            {
                "name": "col 1",
                "type": "string"
            },
            {
                "name": "col 2",
                "type": "string"
            }
        ],
        "primaryKey": [
            "index"
        ],
        "pandas_version": "1.4.0"
    },
    "data": [
        {
            "index": "row 1",
            "col 1": "a",
            "col 2": "b"
        },
        {
            "index": "row 2",
            "col 1": "c",
            "col 2": "d"
        }
    ]
}
saveLATEX(buf: FilePath | WriteBuffer[str] | None = None, *, columns: Sequence[Hashable] | None = None, header: bool_t | SequenceNotStr[str] = True, index: bool_t = True, na_rep: str = 'NaN', formatters: FormattersType | None = None, float_format: FloatFormatType | None = None, sparsify: bool_t | None = None, index_names: bool_t = True, bold_rows: bool_t = False, column_format: str | None = None, longtable: bool_t | None = None, escape: bool_t | None = None, encoding: str | None = None, decimal: str = '.', multicolumn: bool_t | None = None, multicolumn_format: str | None = None, multirow: bool_t | None = None, caption: str | tuple[str, str] | None = None, label: str | None = None, position: str | None = None) str | None

Render object to a LaTeX tabular, longtable, or nested table.

Requires \usepackage{{booktabs}}. The output can be copy/pasted into a main LaTeX document or read from an external file with \input{{table.tex}}.

Changed in version 2.0.0: Refactored to use the Styler implementation via jinja2 templating.

Parameters

bufstr, Path or StringIO-like, optional, default None

Buffer to write to. If None, the output is returned as a string.

columnslist of label, optional

The subset of columns to write. Writes all columns by default.

headerbool or list of str, default True

Write out the column names. If a list of strings is given, it is assumed to be aliases for the column names.

indexbool, default True

Write row names (index).

na_repstr, default ‘NaN’

Missing data representation.

formatterslist of functions or dict of {{str: function}}, optional

Formatter functions to apply to columns’ elements by position or name. The result of each function must be a unicode string. List must be of length equal to the number of columns.

float_formatone-parameter function or str, optional, default None

Formatter for floating point numbers. For example float_format="%.2f" and float_format="{{:0.2f}}".format will both result in 0.1234 being formatted as 0.12.

sparsifybool, optional

Set to False for a DataFrame with a hierarchical index to print every multiindex key at each row. By default, the value will be read from the config module.

index_namesbool, default True

Prints the names of the indexes.

bold_rowsbool, default False

Make the row labels bold in the output.

column_formatstr, optional

The columns format as specified in LaTeX table format e.g. ‘rcl’ for 3 columns. By default, ‘l’ will be used for all columns except columns of numbers, which default to ‘r’.

longtablebool, optional

Use a longtable environment instead of tabular. Requires adding a usepackage{{longtable}} to your LaTeX preamble. By default, the value will be read from the pandas config module, and set to True if the option styler.latex.environment is “longtable”.

Changed in version 2.0.0: The pandas option affecting this argument has changed.

escapebool, optional

By default, the value will be read from the pandas config module and set to True if the option styler.format.escape is “latex”. When set to False prevents from escaping latex special characters in column names.

Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to False.

encodingstr, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’.

decimalstr, default ‘.’

Character recognized as decimal separator, e.g. ‘,’ in Europe.

multicolumnbool, default True

Use multicolumn to enhance MultiIndex columns. The default will be read from the config module, and is set as the option styler.sparse.columns.

Changed in version 2.0.0: The pandas option affecting this argument has changed.

multicolumn_formatstr, default ‘r’

The alignment for multicolumns, similar to column_format The default will be read from the config module, and is set as the option styler.latex.multicol_align.

Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to “r”.

multirowbool, default True

Use multirow to enhance MultiIndex rows. Requires adding a usepackage{{multirow}} to your LaTeX preamble. Will print centered labels (instead of top-aligned) across the contained rows, separating groups via clines. The default will be read from the pandas config module, and is set as the option styler.sparse.index.

Changed in version 2.0.0: The pandas option affecting this argument has changed, as has the default value to True.

captionstr or tuple, optional

Tuple (full_caption, short_caption), which results in \caption[short_caption]{{full_caption}}; if a single string is passed, no short caption will be set.

labelstr, optional

The LaTeX label to be placed inside \label{{}} in the output. This is used with \ref{{}} in the main .tex file.

positionstr, optional

The LaTeX positional argument for tables, to be placed after \begin{{}} in the output.

Returns

str or None

If buf is None, returns the result as a string. Otherwise returns None.

See Also

io.formats.style.Styler.to_latexRender a DataFrame to LaTeX

with conditional formatting.

DataFrame.to_stringRender a DataFrame to a console-friendly

tabular output.

DataFrame.to_html : Render a DataFrame as an HTML table.

Notes

As of v2.0.0 this method has changed to use the Styler implementation as part of Styler.to_latex() via jinja2 templating. This means that jinja2 is a requirement, and needs to be installed, for this method to function. It is advised that users switch to using Styler, since that implementation is more frequently updated and contains much more flexibility with the output.

Examples

Convert a general DataFrame to LaTeX with formatting:

>>> df = pd.DataFrame(dict(name=['Raphael', 'Donatello'],
...                        age=[26, 45],
...                        height=[181.23, 177.65]))
>>> print(df.to_latex(index=False,
...                   formatters={"name": str.upper},
...                   float_format="{:.1f}".format,
... ))  
\begin{tabular}{lrr}
\toprule
name & age & height \\
\midrule
RAPHAEL & 26 & 181.2 \\
DONATELLO & 45 & 177.7 \\
\bottomrule
\end{tabular}
saveMATFILE(filename: str) None

Save the current SisypheSheet instance to a Matlab file (.mat).

Parameters

filenamestr

Matlab file name (.mat)

saveNPY(filename: str) None

Save the current SisypheSheet instance to a Numpy binary file (.npy).

Parameters

filenamestr

Numpy binary file name (.npy)

saveNPZ(filename: str) None

Save the current SisypheSheet instance to a Numpy binary compressed file (.npz).

Parameters

filenamestr

Numpy binary compressed file name (.npz)

saveTXT(filename: str, sep: str = '|') None

Save the current SisypheSheet instance to a text file (.txt).

Parameters

filenamestr

text file name (.txt)

sepstr

optional (default ‘|’), char separator between values

saveXLSX(excel_writer: FilePath | WriteExcelBuffer | ExcelWriter, *, sheet_name: str = 'Sheet1', na_rep: str = '', float_format: str | None = None, columns: Sequence[Hashable] | None = None, header: Sequence[Hashable] | bool_t = True, index: bool_t = True, index_label: IndexLabel | None = None, startrow: int = 0, startcol: int = 0, engine: Literal['openpyxl', 'xlsxwriter'] | None = None, merge_cells: bool_t = True, inf_rep: str = 'inf', freeze_panes: tuple[int, int] | None = None, storage_options: StorageOptions | None = None, engine_kwargs: dict[str, Any] | None = None) None

Write object to an Excel sheet.

To write a single object to an Excel .xlsx file it is only necessary to specify a target file name. To write to multiple sheets it is necessary to create an ExcelWriter object with a target file name, and specify a sheet in the file to write to.

Multiple sheets may be written to by specifying unique sheet_name. With all data written to the file it is necessary to save the changes. Note that creating an ExcelWriter object with a file name that already exists will result in the contents of the existing file being erased.

Parameters

excel_writerpath-like, file-like, or ExcelWriter object

File path or existing ExcelWriter.

sheet_namestr, default ‘Sheet1’

Name of sheet which will contain DataFrame.

na_repstr, default ‘’

Missing data representation.

float_formatstr, optional

Format string for floating point numbers. For example float_format="%.2f" will format 0.1234 to 0.12.

columnssequence or list of str, optional

Columns to write.

headerbool or list of str, default True

Write out the column names. If a list of string is given it is assumed to be aliases for the column names.

indexbool, default True

Write row names (index).

index_labelstr or sequence, optional

Column label for index column(s) if desired. If not specified, and header and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

startrowint, default 0

Upper left cell row to dump data frame.

startcolint, default 0

Upper left cell column to dump data frame.

enginestr, optional

Write engine to use, ‘openpyxl’ or ‘xlsxwriter’. You can also set this via the options io.excel.xlsx.writer or io.excel.xlsm.writer.

merge_cellsbool, default True

Write MultiIndex and Hierarchical Rows as merged cells.

inf_repstr, default ‘inf’

Representation for infinity (there is no native representation for infinity in Excel).

freeze_panestuple of int (length 2), optional

Specifies the one-based bottommost row and rightmost column that is to be frozen.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.

Added in version 1.2.0.

engine_kwargsdict, optional

Arbitrary keyword arguments passed to excel engine.

See Also

to_csv : Write DataFrame to a comma-separated values (csv) file. ExcelWriter : Class for writing DataFrame objects into excel sheets. read_excel : Read an Excel file into a pandas DataFrame. read_csv : Read a comma-separated values (csv) file into DataFrame. io.formats.style.Styler.to_excel : Add styles to Excel sheet.

Notes

For compatibility with to_csv(), to_excel serializes lists and dicts to strings before writing.

Once a workbook has been saved it is not possible to write further data without rewriting the whole workbook.

Examples

Create, write to and save a workbook:

>>> df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
...                    index=['row 1', 'row 2'],
...                    columns=['col 1', 'col 2'])
>>> df1.to_excel("output.xlsx")  

To specify the sheet name:

>>> df1.to_excel("output.xlsx",
...              sheet_name='Sheet_name_1')  

If you wish to write to more than one sheet in the workbook, it is necessary to specify an ExcelWriter object:

>>> df2 = df1.copy()
>>> with pd.ExcelWriter('output.xlsx') as writer:  
...     df1.to_excel(writer, sheet_name='Sheet_name_1')
...     df2.to_excel(writer, sheet_name='Sheet_name_2')

ExcelWriter can also be used to append to an existing Excel file:

>>> with pd.ExcelWriter('output.xlsx',
...                     mode='a') as writer:  
...     df1.to_excel(writer, sheet_name='Sheet_name_3')

To set the library that is used to write the Excel file, you can pass the engine keyword (the default engine is automatically chosen depending on the file extension):

>>> df1.to_excel('output1.xlsx', engine='xlsxwriter')  
toClipboard(*, excel: bool = True, sep: str | None = None, **kwargs) None

Copy object to the system clipboard.

Write a text representation of object to the system clipboard. This can be pasted into Excel, for example.

Parameters

excelbool, default True

Produce output in a csv format for easy pasting into excel.

  • True, use the provided separator for csv pasting.

  • False, write a string representation of the object to the clipboard.

sepstr, default '\t'

Field delimiter.

**kwargs

These parameters will be passed to DataFrame.to_csv.

See Also

DataFrame.to_csvWrite a DataFrame to a comma-separated values

(csv) file.

read_clipboard : Read text from clipboard and pass to read_csv.

Notes

Requirements for your platform.

  • Linux : xclip, or xsel (with PyQt4 modules)

  • Windows : none

  • macOS : none

This method uses the processes developed for the package pyperclip. A solution to render any output string format is given in the examples.

Examples

Copy the contents of a DataFrame to the clipboard.

>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
>>> df.to_clipboard(sep=',')  
... # Wrote the following to the system clipboard:
... # ,A,B,C
... # 0,1,2,3
... # 1,4,5,6

We can omit the index by passing the keyword index and setting it to false.

>>> df.to_clipboard(sep=',', index=False)  
... # Wrote the following to the system clipboard:
... # A,B,C
... # 1,2,3
... # 4,5,6

Using the original pyperclip package for any string output format.

import pyperclip
html = df.style.to_html()
pyperclip.copy(html)