diff --git a/docs/contributors.md b/docs/contributors.md index e44d1d7..0c25089 100644 --- a/docs/contributors.md +++ b/docs/contributors.md @@ -21,3 +21,4 @@ The aim of this page is to include a list of the contributors to our project bot - [neutropolis](https://github.com/neutropolis) - [nipsn](https://github.com/nipsn) - [marcosvm13](https://github.com/marcosvm13) +- [tortolavivo23](https://github.com/tortolavivo23) diff --git a/docs/getting-started/installing.md b/docs/getting-started/installing.md index 55127ca..021927f 100644 --- a/docs/getting-started/installing.md +++ b/docs/getting-started/installing.md @@ -14,7 +14,7 @@ _This section explains how to install PyKX on your machine._ Before you start, make sure you have: - **Python** (versions 3.8-3.12) -- **Pip** +- **pip** Recommended: a virtual environment with packages such as [venv](https://docs.python.org/3/library/venv.html) from the standard library. diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md index 58d4788..00aa53c 100644 --- a/docs/getting-started/quickstart.md +++ b/docs/getting-started/quickstart.md @@ -13,18 +13,17 @@ To complete the quickstart guide below you will need to have completed the follo To access PyKX and it's functionality import it within your Python code using the following syntax ```python -import pykx as kx +>>> import pykx as kx ``` The use of the shortened name `kx` is intended to provide a terse convention for interacting with methods and objects from this library. ## How to generate PyKX objects -The generation of PyKX objects is supported pricipally in two ways +The generation of PyKX objects is supported principally in two ways 1. Execution of q code to create these entities -2. Conversion of Python objects to analagous PyKX objects - +2. Conversion of Python objects to analogous PyKX objects ### Creation of PyKX objects using inbuilt PyKX functions @@ -52,7 +51,7 @@ x x1 ### Creation of PyKX objects from Python data types -Generation of PyKX objects from Python, Numpy, Pandas and PyArrow objects can be completed as follows using the `kx.toq` method. +Generation of PyKX objects from Python, NumPy, Pandas and PyArrow objects can be completed as follows using the `kx.toq` method. ```python >>> pylist = [10, 20, 30] @@ -60,11 +59,13 @@ Generation of PyKX objects from Python, Numpy, Pandas and PyArrow objects can be >>> qlist pykx.LongVector(pykx.q('10 20 30')) +>>> import numpy as np >>> nplist = np.arange(0, 10, 2) >>> qlist = kx.toq(nplist) >>> qlist pykx.LongVector(pykx.q('0 2 4 6 8')) +>>> import pandas as pd >>> df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]}) >>> df col1 col2 @@ -78,6 +79,7 @@ col1 col2 2 4 ')) +>>> import pyarrow as pa >>> patab = pa.Table.from_pandas(df) >>> patab pyarrow.Table @@ -115,8 +117,7 @@ x x1 x2 ## Interacting with PyKX Objects -PyKX objects can be interacted with in a variety of ways, through indexing using Pythonic syntax, passing PyKX objects to q/numpy functions, querying via SQL/qSQL syntax or through the use of q functionality via the context interface. Each of these is described in more depth throughout this documentation but examples of each are provided here - +PyKX objects can be interacted with in a variety of ways, through indexing using Pythonic syntax, passing PyKX objects to q/NumPy functions, querying via SQL/qSQL syntax or through the use of q functionality via the context interface. Each of these is described in more depth throughout this documentation but examples of each are provided here. * Create a PyKX list and interact with the list using indexing and slices. @@ -192,7 +193,7 @@ PyKX objects can be interacted with in a variety of ways, through indexing using 0.2062569 3.852387 a 0.481821 0.07970141 a ')) - ``` + ``` * Pass a PyKX object to q function @@ -211,8 +212,6 @@ PyKX objects can be interacted with in a variety of ways, through indexing using >>> qvec.apply(lambda x:x+1) pykx.LongVector(pykx.q('5 8 3 3 10 5 3 1 9 1')) ``` - - * Pass a PyKX array objects to a Numpy functions diff --git a/docs/getting-started/what_is_pykx.md b/docs/getting-started/what_is_pykx.md index 9f056b6..c487b6c 100644 --- a/docs/getting-started/what_is_pykx.md +++ b/docs/getting-started/what_is_pykx.md @@ -2,9 +2,9 @@ ## Introduction -PyKX is a Python first interface to the worlds fastest time-series database kdb+ and it's underlying vector programming language q. PyKX takes a Python first approach to integrating q/kdb+ with Python following 10+ years of integrations between these two languages. Fundamentally it provides users with the ability to efficiently query and analyze huge amounts of in-memory and on-disk time-series data. +PyKX is a Python first interface to the world's fastest time-series database kdb+ and its underlying vector programming language, q. PyKX takes a Python first approach to integrating q/kdb+ with Python following 10+ years of integrations between these two languages. Fundamentally it provides users with the ability to efficiently query and analyze huge amounts of in-memory and on-disk time-series data. -This interface exposes q as a domain-specific language (DSL) embedded within Python, taking the approach that q should principally be used for data processing and management of databases. This approach does not diminish the ability for users familiar with q or those wishing to learn more about it from making the most of advanced analytics and database management functionality rather empowers those who want to make use of the power of kdb+/q who lack this expertise to get up and running fast. +This interface exposes q as a domain-specific language (DSL) embedded within Python, taking the approach that q should principally be used for data processing and management of databases. This approach does not diminish the ability for users familiar with q, or those wishing to learn more about it, from making the most of its advanced analytics and database management functionality. Rather it empowers those who want to make use of the power of kdb+/q who lack this expertise to get up and running quickly. PyKX supports three principal use cases: diff --git a/docs/index.md b/docs/index.md index 7a59acc..5bdace8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -2,7 +2,7 @@ ## About -PyKX is a Python-first interface for the q language and its time-series vector database kdb+. +PyKX is a Python first interface to the world's fastest time-series database kdb+ and its underlying vector programming language, q. For Python developers, PyKX unlocks the speed and power of kdb+ for data processing and storage from within your Python environment. It enables anyone with Python knowledge to apply analytics against vast amounts of data, both in-memory and on-disk, in a fraction of the time, allowing you to focus on getting the best from your data. diff --git a/docs/release-notes/changelog.md b/docs/release-notes/changelog.md index e8872cf..2b827cb 100644 --- a/docs/release-notes/changelog.md +++ b/docs/release-notes/changelog.md @@ -8,6 +8,116 @@ Currently PyKX is not compatible with Pandas 2.2.0 or above as it introduced breaking changes which cause data to be cast to the incorrect type. +## PyKX 2.5.1 + +#### Release Date + +2024-06-11 + +### Additions + +- [Pandas API](../user-guide/advanced/Pandas_API.ipynb) additions: `isnull`, `isna`, `notnull`, `notna`, `idxmax`, `idxmin`, `kurt`, `sem`. +- Addition of `filter_type`, `filter_columns`, and `custom` parameters to `QReader.csv()` to add options for CSV type guessing. + + ```python + >>> import pykx as kx + >>> reader = kx.QReader(kx.q) + >>> reader.csv("myFile0.csv", filter_type = "like", filter_columns="*name", custom={"SYMMAXGR":15}) + pykx.Table(pykx.q(' + firstname lastname + ---------------------- + "Frieda" "Bollay" + "Katuscha" "Paton" + "Devina" "Reinke" + "Maurene" "Bow" + "Iseabal" "Bashemeth" + .. + ')) + ``` + +### Fixes and Improvements + +- Fix to regression in PyKX 2.5.0 where PyKX initialisation on Windows would result in a segmentation fault when using an `k4.lic` license type. +- Previously user could not make direct use of `kx.SymbolicFunction` type objects against a remote process, this has been rectified + + === "Behaviour prior to change" + + ```python + >>> import pykx as kx + >>> kx.q('.my.func:{x+1}') + pykx.Identity(pykx.q('::')) + >>> kx.q.my.func + pykx.SymbolicFunction(pykx.q('`.my.func')) + >>> conn = kx.q.SyncQConnection(port=5050) + >>> conn(kx.q.my.func, 1) + ... Error Message ... + pykx.exceptions.QError: .my.func + ``` + + === "Behaviour post change" + + ```python + >>> import pykx as kx + >>> kx.q('.my.func:{x+1}') + pykx.Identity(pykx.q('::')) + >>> kx.q.my.func + pykx.SymbolicFunction(pykx.q('`.my.func')) + >>> conn = kx.q.SyncQConnection(port=5050) + >>> conn(kx.q.my.func, 1) + pykx.LongAtom(pykx.q('2')) + ``` + +- Previously use of the context interface for q primitive functions in licensed mode via IPC would partially run the function on the client rather than server, thus limiting usage for named entities on the server. + + === "Behaviour prior to change" + + ```python + >>> import pykx as kx + >>> conn = kx.SyncQConnection(port=5050) + >>> conn.q('tab:([]10?1f;10?1f)') + >>> conn.q.meta('tab') + ... Error Message ... + pykx.exceptions.QError: tab + ``` + + === "Behaviour post change" + + ```python + >>> import pykx as kx + >>> conn = kx.SyncQConnection(port=5050) + >>> conn.q('tab:([]10?1f;10?1f)') + >>> conn.q.meta('tab') + pykx.KeyedTable(pykx.q(' + c | t f a + --| ----- + x | f + x1| f + ')) + ``` + +- With the release of PyKX 2.5.0 and support of PyKX usage in paths containing spaces the context interface functionality could fail to load a requested context over IPC if PyKX was not loaded on the server. + + === "Behaviour prior to change" + + ```python + >>> import pykx as kx + >>> conn = kx.SyncQConnection(port=5050) + >>> conn.my_ctx + ... Error Message ... + ``` + + === "Behaviour post change" + + ```python + >>> import pykx as kx + >>> conn = kx.SyncQConnection(port=5050) + >>> conn.my_ctx + + ``` + +- Updated CSV analysis logic to be based on `csvutil.q` 2020.06.20. +- Fix for config value `PYKX_4_1_ENABLED` to only use 4.1 if set to `True`, `true`, or `1`. Previously any non empty value enabled 4.1. + ## PyKX 2.5.0 #### Release Date diff --git a/docs/release-notes/underq-changelog.md b/docs/release-notes/underq-changelog.md index 0659ef4..18d7c68 100644 --- a/docs/release-notes/underq-changelog.md +++ b/docs/release-notes/underq-changelog.md @@ -10,7 +10,7 @@ This changelog provides updates from PyKX 2.0.0 and above, for information relat #### Release Date -TBD +2024-05-15 ### Fixes and Improvements diff --git a/docs/roadmap.md b/docs/roadmap.md index 660fe3b..023c600 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -21,7 +21,7 @@ If you need a feature that's not included in this list please let us know by rai >>> table.select(kx.col('x1').wavg('x2')) ``` -- Addition of support for q primatives as methods off PyKX Vector and Table objects. Syntax for this will be similar to the following +- Addition of support for q primitives as methods off PyKX Vector and Table objects. Syntax for this will be similar to the following: ```python >>> import pykx as kx @@ -31,13 +31,7 @@ If you need a feature that's not included in this list please let us know by rai >>> vec.abs() ``` -- Performance improvements for conversions from Numpy arrays to PyKX Vector objects and vice-versa through enhanced use of C++ over Cython. -- Additions to the Pandas Like API for PyKX. - - `isnull` - - `idxmax` - - `kurt` - - `sem` - +- Performance improvements for conversions from NumPy arrays to PyKX Vector objects and vice-versa through enhanced use of C++ over Cython. - Addition of functionality for the development of streaming workflows using PyKX. - Configurable initialisation logic in the absence of a license. Thus allowing users who have their own workflows for license access to modify the instructions for their users. - Promotion of Beta functionality currently available in PyKX to full production support diff --git a/docs/user-guide/advanced/Pandas_API.ipynb b/docs/user-guide/advanced/Pandas_API.ipynb index ed3e281..8ab8c18 100644 --- a/docs/user-guide/advanced/Pandas_API.ipynb +++ b/docs/user-guide/advanced/Pandas_API.ipynb @@ -708,7 +708,7 @@ "Table.isna()\n", "```\n", "\n", - "Detects null values on a Table object.\n", + "Detects null values in a Table object.\n", "\n", "**Parameters:**\n", "\n", @@ -717,7 +717,7 @@ "\n", "| Type | Description |\n", "| :----------------: | :------------------------------------------------------------------- |\n", - "| Table | A Table with the same shape as the original but containing boolean values. 1b represents a null value was on its place and 0b represents the opposite. |" + "| Table | A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite. |" ] }, { @@ -727,9 +727,23 @@ "metadata": {}, "outputs": [], "source": [ - "tab.isna()" + "tabDemo = kx.Table(data= {\n", + " 'a': [1, 0, float('nan')],\n", + " 'b': [1, 0, float('nan')],\n", + " 'c': [float('nan'), 4, 0]\n", + " })" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8ff16e1", + "metadata": {}, + "outputs": [], + "source": [ + "tabDemo.isna()" + ] + }, { "cell_type": "markdown", "id": "47d20b00", @@ -743,7 +757,7 @@ "\n", "Alias of Table.isna().\n", "\n", - "Detects null values on a Table object.\n", + "Detects null values in a Table object.\n", "\n", "**Parameters:**\n", "\n", @@ -752,7 +766,7 @@ "\n", "| Type | Description |\n", "| :----------------: | :------------------------------------------------------------------- |\n", - "| Table | A Table with the same shape as the original but containing boolean values. 1b represents a null value was on its place and 0b represents the opposite. |" + "| Table | A Table with the same shape as the original but containing boolean values. `1b` represents a null value present in a cell, `0b` represents the opposite. |" ] }, { @@ -762,7 +776,7 @@ "metadata": {}, "outputs": [], "source": [ - "tab.isnull()" + "tabDemo.isnull()" ] }, { @@ -787,7 +801,7 @@ "\n", "| Type | Description |\n", "| :----------------: | :------------------------------------------------------------------- |\n", - "| Table | A Table with the same shape as the original but containing boolean values. 1b represents a non null value was on its place and 0b represents the opposite. |" + "| Table | A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite. |" ] }, { @@ -797,7 +811,7 @@ "metadata": {}, "outputs": [], "source": [ - "tab.notna()" + "tabDemo.notna()" ] }, { @@ -822,7 +836,7 @@ "\n", "| Type | Description |\n", "| :----------------: | :------------------------------------------------------------------- |\n", - "| Table | A Table with the same shape as the original but containing boolean values. 1b represents a non null value was on its place and 0b represents the opposite. |" + "| Table | A Table with the same shape as the original but containing boolean values. `0b` represents a null value present in a cell, `1b` represents the opposite. |" ] }, { @@ -832,12 +846,12 @@ "metadata": {}, "outputs": [], "source": [ - "tab.notnull()" + "tabDemo.notnull()" ] }, { "cell_type": "markdown", - "id": "d1c370e4", + "id": "d97d6bae", "metadata": {}, "source": [ "### Table.iloc[]\n", @@ -2261,7 +2275,7 @@ }, { "cell_type": "markdown", - "id": "d98b298c", + "id": "bc5b6dde", "metadata": {}, "source": [ "### Table.min()\n", @@ -2297,7 +2311,7 @@ "tab.min()" ] }, - { +{ "cell_type": "markdown", "id": "b52627d2", "metadata": {}, @@ -2314,7 +2328,7 @@ "\n", "| Name | Type | Description | Default |\n", "| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n", - "| axis | int | The axis to calculate the idxmax across 0 is columns, 1 is rows. | 0 |\n", + "| axis | int | The axis to calculate the idxmax across. 0 is columns, 1 is rows. | 0 |\n", "| skipna | bool | Ignore any null values along the axis. | True |\n", "| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n", "\n", @@ -2380,7 +2394,7 @@ "\n", "| Name | Type | Description | Default |\n", "| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n", - "| axis | int | The axis to calculate the idxmin across 0 is columns, 1 is rows. | 0 |\n", + "| axis | int | The axis to calculate the idxmin across. 0 is columns, 1 is rows. | 0 |\n", "| skipna | bool | Ignore any null values along the axis. | True |\n", "| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n", "\n", @@ -2431,7 +2445,7 @@ }, { "cell_type": "markdown", - "id": "1bf3da2a", + "id": "4aee2790", "metadata": {}, "source": [ "### Table.sum()\n", @@ -2819,7 +2833,7 @@ "tab.prod(numeric_only=True)" ] }, - { +{ "cell_type": "markdown", "id": "fe565b65-fbf2-47ba-a26e-791d09fd4f55", "metadata": {}, @@ -2837,8 +2851,8 @@ "\n", "| Name | Type | Description | Default |\n", "| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n", - "| axis | int | Axis for the function to be applied on. 0 is columns, 1 is rows. | 0 |\n", - "| skipna | bool | not yet implemented | True |\n", + "| axis | int | Axis for the function to be applied on. 0 is columns, 1 is rows. | 0 |\n", + "| skipna | bool | Not yet implemented | True |\n", "| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n", "\n", "**Returns:**\n", @@ -2914,16 +2928,16 @@ "```\n", "Table.sem(axis=0, skipna=True, numeric_only=False, ddof=0)\n", "```\n", - "Return unbiased standard error of the mean over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument\n", + "Return unbiased standard error of the mean over requested axis. Normalized by N-1 by default. This can be changed using the `ddof` argument.\n", "\n", "**Parameters:**\n", "\n", "| Name | Type | Description | Default |\n", "| :----------: | :--: | :------------------------------------------------------------------------------- | :-----: |\n", - "| axis | int | The axis to calculate the sum across 0 is columns, 1 is rows. | 0 |\n", - "| skipna | bool | not yet implemented | True |\n", + "| axis | int | The axis to calculate the sum across. 0 is columns, 1 is rows. | 0 |\n", + "| skipna | bool | not yet implemented | True |\n", "| numeric_only | bool | Only use columns of the table that are of a numeric data type. | False |\n", - "| ddof | int | Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. | 1 |\n", + "| ddof | int | Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. | 1 |\n", "\n", "**Returns:**\n", "\n", @@ -3008,7 +3022,7 @@ }, { "cell_type": "markdown", - "id": "c777923e", + "id": "ff51630f", "metadata": {}, "source": [ "### Table.skew()\n", @@ -4439,7 +4453,7 @@ } ], "source": [ - "tab.replace(True, (\"one\", \"two\", \"three\"))" + "tab.replace(True, (b\"one\", b\"two\", b\"three\"))" ] }, { diff --git a/src/pykx/__init__.py b/src/pykx/__init__.py index f055f88..e05ef27 100644 --- a/src/pykx/__init__.py +++ b/src/pykx/__init__.py @@ -231,7 +231,9 @@ def _register(self, self._call( f'{"" if name[0] == "." else "."}{name}:(enlist`)!enlist(::);' f'system "d {"" if name[0] == "." else "."}{name}";' - f'.pykx.util.loadfile["{path.parent}";"{path.name}"];', + '$[@[{get x;1b};`.pykx.util.loadfile;{0b}];' + f' .pykx.util.loadfile["{path.parent}";"{path.name}"];' + f' system"l {path}"];', wait=True, ) return name[1:] if name[0] == '.' else name diff --git a/src/pykx/config.py b/src/pykx/config.py index ae89596..09f9acf 100644 --- a/src/pykx/config.py +++ b/src/pykx/config.py @@ -74,7 +74,7 @@ def _is_set(envvar): pykx_dir = Path(__file__).parent.resolve(strict=True) os.environ['PYKX_DIR'] = str(pykx_dir) os.environ['PYKX_EXECUTABLE'] = sys.executable -pykx_libs_dir = Path(pykx_dir/'lib') if _get_config_value('PYKX_4_1_ENABLED', None) is None else Path(pykx_dir/'lib'/'4-1-libs') # noqa +pykx_libs_dir = Path(pykx_dir/'lib'/'4-1-libs') if _is_enabled('PYKX_4_1_ENABLED') else Path(pykx_dir/'lib') # noqa pykx_lib_dir = Path(_get_config_value('PYKX_Q_LIB_LOCATION', pykx_libs_dir)) pykx_platlib_dir = pykx_lib_dir/q_lib_dir_name lib_prefix = '' if system == 'Windows' else 'lib' @@ -97,6 +97,7 @@ def _is_set(envvar): _pwd = os.getcwd() license_located = False lic_path = '' +lic_type = '' for loc in (_pwd, _qlic, qhome): if loc=='': pass @@ -104,6 +105,7 @@ def _is_set(envvar): try: lic_path = Path(str(loc) + '/' + lic).resolve(strict=True) license_located=True + lic_type = lic qlic=Path(loc) except FileNotFoundError: continue diff --git a/src/pykx/ctx.py b/src/pykx/ctx.py index 6921c6f..1c0ac92 100644 --- a/src/pykx/ctx.py +++ b/src/pykx/ctx.py @@ -15,7 +15,6 @@ from . import Q from .config import ignore_qhome, pykx_lib_dir, qhome -from .core import licensed from .exceptions import PyKXException, QError from .wrappers import Identity, SymbolicFunction @@ -121,7 +120,7 @@ def __getattr__(self, key): # noqa raise AttributeError(f'{key}: {self._unsupported_keys_with_msg[key]}') if self._fqn in {'', '.q'} and key in self._q.reserved_words: # Reserved words aren't actually part of the `.q` context dict - if not licensed: + if 'QConnection' in str(self._q._call): return lambda *args: self._q._call(key, *args, wait=True) else: return self._q._call(key, wait=True) diff --git a/src/pykx/ipc.py b/src/pykx/ipc.py index 72e1cae..069e726 100644 --- a/src/pykx/ipc.py +++ b/src/pykx/ipc.py @@ -41,7 +41,7 @@ from .core import licensed from .exceptions import FutureCancelled, NoResults, PyKXException, QError, UninitializedConnection from .util import get_default_args, normalize_to_bytes, normalize_to_str -from .wrappers import CharVector, Composition, Foreign, Function, K, List, SymbolAtom +from .wrappers import CharVector, Composition, Foreign, Function, K, List, SymbolAtom, SymbolicFunction # noqa : E501 from . import _wrappers from . import _ipc @@ -640,6 +640,10 @@ def _send(self, raise RuntimeError("Attempted to use a closed IPC connection") tquery = type(query) debugging = (not skip_debug) and (debug or pykx_qdebug) + if issubclass(tquery, SymbolicFunction): + if licensed: + query = query.func + tquery = type(query) if not (issubclass(tquery, K) or isinstance(query, (str, bytes))): raise ValueError('Cannot send object of passed type over IPC: ' + str(tquery)) if debugging: diff --git a/src/pykx/lib/csvutil.q b/src/pykx/lib/csvutil.q index 2c7653f..fd41a41 100644 --- a/src/pykx/lib/csvutil.q +++ b/src/pykx/lib/csvutil.q @@ -1,4 +1,7 @@ / utilities to quickly load a csv file - for more exhaustive analysis of the csv contents see csvguess.q +/ 2020.06.20 - add POSTLOADEACH and POSTLOADALL filtering +/ 2020.06.03 - adjust basic handling of timespan (210,211) +/ 2020.05.17 - add basicinfo / 2020.05.06 - bugfix for infolike and info0 / 2016.11.09 - add " " as valid delimiter in P / 2016.09.03 - allow HHMMSSXYZXYZXYZ N timestamps @@ -21,13 +24,19 @@ / show delete from info where t=" " / .csvutil.data[file;info] - use the info from .csvutil.info to read the data / .csvutil.data10[file;info] - like .csvutil.data but only returns the first 10 rows +/ .csvutil.bulkload[file;info] - bulk loads file into table DATA (which must be already defined :: DATA:() ) / .csvutil.read[file]/read10[file] - for when you don't care about checking/tweaking the before reading +/ .csvutil.basicread[file]/basicread10[file] - read the basicdata, 20200520 is an int instead of a date frinstance + + \d .csvutil +POSTLOADEACH:{x}; / {delete from x where col0=-1} +POSTLOADALL:{x}; / {`col0`col1 xasc x} DELIM:"," ZAPHDRS:0b / lowercase and remove _ from colhdrs (junk characters are always removed) WIDTHHDR:25000 / number of characters read to get the header -READLINES:222 / number of lines read and used to guess the types +READLINES:5555 / number of lines read and used to guess the types SYMMAXWIDTH:11 / character columns narrower than this are stored as symbols SYMMAXGR:10 / max symbol granularity% before we give up and keep as a * string FORCECHARWIDTH:30 / every field (of any type) with values this wide or more is forced to character "*" @@ -42,14 +51,16 @@ nostar:{$[not"*"in raze string x$11#y;$[112,mdot<2,{all" /"in x}each dchar,.csvutil.cancast["F"]peach sdv; / fractions, "1 3/4" -> 1.75f info:update t:"G",(rules:rules,'52) from info where t="*",mw=36,mdot=0,{all x like"????????-????-????-????-????????????"}peach sdv,.csvutil.cancast["G"]peach sdv; / GUID, v3.0 or later - info:update t:"N",(rules:rules,'53),maybe:1b from info where t="n",mw=15,mdot=0,{all x in"0123456789"}each dchar,.csvutil.cancast["N"]peach sdv; / N, could be T but that'd loose precision - info:update t:"T",(rules:rules,'54),maybe:1b from info where t="n",mw=9,mdot=0,{all x in"0123456789"}each dchar,.csvutil.cancast["T"]peach sdv; + info:update t:"N",(rules:rules,'53),maybe:1b from info where extended,t="n",mw=15,mdot=0,{all x in"0123456789"}each dchar,.csvutil.cancast["N"]peach sdv; / N, could be T but that'd loose precision + info:update t:"T",(rules:rules,'54),maybe:1b from info where extended,t="n",mw=9,mdot=0,{all x in"0123456789"}each dchar,.csvutil.cancast["T"]peach sdv; info:update t:"G",(rules:rules,'55) from info where t="*",mw=38,mdot=0,{all x like"{????????-????-????-????-????????????}"}peach sdv,.csvutil.cancast["G"]peach sdv; / GUID, v3.0 or later info:update t:"J",(rules:rules,'60)from info where t="n",mdot=0,{all x in"+-0123456789"}each dchar,.csvutil.cancast["J"]peach sdv; info:update t:"I",(rules:rules,'70)from info where t="J",mw<12,.csvutil.cancast["I"]peach sdv; info:update t:"H",(rules:rules,'80)from info where t="I",mw<7,.csvutil.cancast["H"]peach sdv; info:update t:"F",(rules:rules,'90)from info where t="n",mdot<2,mw>1,.csvutil.cancast["F"]peach sdv; info:update t:"E",(rules:rules,'100),maybe:1b from info where t="F",mw<9; - info:update t:"M",(rules:rules,'110),maybe:1b from info where t in"nIHEF",mdot<2,mw within 4 7,.csvutil.cancast["M"]peach sdv; - info:update t:"D",(rules:rules,'120),maybe:1b from info where t in"nI",mdot in 0 2,mw within 6 11,.csvutil.cancast["D"]peach sdv; - info:update t:"V",(rules:rules,'130),maybe:1b from info where t="I",mw=6,{all x like"[012][0-9][0-5][0-9][0-5][0-9]"}peach sdv,.csvutil.nostar["V"]peach sdv; / 235959 123456 - info:update t:"U",(rules:rules,'140),maybe:1b from info where t="H",mw=4,{all x like"[012][0-9][0-5][0-9]"}peach sdv,.csvutil.nostar["U"]peach sdv; /2359 + info:update t:"M",(rules:rules,'110),maybe:1b from info where extended,t in"nIHEF",mdot<2,mw within 4 7,.csvutil.cancast["M"]peach sdv; + info:update t:"D",(rules:rules,'120),maybe:1b from info where extended,t="I",mw in 6 8,.csvutil.cancast["D"]peach sdv; + info:update t:"D",(rules:rules,'121),maybe:0b from info where t="n",mdot=0,mw within 8 10,.csvutil.cancast["D"]peach sdv; + info:update t:"D",(rules:rules,'122),maybe:0b from info where t="n",mdot=2,mw within 8 10,.csvutil.cancast["D"]peach sdv; + info:update t:"V",(rules:rules,'130),maybe:1b from info where extended,t="I",mw=6,{all x like"[012][0-9][0-5][0-9][0-5][0-9]"}peach sdv,.csvutil.nostar["V"]peach sdv; / 235959 123456 + info:update t:"U",(rules:rules,'140),maybe:1b from info where extended,t="H",mw=4,{all x like"[012][0-9][0-5][0-9]"}peach sdv,.csvutil.nostar["U"]peach sdv; /2359 info:update t:"U",(rules:rules,'150),maybe:0b from info where t="n",mw in 4 5,mdot=0,{all x like"*[0-9]:[0-5][0-9]"}peach sdv,.csvutil.cancast["U"]peach sdv; info:update t:"T",(rules:rules,'160),maybe:0b from info where t="n",mw within 7 12,mdot<2,{all x like"*[0-9]:[0-5][0-9]:[0-5][0-9]*"}peach sdv,.csvutil.cancast["T"]peach sdv; info:update t:"V",(rules:rules,'170),maybe:0b from info where t="T",mw in 7 8,mdot=0,.csvutil.cancast["V"]peach sdv; - info:update t:"T",(rules:rules,'180),maybe:1b from info where t in"EF",mw within 7 10,mdot=1,{all x like"*[0-9][0-5][0-9][0-5][0-9].*"}peach sdv,.csvutil.cancast["T"]peach sdv; + info:update t:"T",(rules:rules,'180),maybe:1b from info where extended,t in"EF",mw within 7 10,mdot=1,{all x like"*[0-9][0-5][0-9][0-5][0-9].*"}peach sdv,.csvutil.cancast["T"]peach sdv; / info:update t:"Z",(rules:rules,'190),maybe:0b from info where t="n",mw within 11 24,mdot<4,.csvutil.cancast["Z"]peach sdv; info:update t:"P",(rules:rules,'200),maybe:1b from info where t="n",mw within 11 29,mdot<4,{all x like"[12][0-9][0-9][0-9][ ./-][01][0-9][ ./-][0-3][0-9]*"}peach sdv,.csvutil.cancast["P"]peach sdv; - info:update t:"N",(rules:rules,'210),maybe:1b from info where t="n",mw within 3 28,mdot=1,.csvutil.cancast["N"]peach sdv; + info:update t:"N",(rules:rules,'210),maybe:0b from info where t="n",mw within 3 28,mdot=1,{all x like"*[0-9]D[0-9]*"}peach sdv,.csvutil.cancast["N"]peach sdv; + info:update t:"N",(rules:rules,'211),maybe:1b from info where extended,t="n",mw within 3 28,mdot=1,.csvutil.cancast["N"]peach sdv; info:update t:"?",(rules:rules,'220),maybe:0b from info where t="n"; / reset remaining maybe numeric info:update t:"C",(rules:rules,'230),maybe:0b from info where t="?",mw=1; / char info:update t:"D",(rules:rules,'231),maybe:0b from info where t="?",mdot=0,mw within 5 9,{all x like"*[0-9][a-sA-S][a-uA-U][b-yB-Y][0-9][0-9]*"}peach sdv,.csvutil.cancast["D"]peach sdv; / 1dec12..01dec2011 @@ -106,8 +120,18 @@ info0:{[file;onlycols] info:update j12:1b from info where t in"S*",mw<13,{all x in .Q.nA}each dchar; info:update j10:1b from info where t in"S*",mw<11,{all x in .Q.b6}each dchar; select c,ci,t,maybe,empty,res,j10,j12,ipa,mw,mdot,rules,gr,ndv,dchar from info} -info:info0[;()] / by default don't restrict columns -infolike:{[file;pattern] info0[file;{x where(lower x)like lower y}[colhdrs[file];pattern]]} / .csvutil.infolike[file;"*time"] -infoonly:info0 / only some columns .csvutil.infoonly[file;`this`and`that] +info:info0[;();1b] / by default don't restrict columns +basicinfo:info0[;();0b] / nothing clever with date/time-like numbers +infolike:{[file;pattern] info0[file;{x where(lower x)like lower y}[colhdrs[file];pattern];1b]} / .csvutil.infolike[file;"*time"] +infoonly:info0[;;1b] / only some columns .csvutil.infoonly[file;`this`and`that] + +/ DATA:() +bulkload:{[file;info] + if[not`DATA in system"v";'`DATA.not.defined]; + if[count DATA;'`DATA.not.empty]; + loadhdrs:exec c from info where not t=" ";loadfmts:exec t from info; + fs2[{[file;loadhdrs;loadfmts] `DATA insert $[count DATA;flip loadhdrs!(loadfmts;DELIM)0:file;POSTLOADEACH loadhdrs xcol(loadfmts;enlist DELIM)0:file]}[file;loadhdrs;loadfmts]]; + count DATA::POSTLOADALL DATA} \d . +@[.:;"\\l csvutil.custom.q";::]; / save your custom settings in csvutil.custom.q to override those set at the beginning of the file \ No newline at end of file diff --git a/src/pykx/pandas_api/pandas_meta.py b/src/pykx/pandas_api/pandas_meta.py index 66724af..67dbfc6 100644 --- a/src/pykx/pandas_api/pandas_meta.py +++ b/src/pykx/pandas_api/pandas_meta.py @@ -147,7 +147,7 @@ def size(self): def mean(self, axis: int = 0, numeric_only: bool = False): tab = self if 'Keyed' in str(type(tab)): - tab = q('{(keys x) _ 0!x}', tab) + tab = q('value', tab) if numeric_only: tab = _get_numeric_only_subtable(tab) @@ -191,8 +191,8 @@ def kurt(self, axis: int = 0, numeric_only: bool = False): ''', tab, axis, axis_keys ) + @api_return def std(self, axis: int = 0, ddof: int = 1, numeric_only: bool = False): - tab = self if 'Keyed' in str(type(tab)): tab = q.value(tab) @@ -217,7 +217,7 @@ def std(self, axis: int = 0, ddof: int = 1, numeric_only: bool = False): def median(self, axis: int = 0, numeric_only: bool = False): tab = self if 'Keyed' in str(type(tab)): - tab = q('{(keys x) _ 0!x}', tab) + tab = q('value', tab) if numeric_only: tab = _get_numeric_only_subtable(tab) @@ -236,18 +236,18 @@ def median(self, axis: int = 0, numeric_only: bool = False): @convert_result def skew(self, axis=0, skipna=True, numeric_only=False): res, cols, _ = preparse_computations(self, axis, skipna, numeric_only) - return (q( - '''{[row] - m:{(sum (x - avg x) xexp y) % count x}; - g1:{[m;x]m:m[x]; m[3] % m[2] xexp 3%2}[m]; - (g1 each row) * {sqrt[n * n-1] % neg[2] + n:count x} each row - }''', res), cols) + return (q(''' + {[row] + m:{(sum (x - avg x) xexp y) % count x}; + g1:{[m;x]m:m[x]; m[3] % m[2] xexp 3%2}[m]; + (g1 each row) * {sqrt[n * n-1] % neg[2] + n:count x} each row + }''', res), cols) @api_return def mode(self, axis: int = 0, numeric_only: bool = False, dropna: bool = True): tab = self if 'Keyed' in str(type(tab)): - tab = q('{(keys x) _ 0!x}', tab) + tab = q('value', tab) if numeric_only: tab = _get_numeric_only_subtable(tab) @@ -329,6 +329,8 @@ def min(self, axis=0, skipna=True, numeric_only=False): @convert_result def idxmax(self, axis=0, skipna=True, numeric_only=False): tab = self + if 'Keyed' in str(type(tab)): + tab = q('value', tab) axis = q('{$[11h~type x; `index`columns?x; x]}', axis) res, cols, ix = preparse_computations(tab, axis, skipna, numeric_only) return (q( @@ -341,6 +343,8 @@ def idxmax(self, axis=0, skipna=True, numeric_only=False): @convert_result def idxmin(self, axis=0, skipna=True, numeric_only=False): tab = self + if 'Keyed' in str(type(tab)): + tab = q('value', tab) axis = q('{$[11h~type x; `index`columns?x; x]}', axis) res, cols, ix = preparse_computations(tab, axis, skipna, numeric_only) return (q( @@ -353,23 +357,23 @@ def idxmin(self, axis=0, skipna=True, numeric_only=False): @convert_result def prod(self, axis=0, skipna=True, numeric_only=False, min_count=0): res, cols, _ = preparse_computations(self, axis, skipna, numeric_only) - return (q( - '{[row; minc] {$[y > 0; $[y>count[x]; 0N; prd x]; prd x]}[;minc] each row}', - res, - min_count - ), cols) + return (q(''' + {[row; minc] + {$[y > 0; $[y>count[x]; 0N; prd x]; prd x]}[;minc] each row + } + ''', res, min_count), + cols) @convert_result def sum(self, axis=0, skipna=True, numeric_only=False, min_count=0): res, cols, _ = preparse_computations(self, axis, skipna, numeric_only) - return (q( - '{[row; minc]' - '{$[y > 0;' - '$[y>count[x]; 0N; $[11h=type x; `$"" sv string x;sum x]];' - '$[11h=type x; `$"" sv string x;sum x]]}[;minc] each row}', - res, - min_count - ), cols) + return (q(''' + {[row;minc] + {$[y > 0; + $[y>count[x]; 0N; $[11h=type x; `$"" sv string x;sum x]]; + $[11h=type x; `$"" sv string x;sum x] + ]}[;minc] each row} + ''', res, min_count), cols) def agg(self, func, axis=0, *args, **kwargs): # noqa: C901 if 'KeyedTable' in str(type(self)): diff --git a/src/pykx/pykx_init.q_ b/src/pykx/pykx_init.q_ index c9f992f..5fd30ed 100644 Binary files a/src/pykx/pykx_init.q_ and b/src/pykx/pykx_init.q_ differ diff --git a/src/pykx/read.py b/src/pykx/read.py index 3196a02..bccf627 100644 --- a/src/pykx/read.py +++ b/src/pykx/read.py @@ -37,6 +37,8 @@ def __dir__(): 'Time': "T", } +_filter_types = [None, "basic", "only", "like"] + JSONKTypes = Union[ k.Table, k.Dictionary, k.BooleanAtom, k.BooleanVector, k.FloatAtom, k.FloatVector, k.CharVector, k.List @@ -71,6 +73,9 @@ def csv(self, types: Optional[Union[bytes, k.CharAtom, k.CharVector]] = None, delimiter: Union[str, bytes, k.CharAtom] = ',', as_table: Union[bool, k.BooleanAtom] = True, + filter_type: Union[str, k.CharVector] = None, + filter_columns: Union[str, list, k.CharVector, k.SymbolAtom, k.SymbolVector] = None, + custom: dict = None, ) -> Union[k.Table, k.Dictionary]: """Reads a CSV file as a table or dictionary. @@ -86,6 +91,13 @@ def csv(self, as_table: `True` if the first line of the CSV file should be treated as column names, in which case a `pykx.Table` is returned. If `False` a `pykx.List` of `pykx.Vector` is returned - one for each column in the CSV file. + filter_type: Can be `basic`, `only`, or `like`. `basic` will not search for + any types with the `extended` flag in [csvutil.q]. `only` will only process + columns that are passed in `filter_columns`. `like` will only process columns that match + a string pattern passed in `filter_columns`. + filter_columns: Used in tandem with `filter_type` when `only` or `like` is passed. + `only` accepts str or list of str. `like` accepts only a str pattern. + custom: A dictionary used to change default values in [csvutil.q](https://github.com/KxSystems/pykx/blob/main/src/pykx/lib/csvutil.q#L34). Returns: The CSV data as a `pykx.Table` or `pykx.List`, depending on the value of `as_table`. @@ -132,7 +144,7 @@ def csv(self, table = q.read.csv('example.csv', 'SJJ', ' ') ``` - Read a comma seperated CSV file into a `pykx.Dictionary`, guessing the datatypes of + Read a comma separated CSV file into a `pykx.Dictionary`, guessing the datatypes of each column. ```python @@ -145,6 +157,19 @@ def csv(self, ```python table = q.read.csv('example.csv', {'x1':kx.IntAtom,'x2':kx.GUIDAtom,'x3':kx.TimestampAtom}) ``` + + Read a comma separated CSV file specifying only columns that include the word "name" in them. + + ```python + table = q.read.csv('example.csv', filter_type = "like", filter_columns = '*name*') + ``` + + Read a comma separated CSV file changing the guessing variables to change the number of lines + read and used to guess the type of the column. + + ```python + table = q.read.csv('example.csv', custom = {"READLINES":1000}) + ``` """ # noqa: E501 as_table = 'enlist' if as_table else '' dict_conversion = None @@ -153,10 +178,32 @@ def csv(self, raise LicenseException('guess CSV column types') if isinstance(self._q, QConnection): raise ValueError('Cannot guess types of CSV columns over IPC.') - if isinstance(types, dict): - dict_conversion = types - types = None - types = self._q.csvutil.info(k.SymbolAtom(path))['t'] + if filter_type not in _filter_types: + raise ValueError(f'Filter type {filter_type} not in supported filter types.') + self._q.csvutil + cache = self._q('.csvutil') + try: + if custom is not None: + self._q('''{ + if[0