.. include:: references.txt .. _construct_table: Constructing a table -------------------- There is great deal of flexibility in the way that a table can be initially constructed. Details on the inputs to the |Table| constructor are in the `Initialization Details`_ section. However, the easiest way to understand how to make a table is by example. Examples ^^^^^^^^ Much of the flexibility lies in the types of data structures which can be used to initialize the table data. The examples below show how to create a table from scratch with no initial data, create a table with a list of columns, a dictionary of columns, or from `numpy` arrays (either structured or homogeneous). Setup """"" For the following examples you need to import the |Table| and |Column| classes along with the `numpy` package:: >>> from astropy.table import Table, Column >>> import numpy as np Creating from scratch """"""""""""""""""""" A Table can be created without any initial input data or even without any initial columns. This is useful for building tables dynamically if the initial size, columns, or data are not known. .. Note:: Adding columns or rows requires making a new copy of the entire table table each time, so in the case of large tables this may be slow. :: >>> t = Table() >>> t.add_column(Column('a', [1, 4])) >>> t.add_column(Column('b', [2.0, 5.0])) >>> t.add_column(Column('c', ['x', 'y'])) >>> t = Table(names=('a', 'b', 'c'), dtypes=('f4', 'i4', 'S2')) >>> t.add_row((1, 2.0, 'x')) >>> t.add_row((4, 5.0, 'y')) List input """""""""" A typical case is where you have a number of data columns with the same length defined in different variables. These might be Python lists or `numpy` arrays or a mix of the two. These can be used to create a |Table| by putting the column data variables into a Python list. In this case the column names are not defined by the input data, so they must either be set using the ``names`` keyword or they will be auto-generated as ``col``. :: >>> a = [1, 4] >>> b = [2.0, 5.0] >>> c = ['x', 'y'] >>> t = Table([a, b, c], names=('a', 'b', 'c')) >>> t array([(1, 2.0, 'x'), (4, 5.0, 'y')], dtype=[('a', '>> Table([t['c'], t['a']])
array([('x', 1), ('y', 4)], dtype=[('c', '|S1'), ('a', '>> Table([t['a']**2, t['b'] + 10])
array([(1, 12.0), (16, 15.0)], dtype=[('a', '>> a = (1, 4) >>> b = np.array([[2, 3], [5, 6]]) # vector column >>> c = Column('axis', ['x', 'y']) >>> arr = (a, b, c) >>> Table(arr) # Data column named "c" has a name "axis" that table
array([(1, [2, 3], 'x'), (4, [5, 6], 'y')], dtype=[('col0', '>> arr = {'a': [1, 4], ... 'b': [2.0, 5.0], ... 'c': ['x', 'y']} >>> >>> Table(arr)
array([(1, 'x', 2.0), (4, 'y', 5.0)], dtype=[('a', '>> Table(arr, names=('a', 'b', 'c'), dtypes=('f4', 'i4', 'S2'))
array([(1.0, 2, 'x'), (4.0, 5, 'y')], dtype=[('a', '>> arr = {'a': (1, 4), 'b': np.array([[2, 3], [5, 6]]), 'c': Column('axis', ['x', 'y'])} >>> Table(arr, names=('a', 'b', 'c'))
array([(1, [2, 3], 'x'), (4, [5, 6], 'y')], dtype=[('a', '>> Table(arr, names=('a_new', 'b_new', 'c_new')) Traceback (most recent call last): File "", line 2, in File "astropy/table/table.py", line 404, in __init__ init_func(data, names, dtypes, n_cols, copy) File "astropy/table/table.py", line 467, in _init_from_dict data_list = [data[name] for name in names] KeyError: 'a_new' NumPy structured array """""""""""""""""""""" The structured array is the standard mechanism in `numpy` for storing heterogenous table data. Most scientific I/O packages that read table files (e.g. `PyFITS `_, `vo.table `_, `asciitable `_) will return the table in an object that is based on the structured array. A structured array can be created using:: >>> arr = np.array([(1, 2.0, 'x'), ... (4, 5.0, 'y')], ... dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'S2')]) From ``arr`` it is simple to create the corresponding |Table| object:: >>> Table(arr)
array([(1, 2.0, 'x'), (4, 5.0, 'y')], dtype=[('a', '>> table = Table(arr) >>> print table **New column names** The column names can be changed from the original values by providing the ``names`` argument:: >>> Table(arr, names=('a_new', 'b_new', 'c_new'))
array([(1, 2.0, 'x'), (4, 5.0, 'y')], dtype=[('a_new', '>> Table(arr, dtypes=('f4', 'i4', 'S4'))
array([(1.0, 2, 'x'), (4.0, 5, 'y')], dtype=[('a', '>> Table(arr, names=('a_new', 'b_new', 'c_new'), dtypes=('f4', 'i4', 'S4'))
array([(1.0, 2, 'x'), (4.0, 5, 'y')], dtype=[('a_new', '`` where ```` is the column number. **Basic example with automatic column names** :: >>> arr = np.array([[1, 2, 3], ... [4, 5, 6]]) >>> Table(arr)
array([(1, 2, 3), (4, 5, 6)], dtype=[('col0', '>> Table(arr, names=('a_new', 'b_new', 'c_new'), dtypes=('f4', 'i4', 'S4'))
array([(1.0, 2, '3'), (4.0, 5, '6')], dtype=[('a_new', '>> t = Table(arr, copy=False) **Python arrays versus `numpy` arrays as input** There is a slightly subtle issue that is important to understand in the way that |Table| objects are created. Any data input that looks like a Python list (including a tuple) is considered to be a list of columns. In contrast an homogeneous `numpy` array input is interpreted as a list of rows:: >>> arr = [[1, 2, 3], ... [4, 5, 6]] >>> np_arr = np.array(arr) >>> Table(arr) # Two columns, three rows
array([(1, 4), (2, 5), (3, 6)], dtype=[('col0', '>> Table(np_arr) # Three columns, two rows
array([(1, 2, 3), (4, 5, 6)], dtype=[('col0', '>> arr = [[1, 2.0, 'string'], # list of rows [2, 3.0, 'values']] >>> col_arr = zip(*arr) # transpose to a list of columns >>> col_arr [(1, 2), (2.0, 3.0), ('string', 'values')] >>> t = Table(col_arr) Table columns """"""""""""" A new table can be created by selecting a subset of columns in an existing table:: >>> t = Table(names=('a', 'b', 'c')) >>> t2 = t['c', 'b', 'a'] # Makes a copy of the data >>> print t2
array([], dtype=[('c', '>> Table(t.columns[0:2])
array([], dtype=[('a', '>> Table([t.columns[0], t.columns['c']])
array([], dtype=[('a', '". If provided the ``dtypes`` list overrides the base column types and must match the length of ``names``. **dict-like** The keys of the ``data`` object define the base column names. The corresponding values can be Column objects, numpy arrays, or list-like objects. The ``names`` list (optional) can be used to select particular fields and/or reorder the base names. The ``dtypes`` list (optional) must match the length of ``names`` and is used to override the existing or default data types. **list-like** Each item in the ``data`` list provides a column of data values and can can be a Column object, numpy array, or list-like object. The ``names`` list defines the name of each column. The names will be auto-generated if not provided (either from the ``names`` argument or by Column objects). If provided the ``names`` argument must match the number of items in the ``data`` list. The optional ``dtypes`` list will override the existing or default data types and must match ``names`` in length. **None** Initialize a zero-length table. If ``names`` and optionally ``dtypes`` are provided then the corresponding columns are created. names """"" The ``names`` argument provides a way to specify the table column names or override the existing ones. By default the column names are either taken from existing names (for ``ndarray`` or ``Table`` input) or auto-generated as ``col``. If ``names`` is provided then it must be a list with the same length as the number of columns. Any list elements with value ``None`` fall back to the default name. In the case where ``data`` is provided as dict of columns, the ``names`` argument can be supplied to specify the order of columns. The ``names`` list must then contain each of the keys in the ``data`` dict. If ``names`` is not supplied then the order of columns in the output table is not determinate. dtypes """""" The ``dtypes`` argument provides a way to specify the table column data types or override the existing types. By default the types are either taken from existing types (for ``ndarray`` or ``Table`` input) or auto-generated by the ``numpy.array()`` routine. If ``dtypes`` is provided then it must be a list with the same length as the number of columns. The values must be valid ``numpy.dtype`` initializers or ``None``. Any list elements with value ``None`` fall back to the default type. In the case where `data` is provided as dict of columns, the ``dtypes`` argument must be accompanied by a corresponding ``names`` argument in order to uniquely specify the column ordering. meta """" The ``meta`` argument is simply an object that contains meta-data associated with the table. It is recommended that this object be a dict or OrderedDict_, but the only firm requirement is that it can be copied with the standard library ``copy.deepcopy()`` routine. By default ``meta`` is an empty OrderedDict_. copy """" By default the input ``data`` are copied into a new internal ``np.ndarray`` object in the Table object. In the case where ``data`` is either an ``np.ndarray`` object or an existing ``Table``, it is possible to use a reference to the existing data by setting ``copy=False``. This has the advantage of reducing memory use and being faster. However one should take care because any modifications to the new Table data will also be seen in the original input data. See the `Copy versus Reference`_ section for more information. .. _copy_versus_reference: Copy versus Reference ^^^^^^^^^^^^^^^^^^^^^ Normally when a new |Table| object is created, the input data are *copied* into a new internal array object. This ensures that if the new table elements are modified then the original data will not be affected. However, when creating a table from a numpy ndarray object (structured or homogeneous), it is possible to disable copying so that instead a memory reference to the original data is used. This has the advantage of being faster and using less memory. However, caution must be exercised because the new table data and original data will be linked, as shown below:: >>> arr = np.array([(1, 2.0, 'x'), ... (4, 5.0, 'y')], ... dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'S2')]) >>> arr['a'] # column "a" of the input array array([1, 4]) >>> t = Table(arr, copy=False) >>> t['a'][1] = 99 >>> arr['a'] # arr['a'] got changed when we modified t['a'] array([ 1, 99]) Note that when referencing the data it is not possible to change the data types since that operation requires making a copy of the data. In this case an error occurs:: >>> t = Table(arr, copy=False, dtypes=('f4', 'i4', 'S4')) Traceback (most recent call last): File "", line 2, in File "astropy/table/table.py", line 351, in __init__ raise ValueError('Cannot specify dtypes when copy=False') ValueError: Cannot specify dtypes when copy=False Another caveat in using referenced data is that you cannot add new row to the table. This generates an error because of conflict between the two references to the same underlying memory. Internally, adding a row may involve moving the data to a new memory location which would corrupt the input data object. `numpy` does not allow this:: >>> t.add_row([1, 2, 3]) Traceback (most recent call last): File "", line 1, in File "astropy/table/table.py", line 760, in add_row self._data.resize((newlen,), refcheck=False) ValueError: cannot resize this array: it does not own its data Column and TableColumns classes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two classes, |Column| and |TableColumns|, that are useful when constructing new tables. Column """""" A |Column| object can be created as follows, where in all cases the column ``name`` is required as the first argument and one can optionally provide these values: ``description`` : str Full description of column ``units`` : str Physical units ``format`` : str `Format string`_ for outputting column values ``meta`` : dict Meta-data associated with the column Initialization options '''''''''''''''''''''' The column data values, shape, and data type are specified in one of two ways: **Provide a ``data`` value and optionally a ``dtype`` value** Examples:: col = Column('a', data=[1, 2, 3]) # shape=(3,) col = Column('a', data=[[1, 2], [3, 4]]) # shape=(2, 2) col = Column('a', data=[1, 2, 3], dtype=float) col = Column('a', np.array([1, 2, 3])) col = Column('a', ['hello', 'world']) The ``dtype`` argument can be any value which is an acceptable fixed-size data-type initializer for the numpy.dtype() method. See ``_. Examples include: - Python non-string type (float, int, bool) - Numpy non-string type (e.g. np.float32, np.int64, np.bool) - Numpy.dtype array-protocol type strings (e.g. 'i4', 'f8', 'S15') If no ``dtype`` value is provide then the type is inferred using ``np.array(data)``. When ``data`` is provided then the ``shape`` and ``length`` arguments are ignored. **Provide zero or more of ``dtype``, ``shape``, ``length``** Examples:: col = Column('a') col = Column('a', dtype=int, length=10, shape=(3,4)) The default ``dtype`` is ``np.float64`` and the default ``length`` is zero. The ``shape`` argument is the array shape of a single cell in the column. The default ``shape`` is () which means a single value in each element. .. _table_format_string: Format string ''''''''''''' The format string controls the output of column values when a table or column is printed or written to an ASCII table. The format string can be either "old-style" or "new-style": **Old-style** This corresponds to syntax like ``"%.4f" % value`` as documented in `String formatting operations `_. ``"%.4f"`` to print four digits after the decimal in float format, or ``"%6d"`` to print an integer in a 6-character wide field. **New-style** This corresponds to syntax like ``"{:.4f}".format(value)`` as documented in `format string syntax `_. ``"{:.4f}"`` to print four digits after the decimal in float format, or ``"{:6d}"`` to print an integer in a 6-character wide field. Note that in either case any Python format string that formats exactly one value is valid, so ``{:.4f} angstroms`` or ``Value: %12.2f`` would both work. TableColumns """""""""""" Each |Table| object has an attribute ``columns`` which is an ordered dictionary that stores all of the |Column| objects in the table (see also the `Column`_ section). Technically the ``columns`` attribute is a |TableColumns| object, which is an enhanced ordered dictionary that provides easier ways to select multiple columns. There are a few key points to remember: - A |Table| can be initialized from a |TableColumns| object (copy is always True). - Selecting multiple columns from a |TableColumns| object returns another |TableColumns| object. - Select one column from a |TableColumns| object returns a |Column|. So now look at the ways to select columns from a |TableColumns| object: **Select columns by name** :: >>> t = Table(names=('a', 'b', 'c', 'd')) >>> t.columns['d', 'c', 'b'] **Select columns by index slicing** :: >>> t.columns[0:2] # Select first two columns >>> t.columns[::-1] # Reverse column order **Select column by index or name** :: >>> t.columns[1] # Choose columns by index array([], dtype=float64) >>> t.columns['b'] # Choose column by name array([], dtype=float64)