Summary:
- Minimum supported PHP version increased from 5.6 to 7.1.
- Full support for PHP 8.1.
- Iterator adjustments. (Details further below.)
- General code cleanup and bugfixes.
Breaking changes:
- Type hints from PHP 7.0 and 7.1 were added. Take note of this if you happen to extend any of the reader's classes in your code.
- Trying to iterate through a document without first calling open() will now throw an exception.
- The key of each row is 1-based now. This is to be in alignment with the values of the "r" attribute in the actual XLSX document, which this method represents.
- The count() method was removed, as it didn't provide the intended functionality. Its actual functionality, which is to count how many rows were read so far, can be easily emulated by incrementing a counter variable within the iteration loop.
- Exception messages and -types in case of errors were adjusted. If you rely on their exact types/wording in any way, make sure to adjust your exception handling accordingly.
- The methods setDecimalSeparator() and setThousandsSeparator() were removed. (This change was already communicated in a previous version, but not completely enforced yet.)
- indexFromColumnLetter() no longer returns false on error. An exception is thrown instead.
Breaking changes for code addressing SharedStrings directly:
- SharedStrings now manages its own temporary files. Manual management from the outer scope is no longer necessary.
- Attempting to read SharedStrings data after closing the SharedStrings instance will now throw an exception.
- setHandleCurrentIndex() and setCount() have been removed from SharedStringsOptimizedFile, as they served no real function.
Non-breaking changes:
- Calling close() now properly cleans up unnecessary resources from the reader that may have an impact on its memory consumption.
- Documentation improvements.
Iterator adjustments:
The Iterator interface allows iteration through the document by using a foreach on the Reader instance. The previous implementation of the reader did not follow the Iterator interface rules correctly. The adjustments in this update rectify this. As a result, take note of the following changes:
If you're using foreach on the Reader instance to read the document contents (like the example code in the readme):
- The key of each element now represents the actual row number, which, in XLSX, starts counting at 1. (Previous versions started at 0.)
If you're calling the methods current() or key() directly:
- Do not call current() or key() without first checking the return value of valid(). Trying to access invalid positions will now throw an exception.
- current() and key() both start at the first position now, regardless of the order in which they are called.
- key() is 1-based now. This is to be in alignment with the values of the "r" attribute in the actual XLSX document, which this method represents.
- Fixed an issue that made returnUnformatted overrule all configuration options for date/time values.
Breaking changes:
- next() no longer returns the current row. Use current() instead.
- SkipEmptyCells needs to be supplied as a ReaderSkipConfiguration constant now.
Non-breaking changes:
- New configuration option "SkipEmptyRows". Use it to exclude either all empty rows or all empty rows at the end of the document from the output. Use ReaderSkipConfiguration values to configure it.
- Configuration option "SkipEmptyCells" can now be configured to only skip trailing empty cells.
- Added support for scientific notation format.
- Fraction formatting support was enhanced.
- Fixed: "General" format does not output values as decimal, if they are stored using scientific notation internally.
- Fixed: Assorted edge cases in number formatting.
- Added notes to documentation of "ReturnUnformatted" and "ReturnPercentageDecimal" about possible gotchas.
- Internal refactorings.
Breaking changes:
- Reader configuration options must now be supplied to the Reader constructor via a ReaderConfiguration instance. Supplying configuration options via an array is no longer supported.
- When the "ReturnUnformatted" option is set, percentage values are now returned as strings instead of numbers. This aligns their behavior with that of other values.
- setDecimalSeparator() and setThousandsSeparator() methods have been removed, as they no longer had any function.
- Forced date/time format '' (empty string) gets interpreted correctly now.
Non-breaking changes:
- New configuration option "ReturnPercentageDecimal". When set to true, percentage values will be returned using their technical, internal representation ('50%' => '0.5') rather than how they are displayed within a document ('50%' => '50').
- Remove unnecessary restriction of custom formats to predetermined formats from the official specification documents.
- SharedStringsConfiguration calls can now be chained.
- Fix potential resource leaks caused by not closing reader instances.
- Update README.md to reflect the current code state.
- Added support for empty rows with attributes (or: self-closing row tags).
- Minor improvement of test handling.
- Added support for multi-range row span values, fixing issues caused by sheets that use them.
Breaking changes:
- Public-facing method "setCurrencyCode" has been removed, as the currency_code value had no effect to begin with.
Non-breaking changes:
- New configuration option "ReturnUnformatted". If set to true, cell values will be returned without number formatting applied. (Note: Date/Time values are still controlled by the "ReturnDateTimeObjects" option.)
- Number format parsing has been improved. The reader is now capable of parsing more complex number formats.
- General format now outputs cell values as-is, instead of attempting to cast them to a float.
- Fixed issues regarding negative date/time values, causing very early date definitions to lead to unexpected errors.
- Fixed number formatting not being applied in all expected cases.
- Fixed a bug that caused empty shared strings to be treated incorrectly under certain conditions.
- Fixed a bug that caused cell formats making use of currency strings and language ids to break.
- Fixed a "continue in switch" warning in PHP 7.3.
- Added the option to use alphabetical column names (A, B, AA, ZX) instead of numeric indexes in returned row contents, using the parameter "OutputColumnNames".
- Fixed a bug that caused leading zeros in text cell content to get removed if the cell was set to text via an apostrophe prefix.
- Fixed an issue that prevented empty rows from being properly output in all appropriate cases.
- Fixed an issue that caused format parsing to cease working for some files.
- New configuration parameters to control automatic re-formatting of found Date/Time values: forceDateFormat, forceTimeFormat, forceDateTimeFormat
- Improved handling of potential errors when working with subdirectories of the configured temporary directory
- Fixed composer.json lacking ext-xmlreader requirement
- Improved support for different XLSX file generators:
- Improved awareness of XML namespaces.
- Improved support for newer OOXML editions:
- Namespace URIs from newer versions of the OOXML standard are now recognized and handled accordingly.
- Dropped requirement for SimpleXMLElement.
- Minor improvements in handling used document resources.
- Bugfix: Check if current row, that is to be read, is also the one which the read() function takes, return empty row if not.
- Bugfix: differentiate between internal sheet ID and positioning ordering of the sheet within the document
- Removed unneccessary test files.
- Minor code quality improvements.
- Initial fork of the original library. Only the XLSX-relevant parts of the code were inherited, the rest removed.
- Added option 'SkipEmptyCells' in order to consider or not possible empty values in cells.
- Added option 'CustomFormats' to define and overwrite format values.
- Ensure deletion of temporary files after run.
- Fix: MAP Toolkit xlsx files can be parsed.
- PHP 7 compliance.
- Allow configuration of locale based values.
- Include PHPUnit and tests for iterator, file location, shared strings, sheet handling, namespaces and temporary directories handling.
- Major structural refactoring and appliance of PSR1, PSR2 and PSR4 (namespace directory structure)
- Added a special case for cells formatted as text in XLSX. Previously leading zeros would get truncated if a text cell contained only numbers.
- Implemented SeekableIterator. Thanks to paales for suggestion (Issue #54 and Pull request #55).
- Fixed a bug in CSV and ODS reading where reading position 0 multiple times in a row would result in internal pointer being advanced and reading the next line. (E.g. reading row #0 three times would result in rows #0, #1, and #2.). This could have happened on multiple calls to
current()
while in #0 position, or calls toseek(0)
andcurrent()
.
- Pull request #85: Fixed an index check. (Thanks to pa-m).
- Issue #50: Fixed an XLSX rewind issue. (Thanks to osuwariboy)
- Issue #52, #53: Apache POI compatibility for XLSX. (Thanks to dimapashkov)
- Issue #61: Autoload fix in the main class. (Thanks to i-bash)
- Issue #60, #69, #72: Fixed an issue where XLSX changeSheet may not work. (Thanks to jtresponse, osuwariboy)
- Issue #70: Added a check for constructor parameter correctness.
- Attempt to replicate Excel's "General" format in XLSX files that is applied to otherwise unformatted cells. Currently only decimal number values are converted to PHP's floats.
- Fix for formulas being returned along with values in XLSX files. (Thanks to marktag)
- Fix for macro sheets appearing when parsing XLS files. (Thanks to osuwariboy)
- Fix for a PHP warning that occurs with completely empty sheets in XLS files.
- XLSM (macro-enabled XLSX) files are recognized and read, too.
- composer.json file is added to the repository (thanks to matej116)
- Fix for repeated columns in ODS files not reading correctly (thanks to etfb)
- Fix for filename extension reading (Thanks to osuwariboy)
- A fix for the case when row count wasn't read correctly from the sheet in a XLS file.
-
Fixed file type choice when using mime-types (previously there were problems with
XLSX and ODS mime-types) (Thanks to incratec) -
Fixed an error in XLSX iterator where
current()
would advance the iterator forward
with each call. (Thanks to osuwariboy)
-
Multiple sheet reading is now supported:
- The
getSheets()
method lets you retrieve a list of all sheets present in the file. changeSheet($Index)
method changes the sheet in the reader to the one specified.
- The
-
Previously temporary files that were extracted, were deleted after the SpreadsheetReader
was destroyed but the empty directories remained. Now those are cleaned up as well.
- Bugfix for shared string caching in XLSX files. When the shared string count was larger
than the caching limit, instead of them being read from file, empty strings were returned.
- XLS file reading relies on the external Spreadsheet_Excel_Reader class which, by default,
reads additional information about cells like fonts, styles, etc. Now that is disabled
to save some memory since the style data is unnecessary anyway.
(Thanks to ChALkeR for the tip.)
Martins Pilsetnieks pilsetnieks@gmail.com