LeagueCSV

Versions

BOM sequences

Detecting the BOM sequence

To improve interoperability with programs interacting with CSV, the package now provides an interface ByteSequence to help you detect the appropriate BOM sequence.

Constants

The ByteSequence interface provides the following constants :

Info::fetchBOMSequence

function League\Csv\Info::fetchBOMSequence(string $str): ?string

The Info::fetchBOMSequence static method expects a string and returns the BOM sequence found at its start or null otherwise.

use League\Csv\Info;

Info::fetchBOMSequence('hello world!'); //returns null
Info::fetchBOMSequence(Info::BOM_UTF8.'hello world!'); //returns '\xEF\xBB\xBF'
Info::fetchBOMSequence('hello world!'.Info::BOM_UTF16_BE); //returns null

This

bom_match

Since version 9.7 this function is deprecated and you are encouraged to use Info::fetchBOMSequence instead.

function League\Csv\bom_match(string $str): string

The League\Csv\bom_match function expects a string and returns the BOM sequence found at its start or an empty string otherwise.

use League\Csv\ByteSequence;
use function League\Csv\bom_match;

bom_match('hello world!'); //returns ''
bom_match(ByteSequence::BOM_UTF8.'hello world!'); //returns '\xEF\xBB\xBF'
bom_match('hello world!'.ByteSequence::BOM_UTF16_BE); //returns ''

Managing CSV documents BOM sequence

Detecting the BOM sequence

public AbstractCsv::getInputBOM(void): string

The CSV document current BOM character is detected using the getInputBOM method. This method returns the currently used BOM character or an empty string if none is found or recognized. The detection is done using the bom_match function.

use League\Csv\Writer;

$csv = Writer::createFromPath('/path/to/file.csv');
$bom = $csv->getInputBOM();

Setting the outputted BOM sequence

public AbstractCsv::setOutputBOM(string $sequence): self
public AbstractCsv::getOutputBOM(void): string

All connections classes implement the ByteSequence interface.

use League\Csv\Reader;

$csv = Reader::createFromPath('/path/to/file.csv', 'r');
$csv->setOutputBOM(Reader::BOM_UTF8);
$bom = $csv->getOutputBOM(); //returns "\xEF\xBB\xBF"

The default output BOM character is set to an empty string.

The output BOM sequence is never saved to the CSV document.

Controlling Input BOM usage

Since version 9.4.0.

If your document contains a BOM sequence by the following methods control its presence when processing it.

AbstractCsv::skipInputBOM(): self;
AbstractCsv::includeInputBOM(): self;
AbstractCsv::isInputBOMIncluded(): bool;

By default and to avoid BC Break, the Input BOM, if present, is skipped.

If your document does not contains any BOM sequence you can speed up the CSV iterator by preserving its presence which means that no operation to detect and remove it if present will take place.

$raw_csv = Reader::BOM_UTF8."john,doe,john.doe@example.com\njane,doe,jane.doe@example.com\n";
$csv = Reader::createFromString($raw_csv);
$csv->setOutputBOM(Reader::BOM_UTF16_BE);
$csv->includeInputBOM();
ob_start();
$csv->output();
$document = ob_get_clean();

the returned $document will contains 2 BOM marker instead of one.

If you are using a stream that can not be seekable you should disabled BOM skipping otherwise an Exception will be triggered.

The BOM sequence is never removed from the CSV document, it is only skipped from the result set.