BOM sequences
Detecting the BOM sequence
To improve interoperability with programs interacting with CSV, the package now provides an interface ByteSequence
to help you detect the appropriate BOM sequence.
Constants
The ByteSequence
interface provides the following constants :
ByteSequence::BOM_UTF8
: containsUTF-8
BOM
sequence;ByteSequence::BOM_UTF16_BE
: containsUTF-16
BOM
with Big-Endian sequence;ByteSequence::BOM_UTF16_LE
: containsUTF-16
BOM
with Little-Endian sequence;ByteSequence::BOM_UTF32_BE
: containsUTF-32
BOM
with Big-Endian sequence;ByteSequence::BOM_UTF32_LE
: containsUTF-32
BOM
with Little-Endian sequence;
Info::fetchBOMSequence
function League\Csv\Info::fetchBOMSequence(string $str): ?string
The Info::fetchBOMSequence
static method expects a string and returns the BOM sequence found at its start or null otherwise.
use League\Csv\Info;
Info::fetchBOMSequence('hello world!'); //returns null
Info::fetchBOMSequence(Info::BOM_UTF8.'hello world!'); //returns '\xEF\xBB\xBF'
Info::fetchBOMSequence('hello world!'.Info::BOM_UTF16_BE); //returns null
bom_match
function League\Csv\bom_match(string $str): string
The League\Csv\bom_match
function expects a string and returns the BOM sequence found at its start or an empty string otherwise.
use League\Csv\ByteSequence;
use function League\Csv\bom_match;
bom_match('hello world!'); //returns ''
bom_match(ByteSequence::BOM_UTF8.'hello world!'); //returns '\xEF\xBB\xBF'
bom_match('hello world!'.ByteSequence::BOM_UTF16_BE); //returns ''
Managing CSV documents BOM sequence
Detecting the BOM sequence
public AbstractCsv::getInputBOM(void): string
The CSV document current BOM character is detected using the getInputBOM
method. This method returns the currently used BOM character or an empty string if none is found or recognized. The detection is done using the Info::fetchBOMSequence
static method.
use League\Csv\Reader;
$csv = Reader::createFromPath('/path/to/file.csv');
$bom = $csv->getInputBOM();
Setting the outputted BOM sequence
public AbstractCsv::setOutputBOM(string $sequence): self
public AbstractCsv::getOutputBOM(void): string
setOutputBOM
: sets the outputting BOM you want your CSV to be associated with.getOutputBOM
: gets the outputting BOM you want your CSV to be associated with.
use League\Csv\Reader;
$csv = Reader::createFromPath('/path/to/file.csv', 'r');
$csv->setOutputBOM(ByteSequence::BOM_UTF8);
$bom = $csv->getOutputBOM(); //returns "\xEF\xBB\xBF"
Controlling Input BOM usage
If your document contains a BOM sequence the following methods control its presence when processing it.
AbstractCsv::skipInputBOM(): self;
AbstractCsv::includeInputBOM(): self;
AbstractCsv::isInputBOMIncluded(): bool;
skipInputBOM
: enables skipping the input BOM from your CSV document.includeInputBOM
: preserves the input BOM from your CSV document while accessing its content.isInputBOMIncluded
: tells whether skipping or including the input BOM will be done.
If your document does not contain any BOM sequence you can speed up the CSV iterator by preserving its presence, which means that no operation to detect and remove it if present will take place.
$raw_csv = Reader::BOM_UTF8."john,doe,john.doe@example.com\njane,doe,jane.doe@example.com\n";
$csv = Reader::createFromString($raw_csv);
$csv->setOutputBOM(Reader::BOM_UTF16_BE);
$csv->includeInputBOM();
ob_start();
$csv->output();
$document = ob_get_clean();
The returned $document
will contain 2 BOM markers instead of one.