LeagueCSV

Versions

BOM and CSV encoding

Depending on the software you are using to read your CSV you may need to adjust the CSV document outputting script.

To work as intended you CSV object needs to support the stream filter API

MS Excel on Windows

On Windows, MS Excel expects an UTF-8 encoded CSV with its corresponding BOM character. To fulfill this requirement, you simply need to add the UTF-8 BOM character if needed as explained below:

use League\Csv\Bom;
use League\Csv\Reader;

$reader = Reader::createFromPath('/path/to/my/file.csv', 'r');
//let's set the output BOM
$reader->setOutputBOM(Bom::Utf8);
//let's convert the incoming data from iso-88959-15 to utf-8
$reader->addStreamFilter('convert.iconv.ISO-8859-15/UTF-8');
//BOM detected and adjusted for the output
echo $reader->getContent();

The conversion is done with the iconv extension using its bundled stream filters.

MS Excel on MacOS

On a MacOS system, MS Excel requires a CSV encoded in UTF-16 LE using the tab character as delimiter. Here’s an example on how to meet those requirements using the League\Csv package.

use League\Csv\Bom;
use League\Csv\CharsetConverter;
use League\Csv\Reader;
use League\Csv\Writer;

//the current CSV is ISO-8859-15 encoded with a ";" delimiter
$origin = Reader::createFromPath('/path/to/french.csv', 'r');
$origin->setDelimiter(';');

//let's use stream resource
$writer = Writer::createFromStream(fopen('php://temp', 'r+'));
//let's set the output BOM
$writer->setOutputBOM(Bom::Utf16Le);
//we set the tab as the delimiter character
$writer->setDelimiter("\t");
//let's convert the incoming data from iso-88959-15 to utf-16
CharsetConverter::addTo($writer, 'ISO-8859-15', 'UTF-16');
//we insert csv data
$writer->insertAll($origin);
//all is good let's output the results
$writer->output('mycsvfile.csv');

The conversion is done with the mbstring extension using the League\Csv\CharsetConverter.

Skipping The BOM sequence with the Reader class

Since version 9.9.0.

In order to ensure the correct removal of the sequence and avoid bugs while parsing the CSV, the filter can skip the BOM sequence completely when using the Reader class and convert the CSV content from the BOM sequence encoding charset to UTF-8. To work as intended call the Reader::includeInputBOM method to ensure the default BOM removal behaviour is disabled and add the stream filter to you reader instance using the static method CharsetConverter::skipBOM method;

<?php

use League\Csv\Bom;
use League\Csv\Reader;
use League\Csv\CharsetConverter;

$input = Bom::Utf16Be->value."john,doe,john.doe@example.com\njane,doe,jane.doe@example.com\n";
$document = Reader::createFromString($input);
$document->includeInputBOM(); // de-activate the default skipping mechanism
CharsetConverter::addBOMSkippingTo($document);
var_dump([...$document]);
// returns the document content without the skipped BOM sequence 
// [
//     ['john', 'doe', 'john.doe@example.com'],
//     ['jane', 'doe', 'jane.doe@example.com'],
// ]

Once the filter is applied, the Reader instance looses all information regarding its own BOM sequence. The sequence is still be present but the instance is no longer able to detect it.