Charset conversion
The CharsetConverter
class converts your CSV records using the mbstring
extension and its supported character encodings.
Settings
public CharsetConverter::inputEncoding(string $input_encoding): self
public CharsetConverter::outputEncoding(string $output_encoding): self
The inputEncoding
and outputEncoding
methods set the object encoding properties. By default, the input encoding and the output encoding are set to UTF-8
.
When building a CharsetConverter
object, the methods do not need to be called in any particular order, and may be called multiple times. Because the CharsetConverter
is immutable, each time its setter methods are called they return a new object without modifying the current one.
Conversion
public CharsetConverter::convert(iterable $records): iterable
CharsetConverter::convert
converts the collection of records according to the encoding settings.
use League\Csv\CharsetConverter;
$csv = new SplFileObject('/path/to/french.csv', 'r');
$csv->setFlags(SplFileObject::READ_CSV | SplFileObject::SKIP_EMPTY);
$encoder = (new CharsetConverter())->inputEncoding('iso-8859-15');
$records = $encoder->convert($csv);
The resulting data is converted from iso-8859-15
to the default UTF-8
since no output encoding charset was set using the CharsertConverter::outputEncoding
method.
CharsetConverter as a Writer formatter
public CharsetConverter::__invoke(array $record): array
Using the CharsetConverter::__invoke
method, you can register a CharsetConverter
object as a record formatter using the Writer::addFormatter method.
use League\Csv\CharsetConverter;
use League\Csv\Writer;
$encoder = (new CharsetConverter())
->inputEncoding('utf-8')
->outputEncoding('iso-8859-15')
;
$writer = Writer::createFromPath('/path/to/your/csv/file.csv');
$writer->addFormatter($encoder);
$writer->insertOne(["foo", "bébé", "jouet"]);
//all 'utf-8' characters are now automatically encoded into 'iso-8859-15' charset
CharsetConverter as a PHP stream filter
public static CharsetConverter::addTo(AbstractCsv $csv, string $input_encoding, string $output_encoding): AbstractCsv
public static CharsetConverter::register(): void
public static CharsetConverter::getFiltername(string $input_encoding, string $output_encoding): string
Usage with CSV objects
If your CSV object supports PHP stream filters then you can use the CharsetConverter
class as a PHP stream filter using the library stream filtering mechanism instead.
The CharsetConverter::addTo
static method:
- registers the
CharsetConverter
class under the generic filternameconvert.league.csv.*
if it is not registered yet; - configures the stream filter using the supplied parameters;
- adds the configured stream filter to the submitted CSV object;
use League\Csv\CharsetConverter;
use League\Csv\Writer;
$writer = Writer::createFromPath('/path/to/your/csv/file.csv');
CharsetConverter::addTo($writer, 'utf8', 'iso-8859-15');
$writer->insertOne(["foo", "bébé", "jouet"]);
//all 'utf-8' characters are now automatically encoded into 'iso-8859-15' charset
Usage with PHP stream resources
To use this stream filter outside League\Csv
objects you need to:
- register the stream filter using
CharsetConverter::register
method. - use
CharsetConverter::getFiltername
with one of PHP’s attaching stream filter functions with the correct arguments as shown below:
use League\Csv\CharsetConverter;
CharsetConverter::register();
$resource = fopen('/path/to/my/file', 'r');
$filter = stream_filter_append(
$resource,
CharsetConverter::getFiltername('utf-8', 'iso-8859-15'),
STREAM_FILTER_READ
);
while (false !== ($record = fgetcsv($resource))) {
//$record is correctly encoded
}
When not mentioned, PHP will register the stream filter twice as a stream filter that can be used on read and as a stream filter that can be used on write. This behaviour may introduce subtle issues if you are not aware of that behaviour. To avoid such scenario we are introducing the following more strict methods:
CharsetConverter::appendOnReadTo
,CharsetConverter::appendOnWriteTo
,CharsetConverter::prependOnReadTo
,CharsetConverter::prependOnWriteTo
To better convey when the conversion will happen. So if you only want to convert the resource on read you should use the following snippet
use League\Csv\CharsetConverter;
$resource = fopen('/path/to/my/file', 'r+');
$filter = CharsetConverter::appendOnReadTo($resource, 'utf-8', 'iso-8859-15');
echo stream_get_contents($resource); // the return string is converted from 'utf-8' to 'iso-8859-15'