中文分词和中文搜索 xunsearch

安装和使用

一、安装

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# wget http://www.xunsearch.com/download/xunsearch-full-latest.tar.bz2
--2018-12-20 14:05:23--  http://www.xunsearch.com/download/xunsearch-full-latest.tar.bz2
Resolving www.xunsearch.com (www.xunsearch.com)... 202.75.216.233
Connecting to www.xunsearch.com (www.xunsearch.com)|202.75.216.233|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10302627 (9.8M) [text/plain]
Saving to: ‘xunsearch-full-latest.tar.bz2’

xunsearch-full-latest.tar.b 100%[=========================================>]   9.83M  10.3MB/s    in 1.0s    

2018-12-20 14:05:24 (10.3 MB/s) - ‘xunsearch-full-latest.tar.bz2’ saved [10302627/10302627]

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# ll
total 55012
drwxr-xr-x  7 zheng zheng     4096 Dec 20 14:05 ./
drwxr-xr-x  5 root       root           4096 Aug 12 15:12 ../
-rw-r--r--  1 root       root       10302627 Nov 16 19:22 xunsearch-full-latest.tar.bz2
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# tar -xjf xunsearch-full-latest.tar.bz2 
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# ll
total 55016
drwxr-xr-x  8 zheng zheng     4096 Dec 20 14:05 ./
drwxr-xr-x  5 root       root           4096 Aug 12 15:12 ../
drwxr-xr-x  3        501 staff          4096 Nov 16 19:16 xunsearch-full-1.4.12/
-rw-r--r--  1 root       root       10302627 Nov 16 19:22 xunsearch-full-latest.tar.bz2
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# cd xunsearch-full-1.4.12/
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# ll
total 40
drwxr-xr-x 3        501 staff       4096 Nov 16 19:16 ./
drwxr-xr-x 8 zheng zheng  4096 Dec 20 14:05 ../
-rw-r--r-- 1        501 staff        120 Dec 31  2016 ._.DS_Store
-rw-r--r-- 1        501 staff       6148 Dec 31  2016 .DS_Store
drwxr-xr-x 2        501 staff       4096 Nov 16 19:21 packages/
-rw-r--r-- 1        501 staff       2937 Dec  5  2014 README.md
-rwxr-xr-x 1        501 staff      11165 Oct 16 13:10 setup.sh*
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# sh setup.sh 

+==========================================+
| Welcome to setup xunsearch(full)         |
| 欢迎使用 xunsearch (完整版) 安装程序     |
+------------------------------------------+
| Follow the on-screen instructions please |
| 请按照屏幕上的提示操作以完成安装         |
+==========================================+

Please specify the installation directory
请指定安装目录 (默认为中括号内的值)
[/usr/local/xunsearch]:setup.sh: 111: read: Illegal option -e
[/usr/local/xunsearch]:

Confirm the installation directory
请确认安装目录:/usr/local/xunsearch [Y/n]y

Checking scws ... no
Installing scws (1.2.3) ... 
Extracting scws package ...
Configuring scws ...
Compiling & installing scws ...
Checking scws dict ... no
Extracting scws dict file ... 
Checking libuuid ... no, try to install it
Extracting libuuid package ...
Configuring libuuid ...
Compiling & installing libuuid ...
Checking xapian-core-scws ... no
Installing xapian-core-scws (1.4.9) ... 
Extracting xapian-core-scws package ...
Configuring xapian-core-scws ...
Compiling & installing xapian-core-scws ...
Checking libevent ... no
Installing libevent (2.0.21-stable) ... 
Extracting libevent package ...
Configuring libevent ...
Compiling & installing libevent ...
Extracting xunsearch package (1.4.12) ...
Configuring xunsearch ...
Compiling & installing xunsearch ...
Cleaning ... done

+=================================================+
| Installation completed successfully, Thanks you |
| 安装成功,感谢选择和使用 xunsearch              |
+-------------------------------------------------+
| 说明和注意事项:                                |
| 1. 开启/重新开启 xunsearch 服务程序,命令如下: |
|    /usr/local/xunsearch/bin/xs-ctl.sh restart
|    强烈建议将此命令写入服务器开机脚本中         |
|                                                 |
| 2. 所有的索引数据将被保存在下面这个目录中:     |
|    /usr/local/xunsearch/data
|    如需要转移到其它目录,请使用软链接。         |
|                                                 |
| 3. 您现在就可以在我们提供的开发包(SDK)基础上    |
|    开发您自己的搜索了。                         |
|    目前只支持 PHP 语言,参见下面文档:          |
|    /usr/local/xunsearch/sdk/php/README
+=================================================+

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# /usr/local/xunsearch/bin/xs-ctl.sh restart
WARNING: no server[xs-indexd] is running (BIND:127.0.0.1:8383)
INFO: re-starting server[xs-indexd] ... (BIND:127.0.0.1:8383)
WARNING: no server[xs-searchd] is running (BIND:127.0.0.1:8384)
INFO: re-starting server[xs-searchd] ... (BIND:127.0.0.1:8384)
root@iZbp1bdm1m8u064ukgg8zwZ:/home/wolonggang/xunsearch-full-1.4.12#

xunsearch PHP-SDK 使用

$xs = new \XS('demo');
$tokenizer = new \XSTokenizerScws;

// 添加分词词典, 支持 TXT/XDB 格式
// $tokenizer = $tokenizer->addDict();

// 获取分词结果
// $tokenizer = $tokenizer->getResult();

// XSTokenizer 接口
// $tokenizer = $tokenizer->getTokens();

// 获取重要词统计结果
// $tokenizer = $tokenizer->getTops();

// 获取 scws 版本号
// $tokenizer = $tokenizer->getVersion();

// 判断是否包含指定词性的词
// $tokenizer = $tokenizer->hasWord();

// 设置字符集
// $tokenizer = $tokenizer->setCharset();

// 设置分词词典, 支持 TXT/XDB 格式
// $tokenizer = $tokenizer->setDict();

// 设置散字二元组合
// $tokenizer = $tokenizer->setDuality();

// 设置忽略标点符号
$tokenizer = $tokenizer->setIgnore();

// 设置复合分词选项
// $tokenizer = $tokenizer->setMulti();

$text = '清华大学怎么样?';
$words1 = $tokenizer->getResult($text, 3);
$words2 = $tokenizer->getTops($text, 3);

print_r("<pre>");
print_r($tokenizer);
print_r($words1);
print_r($words2);
exit;

How to Read Big Files with PHP

It’s not often that we, as PHP developers, need to worry about memory management. The PHP engine does a stellar job of cleaning up after us, and the web server model of short-lived execution contexts means even the sloppiest code has no long-lasting effects.

Measuring Success

The only way to be sure we’re making any improvement to our code is to measure a bad situation and then compare that measurement to another after we’ve applied our fix. In other words, unless we know how much a “solution” helps us (if at all), we can’t know if it really is a solution or not.

There are two metrics we can care about. The first is CPU usage. How fast or slow is the process we want to work on? The second is memory usage. How much memory does the script take to execute? These are often inversely proportional — meaning that we can offload memory usage at the cost of CPU usage, and vice versa.

In an asynchronous execution model (like with multi-process or multi-threaded PHP applications), both CPU and memory usage are important considerations. In traditional PHP architecture, these generally become a problem when either one reaches the limits of the server.

It’s impractical to measure CPU usage inside PHP. If that’s the area you want to focus on, consider using something like top, on Ubuntu or macOS. For Windows, consider using the Linux Subsystem, so you can use top in Ubuntu.

For the purposes of this tutorial, we’re going to measure memory usage. We’ll look at how much memory is used in “traditional” scripts. We’ll implement a couple of optimization strategies and measure those too. In the end, I want you to be able to make an educated choice.

The methods we’ll use to see how much memory is used are:

// formatBytes is taken from the php.net documentation

memory_get_peak_usage();

function formatBytes($bytes, $precision = 2) {
    $units = array("b", "kb", "mb", "gb", "tb");

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);

    $bytes /= (1 << (10 * $pow));

    return round($bytes, $precision) . " " . $units[$pow];
}

We’ll use these functions at the end of our scripts, so we can see which script uses the most memory at one time.

What Are Our Options?

There are many approaches we could take to read files efficiently. But there are also two likely scenarios in which we could use them. We could want to read and process data all at the same time, outputting the processed data or performing other actions based on what we read. We could also want to transform a stream of data without ever really needing access to the data.

Let’s imagine, for the first scenario, that we want to be able to read a file and create separate queued processing jobs every 10,000 lines. We’d need to keep at least 10,000 lines in memory, and pass them along to the queued job manager (whatever form that may take).

For the second scenario, let’s imagine we want to compress the contents of a particularly large API response. We don’t care what it says, but we need to make sure it’s backed up in a compressed form.

In both scenarios, we need to read large files. In the first, we need to know what the data is. In the second, we don’t care what the data is. Let’s explore these options…

Reading Files, Line By Line

There are many functions for working with files. Let’s combine a few into a naive file reader:

// from memory.php

function formatBytes($bytes, $precision = 2) {
    $units = array("b", "kb", "mb", "gb", "tb");

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);

    $bytes /= (1 << (10 * $pow));

    return round($bytes, $precision) . " " . $units[$pow];
}

print formatBytes(memory_get_peak_usage());
// from reading-files-line-by-line-1.php

function readTheFile($path) {
    $lines = [];
    $handle = fopen($path, "r");

    while(!feof($handle)) {
        $lines[] = trim(fgets($handle));
    }

    fclose($handle);
    return $lines;
}

readTheFile("shakespeare.txt");

require "memory.php";

We’re reading a text file containing the complete works of Shakespeare. The text file is about 5.5MB, and the peak memory usage is 12.8MB. Now, let’s use a generator to read each line:

// from reading-files-line-by-line-2.php

function readTheFile($path) {
    $handle = fopen($path, "r");

    while(!feof($handle)) {
        yield trim(fgets($handle));
    }

    fclose($handle);
}

readTheFile("shakespeare.txt");

require "memory.php";

The text file is the same size, but the peak memory usage is 393KB. This doesn’t mean anything until we do something with the data we’re reading. Perhaps we can split the document into chunks whenever we see two blank lines. Something like this:

// from reading-files-line-by-line-3.php

$iterator = readTheFile("shakespeare.txt");

$buffer = "";

foreach ($iterator as $iteration) {
    preg_match("/\n{3}/", $buffer, $matches);

    if (count($matches)) {
        print ".";
        $buffer = "";
    } else {
        $buffer .= $iteration . PHP_EOL;
    }
}

require "memory.php";

Any guesses how much memory we’re using now? Would it surprise you to know that, even though we split the text document up into 1,216 chunks, we still only use 459KB of memory? Given the nature of generators, the most memory we’ll use is that which we need to store the largest text chunk in an iteration. In this case, the largest chunk is 101,985 characters.

I’ve already written about the performance boosts of using generators and Nikita Popov’s Iterator library, so go check that out if you’d like to see more!

Generators have other uses, but this one is demonstrably good for performant reading of large files. If we need to work on the data, generators are probably the best way.

Piping Between Files

In situations where we don’t need to operate on the data, we can pass file data from one file to another. This is commonly called piping (presumably because we don’t see what’s inside a pipe except at each end … as long as it’s opaque, of course!). We can achieve this by using stream methods. Let’s first write a script to transfer from one file to another, so that we can measure the memory usage:

// from piping-files-1.php

file_put_contents(
    "piping-files-1.txt", file_get_contents("shakespeare.txt")
);

require "memory.php";

Unsurprisingly, this script uses slightly more memory to run than the text file it copies. That’s because it has to read (and keep) the file contents in memory until it has written to the new file. For small files, that may be okay. When we start to use bigger files, no so much…

Let’s try streaming (or piping) from one file to another:

// from piping-files-2.php

$handle1 = fopen("shakespeare.txt", "r");
$handle2 = fopen("piping-files-2.txt", "w");

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

This code is slightly strange. We open handles to both files, the first in read mode and the second in write mode. Then we copy from the first into the second. We finish by closing both files again. It may surprise you to know that the memory used is 393KB.

That seems familiar. Isn’t that what the generator code used to store when reading each line? That’s because the second argument to fgets specifies how many bytes of each line to read (and defaults to -1 or until it reaches a new line).

The third argument to stream_copy_to_stream is exactly the same sort of parameter (with exactly the same default). stream_copy_to_stream is reading from one stream, one line at a time, and writing it to the other stream. It skips the part where the generator yields a value, since we don’t need to work with that value.

Piping this text isn’t useful to us, so let’s think of other examples which might be. Suppose we wanted to output an image from our CDN, as a sort of redirected application route. We could illustrate it with code resembling the following:

// from piping-files-3.php

file_put_contents(
    "piping-files-3.jpeg", file_get_contents(
        "https://github.com/assertchris/uploads/raw/master/rick.jpg"
    )
);

// ...or write this straight to stdout, if we don't need the memory info

require "memory.php";

Imagine an application route brought us to this code. But instead of serving up a file from the local file system, we want to get it from a CDN. We may substitute file_get_contents for something more elegant (like Guzzle), but under the hood it’s much the same.

The memory usage (for this image) is around 581KB. Now, how about we try to stream this instead?

// from piping-files-4.php

$handle1 = fopen(
    "https://github.com/assertchris/uploads/raw/master/rick.jpg", "r"
);

$handle2 = fopen(
    "piping-files-4.jpeg", "w"
);

// ...or write this straight to stdout, if we don't need the memory info

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

The memory usage is slightly less (at 400KB), but the result is the same. If we didn’t need the memory information, we could just as well print to standard output. In fact, PHP provides a simple way to do this:

$handle1 = fopen(
    "https://github.com/assertchris/uploads/raw/master/rick.jpg", "r"
);

$handle2 = fopen(
    "php://stdout", "w"
);

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

// require "memory.php";

Other Streams

There are a few other streams we could pipe and/or write to and/or read from:

  • php://stdin (read-only)
  • php://stderr (write-only, like php://stdout)
  • php://input (read-only) which gives us access to the raw request body
  • php://output (write-only) which lets us write to an output buffer
  • php://memory and php://temp (read-write) are places we can store data temporarily. The difference is that php://temp will store the data in the file system once it becomes large enough, while php://memory will keep storing in memory until that runs out.

Filters

There’s another trick we can use with streams called filters. They’re a kind of in-between step, providing a tiny bit of control over the stream data without exposing it to us. Imagine we wanted to compress our shakespeare.txt. We might use the Zip extension:

// from filters-1.php

$zip = new ZipArchive();
$filename = "filters-1.zip";

$zip->open($filename, ZipArchive::CREATE);
$zip->addFromString("shakespeare.txt", file_get_contents("shakespeare.txt"));
$zip->close();

require "memory.php";

This is a neat bit of code, but it clocks in at around 10.75MB. We can do better, with filters:

// from filters-2.php

$handle1 = fopen(
    "php://filter/zlib.deflate/resource=shakespeare.txt", "r"
);

$handle2 = fopen(
    "filters-2.deflated", "w"
);

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

Here, we can see the php://filter/zlib.deflate filter, which reads and compresses the contents of a resource. We can then pipe this compressed data into another file. This only uses 896KB.

I know this is not the same format, or that there are upsides to making a zip archive. You have to wonder though: if you could choose the different format and save 12 times the memory, wouldn’t you?

To uncompress the data, we can run the deflated file back through another zlib filter:

// from filters-2.php

file_get_contents(
    "php://filter/zlib.inflate/resource=filters-2.deflated"
);

Streams have been extensively covered in “Understanding Streams in PHP” and “Using PHP Streams Effectively”. If you’d like a different perspective, check those out!

Customizing Streams

fopen and file_get_contents have their own set of default options, but these are completely customizable. To define them, we need to create a new stream context:

// from creating-contexts-1.php

$data = join("&", [
    "twitter=assertchris",
]);

$headers = join("\r\n", [
    "Content-type: application/x-www-form-urlencoded",
    "Content-length: " . strlen($data),
]);

$options = [
    "http" => [
        "method" => "POST",
        "header"=> $headers,
        "content" => $data,
    ],
];

$context = stream_content_create($options);

$handle = fopen("https://example.com/register", "r", false, $context);
$response = stream_get_contents($handle);

fclose($handle);

In this example, we’re trying to make a POST request to an API. The API endpoint is secure, but we still need to use the http context property (as is used for http and https). We set a few headers and open a file handle to the API. We can open the handle as read-only since the context takes care of the writing.

There are loads of things we can customize, so it’s best to check out the documentation if you want to know more.

Making Custom Protocols and Filters

Before we wrap things up, let’s talk about making custom protocols. If you look at the documentation, you can find an example class to implement:

Protocol {
    public resource $context;
    public __construct ( void )
    public __destruct ( void )
    public bool dir_closedir ( void )
    public bool dir_opendir ( string $path , int $options )
    public string dir_readdir ( void )
    public bool dir_rewinddir ( void )
    public bool mkdir ( string $path , int $mode , int $options )
    public bool rename ( string $path_from , string $path_to )
    public bool rmdir ( string $path , int $options )
    public resource stream_cast ( int $cast_as )
    public void stream_close ( void )
    public bool stream_eof ( void )
    public bool stream_flush ( void )
    public bool stream_lock ( int $operation )
    public bool stream_metadata ( string $path , int $option , mixed $value )
    public bool stream_open ( string $path , string $mode , int $options ,
        string &$opened_path )
    public string stream_read ( int $count )
    public bool stream_seek ( int $offset , int $whence = SEEK_SET )
    public bool stream_set_option ( int $option , int $arg1 , int $arg2 )
    public array stream_stat ( void )
    public int stream_tell ( void )
    public bool stream_truncate ( int $new_size )
    public int stream_write ( string $data )
    public bool unlink ( string $path )
    public array url_stat ( string $path , int $flags )
}

We’re not going to implement one of these, since I think it is deserving of its own tutorial. There’s a lot of work that needs to be done. But once that work is done, we can register our stream wrapper quite easily:

if (in_array("highlight-names", stream_get_wrappers())) {
    stream_wrapper_unregister("highlight-names");
}

stream_wrapper_register("highlight-names", "HighlightNamesProtocol");

$highlighted = file_get_contents("highlight-names://story.txt");

Similarly, it’s also possible to create custom stream filters. The documentation has an example filter class:

Filter {
    public $filtername;
    public $params
    public int filter ( resource $in , resource $out , int &$consumed ,
        bool $closing )
    public void onClose ( void )
    public bool onCreate ( void )
}

This can be registered just as easily:

$handle = fopen("story.txt", "w+");
stream_filter_append($handle, "highlight-names", STREAM_FILTER_READ);

highlight-names needs to match the filtername property of the new filter class. It’s also possible to use custom filters in a php://filter/highligh-names/resource=story.txt string. It’s much easier to define filters than it is to define protocols. One reason for this is that protocols need to handle directory operations, whereas filters only need to handle each chunk of data.

If you have the gumption, I strongly encourage you to experiment with creating custom protocols and filters. If you can apply filters to stream_copy_to_streamoperations, your applications are going to use next to no memory even when working with obscenely large files. Imagine writing a resize-image filter or and encrypt-for-application filter.

Summary

Though this isn’t a problem we frequently suffer from, it’s easy to mess up when working with large files. In asynchronous applications, it’s just as easy to bring the whole server down when we’re not careful about memory usage.

This tutorial has hopefully introduced you to a few new ideas (or refreshed your memory about them), so that you can think more about how to read and write large files efficiently. When we start to become familiar with streams and generators, and stop using functions like file_get_contents: an entire category of errors disappear from our applications. That seems like a good thing to aim for!

original text:https://www.sitepoint.com/performant-reading-big-files-php/

php7.2.X新特性

新的对象类型

这种新的对象类型, object, 引进了可用于逆变(contravariant)参数输入和协变(covariant)返回任何对象类型。

<?php

function test(object $obj) : object
{
return new SplQueue();
}

test(new StdClass());

通过名称加载扩展

扩展文件不再需要通过文件加载 (Unix下以.so为文件扩展名,在Windows下以 .dll 为文件扩展名) 进行指定。可以在php.ini配置文件进行启用, 也可以使用 dl() 函数进行启用。

允许重写抽象方法(Abstract method)

当一个抽象类继承于另外一个抽象类的时候,继承后的抽象类可以重写被继承的抽象类的抽象方法。

<?php

abstract class A
{
abstract function test(string $s);
}
abstract class B extends A
{
// overridden – still maintaining contravariance for parameters and covariance for return
abstract function test($s) : int;
}

使用Argon2算法生成密码散列

Argon2 已经被加入到密码散列(password hashing) API (这些函数以 password_ 开头), 以下是暴露出来的常量:

  • PASSWORD_ARGON2I
  • PASSWORD_ARGON2_DEFAULT_MEMORY_COST
  • PASSWORD_ARGON2_DEFAULT_TIME_COST
  • PASSWORD_ARGON2_DEFAULT_THREADS

新增 ext/PDO(PDO扩展) 字符串扩展类型

当你准备支持多语言字符集,PDO的字符串类型已经扩展支持国际化的字符集。以下是扩展的常量:

  • PDO::PARAM_STR_NATL
  • PDO::PARAM_STR_CHAR
  • PDO::ATTR_DEFAULT_STR_PARAM

这些常量通过PDO::PARAM_STR利用位运算OR进行计算:

<?php

$db->quote(‘über’, PDO::PARAM_STR | PDO::PARAM_STR_NATL);

为 ext/PDO新增额外的模拟调试信息

PDOStatement::debugDumpParams()方法已经更新,当发送SQL到数据库的时候,在一致性、行查询(包括替换绑定占位符)将会显示调试信息。这一特性已经加入到模拟调试中(在模拟调试打开时可用)。

ext/LDAP(LDAP扩展) 支持新的操作方式

LDAP 扩展已经新增了EXOP支持. 扩展暴露以下函数和常量:

  • ldap_parse_exop()
  • ldap_exop()
  • ldap_exop_passwd()
  • ldap_exop_whoami()
  • LDAP_EXOP_START_TLS
  • LDAP_EXOP_MODIFY_PASSWD
  • LDAP_EXOP_REFRESH
  • LDAP_EXOP_WHO_AM_I
  • LDAP_EXOP_TURN

ext/sockets(sockets扩展)添加了地址信息

sockets扩展现在具有查找地址信息的能力,且可以连接到这个地址,或者进行绑定和解析。为此添加了以下一些函数:

  • socket_addrinfo_lookup()
  • socket_addrinfo_connect()
  • socket_addrinfo_bind()
  • socket_addrinfo_explain()

扩展了参数类型

重写方法和接口实现的参数类型现在可以省略了。不过这仍然是符合LSP,因为现在这种参数类型是逆变的。

<?php

interface A
{
public function Test(array $input);
}

class B implements A
{
public function Test($input){} // type omitted for $input
}

允许分组命名空间的尾部逗号

命名空间可以在PHP 7中使用尾随逗号进行分组引入。

<?php

use Foo\Bar\{
Foo,
Bar,
Baz,
};

php7.1.X新特性

可为空(Nullable)类型

参数以及返回值的类型现在可以通过在类型前加上一个问号使之允许为空。 当启用这个特性时,传入的参数或者函数返回的结果要么是给定的类型,要么是 null 。

<?php

function testReturn(): ?string
{
return ‘elePHPant’;
}

var_dump(testReturn());

function testReturn(): ?string
{
return null;
}

var_dump(testReturn());

function test(?string $name)
{
var_dump($name);
}

test(‘elePHPant’);
test(null);
test();

Void 函数

一个新的返回值类型void被引入。 返回值声明为 void 类型的方法要么干脆省去 return 语句,要么使用一个空的 return 语句。 对于 void 函数来说,NULL 不是一个合法的返回值。

<?php
function swap(&$left, &$right) : void
{
if ($left === $right) {
return;
}

$tmp = $left;
$left = $right;
$right = $tmp;
}

$a = 1;
$b = 2;
var_dump(swap($a, $b), $a, $b);

Symmetric array destructuring

短数组语法([])现在作为list()语法的一个备选项,可以用于将数组的值赋给一些变量(包括在foreach中)。

<?php
$data = [
[1, ‘Tom’],
[2, ‘Fred’],
];

// list() style
list($id1, $name1) = $data[0];

// [] style
[$id1, $name1] = $data[0];

// list() style
foreach ($data as list($id, $name)) {
// logic here with $id and $name
}

// [] style
foreach ($data as [$id, $name]) {
// logic here with $id and $name
}

类常量可见性

现在起支持设置类常量的可见性。

<?php
class ConstDemo
{
const PUBLIC_CONST_A = 1;
public const PUBLIC_CONST_B = 2;
protected const PROTECTED_CONST = 3;
private const PRIVATE_CONST = 4;
}

iterable 伪类

现在引入了一个新的被称为iterable的伪类 (与callable类似)。 这可以被用在参数或者返回值类型中,它代表接受数组或者实现了Traversable接口的对象。 至于子类,当用作参数时,子类可以收紧父类的iterable类型到array 或一个实现了Traversable的对象。对于返回值,子类可以拓宽父类的 array或对象返回值类型到iterable

<?php
function iterator(iterable $iter)
{
foreach ($iter as $val) {
//
}
}

多异常捕获处理

一个catch语句块现在可以通过管道字符(|)来实现多个异常的捕获。 这对于需要同时处理来自不同类的不同异常时很有用。

<?php
try {
// some code
} catch (FirstException | SecondException $e) {
// handle first and second exceptions
}

list()现在支持键名

现在list()和它的新的[]语法支持在它内部去指定键名。这意味着它可以将任意类型的数组 都赋值给一些变量(与短数组语法类似)

<?php
$data = [
[“id” => 1, “name” => ‘Tom’],
[“id” => 2, “name” => ‘Fred’],
];

// list() style
list(“id” => $id1, “name” => $name1) = $data[0];

// [] style
[“id” => $id1, “name” => $name1] = $data[0];

// list() style
foreach ($data as list(“id” => $id, “name” => $name)) {
// logic here with $id and $name
}

// [] style
foreach ($data as [“id” => $id, “name” => $name]) {
// logic here with $id and $name
}

支持为负的字符串偏移量

现在所有支持偏移量的字符串操作函数 都支持接受负数作为偏移量,包括通过[]{}操作字符串下标。在这种情况下,一个负数的偏移量会被理解为一个从字符串结尾开始的偏移量。

<?php
var_dump(“abcdef”[-2]);
var_dump(strpos(“aabbcc”, “b”, -3));

$string = ‘bar’;
echo “The last character of ‘$string’ is ‘$string[-1]’.\n”;

ext/openssl 支持 AEAD

通过给openssl_encrypt()openssl_decrypt() 添加额外参数,现在支持了AEAD (模式 GCM and CCM)。

通过 Closure::fromCallable() 将callables转为闭包

Closure新增了一个静态方法,用于将callable快速地 转为一个Closure 对象。

<?php
class Test
{
public function exposeFunction()
{
return Closure::fromCallable([$this, ‘privateFunction’]);
}

private function privateFunction($param)
{
var_dump($param);
}
}

$privFunc = (new Test)->exposeFunction();
$privFunc(‘some value’);

异步信号处理

一个新的名为 pcntl_async_signals() 的方法现在被引入, 用于启用无需 ticks (这会带来很多额外的开销)的异步信号处理。

<?php
pcntl_async_signals(true); // turn on async signals

pcntl_signal(SIGHUP,  function($sig) {
echo “SIGHUP\n”;
});

posix_kill(posix_getpid(), SIGHUP);

HTTP/2 server push support in ext/curl

对服务器推送的支持现在已经被加入到 CURL 扩展中( 需要版本 7.46 或更高)。这个可以通过 curl_multi_setopt() 函数与新的常量 CURLMOPT_PUSHFUNCTION 来进行调节。常量 CURL_PUST_OK 和 CURL_PUSH_DENY 也已经被添加进来,以便服务器推送的回调函数来表明自己会同意或拒绝处理。

php7.0.X新特性

标量类型声明

标量类型声明 有两种模式: 强制 (默认) 和 严格模式。 现在可以使用下列类型参数(无论用强制模式还是严格模式): 字符串(string), 整数 (int), 浮点数 (float), 以及布尔值 (bool)。它们扩充了PHP5中引入的其他类型:类名,接口,数组和 回调类型

<?php
// Coercive mode
function sumOfInts(int ...$ints)
{
return array_sum($ints);
}

var_dump(sumOfInts(2, '3', 4.1)); // int(9)

返回值类型声明

PHP 7 增加了对返回类型声明的支持。 类似于参数类型声明,返回类型声明指明了函数返回值的类型。可用的类型与参数声明中可用的类型相同。

<?php

function arraysSum(array ...$arrays): array
{
return array_map(function(array $array): int {
return array_sum($array);
}, $arrays);
}

print_r(arraysSum([1,2,3], [4,5,6], [7,8,9]));

Array
(
    [0] => 6
    [1] => 15
    [2] => 24
)

null合并运算符

由于日常使用中存在大量同时使用三元表达式和 isset()的情况, 我们添加了null合并运算符 (??) 这个语法糖。如果变量存在且值不为NULL, 它就会返回自身的值,否则返回它的第二个操作数。

<?php
// Fetches the value of $_GET['user'] and returns 'nobody'
// if it does not exist.
$username = $_GET['user'] ?? 'nobody';
// This is equivalent to:
$username = isset($_GET['user']) ? $_GET['user'] : 'nobody';

// Coalesces can be chained: this will return the first
// defined value out of $_GET['user'], $_POST['user'], and
// 'nobody'.
$username = $_GET['user'] ?? $_POST['user'] ?? 'nobody';
?>

太空船操作符(组合比较符)

太空船操作符用于比较两个表达式。当$a小于、等于或大于$b时它分别返回-1、0或1。 比较的原则是沿用 PHP 的常规比较规则进行的。

<?php
// 整数
echo 1 <=> 1; // 0
echo 1 <=> 2; // -1
echo 2 <=> 1; // 1

// 浮点数
echo 1.5 <=> 1.5; // 0
echo 1.5 <=> 2.5; // -1
echo 2.5 <=> 1.5; // 1

// 字符串
echo "a" <=> "a"; // 0
echo "a" <=> "b"; // -1
echo "b" <=> "a"; // 1
?>

通过 define() 定义常量数组

Array 类型的常量现在可以通过 define() 来定义。在 PHP5.6 中仅能通过 const 定义。

<?php
define('ANIMALS', [
'dog',
'cat',
'bird'
]);

echo ANIMALS[1]; // 输出 "cat"
?>

匿名类

现在支持通过new class 来实例化一个匿名类,这可以用来替代一些“用后即焚”的完整类定义。

<?php
interface Logger {
public function log(string $msg);
}

class Application {
private $logger;

public function getLogger(): Logger {
return $this->logger;
}

public function setLogger(Logger $logger) {
$this->logger = $logger;
}
}

$app = new Application;
$app->setLogger(new class implements Logger {
public function log(string $msg) {
echo $msg;
}
});

var_dump($app->getLogger());
?>

Unicode codepoint 转译语法

这接受一个以16进制形式的 Unicode codepoint,并打印出一个双引号或heredoc包围的 UTF-8 编码格式的字符串。 可以接受任何有效的 codepoint,并且开头的 0 是可以省略的。

echo "\u{aa}";
echo "\u{0000aa}";
echo "\u{9999}";

Closure::call()

Closure::call() 现在有着更好的性能,简短干练的暂时绑定一个方法到对象上闭包并调用它。

<?php
class A {private $x = 1;}

// PHP 7 之前版本的代码
$getXCB = function() {return $this->x;};
$getX = $getXCB->bindTo(new A, 'A'); // 中间层闭包
echo $getX();

// PHP 7+ 及更高版本的代码
$getX = function() {return $this->x;};
echo $getX->call(new A);

unserialize()提供过滤

这个特性旨在提供更安全的方式解包不可靠的数据。它通过白名单的方式来防止潜在的代码注入。

<?php

// 将所有的对象都转换为 __PHP_Incomplete_Class 对象
$data = unserialize($foo, ["allowed_classes" => false]);

// 将除 MyClass 和 MyClass2 之外的所有对象都转换为 __PHP_Incomplete_Class 对象
$data = unserialize($foo, ["allowed_classes" => ["MyClass", "MyClass2"]);

// 默认情况下所有的类都是可接受的,等同于省略第二个参数
$data = unserialize($foo, ["allowed_classes" => true]);

IntlChar

新增加的 IntlChar 类旨在暴露出更多的 ICU 功能。这个类自身定义了许多静态方法用于操作多字符集的 unicode 字符。

<?php

printf('%x', IntlChar::CODEPOINT_MAX);
echo IntlChar::charName('@');
var_dump(IntlChar::ispunct('!'));

预期

预期是向后兼用并增强之前的 assert() 的方法。 它使得在生产环境中启用断言为零成本,并且提供当断言失败时抛出特定异常的能力。

老版本的API出于兼容目的将继续被维护,assert()现在是一个语言结构,它允许第一个参数是一个表达式,而不仅仅是一个待计算的 string或一个待测试的boolean

<?php
ini_set('assert.exception', 1);

class CustomError extends AssertionError {}

assert(false, new CustomError('Some error message'));
?>

Group use declarations

从同一 namespace 导入的类、函数和常量现在可以通过单个 use 语句 一次性导入了。

<?php

// PHP 7 之前的代码
use some\namespace\ClassA;
use some\namespace\ClassB;
use some\namespace\ClassC as C;

use function some\namespace\fn_a;
use function some\namespace\fn_b;
use function some\namespace\fn_c;

use const some\namespace\ConstA;
use const some\namespace\ConstB;
use const some\namespace\ConstC;

// PHP 7+ 及更高版本的代码
use some\namespace\{ClassA, ClassB, ClassC as C};
use function some\namespace\{fn_a, fn_b, fn_c};
use const some\namespace\{ConstA, ConstB, ConstC};
?>

生成器可以返回表达式

此特性基于 PHP 5.5 版本中引入的生成器特性构建的。 它允许在生成器函数中通过使用 return 语法来返回一个表达式 (但是不允许返回引用值), 可以通过调用 Generator::getReturn() 方法来获取生成器的返回值, 但是这个方法只能在生成器完成产生工作以后调用一次。

<?php

$gen = (function() {
yield 1;
yield 2;

return 3;
})();

foreach ($gen as $val) {
echo $val, PHP_EOL;
}

echo $gen->getReturn(), PHP_EOL;

Generator delegation

现在,只需在最外层生成其中使用 yield from, 就可以把一个生成器自动委派给其他的生成器, Traversable 对象或者 array

<?php

function gen()
{
yield 1;
yield 2;

yield from gen2();
}

function gen2()
{
yield 3;
yield 4;
}

foreach (gen() as $val)
{
echo $val, PHP_EOL;
}

?>

整数除法函数 intdiv()

新加的函数 intdiv() 用来进行 整数的除法运算。

<?php

var_dump(intdiv(10, 3));
?>

会话选项

session_start() 可以接受一个 array 作为参数, 用来覆盖 php.ini 文件中设置的 会话配置选项。

在调用 session_start() 的时候, 传入的选项参数中也支持 session.lazy_write 行为, 默认情况下这个配置项是打开的。它的作用是控制 PHP 只有在会话中的数据发生变化的时候才 写入会话存储文件,如果会话中的数据没有发生改变,那么 PHP 会在读取完会话数据之后, 立即关闭会话存储文件,不做任何修改,可以通过设置 read_and_close 来实现。

例如,下列代码设置 session.cache_limiter 为 private,并且在读取完毕会话数据之后马上关闭会话存储文件。

<?php
session_start([
'cache_limiter' => 'private',
'read_and_close' => true,
]);
?>

preg_replace_callback_array()

在 PHP 7 之前,当使用 preg_replace_callback() 函数的时候, 由于针对每个正则表达式都要执行回调函数,可能导致过多的分支代码。 而使用新加的 preg_replace_callback_array() 函数, 可以使得代码更加简洁。

现在,可以使用一个关联数组来对每个正则表达式注册回调函数, 正则表达式本身作为关联数组的键, 而对应的回调函数就是关联数组的值。

CSPRNG Functions

新加入两个跨平台的函数: random_bytes() 和 random_int() 用来产生高安全级别的随机字符串和随机整数。

可以使用 list() 函数来展开实现了 ArrayAccess 接口的对象

在之前版本中,list() 函数不能保证 正确的展开实现了 ArrayAccess 接口的对象, 现在这个问题已经被修复。

其他特性

允许在克隆表达式上访问对象成员,例如: (clone $foo)->bar()