GraphQL一种用于 API 的查询语言

一种用于 API 的查询语言

GraphQL 既是一种用于 API 的查询语言也是一个满足你数据查询的运行时。 GraphQL 对你的 API 中的数据提供了一套易于理解的完整描述,使得客户端能够准确地获得它需要的数据,而且没有任何冗余,也让 API 更容易地随着时间推移而演进,还能用于构建强大的开发者工具。

请求你所要的数据不多不少

向你的 API 发出一个 GraphQL 请求就能准确获得你想要的数据,不多不少。 GraphQL 查询总是返回可预测的结果。使用 GraphQL 的应用可以工作得又快又稳,因为控制数据的是应用,而不是服务器。

获取多个资源只用一个请求

GraphQL 查询不仅能够获得资源的属性,还能沿着资源间引用进一步查询。典型的 REST API 请求多个资源时得载入多个 URL,而 GraphQL 可以通过一次请求就获取你应用所需的所有数据。这样一来,即使是比较慢的移动网络连接下,使用 GraphQL 的应用也能表现得足够迅速。

描述所有的可能类型系统

GraphQL API 基于类型和字段的方式进行组织,而非入口端点。你可以通过一个单一入口端点得到你所有的数据能力。GraphQL 使用类型来保证应用只请求可能的数据,还提供了清晰的辅助性错误信息。应用可以使用类型,而避免编写手动解析代码。

快步前进强大的开发者工具

不用离开编辑器就能准确知道你可以从 API 中请求的数据,发送查询之前就能高亮潜在问题,高级代码智能提示。利用 API 的类型系统,GraphQL 让你可以更简单地构建如同GraphiQL的强大工具。

API 演进无需划分版本

给你的 GraphQL API 添加字段和类型而无需影响现有查询。老旧的字段可以废弃,从工具中隐藏。通过使用单一演进版本,GraphQL API 使得应用始终能够使用新的特性,并鼓励使用更加简洁、更好维护的服务端代码。

使用你现有的数据和代码

GraphQL 让你的整个应用共享一套 API,而不用被限制于特定存储引擎。GraphQL 引擎已经有多种语言实现,通过 GraphQL API 能够更好利用你的现有数据和代码。你只需要为类型系统的字段编写函数,GraphQL 就能通过优化并发的方式来调用它们。

IT专业术语解释(备份)

很多同学对热备,冷备,云备了解不深,我科普一下 IT 行业各种备份术语。以后别闹笑话了。

假设你是一位女性,你有一位男朋友,于此同时你和另外一位男生暧昧不清,比朋友好,又不是恋人。你随时可以甩了现任男友,另外一位马上就能补上。这是冷备份

假设你是一位女性,同时和两位男性在交往,两位都是你男朋友。并且他们还互不干涉,独立运行。这就是双机热备份

假设你是一位女性,不安于男朋友给你的安全感。在遥远的男友未知的地方,和一位男生保持着联系,你告诉他你没有男朋友,你现在处于纠结期,一旦你和你男朋友分开了,你马上可以把自己感情转移到异地男人那里去。这是异地容灾备份

假设你是一位女性,有一位男朋友,你又付了钱给一家婚姻介绍所,让他帮你留意好的资源,一旦你和你这位男朋友分开,婚姻介绍所马上给你安排资源,你感情不间断运行,这是云备份。。。。

数据安全大于一切,今天你把自己备份了吗?

假设你是一位女性,你怀疑男朋友对你的忠诚,在某宝购买了一个测试忠诚度的服务。这是灾难演练。友情提醒,在没有备份的情况下,切忌进行灾难演练,说不好会让你数据血本无归。。

假设你是一位女性,你有一位好到不能在好的闺蜜,好到你们可以共享一个男朋友,这是NAS

假设你是一位女性,你男朋友活太好,你一个人根本 hold 不住,必须要姐妹帮忙才能稳住他。这是负载均衡,QOS

假设你是一位女性,和 A 吃饭和 B 逛街和 C 打炮。合起来是一个完整的男朋友。这。。这是超算集群。。。建议主频不高的女性不要这样做。会直接死机的。。

以上部分转自知乎

古的白 http://www.zhihu.com/people/2a61334b801a53aeb8d563e702c4da56

备胎突然有女朋友了,我该不该夺回来?

备份是为了更好地运行,所以再给大家普及一点运维知识:

假设你是一位女性,你的男友沉迷游戏经常不接电话无故宕机,所以当你们约好下午逛街以后你要时不时的打个电话询问,看看他是不是还能正常提供服务,这叫心跳检测

假设你是一位女性,你想去逛街而你的男友 A 在打游戏不接电话,于是乎你把逛街的请求发给了替补男友B,从而保障服务不间断运行,这叫故障切换

假设你是一位女性,你有很多需要男朋友完成的事情,于是乎你跟 A 逛街旅游吃饭不可描述,而 B 只能陪你逛街,不能拥有全部男朋友的权利,这叫主从配置 master-slave

假设你是一位女性,你的需求太强烈以至于你的男友根本吃不消,于是呼你找了两个男朋友,一三五单号,二四六双号限行,从而减少一个男朋友所面临的压力,这叫负载均衡

假设你是一位女性并且有多个男朋友,配合心跳检测与故障切换和负载均衡将会达到极致的体验,这叫集群LVS,注意,当需求单机可以处理的情况下不建议启用集群,会造成大量资源闲置,提高维护成本

假设你是一位女性,你的需求越来越高导致一个男朋友集群已经处理不了了,于是乎你又新增了另外几个,这叫多集群横行扩容,简称 multi-cluster grid

假设你是一位女性,你的男朋友身体瘦弱从而无法满足需求,于是乎你买了很多大补产品帮你男朋友升级,从而提高单机容量,这叫纵向扩容,切记,纵向扩容的成本会越来越高而效果越来越不明显

假设你是一位女性,你跟男友经常出去游玩,情到深处想做点什么的时候却苦于没有 tt,要去超市购买,于是乎你在你们经常去的地方都放置了 tt,从而大幅度降低等待时间,这叫 CDN

假设你是一位女性,你的男朋友英俊潇洒风流倜傥财大气粗对你唯一,于是乎你遭到了女性 B 的敌视,B 会以朋友名义在周末请求你男朋友修电脑, 修冰箱, 占用男朋友大量时间, 造成男朋友无法为你服务, 这叫拒绝服务攻击, 简称 DOS

假设你是一位女性, 你因男朋友被一位女性敌视, 但是你男朋友的处理能力十分强大, 处理速度已经高于她的请求速度, 于是她雇佣了一票女性来轮流麻烦你的男朋友, 这叫分布式拒绝服务攻击, 简称 DDOS

假设你是一位女性, 你发现男朋友总是在处理一些无关紧要的其它请求, 于是呼你给男朋友了一个白名单, 要求他只处理白名单内的请求, 而拒绝其它身份不明的人的要求, 这叫访问控制, 也叫会话跟踪

假设你是一位女性, 你发现采取上述措施以后男朋友的处理请求并没有减少很多, 于是你经过调查发现, 有人伪造你的微信头像 昵称来向你的男朋友发起请求, 这叫跨站点请求伪造, 简称 CSRF

假设你是一位女性,你收到了一份快递,于是你要求男朋友给你取快递,当你拿到快递以后发现有人给你邮寄了一封通篇辱骂的信件, 这叫跨站点脚本攻击 简称 XSS, 请注意, 对方完全可以给你邮寄微型窃听器来窃听你的隐私

假设你是一位女性,为了应对威胁,你要求你的男朋友,邮寄给你的邮件必须检查,这叫数据校验与过滤

假设你是一位女性,你的男朋友太优秀而造人窥视,于是乎它们研究了一下你的男朋友,稍微修改了一点点生产出一个男朋友 B 与你的男朋友百分制 99 相似,这不叫剽窃,这叫逆向工程,比如男朋友外挂

假设你是一位女性,你要求你的男朋友坚持十分钟,然后十五分钟继而二十分钟以测试你男朋友的极限在哪里,这叫压力测试

压力测试的目的是查看男朋友是否可以处理需求从而决定是否启用男朋友集群或提升男朋友处理能力,不要对线上运行的男朋友做压力测试,可能会造成宕机的后果,会血本无归的

假设你是一位女性,为了保证你男朋友的正常运行,于是乎你每天查看他的微信微博等社交资料来寻找可能产生问题的线索,这叫数据分析

假设你是一位女性,你的男朋友属于社交活跃选手,每天的微博知乎微信生产了大量信息,你发现自己的分析速度远远低于他生的速度,于是乎你找来你的闺蜜一起分析,这叫并行计算

假设你是一位女性,你的男朋友太能折腾处处留情产生了天量的待处理信息,你和你的闺蜜们已经累趴也没赶上他创造的速度,于是你付费在知乎上找了20个小伙伴帮你一起分析,这叫云计算

假设你是一位女性,在使用云计算后获得了大量整理好的男朋友数据,这些数据如:

地点 活跃时间段 活跃次数

如家 xxxx 123次

汉庭 xxxx 45次

这叫数据统计

假设你是一位女性,你在得到男朋友经常出没的地点后,根据酒店,敏感时间段等信息确定男朋友因该是出轨了,这叫数据挖掘

假设你是一位女性,在分析男友的数据后,得知他下午又要出去开房,于是乎你在他准备出门前给他发了个短信,问他有没有带 tt,没有的话可以在我这里买,这叫精准推送,需要配合数据挖掘

假如你是一位女性,你的男朋友总该出去浪而各种出问题,于是乎你租了间屋子并准备好了所有需要的东西并告诉他,以后不用找酒店了,直接来我这屋子吧,什么都准备好了,这叫容器

假如你是一位女性,而你的男朋友是个码农,晚上不睡觉跟大家深入浅出的科普热备、冷备、云备份,那么这些备份你全都用得着 (来自 海鸥http://www.zhihu.com/people/36da2198b32aff4416173e95b3ae3535 )

假如你是一位女性,你每天都要和男朋友打通一次接口,采集数据。你的男朋友用来连接你和他的工具,叫做接口“机”,你采集到的数据叫做“流”数据。你一天24小时不停地采,这叫实时数据采集。你决定开发新的接口来和男朋友交流,这叫虚拟化。你决定从不同的男友身上采集数据,你就是大数据中心。有一天你决定生一个宝宝,这叫大数据应用。宝宝生下来不知道是谁的,这叫大数据脱敏。但是从宝宝外观来看,黑色皮肤金色头发,这叫数据融合跨域建模。你决定把这个宝宝拿来展览收点门票,这叫大数据变现。(来自 彩色郁金香http://www.zhihu.com/people/4dffd99714e85bf909f7716a3bc58df1 )

参考:https://www.zhihu.com/question/263789393/answer/274245200

中文分词和中文搜索 xunsearch

安装和使用

一、安装

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# wget http://www.xunsearch.com/download/xunsearch-full-latest.tar.bz2
--2018-12-20 14:05:23--  http://www.xunsearch.com/download/xunsearch-full-latest.tar.bz2
Resolving www.xunsearch.com (www.xunsearch.com)... 202.75.216.233
Connecting to www.xunsearch.com (www.xunsearch.com)|202.75.216.233|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10302627 (9.8M) [text/plain]
Saving to: ‘xunsearch-full-latest.tar.bz2’

xunsearch-full-latest.tar.b 100%[=========================================>]   9.83M  10.3MB/s    in 1.0s    

2018-12-20 14:05:24 (10.3 MB/s) - ‘xunsearch-full-latest.tar.bz2’ saved [10302627/10302627]

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# ll
total 55012
drwxr-xr-x  7 zheng zheng     4096 Dec 20 14:05 ./
drwxr-xr-x  5 root       root           4096 Aug 12 15:12 ../
-rw-r--r--  1 root       root       10302627 Nov 16 19:22 xunsearch-full-latest.tar.bz2
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# tar -xjf xunsearch-full-latest.tar.bz2 
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# ll
total 55016
drwxr-xr-x  8 zheng zheng     4096 Dec 20 14:05 ./
drwxr-xr-x  5 root       root           4096 Aug 12 15:12 ../
drwxr-xr-x  3        501 staff          4096 Nov 16 19:16 xunsearch-full-1.4.12/
-rw-r--r--  1 root       root       10302627 Nov 16 19:22 xunsearch-full-latest.tar.bz2
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng# cd xunsearch-full-1.4.12/
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# ll
total 40
drwxr-xr-x 3        501 staff       4096 Nov 16 19:16 ./
drwxr-xr-x 8 zheng zheng  4096 Dec 20 14:05 ../
-rw-r--r-- 1        501 staff        120 Dec 31  2016 ._.DS_Store
-rw-r--r-- 1        501 staff       6148 Dec 31  2016 .DS_Store
drwxr-xr-x 2        501 staff       4096 Nov 16 19:21 packages/
-rw-r--r-- 1        501 staff       2937 Dec  5  2014 README.md
-rwxr-xr-x 1        501 staff      11165 Oct 16 13:10 setup.sh*
root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# sh setup.sh 

+==========================================+
| Welcome to setup xunsearch(full)         |
| 欢迎使用 xunsearch (完整版) 安装程序     |
+------------------------------------------+
| Follow the on-screen instructions please |
| 请按照屏幕上的提示操作以完成安装         |
+==========================================+

Please specify the installation directory
请指定安装目录 (默认为中括号内的值)
[/usr/local/xunsearch]:setup.sh: 111: read: Illegal option -e
[/usr/local/xunsearch]:

Confirm the installation directory
请确认安装目录:/usr/local/xunsearch [Y/n]y

Checking scws ... no
Installing scws (1.2.3) ... 
Extracting scws package ...
Configuring scws ...
Compiling & installing scws ...
Checking scws dict ... no
Extracting scws dict file ... 
Checking libuuid ... no, try to install it
Extracting libuuid package ...
Configuring libuuid ...
Compiling & installing libuuid ...
Checking xapian-core-scws ... no
Installing xapian-core-scws (1.4.9) ... 
Extracting xapian-core-scws package ...
Configuring xapian-core-scws ...
Compiling & installing xapian-core-scws ...
Checking libevent ... no
Installing libevent (2.0.21-stable) ... 
Extracting libevent package ...
Configuring libevent ...
Compiling & installing libevent ...
Extracting xunsearch package (1.4.12) ...
Configuring xunsearch ...
Compiling & installing xunsearch ...
Cleaning ... done

+=================================================+
| Installation completed successfully, Thanks you |
| 安装成功,感谢选择和使用 xunsearch              |
+-------------------------------------------------+
| 说明和注意事项:                                |
| 1. 开启/重新开启 xunsearch 服务程序,命令如下: |
|    /usr/local/xunsearch/bin/xs-ctl.sh restart
|    强烈建议将此命令写入服务器开机脚本中         |
|                                                 |
| 2. 所有的索引数据将被保存在下面这个目录中:     |
|    /usr/local/xunsearch/data
|    如需要转移到其它目录,请使用软链接。         |
|                                                 |
| 3. 您现在就可以在我们提供的开发包(SDK)基础上    |
|    开发您自己的搜索了。                         |
|    目前只支持 PHP 语言,参见下面文档:          |
|    /usr/local/xunsearch/sdk/php/README
+=================================================+

root@iZbp1bdm1m8u064ukgg8zwZ:/home/zheng/xunsearch-full-1.4.12# /usr/local/xunsearch/bin/xs-ctl.sh restart
WARNING: no server[xs-indexd] is running (BIND:127.0.0.1:8383)
INFO: re-starting server[xs-indexd] ... (BIND:127.0.0.1:8383)
WARNING: no server[xs-searchd] is running (BIND:127.0.0.1:8384)
INFO: re-starting server[xs-searchd] ... (BIND:127.0.0.1:8384)
root@iZbp1bdm1m8u064ukgg8zwZ:/home/wolonggang/xunsearch-full-1.4.12#

xunsearch PHP-SDK 使用

$xs = new \XS('demo');
$tokenizer = new \XSTokenizerScws;

// 添加分词词典, 支持 TXT/XDB 格式
// $tokenizer = $tokenizer->addDict();

// 获取分词结果
// $tokenizer = $tokenizer->getResult();

// XSTokenizer 接口
// $tokenizer = $tokenizer->getTokens();

// 获取重要词统计结果
// $tokenizer = $tokenizer->getTops();

// 获取 scws 版本号
// $tokenizer = $tokenizer->getVersion();

// 判断是否包含指定词性的词
// $tokenizer = $tokenizer->hasWord();

// 设置字符集
// $tokenizer = $tokenizer->setCharset();

// 设置分词词典, 支持 TXT/XDB 格式
// $tokenizer = $tokenizer->setDict();

// 设置散字二元组合
// $tokenizer = $tokenizer->setDuality();

// 设置忽略标点符号
$tokenizer = $tokenizer->setIgnore();

// 设置复合分词选项
// $tokenizer = $tokenizer->setMulti();

$text = '清华大学怎么样?';
$words1 = $tokenizer->getResult($text, 3);
$words2 = $tokenizer->getTops($text, 3);

print_r("<pre>");
print_r($tokenizer);
print_r($words1);
print_r($words2);
exit;

How to Read Big Files with PHP

It’s not often that we, as PHP developers, need to worry about memory management. The PHP engine does a stellar job of cleaning up after us, and the web server model of short-lived execution contexts means even the sloppiest code has no long-lasting effects.

Measuring Success

The only way to be sure we’re making any improvement to our code is to measure a bad situation and then compare that measurement to another after we’ve applied our fix. In other words, unless we know how much a “solution” helps us (if at all), we can’t know if it really is a solution or not.

There are two metrics we can care about. The first is CPU usage. How fast or slow is the process we want to work on? The second is memory usage. How much memory does the script take to execute? These are often inversely proportional — meaning that we can offload memory usage at the cost of CPU usage, and vice versa.

In an asynchronous execution model (like with multi-process or multi-threaded PHP applications), both CPU and memory usage are important considerations. In traditional PHP architecture, these generally become a problem when either one reaches the limits of the server.

It’s impractical to measure CPU usage inside PHP. If that’s the area you want to focus on, consider using something like top, on Ubuntu or macOS. For Windows, consider using the Linux Subsystem, so you can use top in Ubuntu.

For the purposes of this tutorial, we’re going to measure memory usage. We’ll look at how much memory is used in “traditional” scripts. We’ll implement a couple of optimization strategies and measure those too. In the end, I want you to be able to make an educated choice.

The methods we’ll use to see how much memory is used are:

// formatBytes is taken from the php.net documentation

memory_get_peak_usage();

function formatBytes($bytes, $precision = 2) {
    $units = array("b", "kb", "mb", "gb", "tb");

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);

    $bytes /= (1 << (10 * $pow));

    return round($bytes, $precision) . " " . $units[$pow];
}

We’ll use these functions at the end of our scripts, so we can see which script uses the most memory at one time.

What Are Our Options?

There are many approaches we could take to read files efficiently. But there are also two likely scenarios in which we could use them. We could want to read and process data all at the same time, outputting the processed data or performing other actions based on what we read. We could also want to transform a stream of data without ever really needing access to the data.

Let’s imagine, for the first scenario, that we want to be able to read a file and create separate queued processing jobs every 10,000 lines. We’d need to keep at least 10,000 lines in memory, and pass them along to the queued job manager (whatever form that may take).

For the second scenario, let’s imagine we want to compress the contents of a particularly large API response. We don’t care what it says, but we need to make sure it’s backed up in a compressed form.

In both scenarios, we need to read large files. In the first, we need to know what the data is. In the second, we don’t care what the data is. Let’s explore these options…

Reading Files, Line By Line

There are many functions for working with files. Let’s combine a few into a naive file reader:

// from memory.php

function formatBytes($bytes, $precision = 2) {
    $units = array("b", "kb", "mb", "gb", "tb");

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);

    $bytes /= (1 << (10 * $pow));

    return round($bytes, $precision) . " " . $units[$pow];
}

print formatBytes(memory_get_peak_usage());
// from reading-files-line-by-line-1.php

function readTheFile($path) {
    $lines = [];
    $handle = fopen($path, "r");

    while(!feof($handle)) {
        $lines[] = trim(fgets($handle));
    }

    fclose($handle);
    return $lines;
}

readTheFile("shakespeare.txt");

require "memory.php";

We’re reading a text file containing the complete works of Shakespeare. The text file is about 5.5MB, and the peak memory usage is 12.8MB. Now, let’s use a generator to read each line:

// from reading-files-line-by-line-2.php

function readTheFile($path) {
    $handle = fopen($path, "r");

    while(!feof($handle)) {
        yield trim(fgets($handle));
    }

    fclose($handle);
}

readTheFile("shakespeare.txt");

require "memory.php";

The text file is the same size, but the peak memory usage is 393KB. This doesn’t mean anything until we do something with the data we’re reading. Perhaps we can split the document into chunks whenever we see two blank lines. Something like this:

// from reading-files-line-by-line-3.php

$iterator = readTheFile("shakespeare.txt");

$buffer = "";

foreach ($iterator as $iteration) {
    preg_match("/\n{3}/", $buffer, $matches);

    if (count($matches)) {
        print ".";
        $buffer = "";
    } else {
        $buffer .= $iteration . PHP_EOL;
    }
}

require "memory.php";

Any guesses how much memory we’re using now? Would it surprise you to know that, even though we split the text document up into 1,216 chunks, we still only use 459KB of memory? Given the nature of generators, the most memory we’ll use is that which we need to store the largest text chunk in an iteration. In this case, the largest chunk is 101,985 characters.

I’ve already written about the performance boosts of using generators and Nikita Popov’s Iterator library, so go check that out if you’d like to see more!

Generators have other uses, but this one is demonstrably good for performant reading of large files. If we need to work on the data, generators are probably the best way.

Piping Between Files

In situations where we don’t need to operate on the data, we can pass file data from one file to another. This is commonly called piping (presumably because we don’t see what’s inside a pipe except at each end … as long as it’s opaque, of course!). We can achieve this by using stream methods. Let’s first write a script to transfer from one file to another, so that we can measure the memory usage:

// from piping-files-1.php

file_put_contents(
    "piping-files-1.txt", file_get_contents("shakespeare.txt")
);

require "memory.php";

Unsurprisingly, this script uses slightly more memory to run than the text file it copies. That’s because it has to read (and keep) the file contents in memory until it has written to the new file. For small files, that may be okay. When we start to use bigger files, no so much…

Let’s try streaming (or piping) from one file to another:

// from piping-files-2.php

$handle1 = fopen("shakespeare.txt", "r");
$handle2 = fopen("piping-files-2.txt", "w");

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

This code is slightly strange. We open handles to both files, the first in read mode and the second in write mode. Then we copy from the first into the second. We finish by closing both files again. It may surprise you to know that the memory used is 393KB.

That seems familiar. Isn’t that what the generator code used to store when reading each line? That’s because the second argument to fgets specifies how many bytes of each line to read (and defaults to -1 or until it reaches a new line).

The third argument to stream_copy_to_stream is exactly the same sort of parameter (with exactly the same default). stream_copy_to_stream is reading from one stream, one line at a time, and writing it to the other stream. It skips the part where the generator yields a value, since we don’t need to work with that value.

Piping this text isn’t useful to us, so let’s think of other examples which might be. Suppose we wanted to output an image from our CDN, as a sort of redirected application route. We could illustrate it with code resembling the following:

// from piping-files-3.php

file_put_contents(
    "piping-files-3.jpeg", file_get_contents(
        "https://github.com/assertchris/uploads/raw/master/rick.jpg"
    )
);

// ...or write this straight to stdout, if we don't need the memory info

require "memory.php";

Imagine an application route brought us to this code. But instead of serving up a file from the local file system, we want to get it from a CDN. We may substitute file_get_contents for something more elegant (like Guzzle), but under the hood it’s much the same.

The memory usage (for this image) is around 581KB. Now, how about we try to stream this instead?

// from piping-files-4.php

$handle1 = fopen(
    "https://github.com/assertchris/uploads/raw/master/rick.jpg", "r"
);

$handle2 = fopen(
    "piping-files-4.jpeg", "w"
);

// ...or write this straight to stdout, if we don't need the memory info

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

The memory usage is slightly less (at 400KB), but the result is the same. If we didn’t need the memory information, we could just as well print to standard output. In fact, PHP provides a simple way to do this:

$handle1 = fopen(
    "https://github.com/assertchris/uploads/raw/master/rick.jpg", "r"
);

$handle2 = fopen(
    "php://stdout", "w"
);

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

// require "memory.php";

Other Streams

There are a few other streams we could pipe and/or write to and/or read from:

  • php://stdin (read-only)
  • php://stderr (write-only, like php://stdout)
  • php://input (read-only) which gives us access to the raw request body
  • php://output (write-only) which lets us write to an output buffer
  • php://memory and php://temp (read-write) are places we can store data temporarily. The difference is that php://temp will store the data in the file system once it becomes large enough, while php://memory will keep storing in memory until that runs out.

Filters

There’s another trick we can use with streams called filters. They’re a kind of in-between step, providing a tiny bit of control over the stream data without exposing it to us. Imagine we wanted to compress our shakespeare.txt. We might use the Zip extension:

// from filters-1.php

$zip = new ZipArchive();
$filename = "filters-1.zip";

$zip->open($filename, ZipArchive::CREATE);
$zip->addFromString("shakespeare.txt", file_get_contents("shakespeare.txt"));
$zip->close();

require "memory.php";

This is a neat bit of code, but it clocks in at around 10.75MB. We can do better, with filters:

// from filters-2.php

$handle1 = fopen(
    "php://filter/zlib.deflate/resource=shakespeare.txt", "r"
);

$handle2 = fopen(
    "filters-2.deflated", "w"
);

stream_copy_to_stream($handle1, $handle2);

fclose($handle1);
fclose($handle2);

require "memory.php";

Here, we can see the php://filter/zlib.deflate filter, which reads and compresses the contents of a resource. We can then pipe this compressed data into another file. This only uses 896KB.

I know this is not the same format, or that there are upsides to making a zip archive. You have to wonder though: if you could choose the different format and save 12 times the memory, wouldn’t you?

To uncompress the data, we can run the deflated file back through another zlib filter:

// from filters-2.php

file_get_contents(
    "php://filter/zlib.inflate/resource=filters-2.deflated"
);

Streams have been extensively covered in “Understanding Streams in PHP” and “Using PHP Streams Effectively”. If you’d like a different perspective, check those out!

Customizing Streams

fopen and file_get_contents have their own set of default options, but these are completely customizable. To define them, we need to create a new stream context:

// from creating-contexts-1.php

$data = join("&", [
    "twitter=assertchris",
]);

$headers = join("\r\n", [
    "Content-type: application/x-www-form-urlencoded",
    "Content-length: " . strlen($data),
]);

$options = [
    "http" => [
        "method" => "POST",
        "header"=> $headers,
        "content" => $data,
    ],
];

$context = stream_content_create($options);

$handle = fopen("https://example.com/register", "r", false, $context);
$response = stream_get_contents($handle);

fclose($handle);

In this example, we’re trying to make a POST request to an API. The API endpoint is secure, but we still need to use the http context property (as is used for http and https). We set a few headers and open a file handle to the API. We can open the handle as read-only since the context takes care of the writing.

There are loads of things we can customize, so it’s best to check out the documentation if you want to know more.

Making Custom Protocols and Filters

Before we wrap things up, let’s talk about making custom protocols. If you look at the documentation, you can find an example class to implement:

Protocol {
    public resource $context;
    public __construct ( void )
    public __destruct ( void )
    public bool dir_closedir ( void )
    public bool dir_opendir ( string $path , int $options )
    public string dir_readdir ( void )
    public bool dir_rewinddir ( void )
    public bool mkdir ( string $path , int $mode , int $options )
    public bool rename ( string $path_from , string $path_to )
    public bool rmdir ( string $path , int $options )
    public resource stream_cast ( int $cast_as )
    public void stream_close ( void )
    public bool stream_eof ( void )
    public bool stream_flush ( void )
    public bool stream_lock ( int $operation )
    public bool stream_metadata ( string $path , int $option , mixed $value )
    public bool stream_open ( string $path , string $mode , int $options ,
        string &$opened_path )
    public string stream_read ( int $count )
    public bool stream_seek ( int $offset , int $whence = SEEK_SET )
    public bool stream_set_option ( int $option , int $arg1 , int $arg2 )
    public array stream_stat ( void )
    public int stream_tell ( void )
    public bool stream_truncate ( int $new_size )
    public int stream_write ( string $data )
    public bool unlink ( string $path )
    public array url_stat ( string $path , int $flags )
}

We’re not going to implement one of these, since I think it is deserving of its own tutorial. There’s a lot of work that needs to be done. But once that work is done, we can register our stream wrapper quite easily:

if (in_array("highlight-names", stream_get_wrappers())) {
    stream_wrapper_unregister("highlight-names");
}

stream_wrapper_register("highlight-names", "HighlightNamesProtocol");

$highlighted = file_get_contents("highlight-names://story.txt");

Similarly, it’s also possible to create custom stream filters. The documentation has an example filter class:

Filter {
    public $filtername;
    public $params
    public int filter ( resource $in , resource $out , int &$consumed ,
        bool $closing )
    public void onClose ( void )
    public bool onCreate ( void )
}

This can be registered just as easily:

$handle = fopen("story.txt", "w+");
stream_filter_append($handle, "highlight-names", STREAM_FILTER_READ);

highlight-names needs to match the filtername property of the new filter class. It’s also possible to use custom filters in a php://filter/highligh-names/resource=story.txt string. It’s much easier to define filters than it is to define protocols. One reason for this is that protocols need to handle directory operations, whereas filters only need to handle each chunk of data.

If you have the gumption, I strongly encourage you to experiment with creating custom protocols and filters. If you can apply filters to stream_copy_to_streamoperations, your applications are going to use next to no memory even when working with obscenely large files. Imagine writing a resize-image filter or and encrypt-for-application filter.

Summary

Though this isn’t a problem we frequently suffer from, it’s easy to mess up when working with large files. In asynchronous applications, it’s just as easy to bring the whole server down when we’re not careful about memory usage.

This tutorial has hopefully introduced you to a few new ideas (or refreshed your memory about them), so that you can think more about how to read and write large files efficiently. When we start to become familiar with streams and generators, and stop using functions like file_get_contents: an entire category of errors disappear from our applications. That seems like a good thing to aim for!

original text:https://www.sitepoint.com/performant-reading-big-files-php/

php7.2.X新特性

新的对象类型

这种新的对象类型, object, 引进了可用于逆变(contravariant)参数输入和协变(covariant)返回任何对象类型。

<?php

function test(object $obj) : object
{
return new SplQueue();
}

test(new StdClass());

通过名称加载扩展

扩展文件不再需要通过文件加载 (Unix下以.so为文件扩展名,在Windows下以 .dll 为文件扩展名) 进行指定。可以在php.ini配置文件进行启用, 也可以使用 dl() 函数进行启用。

允许重写抽象方法(Abstract method)

当一个抽象类继承于另外一个抽象类的时候,继承后的抽象类可以重写被继承的抽象类的抽象方法。

<?php

abstract class A
{
abstract function test(string $s);
}
abstract class B extends A
{
// overridden – still maintaining contravariance for parameters and covariance for return
abstract function test($s) : int;
}

使用Argon2算法生成密码散列

Argon2 已经被加入到密码散列(password hashing) API (这些函数以 password_ 开头), 以下是暴露出来的常量:

  • PASSWORD_ARGON2I
  • PASSWORD_ARGON2_DEFAULT_MEMORY_COST
  • PASSWORD_ARGON2_DEFAULT_TIME_COST
  • PASSWORD_ARGON2_DEFAULT_THREADS

新增 ext/PDO(PDO扩展) 字符串扩展类型

当你准备支持多语言字符集,PDO的字符串类型已经扩展支持国际化的字符集。以下是扩展的常量:

  • PDO::PARAM_STR_NATL
  • PDO::PARAM_STR_CHAR
  • PDO::ATTR_DEFAULT_STR_PARAM

这些常量通过PDO::PARAM_STR利用位运算OR进行计算:

<?php

$db->quote(‘über’, PDO::PARAM_STR | PDO::PARAM_STR_NATL);

为 ext/PDO新增额外的模拟调试信息

PDOStatement::debugDumpParams()方法已经更新,当发送SQL到数据库的时候,在一致性、行查询(包括替换绑定占位符)将会显示调试信息。这一特性已经加入到模拟调试中(在模拟调试打开时可用)。

ext/LDAP(LDAP扩展) 支持新的操作方式

LDAP 扩展已经新增了EXOP支持. 扩展暴露以下函数和常量:

  • ldap_parse_exop()
  • ldap_exop()
  • ldap_exop_passwd()
  • ldap_exop_whoami()
  • LDAP_EXOP_START_TLS
  • LDAP_EXOP_MODIFY_PASSWD
  • LDAP_EXOP_REFRESH
  • LDAP_EXOP_WHO_AM_I
  • LDAP_EXOP_TURN

ext/sockets(sockets扩展)添加了地址信息

sockets扩展现在具有查找地址信息的能力,且可以连接到这个地址,或者进行绑定和解析。为此添加了以下一些函数:

  • socket_addrinfo_lookup()
  • socket_addrinfo_connect()
  • socket_addrinfo_bind()
  • socket_addrinfo_explain()

扩展了参数类型

重写方法和接口实现的参数类型现在可以省略了。不过这仍然是符合LSP,因为现在这种参数类型是逆变的。

<?php

interface A
{
public function Test(array $input);
}

class B implements A
{
public function Test($input){} // type omitted for $input
}

允许分组命名空间的尾部逗号

命名空间可以在PHP 7中使用尾随逗号进行分组引入。

<?php

use Foo\Bar\{
Foo,
Bar,
Baz,
};