Faup stands for Finally An Url Parser and is a library and command line tool to parse URLs and normalize fields with two constraints:

  1. Work with real-life urls (resilient to badly formated ones)
  2. Be fast: no allocation for string parsing and read characters only once

Documentation

Badges

Travis Continuous Build


Coverity Scan Build Status

Quick Start

What is provided?

  • A static library you can embed in your software (faup_static)
  • A dynamic library you can get along with (faupl)
  • A command line tool you can use to extract various parts of a url (faup)

Why Yet Another URL Extraction Library?

Because they all suck. Find a library that can extract, say, a TLD even if you have an IP address, or http://localhost, or anything that may confuse your regex so much that you end up with an unmaintainable one.

Command line usage

Simply pipe or give your url as a parameter:

$ echo "www.github.com" |faup -p
scheme,credential,subdomain,domain,host,tld,port,resource_path,query_string,fragment
,,www,github.com,www.github.com,com,,,,

$ faup www.github.com
,,www,github.com,www.github.com,com,,,,

If that url is a file, multiple values will be unpacked:

$ cat urls.txt 
https://foo:bar@example.com
localhost
www.mozilla.org:80/index.php

$ faup -p urls.txt 
scheme,credential,subdomain,domain,domain_without_tld,host,tld,port,resource_path,query_string,fragment
https,foo:bar,,example.com,example,example.com,com,,,,
,,,localhost,localhost,localhost,,,,,
,,www,mozilla.org,mozilla,www.mozilla.org,org,80,/index.php,,

Extract only the TLD field

Faup uses the Mozilla list to extract TLDs of level greater than one. Can handle exceptions, etc.

$ faup -f tld slashdot.org
org

$ faup -f tld www.bbc.co.uk
co.uk

Json output, high level TLDs

The Json output can be called like this:

$ faup -o json www.takatoukiter.foobar.yokohama.jp
{
    "scheme": "",
    "credential": "",
    "subdomain": "www",
    "domain": "takatoukiter.foobar.yokohama.jp",
    "domain_without_tld": "takatoukiter",
    "host": "www.takatoukiter.foobar.yokohama.jp",
    "tld": "foobar.yokohama.jp",
    "port": "",
    "resource_path": "",
    "query_string": "",
    "fragment": ""
}

Building faup

To get and build faup, you need cmake. As cmake doesn't allow to build the binary in the source directory, you have to create a build directory.

git clone git://github.com/stricaud/faup.git
cd faup
mkdir build
cd build
cmake .. && make
sudo make install

LUA support

Faup can be compiled without LUA support. In that case, CMake will output the following line

-- Could NOT find Lua51 (missing:  LUA_INCLUDE_DIR) 

If you want to add LUA functionnality you need to install lua development headers prior to the previous building steps.

For example, on Redhat systems:

# yum -y install lua lua-devel

CMake 2.8 for Redhat/CentOS 6.x

The following error may appears if you have an outdated version of CMake (just like Redhat and CentOS systems):

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 2.8 or higher is required.  You are running version 2.6.4

-- Configuring incomplete, errors occurred!

To manually install CMake 2.8 on Redhat/CentOS systems use the sources and follow those instructions:

# Install dependencies
yum install ncurses-devel gcc gcc-c++ make

# Get the sources
cd /usr/local/
wget http://www.cmake.org/files/v2.8/cmake-2.8.12.2.tar.gz
tar xzf cmake-2.8.12.2.tar.gz
cd cmake-2.8.12.2

# Compile and install the sources
./configure
make
make install

# clean the env
cd /usr/local
rm -rf cmake-2.8.12.2 cmake-2.8.12.2.tar.gz

# adding cmake to the PATH 
echo "PATH=/usr/local/bin/:\$PATH" > /etc/profile.d/cmake28.sh 
source /etc/profile

FAQ

Why do I receive the error “libfaupl.so.1: cannot open shared object file” when trying to run faup?

If you get a shared library loading error similar to the following when trying to run faup, its probably due to your platform doesn't include the /usr/local/lib shared library directory by default (ex: Ubuntu/Debian) or the directory where faup has its shared library installed:

$ faup
faup: error while loading shared libraries: libfaupl.so.1: cannot open shared object file: No such file or director

A good way to see which shared libraries are loaded by faup is by using the ldd command:

$ ldd /usr/local/bin/faup 
    linux-vdso.so.1 =>  (0x00007fff89735000)
    libfaupl.so.1 => not found
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f55a6082000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f55a641a000)

To update the faup shared library path, you can use the ldconfig command. For example if faup libraries are installed in /usr/local/lib, you can add the path as follows:

$ echo '/usr/local/lib' | sudo tee -a /etc/ld.so.conf.d/faup.conf
$ ldconfig
$ ldd /usr/local/bin/faup 
    linux-vdso.so.1 =>  (0x00007fff550d5000)
    libfaupl.so.1 => /usr/local/lib/libfaupl.so.1 (0x00007f41f9102000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f41f8d78000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f41f9317000)



Faup 代表Finally An Url Parser,是一个库和命令行工具,用于解析URL并对具有两个约束的字段进行规范化:

  1. Work with real-life urls (resilient to badly formated ones)
  2. Be fast: no allocation for string parsing and read characters only once

文件

徽章

特拉维斯连续生成


Coverity Scan Build Status

快速入门

提供什么?

  • 可以嵌入到软件中的静态库(faup_static)
  • 可以与动态库(faupl)
  • 相关联的动态库
  • 可用于提取url(faup)
  • 的各个部分的命令行工具

为什么还有另一个URL提取库?

因为他们都吸吮找到一个可以解压缩TLD的库,即使你有 一个IP地址,或 http:// localhost ,或任何可能混淆您的正则表达式这么多 你最终得不到一个。

命令行用法

只需管道或给出您的网址作为参数:

$ echo "www.github.com" |faup -p
scheme,credential,subdomain,domain,host,tld,port,resource_path,query_string,fragment
,,www,github.com,www.github.com,com,,,,

$ faup www.github.com ,,www,github.com,www.github.com,com,,,,

如果该网址是文件,则会解压缩多个值:

$ cat urls.txt
https://foo:bar@example.com
localhost
www.mozilla.org:80/index.php

$ faup -p urls.txt scheme,credential,subdomain,domain,domain_without_tld,host,tld,port,resource_path,query_string,fragment https,foo:bar,,example.com,example,example.com,com,,,, ,,,localhost,localhost,localhost,,,,, ,,www,mozilla.org,mozilla,www.mozilla.org,org,80,/index.php,,

仅提取TLD字段

Faup使用 Mozilla列表来提取TLD级别大于1。可以处理异常等。

$ faup -f tld slashdot.org
org

$ faup -f tld www.bbc.co.uk co.uk

Json输出,高级TLD

Json输出可以像这样调用:

$ faup -o json www.takatoukiter.foobar.yokohama.jp
{
    "scheme": "",
    "credential": "",
    "subdomain": "www",
    "domain": "takatoukiter.foobar.yokohama.jp",
    "domain_without_tld": "takatoukiter",
    "host": "www.takatoukiter.foobar.yokohama.jp",
    "tld": "foobar.yokohama.jp",
    "port": "",
    "resource_path": "",
    "query_string": "",
    "fragment": ""
}

建立faup

要获取和建立自己的平台,您需要 cmake 。由于cmake不允许 要在源目录中构建二进制文件,您必须创建一个构建目录。

git clone git://github.com/stricaud/faup.git
cd faup
mkdir build
cd build
cmake .. && make
sudo make install

LUA支持

可以在没有LUA支持的情况下编译Faup。在这种情况下,CMake将输出以下行

– Could NOT find Lua51 (missing:  LUA_INCLUDE_DIR)

如果要添加LUA功能,您需要在之前的构建步骤之前安装lua开发头。

例如,在Redhat系统上:

# yum -y install lua lua-devel

用于Redhat / CentOS 6.x的CMake 2.8

如果您有一个过时的CMake版本(就像Redhat和CentOS系统),可能会出现以下错误:

CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 2.8 or higher is required.  You are running version 2.6.4

– Configuring incomplete, errors occurred!

要在Redhat / CentOS系统上手动安装CMake 2.8,请使用这些来源,并按照以下说明进行操作:

# Install dependencies
yum install ncurses-devel gcc gcc-c++ make

Get the sources

cd /usr/local/ wget http://www.cmake.org/files/v2.8/cmake-2.8.12.2.tar.gz tar xzf cmake-2.8.12.2.tar.gz cd cmake-2.8.12.2

Compile and install the sources

./configure make make install

clean the env

cd /usr/local rm -rf cmake-2.8.12.2 cmake-2.8.12.2.tar.gz

adding cmake to the PATH

echo "PATH=/usr/local/bin/:\$PATH" > /etc/profile.d/cmake28.sh source /etc/profile

常见问题

< / a>为什么在尝试运行faup时收到错误libfaupl.so.1:无法打开共享对象文件?

如果您在尝试运行faup时收到类似于以下内容的共享库加载错误,则可能由于您的平台而导致默认情况下不包含 / usr / local / lib 共享库目录(例如: Ubuntu / Debian )或者其中安装了faup的共享库的目录:

$ faup
faup: error while loading shared libraries: libfaupl.so.1: cannot open shared object file: No such file or director

通过使用 ldd 命令查看哪些共享库是由faup加载的好方法:

$ ldd /usr/local/bin/faup
    linux-vdso.so.1 =>  (0x00007fff89735000)
    libfaupl.so.1 => not found
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f55a6082000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f55a641a000)

要更新faup共享库路径,可以使用 ldconfig 命令。例如,如果faup库安装在 / usr / local / lib 中,可以按如下方式添加路径:

$ echo '/usr/local/lib' | sudo tee -a /etc/ld.so.conf.d/faup.conf
$ ldconfig
$ ldd /usr/local/bin/faup
    linux-vdso.so.1 =>  (0x00007fff550d5000)
    libfaupl.so.1 => /usr/local/lib/libfaupl.so.1 (0x00007f41f9102000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f41f8d78000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f41f9317000)




相关问题推荐