- New site config directive: strip_attr: XPath attribute selector (e.g. //img/@srcset) - remove attribute from element
- New site config directive: insert_detected_image: yes/no (default yes) - places image in og:image in the body if no other images extracted
- Bug fix: Better handling of Internationalized Domain Names (IDNs)
- Bug fix: Relative base URLs (<base>) now resolved against page URL
- Bug fix: Wrong site config file chosen in certain cases (when wildcard and exact subdomain files available and cached in APCu)
- Bug fix: ' HTML entities not converted correctly when parsing with Gumbo PHP
- Remove srcset (+ sizes) attributes on img elements if it looks like they only contain relative URLs (browser will use src attribute value instead)
- https:// URLs now re-written to sec:// before being submitted to avoid overzealous security software blocking request on some servers - no redirect, only affects newly submitted URLs on index.php
- HTML5-PHP library updated
- Language Detect library updated
- Site config files updated for better extraction
- Minimum PHP version is now 5.4. If you must use PHP 5.3, please stick with Full-Text RSS 3.7
- Insert og:image (if we find one) at the top of the article when no images have been extracted
- Additional lazy image load handling - helps preserve more images designed for JS-enabled browsers
- Original GUID values from feed items now preserved
- New config option favour_effective_url determines if item's effective URL (after redirects) should replace original item URL in feed output
- Adding &use_effective_url to querystring will replace original feed item URL with effective URL (unless disabled with config option above)
- APCu stats view in admin panel fixed to work with recent versions of APCu
- HTML5-PHP library updated
- Tested for PHP 7 compatibility
- VPS Puppet script (ubuntu-15.10.pp) updated - fixes issue with IDN encodings, among other things. (This is intended for setting up a new Ubuntu 15.10 instance for running Full-Text RSS.)
- Open Graph properties og:title, og:type, og:url, og:image, and og:description now returned if found in the page being processed
- Bug fix: certain XPath expressions weren't being evaluated correctly when HTML5 parsing was enabled
- Cookie handling now only on redirects - fixes issue with certain sites (thanks to Dave Vasilevsky)
- Compatibility test will no longer show HHVM as incompatible - Full-Text RSS worked with HHVM 3.7.1 in our tests (but without Tidy support and no automatic site config updates)
- Humble HTTP Agent updated to support version 2 of PHP's HTTP extension
- HTML5-PHP library updated
- Site config files can now include HTTP headers (user-agent, cookie, referer), e.g. http_header(user-agent): PHP/5.6
- Config option removed: $options->user_agents - use site config files.
- Site config files which use single_page_link can now follow it with if_page_contains: XPath to make it conditional.
- Minimum supported PHP version is now 5.3. If you must use PHP 5.2, please download Full-Text RSS 3.4
- New request parameter: siteconfig lets you submit extraction rules directly in request
- New request paramter: accept=(auto|feed|html) determines what we'll accept as a response (deprecates html=1 parameter)
- New request parameter: key_redirect=0 to prevent HTTP redirect to hide API key
- Site config files can now contain native_ad_clue: [xpath] to check for elements which signify that the article is a native ad
- New config option: remove_native_ads - set to true and when we notice native ads (see above) we'll remove them from the output (only when processing feeds, doesn't affect output when input URL points to an HTML page).
- Feed output will include <dc:type>Native Ad</dc:type> for articles which appear to be native ads.
- New config option: user_submitted_config to determine whether siteconfig parameter is enabled or not
- Feed output now includes <atom:link rel="self"...> with URL of the generated feed
- Feed output now includes <atom:link rel="alternate"...> with URL of the original (input) URL
- Feed output now includes <atom:link rel="related"...> with URL to subscribe to the generated feed (using subtome.com)
- Feed preview stylesheet (feed.xsl) now presents a subscribe to feed link
- Fixed character encoding issue for certain texts
- Fixed character encoding issue for certain characters in HTML5 parsing mode
- Use base element, if present in HTML, when rewriting URLs
- A short excerpt from the first few lines of the extracted content can now be included in the output (pass &summary=1 in querystring, see $options->summary in config file for more info)
- Full content can now be excluded from the output (pass &content=0 in querystring, see $options->content in config file for more info)
- Site config files can now be automatically updated from our GitHub repository (URL to call visible in admin area)
- Site config files updated for better extraction
- PHP Readability updated to be more lenient when pruning HTML
- Language detection library updated
- HTML meta refresh redirects now also followed
- APC stats (if APC is available on your server) now visible in admin area
- Bug fix: Duplicate find_string and replace_string values in site config files no longer removed (thanks Fabrizio!)
- Bug fix: MIME type actions now applied when following single page URLs