Category Archives: News

Rule #553 of the Internet

 

DDOS Attack on Red Button on Black Computer Keyboard.

Rule #553 of the Internet: You know your app’s doing well when idiots make the effort to attack it for no apparent reason. For a while now, Textise has been suffering chronic perfomance problems and regular outages. You might have noticed. I certainly did!

First of all, my hosting company started complaining that Textise was hogging all the CPU on the shared server it was on. So they throttled it. This was understandable but reduced performance even further. It seemed that hundreds of  thousands of requests were hitting the app every day, all sourced from the Opera browser. Obviously, this immediately looked suspicious, given that Opera isn’t the most popular browser on the planet, and none of these requests were showing up in Google Analytics (which was presumably assuming them to be bots).

So, I signed up for CloudFlare, a proxy service that can filter out malicious requests before they hit your app server. CloudFlare found threats, and stopped them, but it seemed to miss the Opera-sourced attacks, which didn’t reduce at all.

Plan B: I moved Textise from the shared server to a dedicated, physical box. This costs ten times more a year but at least allows me to see exactly what’s going on. The new server coped better with the traffic but still had to be throttled to stop it crashing out on a regular basis.

Plan C: I added code to the Textise app to reject calls from Opera. This did, finally, reduce CPU, but I was unhappy about such a blanket approach.

Plan D: I trawled through the server logs and, with the help of the R Project, I extracted page hit info from Google Analytics so I could compare the two. Eventually, I found another way to identify the malicious requests, meaning that genuine Opera users would still be able to use Textise, and coded it into the app. I talked to the folks at CloudFlare, in the hope that there was a way I could configure CloudFlare to do something smilar, but it turned out that would cost me mucho cash, so the code stays in the application. This is a shame, as I’d rather these stupid calls were blocked well before they get anywhere near my server.

I’ve now also added SSL to the site. This doesn’t stop attacks, of course, but it means that your use of Textise is protected. A downside is the bookmarklet and Firefox add-on were slightly broken. I’ve now fixed the bookmarklet (to update, just drag it into your bookmark bar again) but, because Mozilla are changing the way that add-ons work again, I need to re-write the FF add-on, which will take a little while longer.

 

Advertisements

Can you hear me, Mother?

We’re trialling ReadSpeaker’s funky text-to-voice technology on Textised pages where the main content can be identified, like the BBC News Page.

Why not give it a go and let us know what you think?

SEOtext – “What the Search Engine saw”

SEOtext logo2

Announcing our newest product: SEOtext, the SEO tool that’s powered by Textise. SEOtext gets right into the textual content of a web page.

SEOtext can analyse the text of a web site to give you invaluable information about keyword frequency, which is, according to our new prospective partner, “truly ground breaking” for SEO. SEOtext can also help you to detect copyright breaches by extracting the “raw” text of a page.

For more information, see the new SEOtext Page.

Textise 3.0

Searching For A Heart Of GoldI’m pleased to say that I’ve now published a new version of Textise. Let’s call it version 3.0. This version sports a brand new hairstyle and an enquiring mind.

Seek And Ye Shall Find

First up, you now have the option of displaying a search box on some sites. This option is not switched on by default because, to be honest, it breaks the “text only” model, but you can enable it on the Options Page. When enabled, you’ll find a search box on selected sites, right at the top of the page. The feature doesn’t work with every site in existence containing a search form because the job of disentangling the myriad ways in which forms are submitted is too much for my brain. Instead, I’ve added  search boxes to popular sites whose search functions are easily replicated. As of writing, these are:

addons.mozilla
about.com
amazon.com
amazon.co.uk
ask.com
baidu.com
bbc.co.uk
bing.com
cdwow.com
dictionary.cambridge.org
ebay.com
ebay.co.uk
google.co.uk
google.com
imdb.com
twitter.com
metacritic.com
reddit.com
sogou.com
taobao.com
wikipedia.org
yahoo.com
yandex.ru
youtube

I’ll be adding others as I find (and test) them. If you have suggestions for other sites, just write to me via the Contact Page.

Now that pages can contain search boxes, I’ve also decided to reduce the number of search engines available from the home page. These are now Bing, Google and Yahoo only.

Skip to content

Regular users of Textise will know that it skips over the nasty navigation section of a page, directly to the main content – but only sometimes! This limitation is imposed by the page itself: if Textise can find an internal bookmark and a “Skip to content”-type link that points to it, it uses it. Otherwise you’re stuck with ten screens of unpleasantness to scroll through.

So it’s a useful feature, although not previously optional. Now, however, you’ll find that you can disable it on the Options Page. It’s still enabled by default. One reason you might want to disable it is that the new search boxes appear at the top of the page, not at the start of the content, and you may not want to skip straight past them.

*** Stop Press ***
Another option is now available: Strip navigation. This option will remove all content up to the main content, assuming Textise can identify it. It requires you to also have “Skip to main content” selected. Doesn’t, unfortunately, work with PDF conversions. I’ll have a think about that.

PDF

I’ve wanted to add the option to convert Textise’s text only output to PDF format for a while now but just couldn’t find an easy-to-use, cost-effective (free) solution.

So I’m pleased to report that pdfcrowd has come to the rescue, offering a fantastic free service that requires the addition of a single line of HTML code. Genius. Try the “Convert this page to a PDF” option next time you convert a page to text. Very useful if you want to save your text only output to read later.

ALEXA

There’s lots of anti-Alexa emotion on the ‘Net and there are some valid doubts expressed about its accuracy/relevance/usefulness. However, when Textise’s Alexa rank tumbles (which is a good thing) like it has been recently, I’m happy to go along with it.

Today’s rankings for Textise (I’m writing this on 21/02/2014) are 355,852 globally, 16,486 in the UK. These are pretty good numbers when you consider that there are over 650 million web sites in the world, over 10 million of them in the UK (with a “.uk” address).

Update 22/02/2014: Ha! The Alexa scores have now gone up! Told you it was rubbish. 🙂

Privacy

The use of cookies on web sites is becoming a big topic so I’ve finally got round to adding a Privacy Policy page where you can see how Textise uses them. In a sentence, Textise uses cookies to store your preferences and to provide anonymous analytical data (using Google Analytics). The analytical data is really useful because it allows me to see which sites are being converted to text and where users are located.

You’ll notice a pop-up nag box on the site now, asking you to accept the use of cookies. I’m afraid this also appears on the text only output but I have to make sure that all users see it and not everyone visits the home page (lots of people use the Firefox add-on or the bookmarklet and never need to go there – I know this because of the analytical data I get!).

Rest assured though, you need only click the acceptance button once and it’ll disappear from the whole site, forever (as long as you don’t delete your cookies!).

FIXES

Both the site and the web service have been spruced up a bit.

I noticed that the Mail Online site was causing Textise to crash out for no obvious reason. It’s a pretty nasty site, with long, long pages of links to the usual salacious celebrity stories and bigotry wrapped up as journalism. As you can tell, I love the Daily Mail but I didn’t let my personal feelings get in the way. It turned out that the problem was caused by the presence of unusual (and unnecessary) hexadecimal characters in the HTML of the pages, which I promptly zapped. All good now (apart from the wretched rag itself).

Some Chinese sites were also causing problems, this time by including “on” events in image tags (for example, onerror=”…javascript…”). This made an absolute mockery of my image processing code. Also now fixed.

I’ve also modified various other bits of code, sometimes in a spirit of tidiness, other times to very slightly improve performance. Actually, performance has improved dramatically anyway since the recent move to Hosting UK but you know what that (UK) supermarket ad says. No? Oh well.

If you have any comments or suggestions for Textise, please get in touch via the Contact Page or add a comment on the Feedback & Suggestions Page.

And your host for tonight is…!

Fix-My-Broken-Website_Tucson_Web_DesignThis weekend (Saturday 23/11/2013) I intend moving Textise to the new hosting on HostingUK. There’s absolutely no way this can go wrong, of course, but I thought I ought to warn you anyway…

You’ll know when Textise is running on the new servers if you see “Textise is now hosted by HostingUK” at the top of your text only output. Or if the site’s completely broken (which, if you remember, definitely can’t happen).

Textise has been really busy this week, which is great to see. And they said that we didn’t need text only pages! Hah!

Update 23/11/2013 13:36 GMT

The move to our new hosting has now completed successfully!

Please note that you may have to first remove the “textise.net/textiseOptions” cookie from your browser before you’ll be able to make changes on the Options page.

Note also that, at the moment, you’ll be sent back to the home page after making changes in Options. This is because of some changes I had to make due to a weird error being thrown during cookie processing. I’ll be fixing this later!

Update 23/11/2013 14:35 GMT

Fixed: After making changes on the Options page, you’ll now go back to the text only output you were viewing.

One Direction

Which way? THIS WAY!Just made a long-needed change to Textise. I noticed a little while ago that sanantonio.gov pages had stopped reporting SSL errors and were now complaining about “too many redirects”. I was pretty sure that the web service had code to deal with this but guess what? – it didn’t.

The solution I found came in two parts and I have to admit I didn’t do proper testing to see if just one part solved the problem. The first part – and no big surprise here – was to set my HttpWebRequest object’s AllowAutoRedirect property to True. The second part was to create a cookie container for the HttpWebRequest. This feels like a useful thing to do in general, which is why I added it anyway.

The upshot of these changes is that the pages at the official San Antonio web site now convert to text properly. Other sites will presumably work better now too.

In other news, I’ve noticed that the Amazon UK servers are refusing connections to Textise, so apologies if you regularly use the Amazon UK search from the home page. Hopefully they’ll see the error of their ways soon!

I’m pleased to report that the official State of Oregon web site is now using Textise for its accessibility/text only links. We had a little hiccup when some of their users reported errors when using Internet Explorer with the text-only links on the Oregon Courts pages. I traced this to the fact that the links had been set up with a ‘target=_blank’, which is not possible when using the example code that I published on the For Web Developers page. This is because the code sets the current page’s location to the text-only version, meaning that the ‘target=_blank’ is at the very least ignored (in Chrome and Firefox, for example) or causes an error (in Internet Explorer). I’m unsure why IE is being pernicketty here when other browsers can cope but that’s cross-browser compatibility for you! I’ve set up a test page where you can view the different behaviours, as well as an example of sending the text-only results to a new tab or page. Have a look at the source to see what’s going on. I’ll also add the alternative code to the For Web Developers page.

So things are moving forward nicely, meaning that this post’s title is quite appropriate and not at all a cynical attempt to drive more traffic to my site by misleading fans of boy bands.

Subscriptions

Convert-your-Blog-Visitors-to-Subscribers

I’ve been making some changes to the way that Textise manages subscribers. Subscriptions are available to web masters who want to use Textise to power Text Only/Accessibility links on their pages. The rules for subscriptions are pretty simple, as laid out on the For Web Developers page:

Commercial use

Textise is free for personal use. For commercial purposes (including, but not restricted to, the creation of text-only links that use Textise or calls to the web service for commercial web sites), please contact me using the Contact Us page.

My intention isn’t to charge all sites for this facility. For example, one current site using Textise is the Tanzanian Training Centre for International Health and I’m very happy for them to continue using the tool for free.

I was recently contacted by the official web site for the State Of Oregon, requesting a subscription for a few of their domains, and I’m pleased to say that their new, Textise-driven text-only links should go live on 23 July 2013. The web master of the sites also requested some small changes to the way their text-only pages render, with which I was happy to comply. These changes are:

  • The request for PayPal donations at the bottom of a text-only page is now removed for subscribing sites.
  • The message “This page has been Textised!” is changed to  “This text only page was specially created by Textise for <domain name>” for subscribing sites.

These changes will apply to all subscribers from now on. In the future I intend making further changes that will benefit subscribers, for example the facility to convert a Textised page into a PDF.

If you run a web site and would like to use Textise to create maintenance-free Text Only/Accessibility links on your pages (or you already use Textise in this way and haven’t got round to subscribing), please get in touch via the  Contact Us page.