Textise 3.0

Searching For A Heart Of GoldI’m pleased to say that I’ve now published a new version of Textise. Let’s call it version 3.0. This version sports a brand new hairstyle and an enquiring mind.

Seek And Ye Shall Find

First up, you now have the option of displaying a search box on some sites. This option is not switched on by default because, to be honest, it breaks the “text only” model, but you can enable it on the Options Page. When enabled, you’ll find a search box on selected sites, right at the top of the page. The feature doesn’t work with every site in existence containing a search form because the job of disentangling the myriad ways in which forms are submitted is too much for my brain. Instead, I’ve added  search boxes to popular sites whose search functions are easily replicated. As of writing, these are:

addons.mozilla
about.com
amazon.com
amazon.co.uk
ask.com
baidu.com
bbc.co.uk
bing.com
cdwow.com
dictionary.cambridge.org
ebay.com
ebay.co.uk
google.co.uk
google.com
imdb.com
twitter.com
metacritic.com
reddit.com
sogou.com
taobao.com
wikipedia.org
yahoo.com
yandex.ru
youtube

I’ll be adding others as I find (and test) them. If you have suggestions for other sites, just write to me via the Contact Page.

Now that pages can contain search boxes, I’ve also decided to reduce the number of search engines available from the home page. These are now Bing, Google and Yahoo only.

Skip to content

Regular users of Textise will know that it skips over the nasty navigation section of a page, directly to the main content – but only sometimes! This limitation is imposed by the page itself: if Textise can find an internal bookmark and a “Skip to content”-type link that points to it, it uses it. Otherwise you’re stuck with ten screens of unpleasantness to scroll through.

So it’s a useful feature, although not previously optional. Now, however, you’ll find that you can disable it on the Options Page. It’s still enabled by default. One reason you might want to disable it is that the new search boxes appear at the top of the page, not at the start of the content, and you may not want to skip straight past them.

*** Stop Press ***
Another option is now available: Strip navigation. This option will remove all content up to the main content, assuming Textise can identify it. It requires you to also have “Skip to main content” selected. Doesn’t, unfortunately, work with PDF conversions. I’ll have a think about that.

PDF

I’ve wanted to add the option to convert Textise’s text only output to PDF format for a while now but just couldn’t find an easy-to-use, cost-effective (free) solution.

So I’m pleased to report that pdfcrowd has come to the rescue, offering a fantastic free service that requires the addition of a single line of HTML code. Genius. Try the “Convert this page to a PDF” option next time you convert a page to text. Very useful if you want to save your text only output to read later.

ALEXA

There’s lots of anti-Alexa emotion on the ‘Net and there are some valid doubts expressed about its accuracy/relevance/usefulness. However, when Textise’s Alexa rank tumbles (which is a good thing) like it has been recently, I’m happy to go along with it.

Today’s rankings for Textise (I’m writing this on 21/02/2014) are 355,852 globally, 16,486 in the UK. These are pretty good numbers when you consider that there are over 650 million web sites in the world, over 10 million of them in the UK (with a “.uk” address).

Update 22/02/2014: Ha! The Alexa scores have now gone up! Told you it was rubbish. 🙂

Privacy

The use of cookies on web sites is becoming a big topic so I’ve finally got round to adding a Privacy Policy page where you can see how Textise uses them. In a sentence, Textise uses cookies to store your preferences and to provide anonymous analytical data (using Google Analytics). The analytical data is really useful because it allows me to see which sites are being converted to text and where users are located.

You’ll notice a pop-up nag box on the site now, asking you to accept the use of cookies. I’m afraid this also appears on the text only output but I have to make sure that all users see it and not everyone visits the home page (lots of people use the Firefox add-on or the bookmarklet and never need to go there – I know this because of the analytical data I get!).

Rest assured though, you need only click the acceptance button once and it’ll disappear from the whole site, forever (as long as you don’t delete your cookies!).

FIXES

Both the site and the web service have been spruced up a bit.

I noticed that the Mail Online site was causing Textise to crash out for no obvious reason. It’s a pretty nasty site, with long, long pages of links to the usual salacious celebrity stories and bigotry wrapped up as journalism. As you can tell, I love the Daily Mail but I didn’t let my personal feelings get in the way. It turned out that the problem was caused by the presence of unusual (and unnecessary) hexadecimal characters in the HTML of the pages, which I promptly zapped. All good now (apart from the wretched rag itself).

Some Chinese sites were also causing problems, this time by including “on” events in image tags (for example, onerror=”…javascript…”). This made an absolute mockery of my image processing code. Also now fixed.

I’ve also modified various other bits of code, sometimes in a spirit of tidiness, other times to very slightly improve performance. Actually, performance has improved dramatically anyway since the recent move to Hosting UK but you know what that (UK) supermarket ad says. No? Oh well.

If you have any comments or suggestions for Textise, please get in touch via the Contact Page or add a comment on the Feedback & Suggestions Page.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s