Behind the scenes, we’re constantly improving the search engine that powers the website.
At the beginning of December, we made a major upgrade to our search technology to make your searches faster. Our lead Search Engineer, Dr Boon Low, explains what we did and what it means for you.
As The British Newspaper Archive’s lead search engineer, my job is to deliver the best search experience we can for our customers. It’s a complex job – every time a search is performed, we look through billions of words spread across hundreds of millions of articles and millions of pages, and try to bring back the most relevant matches for what you’re looking for.
The team’s work revolves around three main areas: improving the search interface, so it’s easy to search and your results are presented in an easy-to-understand way; interpreting what you’re searching for and bringing back the most relevant results; and finally, doing this quickly and efficiently, so your results are delivered in a flash.
Over the course of the summer, we started looking at ways to increase the speed of searches. The first thing that we looked at was increasing the speed of the software that processes your searches and returns the results (turbo-charging the brains of the search, if you like). We also looked at hardware improvements we could make to give the software the best possible chance of working at full capacity (beefing up the body that supports the brains). We then started building new search machines and testing how they worked with the huge volume of data that sits behind The BNA until we had found a build that we were happy with.
By the end of November, we were ready to go, and now came the tricky part: putting it all live. This involved bringing the new machine into service on the live site and making sure that it worked well with the older machines. We then took each older machine out of service, upgraded it and put it back into service in turn until the whole farm of machines was upgraded (yes, we really do call it a farm!).
It’s harder than it sounds, because we made the changes directly to the live system, rather than take the search down while we made the changes. We’re happy to say that we completed the work successfully over 3 days, and with no downtime at all – in fact, you shouldn’t even have noticed that we were doing anything behind the scenes.
So what’s the end result? Well, it’s fair to say that we are all delighted with the improvement in speed, which was better than we had anticipated from our early tests.
Before the upgrade, we were taking 1.6 seconds on average to return results for search queries. Since the upgrade, average search time is down to 0.6 seconds – we have managed to shave an entire second off.
To put it another way, the vast majority of searches on The BNA are now completing in under a second – around the same speed as searches on Google – and the average search is now over 250% faster.
We hope you notice the difference!
12 comments On We’ve upgraded! A radical improvement to the speed of your searches
great news, and keep up the good work.
However, on being really pleased to see you have started to upload the Coventry Standard for 1879, it barely works! the quality is so poor, that hardly any of the words are picked up in the searches. have a go yourself.
Hi Damien,
Thanks for your comment.
We’re sorry to hear about this – and we understand your disappointment.
From what we know so far, the microfilm images for this particular batch of newspapers were not ideal.
We’ll send you further info about this matter when we’ve looked into it in more detail.
Kind regards,
The BNA Team.
Merry Christmas BNA,
I don’t want to sound like Scrooge, but I would gladly have traded those extra seconds for four months of new content.
What are your plans for 2014?
Hello Kate,
As always, we have an excellent pipeline of new content on the way. We’re currently reviewing our plans for 2014, but as soon as we’ve got concrete announcements to make, rest assured we’ll be in touch. Hope you’re having an excellent Christmas.
Best Wishes,
Team BNA
Please , please, please can we have some Black Country and Wolverhampton newspapers? The population is/was huge. The Birmingham papers don’t really cover it much, and the Staffordshire papers seem to have covered mid/north Staffs. It’s a big gap!
I would love to have wildcards enabled rather than having to second guess the strange results of OCR. Are there any plans for this?
Also when I correct some text why is it not updated in the snippet that’s shown even after several months?
Keep up the good work – it’s a wonderful resource!
It’s good to hear about the search speed, but I didn’t consider this to be important. I have given up searching on quite a few occasions, but never because the search speed was slow. The problem was that many small articles are often taken together, as a single unit, for the purposes of a search. for instance, police reports, entertainments, court reports, etc. There is no still support for proximity/adjacency operators which can make it extremely hard to keep keywords in the context of a specific single report. I mentioned this when the project first began, as I was used to using such operators with the Gale database, but I was fobbed off. To make things worse, there is no highlighting of the keyword instances on the page [This is certainly true through the findmypast interface, and I have been told it is the same with the native interface], and this has meant that I have simply abandoned some searches as ‘not practical’. If I have 50 newspaper hits, and it would take 20 min to try and find the keyword hits in each paper (knowing that they might be false matches), then this is not going to work.
As an annual subscriber, with considerable experience of using digitised newspaper databases virtually since their inception, I have two major “beefs” about BNA.
1) Virtually always I want my results in order of date ascending. There ought to be a place where readers can set this (or date descending) as their global preference, and only have to change it when they want “relevance” instead. IMHO the relevance is not accurate anyway (see next point).
2) The “exact search” checkbox should do what it says. I am researching a biography of a man whose surname was “Blackburne” and I am plagued by results for “Blackburn” the Lancashire town and various people who spelled their surname “Blackburn”. Eliminating these “by hand” means 50% of the time I spend on BNA is wasted.
Ticking “exact search” seems to make no difference here, presumably because your OCR makes no distinction between the two forms of the name. And relevance searches son’t necessarily put “Blackburne” ahead of “Blackburn” either.
I made inquiries with BNA a while back as to whether it was technically possible to introduce a “recently added pages” search option. They said it was. I begged them “that I was on my knees as I write” and to please introduce this option. What if your researching a Mr Peter Thomas, you decide after a few weeks or months to do a fresh search, only to be completely flooded with 1,000′s of seen before results. Also, why do you have to actually view an article before you can tell that it’s one that you’ve already ascribed to your “bookmarked” collection? Surely, when doing a repeat word-search, it must be possible to show which items have previously been “bookmarked” without having to actually click on them. You could forgive some clunkiness in some old newspaper sites but brightsolid are supposed to be a cutting edge, C21st bunch of guys. Hmm.
Last compaint: whacked out titles selection. Your never going to be able to satisfy everyone, so if you’re going to add a new title, then really go for it and do a comprehensive run.Who needs one year of this or six months of that?
By the way, if any of you out there reading this have a BL reader’s card did you know that you can search/view C19th/early C20th African & S. American papers for free and from home! That was the only place I found an obituary for a person I was researching.
Thank you all very much for your feedback and the suggestions about how to make the website even better. We’re very keen to make improvements to The British Newspaper Archive, so your comments have all been passed on.
We’ve also set up a survey to allow us to collate your feedback and gain a better understanding of what you want from the website. We’d be very grateful if you could spare us 10 minutes to complete it. Just click on this link to take the survey:
https://www.surveymonkey.com/s/RYPYF2V
Very best wishes and thanks again,
The British Newspaper Archive team
I continue to be disappointed with the search experience. For example, I am searching for articles referring to “Lord Chief Justice Coleridge” after 1900. I enter that phrase in the phrase box, with parentheses. I expect to get back only those articles with that phrase but instead I receive more than 27,000 “results” with unrelated “Lords,” American “Chief Justices” etc etc. I don’t care how fast your search engine is if it can not deliver the specific results requested. I really wish we could go back to the previous British Library newspaper archive site because Brightsolid has proven to be a major disappointment.
Hi bibliomaine,
Sorry to hear that you’ve been struggling to find relevant results. The best way to search for a person is to put the name inside double quotation marks (or by using the advanced search and specifying that the name is a phrase). This video tutorial will show you how to do this: http://blog.britishnewspaperarchive.co.uk/2015/01/20/how-to-search-the-british-newspaper-archive-for-a-persons-name/
You can also use the advanced search to narrow your results down by date, so that you only see articles published after 1900. This video tutorial will show you how to do that: http://blog.britishnewspaperarchive.co.uk/2014/09/08/top-tip-searching-newspapers-from-a-particular-date-such-as-world-war-one/
As an example, here’s the results we’ve found for “lord chief justice coleridge” from 1 January 1900 – 31 December 1955: http://www.britishnewspaperarchive.co.uk/search/results/1900-01-01/1955-12-31?basicsearch=%22lord%20chief%20justice%20coleridge%22&phrasesearch=lord%20chief%20justice%20coleridge&sortorder=score
We hope that helps!