Let down by Google

The Fredures search engine is powered by Google Custom Search which, in principle, is a fantastic resource, making it possible to target specified websites in order to get highly focused results.

Unfortunately, however, Google has curtailed one of the most powerful features of its custom search. It used to be possible for the search engine to extract links from web pages. For various reasons, which I won’t go into, this was a fantastic feature for my purposes, and I made extensive use of it when I set up the Fredures search engine a couple of years ago.

However, Google disabled this feature last year, and this changes the whole way these search engines work. I knew the change had happened, but I have only just started to realize how badly it affects the way Fredures works (or doesn’t work)!

I am currently looking at ways round the problem. This sets me back even further than I already am. It seems as if I am taking two steps backwards for every step forwards. My initial target of getting everything running smoothly by the end of January is already impossible. Now I’m wondering whether the new target (to go fully operational sometime in February) is feasible…


The best-laid plans of mice and men…

Clearly I am not going to make my January 31 deadline for launching the Educational Hub website…

First it was technical glitches, like menus that work fine on a computer but not on a mobile phone. I’m sorting that one out, slowly, but now I’ve gone down with flu!

One day is just too short! I want to get the website up and running properly and get back to making videos, but everything seems to be receding further from my grasp even as I reach out for it. I suppose there’s a lesson in there somewhere!

Early English Books Online-Text Creation Partnership (EEBO-TCP)

Early English Books Online Text Creation Partnership (EEBO=TCP)

This is another database resource, and it’s one I’ve used a lot. It’s the result of a collaboration between ProQuest, a for-profit company, and the University of Michigan, the University of Oxford, and the Council on Library and Information Resources, with the cooperation of some 150 other libraries around the world.

Why I am writing about a commercial resource when my whole purpose is to promote open-access resources? Well, it’s because 25,368 texts published before 1700 came into the public domain just a few years ago, and the story of how that happened is an interesting example of how universities and libraries have interacted with the private sector to create a database. More about that later.

What they say

” 130,000 works, microfilmed over 70 years from more than 200 libraries worldwide, were made available online by ProQuest in one collection. Early English Books Online (EEBO) is now one of the most successful research collections ProQuest has ever produced and it is used by students and scholars in over 1,000 institutions worldwide

https://www.sc.pages04.net/lp/43888/470018/1208PQ%20EEBOTCP%20Brox-Jap.pd

But…

If you’re paying attention here (and you’ll need to, because it’s complicated!) you’ll notice that they are talking about Early English Books Online (EEBO), not Early English Books Online Text Creation Partnership (EEBO-TCP), which is a different kettle of fish.

Here’s the point. EEBO is not open access. I consists of PDF files that have not been run through an optical character reader (OCR). If you want to see the actual pages of these early modern texts this is what you need, but there’s no way you can get access to it except through a university or some other institution. Well, I guess if you were a billionaire you could take out a subscription and not notice it, but it’s priced way beyond what the average person could afford.

So what is EEBO-TCP, and how is it connected to EEBO? Well, as you might expect, having digitized all these books as PDF files, the next thing people wanted to do with them was make them searchable. EEBO-TCP consists of (aat the time of writing) 58,531 EEBO texts converted into text-searchable plain text.

“Wait a minute!” I hear you cry. “Didn’t you say there were 23,568 texts?

Ah! So you have been paying attention. Yes, that’s right. 23,568 texts are in the public domain. But a further 34,963 texts are still only accessible if you have a log-in. Furthermore, if you have a log-in (which, you’ll remember, basically means you have to be a member of a participating institution) you can click through from the plain text version to the PDFs. This is particularly useful for those books which do not mark the page number – and many (perhaps most) early modern books do not mark page numbers. If you can see the PDF you can figure out the signature or folio, which is what was used before page numbers came into vogue, but if you can’t see the PDF you can access the text but you can’t give a proper reference for it.

Smart, huh? I mean smart from ProQuest’s point of view. The text-readable material that has come into the public domain is a fantastic resource, but it would be just that much better if you could access the PDFs, but that requires a subscription.

This is the way “freemiums” work. A “freemium” is a resource that gives you a certain amount free, but holds back features you can only get by paying for them. They offer something, but they’ve got more than they offer.

Still, something’s better than nothing, and the 25,000+ public domain texts are still a resource worth knowing about if you’re interested in the literature of that period.

So how does it work?

A couple of years ago a colleague of mine in the world of antiquarian books posted a Christmas message on Facebook featuring what he said was the earliest occurrence of the salutation “Merry Christmas” in print. That’s what the Oxford English Dictionary (another resource you have to pay for if you want it in its complete form!) says, and that’s what he was going by.

Without wanting to go into killjoy mode I felt it was worth putting his claim to the test and searched for “Merry Christmas” on EEBO-TCP. I found that the first occurrence with this spelling was in 1577, several decades earlier than the text my colleague had featured in his Facebook post.

In the same way, it has been possible to show that a fair number of the expressions credited to Shakespeare had in fact found their way into print before he used them. We can also establish patterns of usage. For example, we can check the use of the word “cruelty” in proximity to the word “Catholic” to see how closely these two things were connected in people’s minds, and we can check the frequency of such usage over time, to get some idea of whether attitudes changed, or were related to specific events, such as the Gunpowder Plot.

Of course, we can’t just go by the raw results. We need to examine each result and see the full context, but being able to isolate all the relevant texts in this way is something that people could only have dreamed about until just a few years ago.

Slowly, the database is redefining our understanding of the early modern period. It’s frustrating that the whole database isn’t open access, together with the PDFs that underlie it, but it’s still a lot better than nothing!

The Internet Archive

The Internet Archive is a massive database, probably the biggest one out there. It houses digitized versions of millions of books, journals, magazines, audiobooks, videos and much more.

It also takes “snapshots” of websites, providing a history of changes made at any given URL at any time in its history. This resource is called the Wayback Machine.

This is particularly valuable if you are looking for something that used to be available at a particular URL but is no longer there. Using the Wayback Machine you can find that information again and – invaluably if you are citing it as an academic reference or something like that – it will tell you when that information was available at that URL and when it ceased to be available. The Wayback Machine provides data on 330 billion websites!

The Internet Archive is funded by a number of bodies, among them the National Science Foundation, the Council of Library Information Resources, and the Institute of Library and Museum Services.

What they say

“The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge.”

https://archive.org/about

What I say

This is one of those resources that I just keep going back to! How many times have I searched elsewhere in vain for an out-of-print book only to find it on the Internet Archive!

There may be an issue with its funding; quite a lot of it seems to come from sources linked directly or indirectly to the American government. As the Society of American Archivists points out, archiving is never entirely neutral; there are always choices to be made. However, as far as I can make out, the Internet Archive is constantly expanding the frontiers of its activities, making as much material available as possible:

“Even the Internet Archive, a repository of online content, has positioned itself as a tool of accountability through the Wayback Machine and its recent endeavor to collect the 45th president’s online statements, interviews, and sound bites.”

https://issuesandadvocacy.wordpress.com/2017/03/27/archivists-on-the-issues-the-neutrality-lie-and-archiving-in-the-now/

TLDR: Whatever the pros and cons of its archiving policy, the Internet Archive is too huge and comprehensive to ignore!

Open Culture

Open Culture is one of a number of “metaresources” collating and bringing together information about online resources. Its main focus is online lecture courses and MOOCs (massive open online courses). At the time of writing Open Culture provides links to some 1,300 lecture courses and over 1000 MOOCs. There are also listings for audiobooks, ebooks, language courses and resources for children.

What they say

“Web 2.0 has given us great amounts of intelligent audio and video. It’s all free. It’s all enriching. But it’s also scattered across the web, and not easy to find. Our whole mission is to centralize this content, curate it, and give you access to this high quality content whenever and wherever you want it.”

http://www.openculture.com/faq

What I say

The core content of Open Culture is its links to lecture courses and MOOCs. It’s good but, like other sites aiming to collate open access resources (including this one!), it struggles to stay ahead of the game. Dhawal Shah’s Class Central, which focuses exclusively on MOOCs, list over 7000 – seven times as many as Open Culture.

If it were a competition, Open Culture might lose out to Class Central, at least when it comes to listing MOOCs, but it’s not a competition! The more sites that are out there promoting this kind of online content the more people will become aware of them.

W3Schools and other web development resources

I am planning to post details of one outstanding open-access educational resource every day until the launch date of Educational Hub at the end of this month. Today’s choice is W3Schools, a website devoted to teaching coding for web development.

What they say

From the W3Schools “About” page:

“W3Schools is a web developers site, with tutorials and references on web development languages such as HTML, CSS, JavaScript, PHP, SQL, Python, jQuery, W3.CSS, and Bootstrap, covering most aspects of web programming…

“W3Schools was originally created in 1998 by Refsnes Data, a Norwegian software development and consulting company…

“W3Schools is, and will always be, a completely free developers resource…

“Many people work very hard to ensure w3schools remains useful, educational, updated, and interesting.”

https://www.w3schools.com/about/default.asp

What I say

I have used W3Schools countless times in designing and building my websites. I am not trained in coding or computer programming, but the explanations are nearly always clear and straightforward enough for a non-specialist like me to be able to understand them.

This is a wonderful example of the internet at its best – a collaborative open-access resource that has been worked on over the years. It is not the only site of its kind – among others are Tutorialspoint (“Our content and resources are freely available and we prefer to keep it that way”), Code.org (“Code.org® is a nonprofit dedicated to expanding access to computer science in schools and increasing participation by women and underrepresented minorities”) and Exercism (“Exercism is free forever … Exercism is entirely open source and relies on the contributions of thousands of wonderful people”) – but it is the most comprehensive and extensive.

Resources like these are levelling the playing field and changing the world we live in. By sharing them, promoting them and using them we can be the change we want to see!

What’s next?

Once I get the coding sorted out, the next step is going to be an online form for people to create a listing on Edhub. That’s much easier and requires a lot less input than creating web pages on the site, and a listing can always be expanded into web pages later on.

But I’m only an amateur when it comes to coding, and creating a form that will do the trick is testing me to the limits. I’m probably going to have to call in some outside help on this one, but I’m hoping it’ll be ready to roll within a week or so.

Edhub goes online!

The Educational Hub went online on January 1st. I’ve been using it for my regular teaching (John Wilson’s pages) and to showcase the educational videos I’ve been working on since I retired (Ano sensei!), and now I’m opening it up to other teachers.

Essentially, it’s a free hosting site for teachers to showcase their materials. I’m starting small – just grapevining, rather than advertising – because I wouldn’t be able to handle a large influx of people wanting to use it, but the long-term aim is to attract as many users as possible.

I’m hoping to establish a nucleus of about a dozen users at this pilot stage, before moving on to advertise it more widely. Any teething difficulties should get sorted out at this stage, and I’ll be able to get a feel of how it will work. Will it be a bunch of teachers who just happen to host their materials on the same site? Or will it become a kind of community of educators, sharing their concerns and airing their views? Obviously, the latter would be more exciting, but only time will tell!