Open Education Tagger

Setup your own data collection and search engine infrastructure for educational resources. No commandline-skills required, just a modern web browser.

Current state: beta / experimental (v.0.9)

Video tutorial (coming soon)

Toolkit

Open Education Tagger consists of three core components: A spreadsheet template for collection of resources, a browser-tool called "spreadsheet2index" (source code) and a generalized Search UI, which can be configured via simple GET-parameters (source code).

Demo / Public playground

Edit this publicly editable demo spreadsheet, data will be sent every 10 minutes to the index automatically:

Collect Explore in search UI

Elasticsearch hosted at bonsai.io (Privacy Policy), Auto-Sync-script.

Why yet another search project?

There are a lot of professional infrastructure, metadata and platform projects out there, but I wasn't aware of a solution which can be easily configured and deployed with limited resources / limited staff / limited commandline skills. So this little open source project is my contribution to that issue.

For nOERds: Architecture graphic | README


Set it up yourself

Video tutorial: coming soon (hopefully ;-))

0. Prepare

Please open the developer tools of your browser (Tutorial: Chrome, Firefox), we'll need them in the next steps to monitor requests.

Info: The following settings will be stored automatically in your browser storage, you can delete them anytime (see footer button "Delete local browser storage").

1. Google Drive Spreadsheet Template 📝

Collect educational resources together with a simple web spreadsheet, I prepared a simple template:

Clone it
View template

After you duplicated this spreadsheet, publish the spreadsheet ("File -> Publish in Web"). Close the publish dialogue. Afterwards you can use your spreadsheet-ID in the next steps.

2. Setup your own Elasticsearch index 🗄

Elasticsearch is an open source database technology which allows fast and flexible data collection.

Create your own instance, for example at one of these providers:

There are professional cloud providers (Elastic Cloud, AWS, Google, Azure, etc.) for Elasticsearch hosting, but you have to configure a lot more when you set up Elasticsearch there. Elasticsearch is Open Source, therefore you can of course install it on your own server if you have nerd-superpowers.

3. Elasticsearch authentication 🔑

After you created your first instance, access the security / access information of your instance (some providers auto-generate this, e.g. appbase, on some you have to configure this yourself). Access or create :

  • A read-write/admin-key for step 4
  • A read-only-key (not all providers provide it on free accounts) for the search UI, this will be used in step 5

2DO: add images/screenshots

Special hint: On bonsai.io you need to create an index first via "PUT /yourindex".

4. Send spreadsheet to index 📝 ➡️ 🗄️

This will perform a one-way-sync from the given Google Drive Spreadsheet to your Elasticsearch instance. The spreadsheet ID will be saved as data field in elasticsearch entries. Before the sync all existing entries with that spreadsheet ID will be deleted in the given elasticsearch instance (only if you synced data before). Don't worry about that if you're using that on a newly created Elasticsearch, but don't use it for important real-world databases.

Copy ID from address bar of your browser: https://docs.google.com/spreadsheets/d/THIS-IS-THE-ID/edit?usp=sharing, e.g. '1gqRt0UxtcTNGKduQnTlV1MR3U5ByBkzCyTMkWE6wb04'. Spreadsheet must be published via "File -> Publish in web".
In 99,999% of cases this will be "od6", see tutorial by @scottcents.
Host name, retrieved from "https://user:pass@THE-HOSTNAME.DOMAIN/index", e.g. "scalr.api.appbase.io" or "yourcluster.bonsaisearch.net" (without https://)
This is the app name on appbase.io, for bonsai.io you have to create it manually first
This is a "user:pass" string, e.g. "SxX2673783m:cc2829282921b19fe0dc567". Needs to have write permissions.

Warning This process will delete all elasticsearch entries which contain field and value 'spreadsheetid = YOUR-SUBMITTED-SPREADSHEET-ID'. This can't be undone. After the deletion the current values from the given spreadsheet are inserted.

This is just a quick & dirty script, no professional product. Use it at your own risk, no warranty given. Admin key might get exposed trough network. Advanced Setup heroku.com scheduled worker (cron job)

5. Explore the search interface 🗄 ➡️ 🔎

Nice, the data should be in the Elastichsearch index by now. How can users explore it in a nice way? The open source software reactivesearch provides a modern looking search interface for elasticsearch databases, here I prepared a general interface which works out-of-the-box with the current spreadsheet template:

Hostname and name of index are configured in step 4. (If you want just to get your URL for the search interface, you can leave admin key blank and proceed here)

This is a "user:pass" string, e.g. "SxX2673783m:cc2829282921b19fe0dc567". Security For security reasons this access key should only have rights to read data, because it will be exposed to end users using the search UI and in the URL/browser history.

Advanced: Fork it and build your own

All own source code provided as CC0 | Hosted on GitLab.com (Privacy) | Imprint | Privacy