Skip to main content Accessibility Feedback

How to create your own search API for a static website with JavaScript and PHP

Five years ago, I ditched WordPress for a static site generator. The move brought a lot of benefits, but one of the things I lost was built-in search.

I ended up hacking together my own solution using an inline object with all of the content from all of my articles, and some JavaScript to sort through it and find what the person was searching for.

It’s worked great!

But after years of daily writing, the /search page was getting increasingly bigger and slower to load. Because of that inline JS object, it had reached several megabytes in size!

As part of the content migration I wrote about yesterday, I decided to update how search works. Today, I’m going to share what I did.

Let’s dig in!

The approach

At a high level, here’s what I do…

  1. All searchable content gets stored as a JSON file on the server.
  2. Some JavaScript on the /search page calls my search API and sends the query the user is searching for.
  3. The search API loops through the content in that JSON file and finds matches.
  4. The search API sends the matches back, and the JavaScript renders them on the page.

Because there’s no database involved and the API is just vanilla PHP, the whole thing is absurdly fast—much faster than any WordPress search I’ve ever used!

Creating a search index

For this to work, I need an index of content for the search API to look at.

Hugo, my static site generator of choice, provides a built-in way to generate output formats beyond just HTML. I believe 11ty has a similar feature.

I configured my /search page template to generate both HTML and a JSON file.

Then, I moved the templating I was using to create my old JavaScript object of posts to the template for the JSON file. I also added some front matter to specify which content types should be included, because I wanted to include articles, courses, and my toolkit in the results.

<!-- search.md front matter -->
---
title: Search
type: search
searchTypes: ["articles", "courses", "toolkit"]
outputs: ["html", "json"]
---

Your setup may differ.

Here’s what the JSON template looks like. Hugo uses GoLang for its templating.

[
{{- $.Scratch.Set "comma" false -}}
{{- range $type := .Params.searchTypes -}}
	{{- range $page := (where $.Site.Pages "Type" $type) -}}
	{{- if ($.Scratch.Get "comma") -}},{{- else -}}{{- $.Scratch.Set "comma" true -}}{{- end -}}{
		"title": {{ $page.Title | htmlEscape | jsonify }},
		"url": {{ $page.Permalink | jsonify }},
		"date": "{{ $page.PublishDate.Format "January 2, 2006" }}",
		"datetime": {{ $page.PublishDate | jsonify }},
		"type": "{{ title $page.Type }}",
		"content": {{ $page.Content | plainify | jsonify }},
		"summary": {{ $page.Summary | plainify | jsonify }}
	}
	{{- end -}}
{{- end -}}
]

This creates a JSON file with the title, URL, published date, content type, content, and summary.

The Search API

Next, I had to setup a search API to respond to queries.

I learned to code with WordPress and PHP, and it runs basically anyways, so that’s my go-to language for this kind of thing. I created a search.php file, and set it up to receive and respond to Ajax requests.

First, I added a few helper functions.

The send_response() method sends back an encoded JSON object as a response (with an HTTP status code). The get_request_data() method gets data from the API request (and supports encoded form data, JSON objects, the FormData object, and query string variables).

<?php

/**
 * Send an API response
 * @param  string   $response The API message
 * @param  integer  $code     The response code
 * @
 */
function send_response ($response, $code = 200) {
	http_response_code($code);
	die(json_encode($response));
}

/**
 * Get data object from API data
 * @return Object The data object
 */
function get_request_data () {
	$_POST = array_merge($_POST, (array) json_decode(file_get_contents('php://input'), true));
	return $_POST;
}

When the file is called, the first thing it does is get the request data and assign it to the $data variable. Then, it checks for a query (the q parameter).

If there isn’t one, it returns an error message.

<?php

// Get the request data
$data = get_request_data();

// Check for required data
if (empty($data['q'])) {
	send_response(['msg' => 'Sorry, no matches were found.'], 400);
}

I created a search() function accepts the $query the user is searching for as an argument.

Then, I copy/pasted in my original JavaScript code and began converting it to PHP.

First, I convert the $query to lowercase, and convert it into an array of individual words.

<?php

/**
 * Do a search
 * @param  String $query The search query
 * @return Array         The search results
 */
function search ($query) {

	// Get an array of query words
	$query_arr = explode(' ', strtolower($query));

}

There are a bunch of generic words you generally don’t want to include a search (a, an, the, and so on).

I created an array of those $stop_words. Then, I loop through the $query_arr and generate a new $cleaned_query with those words omitted.

<?php

// Get an array of query words
$query_arr = explode(' ', strtolower($query));

// A list of words to ignore
$stop_words = ['a', 'an', 'and', 'are', 'aren\'t', 'as', 'by', 'can', 'cannot', 'can\'t', 'could', 'couldn\'t', 'how', 'is', 'isn\'t', 'it', 'its', 'it\'s', 'that', 'the', 'their', 'there', 'they', 'they\'re', 'them', 'to', 'too', 'us', 'very', 'was', 'we', 'well', 'were', 'what', 'whatever', 'when', 'whenever', 'where', 'with', 'would', 'yet', 'you', 'your', 'yours', 'yourself', 'yourselves', 'the'];

// Remove the ignored words
$cleaned_query = [];
foreach ($query_arr as $word) {
	if (in_array($word, $stop_words)) continue;
	$cleaned_query[] = $word;
}

Next, I created a helper function to get the actual search JSON file, read its content, and convert it from a string to an array.

The specific $path you use will vary by how you have your site and directories setup.

/**
 * Get file
 * @param  String  $filename  The filename
 * @param  *       $fallback  Fallback content if the file isn't found
 * @param  Boolean $as_string Return string instead of decoded object
 * @return *                The file content
 */
function get_search_file () {

	// File path
	$path = dirname(__FILE__) . '/path/to/search/index.json';

	// If file exists, return it
	if (file_exists($path)) {
		$file = file_get_contents($path);
		return json_decode($file, true);
	}

	// Otherwise, return a fallback
	return json_decode('[]', true);

}

In the search() function, I run the get_search_file() method to get the search index.

Then, I loop through each item and actually do my search.

For each $word in my $cleaned_query, I check if that word is in the $article title or content. I give the title a lot of priority for matching, and each instance of the word in the content bumps the priority a little bit.

If any of the words match, I push the $article into the $results array.

<?php

// Get the source data
$file = get_search_file();

// Create results array
$results = [];
foreach ($file as $article) {

	// Setup priority count
	$priority = 0;

	// Assign priority for matches
	foreach ($cleaned_query as $word) {

		// If word is in title (case-insensitive)
		if (preg_match("/\b{$word}\b/i", $article['title'])) {
			$priority += 100;
		}

		// If word is in article
		preg_match("/\b{$word}\b/i", $article['content'], $matches);
		if (!empty($matches)) {
			$priority += count($matches);
		}

	}

	// If any matches, push to results
	if ($priority > 0) {
		$article['priority'] = $priority;
		$results[] = $article;
	}

}

I want the highest matching items to show first, so I run a sort on my $results, ordering them by $priority.

Then, I return the $results.

<?php

/**
 * Do a search
 * @param  String $query The search query
 * @return Array         The search results
 */
function search ($query) {

	// ...

	// Sort the results by priority
	function sorter ($a, $b) {
		return $a['priority'] < $b['priority'] ? 1 : -1;
	}
	usort($results, "sorter");

	// Return the search results
	return $results;

}

Now, I can pass the $data['q'], the query, into the search() function to do a search.

Then, I pass the $results into the send_response() method to send the response back to the requesting JavaScript file.

<?php

// Get search data
$results = search($data['q']);

// Otherwise, show success message
send_response($results);

Doing a search in the front end

Back in my HTML, I include a basic search form.

By default, the form actually makes a request to DuckDuckGo.com, with results restricted to GoMakeThings.com. This way, if the JavaScript fails, users can still search.

<form action="https://duckduckgo.com/" method="get" id="form-search">
	<label for="input-search">Enter your search criteria:</label>
	<input type="text" name="q" id="input-search">
	<input type="hidden" name="sites" value="gomakethings.com">
	<button>
		Search
	</button>
</form>

Once the JavaScript loads, it intercepts submit events on the form and calls my search API.

But… that’s an article all by itself, so we’ll look at that tomorrow!