Creating a static version of a (Craft CMS) website
All the websites I develop run on shared hosting plans. This sometimes leads to poor performance. One way to improve the page speed is to generate a static version of the website instead of having Craft CMS generate them for every user (of course Craft CMS does some caching as well, but usually this doesn't help much on slow servers).
Generating a static version means creating HTML files for every page and serving them directly. Then, no database queries are run and no PHP scripts or any other processing need to be executed on the server.
Static site builders
Some years ago I found out about static site generators like eleventy. What I didn't like about it is the build process. Everytime a change was made on the website, one or more pages needed to be regenerated. I'm not a big fan of build processes with all the different tools involved and lots of dependencies. It happened to me that when I tried to keep everything up to date I needed to rewrite parts of my code because a new major version of eleventy was available and things didn't work like before any more.
Why not just scrape the HTML code of your Craft CMS website?
At some point I thought, why not generate a static version of the website by just scraping it myself. So I could still use Craft's (front end) tools (like image transforms and automatic srcset generation, asset version hashing, previews ...) and plugins as well as twig to build the front end. And then use an automated script to save html files.
That's how I did it:
Put the Craft installation on a subdomain
Since the static version should be available via the (root) domain, I put the Craft installation on a subdomain. This subdomain should not get indexed by search engines, so I put
CRAFT_DISALLOW_ROBOTS=true
in my .dev
file.
Create a kind of sitemap
For the scraping script to know which pages to download, I create a fake-JSON array. It's kind of a sitemap. In the templates
folder, I added a file called scrape-list.twig
with the following code:
[
{% for entry in craft.entries.section(['section1', 'section2', ...]).all %}
{
"title": "{{ entry.title }}",
"url": "{{ entry.url }}",
"slug": "{{ entry.slug }}",
"type": "{{ entry.type }}"
}{{ not loop.last ? "," }}
{% endfor %}
]
What may needs to be adapted is the array of sections handles. If you use templates without sections, like a 404 page, you can add those to the array.
Create the scraper
Next I created the file scraper.php
in the web
folder of the Craft CMS installation. This script will take care of scraping the pages from the subdomain. The domain mapping for the root domain is the static
folder in the same parent folder as the web
folder.
<?php
$cmsurl = "https://subdomain.domain.tld";
$rooturl = "https://domain.tld";
$jsonurl = $cmsurl . "/scrape-list";
$staticFolderPath = "../static";
// get the json text with the page data
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $jsonurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
// convert text to json
$jsonData = json_decode($data);
// delete everything in static folder
shell_exec("rm -r ../static/*");
foreach ($jsonData as $page) {
// remove the base url to get the filename / foldername
$filename = str_replace($cmsurl, "", $page->url);
// get the html code
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $page->url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($ch);
curl_close($ch);
$content = str_replace($cmsurl, $rooturl, $content );
// all the files are called index.html but put in subfolders
// the contact page for example will be at static/contact/index.html
mkdir($staticFolderPath ."/". $filename, 0755, true);
file_put_contents($staticFolderPath ."/". $filename . "/index.html", $content, LOCK_EX);
}
// create symlink for assets
shell_exec("ln -s *absolutePathTo*/assets *absolutePathTo*/static/assets");
// create symlink for uploads
shell_exec("ln -s *absolutePathTo*/web/uploads *absolutePathTo*/static/uploads");
Execute a webhook after changes are made
Everytime an entry gets changed, a new static version should get generated. Usually, the websites I create don't have many subpages and the contents don't change often. So I just regenerate the whole website and not only the pages that actually changed. Websites with frequent changes and/or lots of pages should follow a different approach.
What needs to be done now is to execute the scraper.php
file. This can be done via a webhook. There is a plugin available to do this.
As "Sender Class" I chose craft\elements\Entry
, as "Event Name" afterSave
.
As Filters I set "Element is enabled" to required
. "Element is a Draft" prevents the webhook execution.
For "Request Method & URL" I used GET
and the set the URL to scraper.php
.
PHP config
For this script to work, the shell_exec
command needs to be available. Also the values for max_execution_time is relevant.