-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Expand file tree
/
Copy pathhtml-proofer
More file actions
executable file
·74 lines (69 loc) · 3.22 KB
/
html-proofer
File metadata and controls
executable file
·74 lines (69 loc) · 3.22 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#!/usr/bin/env ruby
# frozen_string_literal: true
# ---------------------------------------------------------
# HTMLProofer Runner Script
# ---------------------------------------------------------
# This script checks the generated static site (usually from
# Jekyll or another static site generator) for:
# - Broken links
# - Invalid HTML
# - Missing OpenGraph tags
# - Missing favicons
# - 4xx link errors
#
# HTMLProofer helps ensure the built website is clean,
# accessible, and free of broken external/internal links.
#
# This version adds clear comments and formatting to make
# it easier for contributors to understand and maintain.
# ---------------------------------------------------------
require "bundler/setup"
require "html-proofer"
# ---------------------------------------------------------
# URLs & patterns to ignore during link checking.
# Some websites block automated requests, cause false
# positives, or frequently return rate-limit errors.
# ---------------------------------------------------------
url_ignores = [
"https://okdistribute.xyz/post/okf-de",
"https://www.drupal.org/community-initiatives/drupal-core/usability",
"https://scripts.sil.org/ofl",
"https://the-orbit.net/almostdiamonds/2014/04/10/so-youve-got-yourself-a-policy-now-what/",
"https://pages.18f.gov/open-source-guide/making-readmes-readable/",
"https://foundation.mozilla.org/en/blog/its-a-wrap-movement-building-from-home/",
"https://sloan.org/programs/digital-technology",
"https://www.jfklibrary.org/learn/education/teachers/curricular-resources/ask-not-what-your-country-can-do-for-you",
"https://stackoverflow.com/questions/18664074/",
"https://geekfeminism.fandom.com/wiki/Meritocracy",
"https://news.ycombinator.com/item?id=7531689",
# Regex patterns for broader ignore rules
%r{^https?://stackoverflow\.com/questions/18664074/},
%r{^https?://readwrite\.com/2014/10/10/open-source-diversity-how-to-contribute/},
%r{^https?://twitter\.com/},
%r{^https?://(www\.)?kickstarter\.com/},
%r{^https://guides\.github\.com/},
%r{^https://help\.github\.com/},
%r{^https://github\.com/},
%r{^https?://(www\.)?reddit\.com},
%r{^https://rockwoodleadership\.org/},
%r{^https://(www\.)?npmjs\.com},
%r{^https://(www\.)?quora\.com},
%r{^https?://(www\.)?medium\.com},
]
# ---------------------------------------------------------
# Run HTMLProofer with project-specific settings
# ---------------------------------------------------------
HTMLProofer::Runner.new(
["_site"], # Directory containing the generated site
parallel: { in_threads: 4 }, # Speed up checks using 4 threads
type: :directory,
ignore_urls: url_ignores, # Skip known-problematic URLs
check_html: true, # Validate HTML structure
check_opengraph: true, # Check for OpenGraph tags
favicon: true, # Ensure favicon exists
assume_extension: true, # Allow links without file extensions
allow_missing_href: true, # Don't fail on <a> tags with no href
enforce_https: false, # Allow HTTP links
only_4xx: true, # Only report 4xx errors from external URLs
ignore_status_codes: [429] # Ignore Too Many Requests (rate-limit)
).run