gatsby-plugin-merge-robots
Based off the default gatsby robots.txt plugin. Create
robots.txt
for your Gatsby site on build.
Have another app/service/subdomain not managed by your gatsby site, but need to consolidate to one robots.txt
?
gatsby-plugin-merge-robots
is a drop-in replacement for gatsby-plugin-robots-txt
with one added parameter:
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
host: 'https://www.example.com',
sitemap: 'https://www.example.com/sitemap.xml',
policy: [{userAgent: '*', allow: '/'}],
external: ["https://example.com/blog/robots.txt"]
}
}
]
};
The plugin will pull and refactor the external robots.txt by removing unneessary metadata, transforming the path to be compatible with your root level robots.txt
, then appending it to the output of your gatsby robots.txt
. Magic! ⚡
Install
yarn add gatsby-plugin-merge-robots
or
npm install --save gatsby-plugin-merge-robots
How To Use
gatsby-config.js
module.exports = {
siteMetadata: {
siteUrl: 'https://www.example.com'
},
plugins: ['gatsby-plugin-merge-robots']
};
Options
This plugin uses generate-robotstxt
to generate content
of robots.txt
and it has the following options:
Name | Type | Default | Description |
---|---|---|---|
host |
String |
${siteMetadata.siteUrl} |
Host of your site |
sitemap |
String / String[] |
${siteMetadata.siteUrl}/sitemap/sitemap-index.xml |
Path(s) to sitemap.xml |
policy |
Policy[] |
[] |
List of Policy rules |
configFile |
String |
undefined |
Path to external config file |
output |
String |
/robots.txt |
Path where to create the robots.txt |
gatsby-config.js
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
host: 'https://www.example.com',
sitemap: 'https://www.example.com/sitemap.xml',
policy: [{userAgent: '*', allow: '/'}]
}
}
]
};
env
-option
gatsby-config.js
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
host: 'https://www.example.com',
sitemap: 'https://www.example.com/sitemap.xml',
env: {
development: {
policy: [{userAgent: '*', disallow: ['/']}]
},
production: {
policy: [{userAgent: '*', allow: '/'}]
}
}
}
}
]
};
The env
key will be taken from process.env.GATSBY_ACTIVE_ENV
first (
see Gatsby Environment Variables for more information on this
variable), falling back to process.env.NODE_ENV
. When this is not available then it defaults to development
.
You can resolve the env
key by using resolveEnv
function:
gatsby-config.js
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
host: 'https://www.example.com',
sitemap: 'https://www.example.com/sitemap.xml',
resolveEnv: () => process.env.GATSBY_ENV,
env: {
development: {
policy: [{userAgent: '*', disallow: ['/']}]
},
production: {
policy: [{userAgent: '*', allow: '/'}]
}
}
}
}
]
};
configFile
-option
You can use the configFile
option to set specific external configuration:
gatsby-config.js
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
configFile: 'robots-txt.config.js'
}
}
]
};
robots-txt.config.js
module.exports = {
host: 'https://www.example.com',
sitemap: 'https://www.example.com/sitemap.xml',
policy: [{userAgent: '*'}]
};
Netlify
If you would like to disable crawlers for deploy-previews you can use the following snippet:
gatsby-config.js
const {
NODE_ENV,
URL: NETLIFY_SITE_URL = 'https://www.example.com',
DEPLOY_PRIME_URL: NETLIFY_DEPLOY_URL = NETLIFY_SITE_URL,
CONTEXT: NETLIFY_ENV = NODE_ENV
} = process.env;
const isNetlifyProduction = NETLIFY_ENV === 'production';
const siteUrl = isNetlifyProduction ? NETLIFY_SITE_URL : NETLIFY_DEPLOY_URL;
module.exports = {
siteMetadata: {
siteUrl
},
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
resolveEnv: () => NETLIFY_ENV,
env: {
production: {
policy: [{userAgent: '*'}]
},
'branch-deploy': {
policy: [{userAgent: '*', disallow: ['/']}],
sitemap: null,
host: null
},
'deploy-preview': {
policy: [{userAgent: '*', disallow: ['/']}],
sitemap: null,
host: null
}
}
}
}
]
};
query
-option
By default the site URL will come from the Gatsby node site.siteMeta.siteUrl
. Like
in Gatsby’s sitemap plugin an optional GraphQL query can be
used to provide a different value from another data source as long as it returns the same shape:
gatsby-config.js
module.exports = {
plugins: [
{
resolve: 'gatsby-plugin-merge-robots',
options: {
query: `{
site: MyCustomDataSource {
siteMetadata {
siteUrl
}
}
}`
}
}
]
};
Contributors
License
MIT License