Testing Plagiarism Detection in Automated Publishing Pipelines with Originality.ai

Some links in this article are affiliate links to products I actually use. If you sign up through them, I may earn a small commission — at no extra cost to you.
This post is part of an ongoing series documenting how I built Nexus, my automated content publishing pipeline. Earlier posts covered generation and scheduling – this one covers a quality gate I almost skipped: plagiarism detection in automated publishing.
Why I Added a Content Check
Plagiarism detection becomes a hard requirement the moment you start automating content generation. There is a risk that large language models generate phrasing that mirrors existing web content closely enough to trigger search engine filters and penalties, and a duplicate content checker is the most practical safety net I could add.
Manual copy-pasting into one of the available web-based checkers is an obvious solution – but this did not feel like the right approach for my setup – i.e. to incorporate plagiarism detection in automated publishing. I ideally wanted to add the check directly into the pipeline itself as a preventative ‘good hygiene’ step to flag potential issues.
The challenge is finding plagiarism checking that integrates reliably into automated workflows, handles AI-generated content appropriately, and doesn’t create more problems than it solves.
Tools I Evaluated
I looked at three approaches: Copyscape, Originality.ai, and open source solutions.
| Tool | API Available | What It Checks | Pricing Reference | Notes |
|---|---|---|---|---|
| Copyscape | Yes | Web index | Copyscape premium | Focused on plagiarism checks, including a WordPress plugin, they also offer a service to provide automatic alerts if duplicates of your content are found on the web. |
| Originality.ai | Yes | Web index | Originality.ai pricing | In addition to the core plagiarism check they also offer services like AI detection, Grammar Checker, Content Quality, Fact Checker, as well as a WordPress plugin. |
| Custom | – | Web, local database | – | Different options but nothing that stood out as a viable choice for my requirements. |
How the Check Fits Into the Process
An automated plagiarism or duplicate content check runs immediately after the draft is generated. The raw text is passed to an external API which returns a similarity score; content below a set threshold is either flagged and highlighted for additional scrutiny during the manual review, or better still, the content is automatically re-generated using the score as feedback to eliminate the duplication.
Unfortunately there were a couple of hurdles – firstly, none of the open currently available source solutions seemed to fit into this type of flow, and secondly API access to commercial services requires premium / enterprise plans, completely out of my budget!
Back To The Drawing Board
Given the limitations, I decided upon another strategy. In my process, I would be reviewing and updating the content anyway prior to publishing, so a WordPress plugin to run the check against the draft post would be the next best thing.
With this approach, I decided upon Originality.ai for the following reasons:
- the option of a pay as you go pricing model alongside monthly or yearly subscriptions
- the additional services available, e.g. Grammar, spelling and a fact checker to help catch AI hallucinations
The Results In Practice
I’m quite happy with it – so far, Originality.ai flagged zero plagiarism issues across the first few AI-generated drafts I ran through it. The AI-detection scores were more interesting – and a useful reminder that “passes plagiarism check” and “reads as human-written” are two different things, even after manual editing.
Taking this post as an example, the result shows an overall confidence of 52% that the post was AI-generated (not that 52% was written by AI), with colour coded results for individual sections reflecting the confidence of each. And 100% confidence it is original ;).

The WordPress plugin only performs an AI check – it does not pick up the default scan types you may have configured in your Originality.ai account settings. If you want plagiarism or grammar checks alongside the AI detection, you’ll need to run those from their website against your content, a straightforward step since this will have already been automatically loaded from the WordPress post when viewing the detailed results of a scan.
Also note that certain block types — tables in particular — can occasionally render incorrectly in the automatically loaded content. This means grammatical “errors” might be flagged in those sections that are artefacts of the formatting rather than genuine issues. Easy to dismiss once you know to expect it, but worth keeping in mind so you do not overlook a real issue.
These observations have been shared as feedback with the Originality.ai team.
Limitations
I could not incorporate an automated check against a plagiarism API, however it’s important to point out that a similarity score is not the same as plagiarism. A high score often just means the article uses common industry terminology or quotes a public source. Distinguishing between the two still requires judgement – the score only tells you where to look.
If you build a self-hosted alternative to avoid external API costs, you take on the development and maintenance overhead of your own infrastructure. That trade-off is typically only worth it at significant scale.
Finally, these tools only catch surface-level matches. Two articles can cover identical ideas in different words and both pass cleanly.
Try It Yourself
For most solo publishing operations, the personal review step with the plugin is probably the right balance of effort versus risk. If you are running a similar automated publishing setup, the simplest starting point is the Originality.ai plugin. Install it, connect to their backend, and you get the score panel directly inside the post editor.
For other scenarios and use cases with access to higher subscription plans, API integration to a commercial service could be appealing – once you have a baseline feel for your typical scores, a threshold-based notification or re-generation step becomes a natural next addition to the pipeline.






