Apache Tika#

Description#
The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
Home page for this solution: https://tika.apache.org/
Overview#
Key |
Value |
---|---|
Name |
tika |
Description |
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). |
License |
Apache License 2.0 |
Programming Language |
Java |
Created |
2009-05-21 |
Last update |
2025-03-31 |
Github Stars |
2888 |
Project Home Page |
|
Code Repository |
|
OpenSSF Scorecard |
Note:
Created date is date that repro is created on Github.com.
Last update is only the last date I run an automatic check.
Do not attach a wrong value to github stars. Its a vanity metric! Stars count are misleading and don’t indicate if the SBB is high-quality or very popular.