ArchiveBox

ArchiveBox

🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

All edits

Edits on 19 February, 2020
Golden AI"Adding location topic from lookup Montréal, Québec"
Golden AI edited on 19 February, 2020 2:08 am
Edits made to:
Infobox (+1 properties)

Infobox

Location
Edits on 1 May, 2019
Nick Sweeting
Nick Sweeting approved a suggestion from Golden's AI on 1 May, 2019 4:00 am
Edits made to:
Article (+4/-4 characters)

Article

ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTMLHTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).

Nick Sweeting"Approved suggestion from source: https://archivebox.io"
Nick Sweeting approved a suggestion from Golden's AI on 1 May, 2019 4:00 am
Edits made to:
Infobox (+1 properties)
Nick Sweeting
Nick Sweeting edited on 1 May, 2019 4:00 am
Edits made to:
Infobox (+2 properties)
Description (+137 characters)
Article (+3148 characters)
People (+1 rows) (+2 cells) (+20 characters)
Further reading (+1 rows) (+3 cells) (+82 characters)
Categories (+2 topics)
Topic thumbnail

ArchiveBox

🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Article

ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).

After installing the dependencies, just pipe some new links into the ./archive command to start your archive.

Running ./archive adds only new, unique links into output/ on each run. Because it will ignore duplicates and only archive each link the first time you add it, you can schedule it to run on a timer and re-import all your feeds multiple times a day. It will run quickly even if the feeds are large, because it's only archiving the newest links since the last run. For each link, it runs through all the archive methods. Methods that fail will save None and be automatically retried on the next run, methods that succeed save their output into the data folder and are never retried/overwritten by subsequent runs. Support for saving multiple snapshots of each site over time will be added soon (along with the ability to view diffs of the changes between runs).

Whether you want learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, our Community Wiki page serves as an index of the broader web archiving community. Check it out to learn about some of the coolest web archiving projects and communities on the web!

Learn why archiving the internet is important by reading the "On the Importance of Web Archiving" blog post.

ArchiveBox takes a list of website URLs you want to archive, and creates a local, static, browsable HTML clone of the content from those websites (it saves HTML, JS, media files, PDFs, images and more).

After installing the dependencies, just pipe some new links into the ./archive command to start your archive.

Running ./archive adds only new, unique links into output/ on each run. Because it will ignore duplicates and only archive each link the first time you add it, you can schedule it to run on a timer and re-import all your feeds multiple times a day. It will run quickly even if the feeds are large, because it's only archiving the newest links since the last run. For each link, it runs through all the archive methods. Methods that fail will save None and be automatically retried on the next run, methods that succeed save their output into the data folder and are never retried/overwritten by subsequent runs. Support for saving multiple snapshots of each site over time will be added soon (along with the ability to view diffs of the changes between runs).

Whether you want learn which organizations are the big players in the web archiving space, want to find a specific open source tool for your web archiving need, or just want to see where archivists hang out online, our Community Wiki page serves as an index of the broader web archiving community. Check it out to learn about some of the coolest web archiving projects and communities on the web!

...

Learn why archiving the internet is important by reading the "On the Importance of Web Archiving" blog post.

People

Name
Role
Related Golden topics

Nick Sweeting

Creator

Further reading

Title
Author
Link
Type

Infobox

Categories

Nick Sweeting"Initial topic creation"
Nick Sweeting created this topic on 1 May, 2019 3:56 am
Edits made to:
Topic thumbnail

 ArchiveBox

🗃 The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

No more activity to show.

Golden logo
Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.