DocumentCloud is a web-based software platform for organizing, researching, annotating, and publishing primary source documents. We offer a set of tools that help you find and tell stories in your documents.
Thanks to generous funding from the Knight Foundation, DocumentCloud since its launch in 2010 has been offered for free exclusively to journalism organizations. We are currently developing a paid model to ensure the DocumentCloud platform's sustainability, an effort you can read more about at the DocumentCloud blog.
Currently, we are offering accounts to journalists — both those within news organizations and those with a demonstrated freelance portfolio. In addition, we are opening the platform on a trial basis to users in related industries who work with public documents. Please apply.
You can find answers to most questions you have either in this FAQ or our Help documentation. If you can’t find the answer, or if you’re having trouble using DocumentCloud, please email us at [email protected].
The current version, as well as past versions, are here.
By default, any document you upload is set to "private" access and is viewable only by you. If you set the access level to "private to [your organization name]," then other DocumentCloud users in your organization can view the document, but not the public.
Start by filling out an application. Learn more about getting accounts at our contact page.
Currently, we create accounts under the umbrella of an organization. That is, each user account is tied to at least one organization. This allows, for example, users within that organization to collaborate privately on documents.
To log in, on the DocumentCloud home page, enter your account email address and password in the login box. To log out, click the "Log Out" link at the top right of the workspace. Note: Always log out when you’re done working if you’re logged in on a shared computer.
If you’ve forgotten your password, go to our password reset page.
To change your email address, in the workspace, click the "Accounts" tab at top. Enter a new email address in the email box and click "Save." To change your password, Click the "Change Password" button, type your new password, and click "Save."
Yes, the DocumentCloud Terms of Service require that accounts use real names and valid email addresses. However, we allow organizations to create one shared account for posting documents and another shared account for use with automation technology or our API. These accounts should have an appropriate name, such as "[Organization name] Documents."
Anyone in your organization who has a DocumentCloud account with administrator privileges can create accounts. Check around your organization, but if you’re not sure, email us at [email protected]
The abilities are largely similar but with some notable exceptions:
Yes, it is possible to use the same email with accounts under more than one organization. However, one of those organizations will be set as your default for uploading. We recommend using a different email for additional accounts.
Currently, there is no limit.
Please notify DocumentCloud at [email protected] if your organization is closing, changing its name or experiencing another significant change. We value the documents uploaded and made public by our contributors and ask that you do not delete them. We will be glad to work with you on a transition plan.
You are able at any time to download the original documents you uploaded to our service. Generally, we defer to each organization regarding disposition of the documents you uploaded to DocumentCloud while in their employment. Every organization has its own rules governing ownership of material generated while in their employment or service.
Accounts cannot be deleted, but they can be disabled by another user within your organization who has administrator-level privileges. We (and the public) value the documents you uploaded and made public and are glad to continue to host them.
We feature more than a million documents provided by contributors ranging from The New York Times to The Guardian and hundreds of large and small news organizations, freelance journalists and others who report using public documents.
No, you do not. We are proud to provide a valuable public resource at no cost.
No, whether you have an account or not, the only documents available for viewing are those explicitly shared by the users who upload them.
Using the search bar in the workspace, type the text you'd like to find in documents or search by attributes including Title, Account, Project and Group. Click once in the search bar to see available attributes, then select the value. Learn more in our Help document on searching
To start contributing, please apply for an account. You will need an active account to contribute documents. You can apply here.
The most common file type our users upload is PDF, but DocumentCloud can process any file that LibreOffice can convert into a PDF. This includes Word documents, Excel spreadsheets, PowerPoint presentations, HTML and image files. We cannot process video, audio or closed-format files such as Outlook PST files.
Yes, 400 MB is the largest file you can upload.
DocumentCloud is intended to be a repository of public documents. Our Terms of Service prohibits uploading copyrighted material that is not yours.
Processing times will vary depending on the size of the file, whether or not it needs to be OCR’d, and current activity on the platform. Small documents are usually processed in a minute or less; larger documents can take significantly longer.
When you upload a document, we save the original file. We extract images of each page in several sizes for our workspace and embeds. If there is a text layer embedded in the document, we retrieve that and store it in a database for searching. If there isn’t, we OCR the document to capture text. Finally, we pass the extracted text to the OpenCalais API to locate entities such as names, places and organizations.
By default, documents are set to private access upon upload. That means only you can see it. You have the option of setting access to "private to your organization," meaning anyone else with a DocumentCloud account in your organization can see it, or "public," meaning it’s viewable by everyone.
Yes, we offer OCR in more than 20 languages via the Tesseract OCR engine’s language packs. In the workspace under the "Accounts" tab, choose your preferred language for OCR under the "New documents" drop down. Then click "Save."
We are glad to add languages supported by Tesseract, the OCR engine we use. Please see en.wikipedia.org/wiki/Tesseract_(software) for a list of languages and contact us at [email protected] to discuss your needs.
You can add several pieces of information either before or after uploading. These include a source, description, published URL and related article URL. To access these fields after you’ve uploaded a document, select the document and choose "Edit Document Information."
Use the Related Article URL to tell readers the location of the article that uses this document as source material. Adding a URL in this field creates a Related Article link in the sidebar of the full viewer. The Published URL is the page where the document is embedded. Most users won't need to provide this — our platform can usually detect where the document is embedded. If a document might be accessed at more than one URL, however, you can specify the URL we should send users to if they find the document through a search of DocumentCloud.
DocumentCloud allows you to define and search your own set of custom data (key/value pairs) associated with specific documents. To edit data for individual documents in the workspace, select the documents you wish to update, and choose "Edit Document Data" from the "Edit" menu. See "Editing and Searching your own Custom Data" in our Help section to learn more.
Double-click the document to open it in the document workspace. In the sidebar, click "Reorder Pages." You’ll see thumbnails of all the pages in your document. Drag and drop pages into the order you’d like, then click "Save Page Order."
To insert one or more pages:
We recommend you retain a backup of your document before removing pages. To remove one or more pages:
We recommend you retain a backup of your document before making redactions. To redact a portion of a document:
When you redact a portion of a document, we erase all data related to the redacted information, create a new redacted document, and delete the original document. Any text that was part of the redacted portion is deleted.
Currently, we do not support this. We recommend that you change the orientation of the page before uploading your document.
In the workspace, right-click a document and select "Delete Document." You’ll be asked to confirm your choice.
No, once you delete a document it’s permanently deleted from our platform.
Select one or more documents in the workspace. Under the "Analyze" menu, select "View Entities."
Select one or more documents in the workspace. Under the "Analyze" menu, select "View Timeline." To zoom in on a time period, drag your cursor over a selection of dates.
We use Tesseract, an open-source OCR engine. Google currently sponsors development.
No. If your document contains embedded text, we save that in our database. We use OCR only when there’s no text layer.
Yes. Double-click the document to open it in the document workspace. In the sidebar, click "Reprocess Text." In the dialog, click "Force OCR."
To edit text:
Select the document in the workspace. Under the "Publish" menu, choose "Download Full Text." Your browser will open a separate window or tab open to a URL that contains all the text. You can save it or copy and paste into a text editor.
In DocumentCloud, notes are a way to highlight important sections of documents with a short headline and explanatory text. Notes can either be private — viewable only by you — or public, meaning if you share your document with others in your organization or make it fully public, then readers will see your notes.
To add a note:
Find the note on your document and select it. Click the pencil icon to the right of the headline to edit the note.
You can save a public note as "Draft" if you’re not ready to publish it yet. While it’s in draft mode, the note will not be visible to others and will show a "Draft" icon in its header. You can switch the note to public by selecting it and clicking "Publish."
Public notes are visible to anyone who has access to the document. Private notes are viewable only by the person who uploaded the document.
Not at this time. If you need to change the status of a note from public to private, or vice-versa, edit the note, copy the contents, and delete it. Then re-draw a new public or private note and add the information you copied.
Instead of highlighting a portion of a document, you can create a note that appears at the top of a page. To do this, follow the directions for creating a note and, rather than drawing a box on a page, click in between any two pages (or above the first page).
You can format text in notes by using some basic HTML codes. For example, to bold a phrase, precede it with <b> and end it with </b>.
Any public notes you create are visible (i.e., published) as soon as you set the document’s access level to "public."
Each public note has a specific URL that you can share. To find it, select the note. Then click the chain-link icon to the right of the small headline. In your browser, the URL will change to the note link. Copy that and use it on your website. When a reader clicks the link, they’ll be directed to the document with the note open.
In the workspace, select a document that has notes. Under the "Publish" menu, choose "Embed a Note." That will launch a dialog box to lead you through creating an embed code. For more on publishing notes and other assets, please consult our Help documents.
Projects are labels you can apply to groups of documents to organize them by topic or project. A document can live in more than one project.
In the workspace at left, click the "New Project" button. Give your project a name and click "Save."
Not at this time. However, if you are looking for a way to easily organize, filter or search your documents, we recommend you add custom data.
You can drag and drop the file icon on the project title at the left of the workspace. Or highlight the file in the workspace, click the "Projects" icon, and choose the name of a project.
Select a document in the workspace. Click the "Projects" menu, which displays all your project titles. You’ll see a check mark next to each project the document belongs to. Find the project that you want to remove the document from, and click the title to remove the check mark.
In the workspace, hover over the name of your project in the project list, then click the pencil icon to show the project editing dialog. Click "Add a collaborator to this project." You can add email addresses of people who have DocumentCloud accounts — whether they belong to your organization or another.
At this time, no. You can edit up to 30 documents at a time in the workspace by setting the view to thumbnails and selecting all documents with your cursor. If you have a large number of documents, please contact us at [email protected] to discuss other options.
At the top left of the workspace is a link to view all documents that users have either set to "public" access or "private to your organization." Clicking the arrow next to your organization name will reveal each user, and you can then click their name to see their documents.
Yes, you can do this by granting them "reviewer" access. To do so:
We currently offer four embed types:
Please see our help section on Embedding and Publishing for details.
DocumentCloud provides a wizard that walks you through the steps of generating codes for each type of embed. You then typically add the embed code to your website using your content management system’s embed function. Please see our help section on Embedding and Publishing for details.
In the workspace, select one or more documents. Under the "Edit" menu, choose "Access Level." Pick one of the three options. "Public Access" means anyone on the internet can search for and view the document. "Private Access" means only you and people with explicit permission (via collaboration) have access. "Private to [Organization Name]" means only the people in your organization have access. (No freelancers.)
If your document is private, you can set a publication date for it:
The date and time set will show in the workspace. If you change your mind, re-open the publication date dialog and choose "Cancel."
Make sure you have set the document’s access to "public." Double-click a document to open it in the document workspace. The URL for sharing the document is now in your browser’s URL bar. The format is https://www.documentcloud.org/documents/[ID Number]-[document-title-words].html
Yes, all of our embed types can be viewed on phone or tablet screens. However, our page and note embeds are responsive and are the best choice for display on those devices.
The best way is to install our custom WordPress plugin, which lets you embed by entering shortcodes into your text. See the documentation for details.
You can set the width and height while creating it using the embed wizard.
No, only you or DocumentCloud users in your organization can make changes to your documents.
If you embed a document with our full viewer, you can disable the link to the original PDF that appears in the sidebar by unchecking the "Link to the original PDF" box in the embed wizard. Nevertheless, once you set your document to public access, it will appear in Internet search results, and people will be able to download it.
Check or uncheck the "Show the Sidebar" option in the embed wizard. When embedding the viewer at narrow widths, hiding the sidebar is usually a good idea.
DocumentCloud's API provides resources to search, upload, edit, and organize documents as well as to work with projects. In addition, an oEmbed service provides easy integration of embedding documents, pages and notes. Full documentation is available.
As with DocumentCloud’s workspace, you need an account to use the API to upload, update or delete documents, or create and modify projects. Other API functions, such as search, do not require an account. Consult the documentation for details.
The open-source community has contributed several helpful libraries for interacting with DocumentCloud's API. These include Python, Ruby and Node.js wrappers. See the listing in the API documentation.
Yes. Please read our complete API Guidelines and Terms of Service.