DocumentCloud

Frequently Asked Questions

General

What is DocumentCloud?

DocumentCloud is a web-based software platform for organizing, researching, annotating, and publishing primary source documents. We offer a set of tools that help you find and tell stories in your documents.

How much does it cost?

Thanks to generous funding from the Knight Foundation, DocumentCloud since its launch in 2010 has been offered for free exclusively to journalism organizations. We are currently developing a paid model to ensure the DocumentCloud platform's sustainability, an effort you can read more about at the DocumentCloud blog.

Who can have accounts?

Currently, we are offering accounts to journalists — both those within news organizations and those with a demonstrated freelance portfolio. In addition, we are opening the platform on a trial basis to users in related industries who work with public documents. Please apply.

Where can I get help?

You can find answers to most questions you have either in this FAQ or our Help documentation. If you can’t find the answer, or if you’re having trouble using DocumentCloud, please email us at support@documentcloud.org.

Where can I read your Terms of Service?

The current version, as well as past versions, are here.

Privacy

What is DocumentCloud’s privacy policy?

Please read our complete Privacy Policy for details.

Who can see documents in my account?

By default, any document you upload is set to "private" access and is viewable only by you. If you set the access level to "private to [your organization name]," then other DocumentCloud users in your organization can view the document, but not the public.

Accounts

How do I get an account?

Start by filling out an application. Learn more about getting accounts at our contact page.

How does DocumentCloud organize accounts?

Currently, we create accounts under the umbrella of an organization. That is, each user account is tied to at least one organization. This allows, for example, users within that organization to collaborate privately on documents.

How do I log in and log out?

To log in, on the DocumentCloud home page, enter your account email address and password in the login box. To log out, click the "Log Out" link at the top right of the workspace. Note: Always log out when you’re done working if you’re logged in on a shared computer.

How do I reset my password?

If you’ve forgotten your password, go to our password reset page.

How do I change my email address and password?

To change your email address, in the workspace, click the "Accounts" tab at top. Enter a new email address in the email box and click "Save." To change your password, Click the "Change Password" button, type your new password, and click "Save."

Do I have to use my real name for my account?

Yes, the DocumentCloud Terms of Service require that accounts use real names and valid email addresses. However, we allow organizations to create one shared account for posting documents and another shared account for use with automation technology or our API. These accounts should have an appropriate name, such as "[Organization name] Documents."

How can I get accounts for others in my organization?

Anyone in your organization who has a DocumentCloud account with administrator privileges can create accounts. Check around your organization, but if you’re not sure, email us at support@documentcloud.org

What are the differences between admin, contributor and freelancer accounts?

The abilities are largely similar but with some notable exceptions:

Feature Administrator Contributor Freelancer
Add, modify or disable accounts Yes No No
Access documents shared across an organization Yes Yes No
Upload, annotate, edit, publish documents Yes Yes Yes
Can I have more than one account with the same email?

Yes, it is possible to use the same email with accounts under more than one organization. However, one of those organizations will be set as your default for uploading. We recommend using a different email for additional accounts.

How many user accounts can an organization have under its account?

Currently, there is no limit.

What happens if my organization closes?

Please notify DocumentCloud at support@documentcloud.org if your organization is closing, changing its name or experiencing another significant change. We value the documents uploaded and made public by our contributors and ask that you do not delete them. We will be glad to work with you on a transition plan.

What happens to my documents and account if I leave my current organization?

You are able at any time to download the original documents you uploaded to our service. Generally, we defer to each organization regarding disposition of the documents you uploaded to DocumentCloud while in their employment. Every organization has its own rules governing ownership of material generated while in their employment or service.

Can I delete my account?

Accounts cannot be deleted, but they can be disabled by another user within your organization who has administrator-level privileges. We (and the public) value the documents you uploaded and made public and are glad to continue to host them.

Search

What can I find in the public catalog?

We feature more than a million documents provided by contributors ranging from The New York Times to The Guardian and hundreds of large and small news organizations, freelance journalists and others who report using public documents.

Do I need an account to search the public catalog?

No, you do not. We are proud to provide a valuable public resource at no cost.

Will getting a DocumentCloud account allow me to see more documents?

No, whether you have an account or not, the only documents available for viewing are those explicitly shared by the users who upload them.

How can I search documents?

Using the search bar in the workspace, type the text you'd like to find in documents or search by attributes including Title, Account, Project and Group. Click once in the search bar to see available attributes, then select the value. Learn more in our Help document on searching

Can I contribute documents to your public catalog?

To start contributing, please apply for an account. You will need an active account to contribute documents. You can apply here.

Uploading

What kinds of file types can I upload?

The most common file type our users upload is PDF, but DocumentCloud can process any file that LibreOffice can convert into a PDF. This includes Word documents, Excel spreadsheets, PowerPoint presentations, HTML and image files. We cannot process video, audio or closed-format files such as Outlook PST files.

Is there a limit on the size of a file I can upload?

Yes, 400 MB is the largest file you can upload.

Are there restrictions on the content of documents I can upload?

DocumentCloud is intended to be a repository of public documents. Our Terms of Service prohibits uploading copyrighted material that is not yours.

How long does it take to process a document?

Processing times will vary depending on the size of the file, whether or not it needs to be OCR’d, and current activity on the platform. Small documents are usually processed in a minute or less; larger documents can take significantly longer.

What does DocumentCloud do with the documents I upload?

When you upload a document, we save the original file. We extract images of each page in several sizes for our workspace and embeds. If there is a text layer embedded in the document, we retrieve that and store it in a database for searching. If there isn’t, we OCR the document to capture text. Finally, we pass the extracted text to the OpenCalais API to locate entities such as names, places and organizations.

When I upload a document, can anyone else see it?

By default, documents are set to private access upon upload. That means only you can see it. You have the option of setting access to "private to your organization," meaning anyone else with a DocumentCloud account in your organization can see it, or "public," meaning it’s viewable by everyone.

Can I OCR my documents in languages other than English?

Yes, we offer OCR in more than 20 languages via the Tesseract OCR engine’s language packs. In the workspace under the "Accounts" tab, choose your preferred language for OCR under the "New documents" drop down. Then click "Save."

I don’t see the language I need for OCR. Can you add it?

We are glad to add languages supported by Tesseract, the OCR engine we use. Please see en.wikipedia.org/wiki/Tesseract_(software) for a list of languages and contact us at support@documentcloud.org to discuss your needs.

Working with Documents

What information can I add to my documents?

You can add several pieces of information either before or after uploading. These include a source, description, published URL and related article URL. To access these fields after you’ve uploaded a document, select the document and choose "Edit Document Information."

What is the difference between Related Article URL and Published URL?

Use the Related Article URL to tell readers the location of the article that uses this document as source material. Adding a URL in this field creates a Related Article link in the sidebar of the full viewer. The Published URL is the page where the document is embedded. Most users won't need to provide this — our platform can usually detect where the document is embedded. If a document might be accessed at more than one URL, however, you can specify the URL we should send users to if they find the document through a search of DocumentCloud.

How can I add custom data (tags) to organize and search my documents?

DocumentCloud allows you to define and search your own set of custom data (key/value pairs) associated with specific documents. To edit data for individual documents in the workspace, select the documents you wish to update, and choose "Edit Document Data" from the "Edit" menu. See "Editing and Searching your own Custom Data" in our Help section to learn more.

How do I change the order of pages in a document I uploaded?

Double-click the document to open it in the document workspace. In the sidebar, click "Reorder Pages." You’ll see thumbnails of all the pages in your document. Drag and drop pages into the order you’d like, then click "Save Page Order."

How do I insert or replace pages in a document I uploaded?

To insert one or more pages:

  • Double-click the document to open it in the document workspace.
  • In the sidebar, click "Insert/Replace Pages." You’ll see thumbnails of all the pages in your document.
  • To insert new pages at a specific position within the document, click between the pages above. If you'd like to replace a specific page with a new copy, click on the page you'd like to remove. Hold down the shift key to select multiple pages to replace at once.
  • When you're ready, click the "Upload Pages" button.
How do I remove pages from a document I uploaded?

We recommend you retain a backup of your document before removing pages. To remove one or more pages:

  • Double-click the document to open it in the document workspace. In the sidebar, click "Remove Pages." You’ll see thumbnails of all the pages in your document.
  • Select the pages you’d like to delete from your document, and then click "Remove [X] Pages."
  • Note that once you remove pages they are permanently deleted and your original document is replaced.
How do I redact portions of my document?

We recommend you retain a backup of your document before making redactions. To redact a portion of a document:

  • Double-click the document to open it in the document workspace. In the sidebar, click "Redact Document."
  • Click and drag to draw a black rectangle over each portion of the document you'd like to redact. (You can redact more than one section at a time.)
  • When finished, click "Save Redactions."
Does redacting a document also remove the text extracted from it?

When you redact a portion of a document, we erase all data related to the redacted information, create a new redacted document, and delete the original document. Any text that was part of the redacted portion is deleted.

Can I change the orientation of a page in a document?

Currently, we do not support this. We recommend that you change the orientation of the page before uploading your document.

How do I delete documents?

In the workspace, right-click a document and select "Delete Document." You’ll be asked to confirm your choice.

Once I delete a document, can I get it back?

No, once you delete a document it’s permanently deleted from our platform.

Analyzing Data in Documents

How can I see entities extracted from my documents?

Select one or more documents in the workspace. Under the "Analyze" menu, select "View Entities."

How can I see a timeline of the dates in a document?

Select one or more documents in the workspace. Under the "Analyze" menu, select "View Timeline." To zoom in on a time period, drag your cursor over a selection of dates.

Working With OCR and Document Text

What kind of OCR software does DocumentCloud use?

We use Tesseract, an open-source OCR engine. Google currently sponsors development.

Do you OCR every document I upload?

No. If your document contains embedded text, we save that in our database. We use OCR only when there’s no text layer.

Can I OCR a document even if it has text embedded in it?

Yes. Double-click the document to open it in the document workspace. In the sidebar, click "Reprocess Text." In the dialog, click "Force OCR."

How do I edit the text extracted from my document?

To edit text:

  • Double-click the document to open it in the document workspace. In the sidebar, click "Edit Page Text."
  • Navigate through pages in your document using the arrows.
  • Save your edits by clicking "Save Text."
How do I download all the text from a document?

Select the document in the workspace. Under the "Publish" menu, choose "Download Full Text." Your browser will open a separate window or tab open to a URL that contains all the text. You can save it or copy and paste into a text editor.

Annotating

What are notes?

In DocumentCloud, notes are a way to highlight important sections of documents with a short headline and explanatory text. Notes can either be private — viewable only by you — or public, meaning if you share your document with others in your organization or make it fully public, then readers will see your notes.

How do I add a note?

To add a note:

  • Double-click the document to open it in the document workspace. In the sidebar, click either "Add a Public Note" or "Add a Private Note."
  • Drag your cursor to draw a box over the area of the document you want to highlight.
  • When you release your cursor, you’ll see a dialog box that lets you add a short headline and some explanatory text.
  • When done, click "Save."
How do I edit an existing note?

Find the note on your document and select it. Click the pencil icon to the right of the headline to edit the note.

What is a draft note?

You can save a public note as "Draft" if you’re not ready to publish it yet. While it’s in draft mode, the note will not be visible to others and will show a "Draft" icon in its header. You can switch the note to public by selecting it and clicking "Publish."

What is the difference between public and private notes?

Public notes are visible to anyone who has access to the document. Private notes are viewable only by the person who uploaded the document.

Can I make a private note public or vice-versa?

Not at this time. If you need to change the status of a note from public to private, or vice-versa, edit the note, copy the contents, and delete it. Then re-draw a new public or private note and add the information you copied.

What is a page note, and how do I add one?

Instead of highlighting a portion of a document, you can create a note that appears at the top of a page. To do this, follow the directions for creating a note and, rather than drawing a box on a page, click in between any two pages (or above the first page).

How can I format the text of notes to make words bold, italic, etc.?

You can format text in notes by using some basic HTML codes. For example, to bold a phrase, precede it with <b> and end it with </b>.

How do I publish a note?

Any public notes you create are visible (i.e., published) as soon as you set the document’s access level to "public."

How do I link directly to a note from a website?

Each public note has a specific URL that you can share. To find it, select the note. Then click the chain-link icon to the right of the small headline. In your browser, the URL will change to the note link. Copy that and use it on your website. When a reader clicks the link, they’ll be directed to the document with the note open.

How do I embed a note on a website?

In the workspace, select a document that has notes. Under the "Publish" menu, choose "Embed a Note." That will launch a dialog box to lead you through creating an embed code. For more on publishing notes and other assets, please consult our Help documents.

Projects

What are projects?

Projects are labels you can apply to groups of documents to organize them by topic or project. A document can live in more than one project.

How do I create a project?

In the workspace at left, click the "New Project" button. Give your project a name and click "Save."

Can I create sub-projects inside a project?

Not at this time. However, if you are looking for a way to easily organize, filter or search your documents, we recommend you add custom data.

How do I add documents to a project?

You can drag and drop the file icon on the project title at the left of the workspace. Or highlight the file in the workspace, click the "Projects" icon, and choose the name of a project.

How do I remove documents from a project?

Select a document in the workspace. Click the "Projects" menu, which displays all your project titles. You’ll see a check mark next to each project the document belongs to. Find the project that you want to remove the document from, and click the title to remove the check mark.

How do I share a project with others?

In the workspace, hover over the name of your project in the project list, then click the pencil icon to show the project editing dialog. Click "Add a collaborator to this project." You can add email addresses of people who have DocumentCloud accounts — whether they belong to your organization or another.

Can I make all documents in a project public at once?

At this time, no. You can edit up to 30 documents at a time in the workspace by setting the view to thumbnails and selecting all documents with your cursor. If you have a large number of documents, please contact us at support@documentcloud.org to discuss other options.

Collaboration

How can I see documents others in my organization have uploaded?

At the top left of the workspace is a link to view all documents that users have either set to "public" access or "private to your organization." Clicking the arrow next to your organization name will reveal each user, and you can then click their name to see their documents.

Can I let someone see a private document without letting them make changes to it?

Yes, you can do this by granting them "reviewer" access. To do so:

  • Select one or more documents in the workspace. Under the "Analyze" menu, select "Share this document."
  • In the dialog box that opens, click "Add Reviewer" and provide an email address and name for the person.
  • Provide an optional message and click "Send."
  • They’ll receive an email with a unique link to access the document.

Embedding and Sharing documents

What options do I have for embedding documents?

We currently offer four embed types:

  • Document: A viewer that shows the complete document, including attribution, all notes, and an available sidebar with navigation and attribution.
  • Page: A lightweight, responsive single page that includes attribution and click-through to the full document.
  • Note: A single annotation that includes attribution and click-through to the full document.
  • Document Set: A collection of documents, based on result of a search or a project. Includes a search widget and ability to click through to each document.

Please see our help section on Embedding and Publishing for details.

How do I embed a document, page, notes or collection of documents on my website?

DocumentCloud provides a wizard that walks you through the steps of generating codes for each type of embed. You then typically add the embed code to your website using your content management system’s embed function. Please see our help section on Embedding and Publishing for details.

How do I make a document visible to the public?

In the workspace, select one or more documents. Under the "Edit" menu, choose "Access Level." Pick one of the three options. "Public Access" means anyone on the internet can search for and view the document. "Private Access" means only you and people with explicit permission (via collaboration) have access. "Private to [Organization Name]" means only the people in your organization have access. (No freelancers.)

How do I set a time for a document to become public?

If your document is private, you can set a publication date for it:

  • Select one or more documents in the workspace. Under the "Publish" menu, select "Set Publication Date."
  • Choose the date and time for publication and press "Save."

The date and time set will show in the workspace. If you change your mind, re-open the publication date dialog and choose "Cancel."

How do I get the URL of a document I want to share?

Make sure you have set the document’s access to "public." Double-click a document to open it in the document workspace. The URL for sharing the document is now in your browser’s URL bar. The format is https://www.documentcloud.org/documents/[ID Number]-[document-title-words].html

Will DocumentCloud work with my CMS?

DocumentCloud’s used by hundreds of organizations worldwide with many different content management systems. If your CMS offers the ability to embed snippets of JavaScript and HTML, you should be fine. We’re available to talk with your CMS’s developers to iron out any questions. Please contact us at support@documentcloud.org

Do embeds work on phones?

Yes, all of our embed types can be viewed on phone or tablet screens. However, our page and note embeds are responsive and are the best choice for display on those devices.

How do I embed documents using WordPress?

The best way is to install our custom WordPress plugin, which lets you embed by entering shortcodes into your text. See the documentation for details.

How do I change the width or height of an embed on my site?

You can set the width and height while creating it using the embed wizard.

Can readers make changes to documents I embed?

No, only you or DocumentCloud users in your organization can make changes to your documents.

Can I prevent people from downloading my original document?

If you embed a document with our full viewer, you can disable the link to the original PDF that appears in the sidebar by unchecking the "Link to the original PDF" box in the embed wizard. Nevertheless, once you set your document to public access, it will appear in Internet search results, and people will be able to download it.

How do I make the sidebar show or hide in the document viewer?

Check or uncheck the "Show the Sidebar" option in the embed wizard. When embedding the viewer at narrow widths, hiding the sidebar is usually a good idea.

DocumentCloud's API

What is the DocumentCloud API?

DocumentCloud's API provides resources to search, upload, edit, and organize documents as well as to work with projects. In addition, an oEmbed service provides easy integration of embedding documents, pages and notes. Full documentation is available.

Do you need a DocumentCloud account to use the API?

As with DocumentCloud’s workspace, you need an account to use the API to upload, update or delete documents, or create and modify projects. Other API functions, such as search, do not require an account. Consult the documentation for details.

What libraries are available for working with the API?

The open-source community has contributed several helpful libraries for interacting with DocumentCloud's API. These include Python, Ruby and Node.js wrappers. See the listing in the API documentation.

Are there limits on API use?

Yes. Please read our complete API Guidelines and Terms of Service.