In [1]:
import sys; sys.path.append(_dh[0].split("knowknow")[0])
from knowknow import *
In [2]:
showdocs("counter")

Counting coocurrences

Cultural phenomena are rich in meaning and context. Moreover, the meaning and context are what we care about, so stripping that would be a disservice. "Consider Geertz:"

Not only is the semantic structure of the figure a good deal more complex than it appears on the surface, but an analysis of that structure forces one into tracing a multiplicity of referential connections between it and social reality, so that the final picture is one of a configuration of dissimilar meanings out of whose interworking both the expressive power and the rhetorical force of the final symbol derive. (Geertz [1955] 1973, Chapter 8 Ideology as a Cultural System, p. 213)

The way people understanding their world shape their action, and understandings are heterogeneous in any community, woven into a complex web of interacting pieces and parts. Understandings are constantly evolving, shifting with every conversation or Breaking News. Any quantitative technique for studying meaning must be able to capture the relational structure of cultural objects, their temporal dynamics, or it cannot be meaning.

These considerations motivate how I have designed the data structure and code for this project. My attention to "cooccurrences" in what follows is an application of Levi Martin and Lee's (2018) formal approach to meaning. They develop the symbolic formalism I use below, as well as showing several general analytic strategies for inductive, ground-up meaning-making from count data. This approach is quite general, useful for many applications.

The process is rather simple, I count cooccurrences between various attributes. For each document, for each citation in that document, I increment a dozen counters, depending on attributes of the citation, paper, journal, or author. This counting process is done once, and can be used as a compressed form of the dataset for all further analyses. In the terminology of Levi Martin and Lee, I am constructing "hypergraphs", and I will use their notation in what follows. For example $[c*fy]$ indicates the dataset which maps from $(c, fy) \to count$. $c$ is the name of the cited work. $fy$ is the publication year of the article which made the citation. $count$ is the number of citations which are at the intersection of these properties.

  • $[c]$ the number of citations each document receives
  • $[c*fj]$ the number of citations each document receives from each journal's articles
  • $[c*fy]$ the number of citations each document receives from each year's articles
  • $[fj]$ the number of citations from each journal
  • $[fj*fy]$ the number of citations in each journal in each year
  • $[t]$ cited term total counts
  • $[fy*t]$ cited term time series
  • term cooccurrence with citation and journal ($[c*t]$ and [fj*t]$)
  • "author" counts, the number of citations by each author ($[a]$ $[a*c]$ $[a*j*y]$)
  • [c*c]$, the cooccurrence network between citations
  • the death of citations can be studied using the $[c*fy]$ hypergraph
  • $[c*fj*t]$ could be used for analyzing differential associations of $c$ to $t$ across publication venues
  • $[ta*ta]$, $[fa*fa]$, $[t*t]$ and $[c*c]$ open the door to network-scientific methods

References

  • Martin, John Levi, and Monica Lee. 2018. “A Formal Approach to Meaning.” Poetics 68(February):10–17.
  • Geertz, Clifford. 1973. The Interpretation of Cultures. New York: Basic Books, Inc.

README

First, you need to get some data. In accordance with JSTOR's usage policies, I do not provide any full-text data. And that's the data you need to use this notebook. You can obtain your own data by requesting full OCR data packages through JSTOR's Data for Research initiative.

Make sure to read carefully through "User Settings." Set the appropriate settings, and run the entire notebook.

This will create a new "database" of counts, which can be recalled by running my_counts = get_cnt( '<DB_NAME_HERE>' ).

User Settings

database_name is the name you choose for the final dataset of counts

zipdir is the directory which contains the .zip files JSTOR provides to you (not included)

mode choose between "basic" and "all" mode

  1. "basic" mode

    • this mode is not typically faster than everything, but it does reduce RAM overhead
      • on ~200k articles the running counters take up more than 16GB RAM
      • to counter this, I first run simple statistics, then rerun this notebook again, filtering based on the descriptive statistics
    • includes c counts, the number of citations each document receives
    • includes c.fj counts, the number of citations each document receives from each journal's articles
    • includes c.fy counts, the number of citations each document receives from each year's articles
    • includes fj counts, the number of citations from each journal
    • includes fj.fy counts, the number of citations in each journal in each year
    • includes t fy.t counts, for term time series and filtering
  2. "all" mode

    • you must run this if you want to run all analyses included in this project
    • includes all counts from basic mode
    • includes term cooccurrence with citation and journal (c.t fj.t)
    • includes "author" counts, the number of citations by each author (a a.c a.j.y)
    • includes c.c, the cooccurrence network between citations
In [3]:
database_name = 'sociology-jstor-basicall'
zipdir = 'G:/My Drive/projects/qualitative analysis of literature/pre 5-12-2020/003 process JSTOR output/RaW dAtA/'
mode = 'all'

I use citation and journal filters while counting. This filtering is important when working with large datasets. You can run the "trend summaries/cysum" on a basic database, and use the variable it automatically generates, "<DBNAME>.included_citations" to modify which citations to use when computing the all database.

In most cases, it's best to set use_included_citations_filter and use_included_journals_filter both to False the first time you run this notebook on a new dataset.

In [4]:
use_included_citations_filter = True
use_included_journals_filter = True

# not necessary if you're not filtering based on citations and journals pre-count
included_citations = load_variable("sociology-jstor.included_citations")
included_journals = ['Acta Sociologica', 'Administrative Science Quarterly', 'American Journal of Political Science', 'American Journal of Sociology', 'American Sociological Review', 'Annual Review of Sociology', 'BMS: Bulletin of Sociological Methodology / Bulletin de Méthodologie Sociologique', 'Berkeley Journal of Sociology', 'Contemporary Sociology', 'European Sociological Review', 'Hitotsubashi Journal of Social Studies', 'Humboldt Journal of Social Relations', 'International Journal of Sociology', 'International Journal of Sociology of the Family', 'International Review of Modern Sociology', 'Journal for the Scientific Study of Religion', 'Journal of Health and Social Behavior', 'Journal of Marriage and Family', 'Language in Society', 'Michigan Sociological Review', 'Polish Sociological Review', 'Review of Religious Research', 'Social Forces', 'Social Indicators Research', 'Social Problems', 'Social Psychology Quarterly', 'Sociological Bulletin', 'Sociological Focus', 'Sociological Forum', 'Sociological Methodology', 'Sociological Perspectives', 'Sociological Theory', 'Sociology', 'Sociology of Education', 'Sociology of Religion', 'Symbolic Interaction', 'The American Sociologist', 'The British Journal of Sociology', 'The Canadian Journal of Sociology', 'The Sociological Quarterly', 'Theory and Society']

Terms are iteratively pruned. After CONSOLIDATE_EVERY_N_CITS citations are counted, the algorithm will keep only the top NUM_TERMS_TO_KEEP terms, blacklisting the rest and not counting them anymore. This doesn't hurt the dataset, but dramatically reduces the RAM overhead and the size of the final dataset on disk.

In [5]:
CONSOLIDATE_TERMS = True

NUM_TERMS_TO_KEEP = 5000

CONSOLIDATE_EVERY_N_CITS = NUM_TERMS_TO_KEEP*3
#CONSOLIDATE_EVERY_N_CITS = 1000

NPERYEAR = 300

It's also convenient to be able to rename various entities. There were a few different names for the Canadian Journal of Sociology. If you want to filter on something other than journals, you'll have to modify the code and add this feature.

In [6]:
journal_map = {} # default
journal_map = {
    "Canadian Journal of Sociology / Cahiers canadiens de sociologie": 'The Canadian Journal of Sociology',
    "The Canadian Journal of Sociology / Cahiers canadiens de\n                sociologie": 'The Canadian Journal of Sociology',
    'The Canadian Journal of Sociology / Cahiers canadiens de sociologie': 'The Canadian Journal of Sociology'
}

imports

In [7]:
# utilities
from nltk import sent_tokenize
from zipfile import ZipFile

import os
import sys
sys.path.insert(0, os.path.abspath('./creating variables/'))

# library functions for cleaning and extracting in-text citations from OCR
from cnt_cooc_jstor_lib import (
    citation_iterator, getOuterParens, 
    Document, ParseError, 
    clean_metadata
)

# XML parser
from lxml.etree import _ElementTree as ElementTree
from lxml import etree
recovering_parser = etree.XMLParser(recover=True)
In [8]:
# getting ready for term counting
from nltk.corpus import stopwords as sw
stopwords = set(sw.words('english'))
In [9]:
zipfiles = list(Path(zipdir).glob("*.zip"))

helpers

The following helper function file_iterator iterates through all documents inside a list of zipfiles

Each iteration returns:

  1. the document DOI
  2. the metadata file contents
  3. the ocr file contents
In [10]:
def getname(x):
    x = x.split("/")[-1]
    x = re.sub(r'(\.xml|\.txt)','',x)
    return x

def file_iterator(zipfiles):
    from random import shuffle
    
    all_files = []
    for zf in zipfiles:
        archive = ZipFile(zf, 'r')
        files = archive.namelist()
        names = list(set(getname(x) for x in files))
        
        all_files += [(archive,name) for name in names]
        
    shuffle(all_files)
        
    for archive, name in all_files:
        try:
            yield(
                name.split("-")[-1].replace("_", "/"),
                archive.read("metadata/%s.xml" % name),
                archive.read("ocr/%s.txt" % name).decode('utf8')
            )
        except KeyError: # some very few articles don't have both
            continue

get_page_strings takes the string contents of an XML file produced by JSTOR. The XML file in question represents the text of a given article. This function cleans the text for OCR peculiarities, and splits the document into pages for further processing.

In [11]:
def basic_ocr_cleaning(x):
    # remove multiple spaces in a row
    x = re.sub(r" +", ' ', str(x))
    # remove hyphenations [NOTE this should be updated, with respect to header and footer across pages...]
    x = re.sub(r"([A-Za-z]+)-\s+([A-Za-z]+)", "\g<1>\g<2>", x)
    
    x = x.strip()
    return x

def get_content_string(ocr_string):
    docXml = etree.fromstring(ocr_string, parser=recovering_parser)
    pages = docXml.findall(".//page")

    page_strings = []
    for p in pages:
        if p.text is None:
            continue
        page_strings.append(p.text)

    secs = docXml.findall(".//sec")

    for s in secs:
        if s.text is None:
            continue
        if s.text.strip() == '':
            try_another = etree.tostring(s, encoding='utf8', method='text').decode("utf8").strip()
            #print(try_another)
            if try_another == '':
                continue

            page_strings.append(try_another)
        else:
            page_strings.append(s.text.strip())

    return basic_ocr_cleaning( "\n\n".join(page_strings) )

consolidate terms was built to eliminate all terms which are not in the top NUM_TERMS_TO_KEEP. This is done by sorting fromyear-term, or fy.t counts in descending order. The top entry here is the term-year pair which accumulated the most appearances in citation contexts. I take the top 1000 t's in this sorted list and preserve them, and blacklist the rest.

In [12]:
term_whitelist = set()

def consolidate_terms():
    global term_whitelist, CONSOLIDATION_CUTOFF
    

    have_now = set(cnt_doc['t'])
    # this is where the filtering occurs
    
    to_keep = set()
    if True:
        
        # takes terms based on the maximum number I can take...
        terms = list(cnt_doc['t'].keys())
        counts = np.array([cnt_doc['t'][k] for k in terms])
        argst = list(reversed(np.argsort(counts)))
        
        to_keep = [terms[i] for i in argst if '-' in terms[i][0]][:NUM_TERMS_TO_KEEP//2] # half should be 2-tuples
        to_keep += [terms[i] for i in argst if not '-' in terms[i][0]][:NUM_TERMS_TO_KEEP//2] # half should be 1-tuples
        
        to_remove = have_now.difference(to_keep)
        to_remove = set("-".join(x) for x in to_remove)
            
    
    if False:
        # takes the top 5000 terms in terms of yearly count
        sort_them = sorted(cnt_doc['fy.t'], key=lambda x: -cnt_doc['fy.t'][x])
        to_keep = defaultdict(set)
        
        i = 0
        while not len(to_keep) or (
            min(len(x) for x in to_keep.values()) < NPERYEAR and 
            i < len(sort_them)
        ):
            # adds the term to the year set, if it's not already "full"
            me = sort_them[i]
            me_fy, me_t = me
            
            # eventually, we don't count it :P
            if cnt_doc['t'][me_t] < CONSOLIDATION_CUTOFF:
                break
            
            if len(to_keep[me_fy]) < NPERYEAR:
                to_keep[me_fy].add(me_t) 
            i += 1
            
        if False: # useful for debugging
            print({
                k: len(v)
                for k,v in to_keep.items()
            })
            
        to_keep = set(chain.from_iterable(x for x in to_keep.values()))
        to_remove = have_now.difference(to_keep)
    
    
    # so that we never log counts for these again:
    term_whitelist.update([x[0] for x in to_keep])

    # the rest of the code is pruning all other term counts for this term in memory
    print("consolidating... removing", len(to_remove), 'e.g.', sample(to_remove,5))
    
    to_prune = ['t','fy.t','fj.t','c.t']
    for tp in to_prune:
        
        whichT = tp.split(".").index('t') # this checks where 't' is in the name of the variable (first or second?)

        print("pruning '%s'..." % tp)

        tydels = [x for x in cnt_doc[tp] if x[ whichT ] in to_remove]
            
        print("old size:", len(cnt_doc[tp]))
        for tr in tydels:
            del cnt_doc[tp][tr]
            del cnt_ind[tp][tr]
        print("new size:", len(cnt_doc[tp]))
        
    
    print("final terms: ", ", ".join( sample(list("-".join(list(x)) for x in cnt_doc['t']), 200) ))

Counting algorithm

The following cells contain the counting function, which accounts for a document in various ways. This function should be relatively simple to extend, if you want to count other combinations, or different attributes altogether.

In [13]:
cnt_ind = defaultdict(lambda:defaultdict(int))
track_doc = defaultdict(lambda:defaultdict(set))
cnt_doc = defaultdict(lambda:defaultdict(int))

def cnt(term, space, doc):
    # it's a set, yo
    track_doc[space][term].add(doc)
    # update cnt_doc
    cnt_doc[space][term] = len(track_doc[space][term])
    # update ind count
    cnt_ind[space][term] += 1
In [14]:
cits = 0
last_print = 0
citations_skipped = 0

def account_for(doc):
    global cits, last_print, mode, citations_skipped
    
    # consolidating "terms" counter as I go, to limit RAM overhead
    # I'm only interested in the most common 1000
    if CONSOLIDATE_TERMS and \
            not len(term_whitelist) and \
            cits - last_print > CONSOLIDATE_EVERY_N_CITS:
        print("Citation %s" % cits)
        print("Term %s" % len(cnt_doc['t']))
        #print(sample(list(cnt_doc['t']), 10))
        last_print = cits
        consolidate_terms()


    if 'citations' not in doc or not len(doc['citations']):
        #print("No citations", doc['doi'])
        return

    for c in doc['citations']:
        if 'contextPure' not in c:
            raise Exception("no contextPure...")



        for cited in c['citations']:
            
            if use_included_citations_filter and (cited not in included_citations):
                citations_skipped += 1
                continue
            
            cits += 1
            cnt(doc['year'], 'fy', doc['doi'])

            # citation
            cnt(cited, 'c', doc['doi'])

            # journal
            cnt(doc['journal'], 'fj', doc['doi'])

            # journal year
            cnt((doc['journal'], doc['year']), 'fj.fy', doc['doi'])

            # citation journal
            cnt((cited, doc['journal']), 'c.fj', doc['doi'])

            # citation year
            cnt((cited, doc['year']), 'c.fy', doc['doi'])

            
        # constructing the tuples set :)
        sp = c['contextPure'].lower()
        sp = re.sub("[^a-zA-Z\s]+", "", sp) # removing extraneous characters
        sp = re.sub("\s+", " ", sp) # removing extra characters
        sp = sp.strip()
        sp = sp.split() # splitting into words
        
        sp = [x for x in sp if x not in stopwords] # strip stopwords
        
        if False:
            tups = set(zip(sp[:-1], sp[1:])) # two-word tuples
        elif False:
            tups = set( (t1,t2) for t1 in sp for t2 in sp if t1!=t2 )# every two-word pair :)
        else:
            
            tups = set( "-".join(sorted(x)) for x in set(zip(sp[:-1], sp[1:]))) # two-word tuples
            tups.update( sp ) # one-word tuples
            
        #print(len(tups),c['contextPure'], "---", tups)
        
        if len(term_whitelist):
            tups = [x for x in tups if x in term_whitelist]

        # just term count, in case we are using the `basic` mode
        for t1 in tups:
            # term
            cnt((t1,), 't', doc['doi'])

            # term year
            cnt((doc['year'], t1), 'fy.t', doc['doi'])
            
        
        if mode == 'all':


            for cited in c['citations']:
                
                if use_included_citations_filter and (cited not in included_citations):
                    continue
                    
                # term features
                for t1 in tups:
                    
                    # cited work, tuple
                    cnt((cited, t1), 'c.t', doc['doi'])

                    # term journal
                    cnt((doc['journal'], t1), 'fj.t', doc['doi'])

                    if False: # eliminating data I'm not using

                        # author loop
                        for a in doc['authors']:
                            # term author
                            cnt((a, t1), 'fa.t', doc['doi'])
                            
                    if len(term_whitelist): # really don't want to do this too early. wait until it's narrowed down to the 5k
                        # term term...
                        for t2 in tups:
                            # if they intersect each other, continue...
                            if len(set(t1).intersection(set(t2))) >= min(len(t1),len(t2)):
                                continue

                            # term term
                            cnt((t1,t2), 't.t', doc['doi'])

                # author loop
                for a in doc['authors']:
                    # citation author
                    cnt((cited,a), 'c.fa', doc['doi'])

                    # year author journal
                    cnt((a, doc['journal'], doc['year']), 'fa.fj.fy', doc['doi'])

                    # author
                    cnt((a,), 'fa', doc['doi'])

                # add to counters for citation-citation counts
                for cited1 in c['citations']:
                    for cited2 in c['citations']:
                        if cited1 >= cited2:
                            continue

                        cnt(( cited1, cited2 ), 'c.c', doc['doi'])
                        cnt(( cited1, cited2, doc['year'] ), 'c.c.fy', doc['doi'])

Master counting cell

This cell is long-running

In [15]:
seen = set()

skipped = 0

total_count = Counter()
doc_count = Counter()
pair_count = Counter()

debug = False

for i, (doi, metadata_str, ocr_str) in enumerate( file_iterator(zipfiles) ):

    if i % 1000 == 0:
        print("Document", i, "...", 
              len(cnt_doc['fj'].keys()), "journals...", 
              len(cnt_doc['c'].keys()), "cited works...", 
              len(cnt_doc['fa'].keys()), "authors...",
              len(cnt_doc['t'].keys()), "terms used...",
              citations_skipped, "skipped citations...",
              cnt_doc['t'][('social',)], "'social' terms"
             )

    try:
        drep = clean_metadata( doi, metadata_str )
        
        # sometimes multiple journal names map onto the same journal, for all intents and purposes
        if drep['journal'] in journal_map:
            drep['journal'] = journal_map[drep['journal']]
        
        # only include journals in the list "included_journals"
        if use_included_journals_filter and (drep['journal'] not in included_journals):
            continue
        
        if debug: print("got meta")

        if drep['type'] != 'research-article':
            continue
            
        # some types of titles should be immediately ignored
        def title_looks_researchy(lt):
            lt = lt.lower()
            lt = lt.strip()

            for x in ["book review", 'review essay', 'back matter', 'front matter', 'notes for contributors', 'publication received', 'errata:', 'erratum:']:
                if x in lt:
                    return False

            for x in ["commentary and debate", 'erratum', '']:
                if x == lt:
                    return False

            return True

        lt = drep['title'].lower()
        if not title_looks_researchy(lt):
            continue

        # Don't process the document if there are no authors
        if not len(drep['authors']):
            continue

        drep['content'] = get_content_string(ocr_str)
        
        drep['citations'] = []
        
        # loop through the matching parentheses in the document
        for index, (parenStart, parenContents) in enumerate(getOuterParens(drep['content'])):
            
            citations = list(citation_iterator(parenContents))
            if not len(citations):
                continue

                
            citation = {
                "citations": citations,
                "contextLeft": drep['content'][parenStart-400+1:parenStart+1],
                "contextRight": drep['content'][parenStart + len(parenContents) + 1:parenStart + len(parenContents) + 1 + 100],
                "where": parenStart
            }


            # cut off any stuff before the first space
            first_break_left = re.search(r"[\s\.!\?]+", citation['contextLeft'])
            if first_break_left is not None:
                clean_start_left = citation['contextLeft'][first_break_left.end():]
            else:
                clean_start_left = citation['contextLeft']

            # cut off any stuff after the last space
            last_break_right = list(re.finditer(r"[\s\.!\?]+", citation['contextRight']))
            if len(last_break_right):
                clean_end_right = citation['contextRight'][:last_break_right[-1].start()]
            else:
                clean_end_right = citation['contextRight']

            # we don't want anything more than a sentence
            
            sentence_left = sent_tokenize(clean_start_left)
            if len(sentence_left):
                sentence_left = sentence_left[-1]
            else:
                sentence_left = ""

            sentence_right = sent_tokenize(clean_end_right)[0]
            if len(sentence_right):
                sentence_right = sentence_right[0]
            else:
                sentence_right = ""

            # finally, strip the parentheses from the string
            sentence_left = sentence_left[:-1]
            sentence_right = sentence_right[1:]

            # add the thing in context
            full = sentence_left + "<CITATION>" + sentence_right

            citation['contextPure'] = sentence_left
            #print(full)

            drep['citations'].append(citation)
            
            
            
            
            
        # now that we have all the information we need,
        # we simply need to "count" this document in a few different ways
        account_for(drep)


    except ParseError as e:
        print("parse error...", e.args, doi)
Document 0 ... 0 journals... 0 cited works... 0 authors... 0 terms used... 0 skipped citations... 0 'social' terms
Document 1000 ... 37 journals... 4005 cited works... 383 authors... 64272 terms used... 2088 skipped citations... 132 'social' terms
Document 2000 ... 40 journals... 8208 cited works... 754 authors... 127737 terms used... 4635 skipped citations... 274 'social' terms
Citation 15075
Term 138929
consolidating... removing 133929 e.g. ['benefit-value', 'conditions-members', 'linked-violence', 'elaborate-independent', 'decoupled']
pruning 't'...
old size: 138929
new size: 5000
pruning 'fy.t'...
old size: 210650
new size: 56089
pruning 'fj.t'...
old size: 159555
new size: 39169
pruning 'c.t'...
old size: 371000
new size: 157187
final terms:  courts, measures-two, normal, however, number-total, intergenerational-mobility, perspectives-sociological, acceptable, minimum, conclude, actually, corporate, cases-many, data-present, time-work, political-religious, phenomenon, unequal, simply, require, research-scientific, analysis-comprehensive, logistic-regression, made-study, giving, associated-positively, worker, accordingly, actor, bureaucratic, attitudes, conducted-research, also-suggests, family-may, focuses-research, research-supported, heterogeneous, distribution-unequal, economic-sociology, family-values, cognitive-development, informal, account-fact, development-economic, differential-treatment, american, committee, emphasized-importance, theory, female, foreign, attributed, facts, incidence, interaction, degree, general-health, controversial, democracy, differences-racial, constraints, administration-reagan, independent-variables, dominant-two, conducted, also-scholars, determinants, rest, attendance-religious, sample-women, arises, position, large-number, extent, recommended, institutions-social, suggest, tend-women, children-less, centers, marijuana-use, desirability-effects, must, distance-social, belief-system, shaping, journal, women-working, understanding, central-one, educational-stratification, score, drugs, divorce-found, associated-significantly, based-studies, field, average, analysis-based, agents, historical-particular, case-may, federal-government, participation-rates, know-people, cultural-resources, inner, middle, framework-theoretical, chronic, war, due-women, devoted, importance-increase, initial, instance, play, circumstances, improve, study-used, historical-periods, differences-significant, showed, explicitly, high-levels, psychology-quarterly, intrinsic, distinctions-within, integration, description, position-privileged, black-men, analysis-class, black-communities, citizens, families-poor, american-families, studies-used, levels-social, communities-ethnic, ajs-volume, connected, found, structures, measure-used, men-often, mixed, income-increases, enforcement, assume-likely, met, housing, needs, analyses-sociological, initiatives-policy, strong, percent-year, class-upper, life-social, continued, aspect-one, school, social-support, remained, international-journal, family-income, typically-women, sentiment, helping, still-women, along-lines, contextual, task, whose, recent-study, different-types, likely-men, one-way, experimental-studies, know, positive, reducing, comes, data-sets, eg, article-present, commonly-used, across-vary, economic-growth, resource-theory, fashion, theoretically, differences-substantial, relative, homogeneity, adults-among, measure-scale, different-social, contributes, national-survey
Document 3000 ... 40 journals... 11991 cited works... 1151 authors... 5000 terms used... 7259 skipped citations... 442 'social' terms
Document 4000 ... 40 journals... 15778 cited works... 1537 authors... 5000 terms used... 9700 skipped citations... 600 'social' terms
Document 5000 ... 41 journals... 19684 cited works... 1919 authors... 5000 terms used... 12576 skipped citations... 779 'social' terms
Document 6000 ... 41 journals... 22992 cited works... 2298 authors... 5000 terms used... 15524 skipped citations... 937 'social' terms
Document 7000 ... 41 journals... 25927 cited works... 2655 authors... 5000 terms used... 18045 skipped citations... 1093 'social' terms
Document 8000 ... 41 journals... 28604 cited works... 2979 authors... 5000 terms used... 20425 skipped citations... 1227 'social' terms
Document 9000 ... 41 journals... 31243 cited works... 3319 authors... 5000 terms used... 23114 skipped citations... 1374 'social' terms
Document 10000 ... 41 journals... 33736 cited works... 3627 authors... 5000 terms used... 25640 skipped citations... 1516 'social' terms
Document 11000 ... 41 journals... 36382 cited works... 3986 authors... 5000 terms used... 28119 skipped citations... 1672 'social' terms
Document 12000 ... 41 journals... 38859 cited works... 4345 authors... 5000 terms used... 30865 skipped citations... 1822 'social' terms
Document 13000 ... 41 journals... 41130 cited works... 4651 authors... 5000 terms used... 34004 skipped citations... 1984 'social' terms
Document 14000 ... 41 journals... 43336 cited works... 4961 authors... 5000 terms used... 36566 skipped citations... 2143 'social' terms
Document 15000 ... 41 journals... 45372 cited works... 5255 authors... 5000 terms used... 38991 skipped citations... 2288 'social' terms
Document 16000 ... 41 journals... 47351 cited works... 5561 authors... 5000 terms used... 41301 skipped citations... 2438 'social' terms
Document 17000 ... 41 journals... 49427 cited works... 5836 authors... 5000 terms used... 43792 skipped citations... 2585 'social' terms
Document 18000 ... 41 journals... 51182 cited works... 6131 authors... 5000 terms used... 46017 skipped citations... 2721 'social' terms
Document 19000 ... 41 journals... 52939 cited works... 6429 authors... 5000 terms used... 48286 skipped citations... 2855 'social' terms
Document 20000 ... 41 journals... 54599 cited works... 6696 authors... 5000 terms used... 50541 skipped citations... 2995 'social' terms
Document 21000 ... 41 journals... 56646 cited works... 7011 authors... 5000 terms used... 53548 skipped citations... 3154 'social' terms
Document 22000 ... 41 journals... 58289 cited works... 7266 authors... 5000 terms used... 55630 skipped citations... 3299 'social' terms
Document 23000 ... 41 journals... 59973 cited works... 7520 authors... 5000 terms used... 57997 skipped citations... 3448 'social' terms
Document 24000 ... 41 journals... 61676 cited works... 7792 authors... 5000 terms used... 60617 skipped citations... 3598 'social' terms
Document 25000 ... 41 journals... 63274 cited works... 8033 authors... 5000 terms used... 62852 skipped citations... 3730 'social' terms
Document 26000 ... 41 journals... 65019 cited works... 8343 authors... 5000 terms used... 65762 skipped citations... 3890 'social' terms
Document 27000 ... 41 journals... 66667 cited works... 8595 authors... 5000 terms used... 68141 skipped citations... 4035 'social' terms
Document 28000 ... 41 journals... 68079 cited works... 8828 authors... 5000 terms used... 70554 skipped citations... 4169 'social' terms
Document 29000 ... 41 journals... 69512 cited works... 9101 authors... 5000 terms used... 73232 skipped citations... 4306 'social' terms
Document 30000 ... 41 journals... 70854 cited works... 9337 authors... 5000 terms used... 75424 skipped citations... 4443 'social' terms
Document 31000 ... 41 journals... 72364 cited works... 9619 authors... 5000 terms used... 78213 skipped citations... 4590 'social' terms
Document 32000 ... 41 journals... 73742 cited works... 9886 authors... 5000 terms used... 80820 skipped citations... 4736 'social' terms
Document 33000 ... 41 journals... 75028 cited works... 10119 authors... 5000 terms used... 83233 skipped citations... 4874 'social' terms
Document 34000 ... 41 journals... 76377 cited works... 10398 authors... 5000 terms used... 85754 skipped citations... 5027 'social' terms
Document 35000 ... 41 journals... 77529 cited works... 10611 authors... 5000 terms used... 87899 skipped citations... 5145 'social' terms
Document 36000 ... 41 journals... 78559 cited works... 10853 authors... 5000 terms used... 89937 skipped citations... 5277 'social' terms
Document 37000 ... 41 journals... 79729 cited works... 11088 authors... 5000 terms used... 92671 skipped citations... 5420 'social' terms
Document 38000 ... 41 journals... 81025 cited works... 11332 authors... 5000 terms used... 94951 skipped citations... 5555 'social' terms
Document 39000 ... 41 journals... 82270 cited works... 11583 authors... 5000 terms used... 97218 skipped citations... 5700 'social' terms
Document 40000 ... 41 journals... 83494 cited works... 11859 authors... 5000 terms used... 99765 skipped citations... 5849 'social' terms
Document 41000 ... 41 journals... 84881 cited works... 12111 authors... 5000 terms used... 102869 skipped citations... 6006 'social' terms
Document 42000 ... 41 journals... 86147 cited works... 12379 authors... 5000 terms used... 106297 skipped citations... 6156 'social' terms
Document 43000 ... 41 journals... 87056 cited works... 12564 authors... 5000 terms used... 108547 skipped citations... 6280 'social' terms
Document 44000 ... 41 journals... 88190 cited works... 12781 authors... 5000 terms used... 110968 skipped citations... 6424 'social' terms
Document 45000 ... 41 journals... 89451 cited works... 13012 authors... 5000 terms used... 114036 skipped citations... 6559 'social' terms
Document 46000 ... 41 journals... 90758 cited works... 13280 authors... 5000 terms used... 117033 skipped citations... 6723 'social' terms
Document 47000 ... 41 journals... 91876 cited works... 13525 authors... 5000 terms used... 119841 skipped citations... 6876 'social' terms
Document 48000 ... 41 journals... 92875 cited works... 13745 authors... 5000 terms used... 122360 skipped citations... 7011 'social' terms
Document 49000 ... 41 journals... 93981 cited works... 13972 authors... 5000 terms used... 125203 skipped citations... 7153 'social' terms
Document 50000 ... 41 journals... 94810 cited works... 14204 authors... 5000 terms used... 127185 skipped citations... 7277 'social' terms
Document 51000 ... 41 journals... 95651 cited works... 14416 authors... 5000 terms used... 129508 skipped citations... 7422 'social' terms
Document 52000 ... 41 journals... 96539 cited works... 14646 authors... 5000 terms used... 131721 skipped citations... 7554 'social' terms
Document 53000 ... 41 journals... 97560 cited works... 14884 authors... 5000 terms used... 134190 skipped citations... 7717 'social' terms
Document 54000 ... 41 journals... 98395 cited works... 15069 authors... 5000 terms used... 136473 skipped citations... 7850 'social' terms
Document 55000 ... 41 journals... 99293 cited works... 15294 authors... 5000 terms used... 139052 skipped citations... 7990 'social' terms
Document 56000 ... 41 journals... 100179 cited works... 15516 authors... 5000 terms used... 141950 skipped citations... 8136 'social' terms
Document 57000 ... 41 journals... 101199 cited works... 15768 authors... 5000 terms used... 144235 skipped citations... 8291 'social' terms
Document 58000 ... 41 journals... 102132 cited works... 16002 authors... 5000 terms used... 147013 skipped citations... 8435 'social' terms
Document 59000 ... 41 journals... 102984 cited works... 16207 authors... 5000 terms used... 149535 skipped citations... 8573 'social' terms
Document 60000 ... 41 journals... 103808 cited works... 16412 authors... 5000 terms used... 152055 skipped citations... 8720 'social' terms
Document 61000 ... 41 journals... 104674 cited works... 16608 authors... 5000 terms used... 154791 skipped citations... 8861 'social' terms
Document 62000 ... 41 journals... 105515 cited works... 16804 authors... 5000 terms used... 157672 skipped citations... 9018 'social' terms
Document 63000 ... 41 journals... 106187 cited works... 17001 authors... 5000 terms used... 159960 skipped citations... 9143 'social' terms
Document 64000 ... 41 journals... 106826 cited works... 17180 authors... 5000 terms used... 161776 skipped citations... 9263 'social' terms
Document 65000 ... 41 journals... 107523 cited works... 17390 authors... 5000 terms used... 163889 skipped citations... 9406 'social' terms
Document 66000 ... 41 journals... 108271 cited works... 17575 authors... 5000 terms used... 166523 skipped citations... 9527 'social' terms
Document 67000 ... 41 journals... 109007 cited works... 17788 authors... 5000 terms used... 169249 skipped citations... 9672 'social' terms
Document 68000 ... 41 journals... 109690 cited works... 17991 authors... 5000 terms used... 171774 skipped citations... 9820 'social' terms
Document 69000 ... 41 journals... 110410 cited works... 18211 authors... 5000 terms used... 173910 skipped citations... 9984 'social' terms
Document 70000 ... 41 journals... 111109 cited works... 18409 authors... 5000 terms used... 176795 skipped citations... 10132 'social' terms
Document 71000 ... 41 journals... 111722 cited works... 18618 authors... 5000 terms used... 179339 skipped citations... 10274 'social' terms
Document 72000 ... 41 journals... 112259 cited works... 18783 authors... 5000 terms used... 181091 skipped citations... 10401 'social' terms
Document 73000 ... 41 journals... 112892 cited works... 18975 authors... 5000 terms used... 183587 skipped citations... 10523 'social' terms
Document 74000 ... 41 journals... 113614 cited works... 19214 authors... 5000 terms used... 186346 skipped citations... 10666 'social' terms
Document 75000 ... 41 journals... 114303 cited works... 19426 authors... 5000 terms used... 188913 skipped citations... 10808 'social' terms
Document 76000 ... 41 journals... 114999 cited works... 19626 authors... 5000 terms used... 191607 skipped citations... 10947 'social' terms
Document 77000 ... 41 journals... 115597 cited works... 19810 authors... 5000 terms used... 194094 skipped citations... 11097 'social' terms
Document 78000 ... 41 journals... 116286 cited works... 20025 authors... 5000 terms used... 197306 skipped citations... 11249 'social' terms
Document 79000 ... 41 journals... 116819 cited works... 20193 authors... 5000 terms used... 199704 skipped citations... 11390 'social' terms
Document 80000 ... 41 journals... 117288 cited works... 20357 authors... 5000 terms used... 201988 skipped citations... 11517 'social' terms
Document 81000 ... 41 journals... 117888 cited works... 20532 authors... 5000 terms used... 205009 skipped citations... 11657 'social' terms
Document 82000 ... 41 journals... 118435 cited works... 20702 authors... 5000 terms used... 207538 skipped citations... 11787 'social' terms
Document 83000 ... 41 journals... 118932 cited works... 20918 authors... 5000 terms used... 209998 skipped citations... 11926 'social' terms
Document 84000 ... 41 journals... 119424 cited works... 21115 authors... 5000 terms used... 212325 skipped citations... 12060 'social' terms
Document 85000 ... 41 journals... 120016 cited works... 21325 authors... 5000 terms used... 215005 skipped citations... 12207 'social' terms
Document 86000 ... 41 journals... 120550 cited works... 21490 authors... 5000 terms used... 217360 skipped citations... 12340 'social' terms
Document 87000 ... 41 journals... 120991 cited works... 21667 authors... 5000 terms used... 219758 skipped citations... 12473 'social' terms
Document 88000 ... 41 journals... 121507 cited works... 21880 authors... 5000 terms used... 222288 skipped citations... 12606 'social' terms
Document 89000 ... 41 journals... 122055 cited works... 22074 authors... 5000 terms used... 224987 skipped citations... 12755 'social' terms
Document 90000 ... 41 journals... 122511 cited works... 22260 authors... 5000 terms used... 227579 skipped citations... 12889 'social' terms
Document 91000 ... 41 journals... 123004 cited works... 22481 authors... 5000 terms used... 229932 skipped citations... 13026 'social' terms
Document 92000 ... 41 journals... 123362 cited works... 22633 authors... 5000 terms used... 231941 skipped citations... 13152 'social' terms
Document 93000 ... 41 journals... 123887 cited works... 22865 authors... 5000 terms used... 234895 skipped citations... 13301 'social' terms
parse error... ('No valid year found',) 10.2307/26650770
Document 94000 ... 41 journals... 124355 cited works... 23061 authors... 5000 terms used... 237218 skipped citations... 13435 'social' terms
Document 95000 ... 41 journals... 124804 cited works... 23232 authors... 5000 terms used... 239781 skipped citations... 13574 'social' terms
parse error... ('No valid year found',) 10.2307/26650789
Document 96000 ... 41 journals... 125264 cited works... 23424 authors... 5000 terms used... 242351 skipped citations... 13700 'social' terms
Document 97000 ... 41 journals... 125706 cited works... 23659 authors... 5000 terms used... 244903 skipped citations... 13841 'social' terms
Document 98000 ... 41 journals... 126187 cited works... 23821 authors... 5000 terms used... 247705 skipped citations... 13990 'social' terms
Document 99000 ... 41 journals... 126643 cited works... 24036 authors... 5000 terms used... 250261 skipped citations... 14130 'social' terms
Document 100000 ... 41 journals... 127069 cited works... 24215 authors... 5000 terms used... 253176 skipped citations... 14267 'social' terms
Document 101000 ... 41 journals... 127495 cited works... 24381 authors... 5000 terms used... 255609 skipped citations... 14414 'social' terms
Document 102000 ... 41 journals... 127856 cited works... 24538 authors... 5000 terms used... 257941 skipped citations... 14553 'social' terms
Document 103000 ... 41 journals... 128198 cited works... 24703 authors... 5000 terms used... 260312 skipped citations... 14689 'social' terms
Document 104000 ... 41 journals... 128520 cited works... 24866 authors... 5000 terms used... 262095 skipped citations... 14816 'social' terms
Document 105000 ... 41 journals... 128864 cited works... 25047 authors... 5000 terms used... 264760 skipped citations... 14958 'social' terms
Document 106000 ... 41 journals... 129233 cited works... 25216 authors... 5000 terms used... 267541 skipped citations... 15079 'social' terms
Document 107000 ... 41 journals... 129680 cited works... 25406 authors... 5000 terms used... 270347 skipped citations... 15226 'social' terms
Document 108000 ... 41 journals... 130059 cited works... 25582 authors... 5000 terms used... 273150 skipped citations... 15358 'social' terms
Document 109000 ... 41 journals... 130347 cited works... 25759 authors... 5000 terms used... 275391 skipped citations... 15493 'social' terms
Document 110000 ... 41 journals... 130730 cited works... 25939 authors... 5000 terms used... 278041 skipped citations... 15629 'social' terms
Document 111000 ... 41 journals... 131028 cited works... 26115 authors... 5000 terms used... 280534 skipped citations... 15763 'social' terms
Document 112000 ... 41 journals... 131311 cited works... 26303 authors... 5000 terms used... 282657 skipped citations... 15900 'social' terms
Document 113000 ... 41 journals... 131632 cited works... 26476 authors... 5000 terms used... 284847 skipped citations... 16010 'social' terms
Document 114000 ... 41 journals... 131918 cited works... 26654 authors... 5000 terms used... 287296 skipped citations... 16139 'social' terms
Document 115000 ... 41 journals... 132270 cited works... 26853 authors... 5000 terms used... 290082 skipped citations... 16293 'social' terms
Document 116000 ... 41 journals... 132478 cited works... 27012 authors... 5000 terms used... 291998 skipped citations... 16407 'social' terms
Document 117000 ... 41 journals... 132806 cited works... 27163 authors... 5000 terms used... 294556 skipped citations... 16552 'social' terms
Document 118000 ... 41 journals... 133123 cited works... 27372 authors... 5000 terms used... 297203 skipped citations... 16691 'social' terms
Document 119000 ... 41 journals... 133467 cited works... 27575 authors... 5000 terms used... 300392 skipped citations... 16843 'social' terms
Document 120000 ... 41 journals... 133692 cited works... 27732 authors... 5000 terms used... 302385 skipped citations... 16969 'social' terms
Document 121000 ... 41 journals... 133927 cited works... 27876 authors... 5000 terms used... 304663 skipped citations... 17095 'social' terms
Document 122000 ... 41 journals... 134189 cited works... 28055 authors... 5000 terms used... 306883 skipped citations... 17233 'social' terms
Document 123000 ... 41 journals... 134445 cited works... 28212 authors... 5000 terms used... 309358 skipped citations... 17372 'social' terms
Document 124000 ... 41 journals... 134677 cited works... 28365 authors... 5000 terms used... 311697 skipped citations... 17488 'social' terms
Document 125000 ... 41 journals... 134946 cited works... 28510 authors... 5000 terms used... 314237 skipped citations... 17618 'social' terms
Document 126000 ... 41 journals... 135200 cited works... 28685 authors... 5000 terms used... 317594 skipped citations... 17756 'social' terms
Document 127000 ... 41 journals... 135423 cited works... 28839 authors... 5000 terms used... 319987 skipped citations... 17890 'social' terms
Document 128000 ... 41 journals... 135651 cited works... 28989 authors... 5000 terms used... 322730 skipped citations... 18026 'social' terms
Document 129000 ... 41 journals... 135835 cited works... 29118 authors... 5000 terms used... 324492 skipped citations... 18142 'social' terms
Document 130000 ... 41 journals... 136077 cited works... 29272 authors... 5000 terms used... 327036 skipped citations... 18286 'social' terms
Document 131000 ... 41 journals... 136285 cited works... 29418 authors... 5000 terms used... 329392 skipped citations... 18413 'social' terms
Document 132000 ... 41 journals... 136475 cited works... 29569 authors... 5000 terms used... 331660 skipped citations... 18526 'social' terms
Document 133000 ... 41 journals... 136694 cited works... 29723 authors... 5000 terms used... 333942 skipped citations... 18672 'social' terms
Document 134000 ... 41 journals... 136881 cited works... 29881 authors... 5000 terms used... 336862 skipped citations... 18815 'social' terms
Document 135000 ... 41 journals... 137037 cited works... 30024 authors... 5000 terms used... 338877 skipped citations... 18943 'social' terms
Document 136000 ... 41 journals... 137229 cited works... 30171 authors... 5000 terms used... 341314 skipped citations... 19075 'social' terms
Document 137000 ... 41 journals... 137436 cited works... 30338 authors... 5000 terms used... 344466 skipped citations... 19206 'social' terms
Document 138000 ... 41 journals... 137655 cited works... 30505 authors... 5000 terms used... 347232 skipped citations... 19334 'social' terms
Document 139000 ... 41 journals... 137864 cited works... 30651 authors... 5000 terms used... 350515 skipped citations... 19465 'social' terms
Document 140000 ... 41 journals... 138027 cited works... 30803 authors... 5000 terms used... 352754 skipped citations... 19591 'social' terms
Document 141000 ... 41 journals... 138186 cited works... 30951 authors... 5000 terms used... 355264 skipped citations... 19719 'social' terms
Document 142000 ... 41 journals... 138370 cited works... 31135 authors... 5000 terms used... 357988 skipped citations... 19851 'social' terms
Document 143000 ... 41 journals... 138511 cited works... 31309 authors... 5000 terms used... 360623 skipped citations... 19971 'social' terms
Document 144000 ... 41 journals... 138650 cited works... 31465 authors... 5000 terms used... 363604 skipped citations... 20110 'social' terms
Document 145000 ... 41 journals... 138810 cited works... 31624 authors... 5000 terms used... 366257 skipped citations... 20243 'social' terms
Document 146000 ... 41 journals... 138963 cited works... 31776 authors... 5000 terms used... 368969 skipped citations... 20380 'social' terms
Document 147000 ... 41 journals... 139089 cited works... 31899 authors... 5000 terms used... 371263 skipped citations... 20501 'social' terms
Document 148000 ... 41 journals... 139221 cited works... 32054 authors... 5000 terms used... 373344 skipped citations... 20619 'social' terms
Document 149000 ... 41 journals... 139325 cited works... 32181 authors... 5000 terms used... 375424 skipped citations... 20729 'social' terms
Document 150000 ... 41 journals... 139461 cited works... 32345 authors... 5000 terms used... 378195 skipped citations... 20874 'social' terms
Document 151000 ... 41 journals... 139580 cited works... 32478 authors... 5000 terms used... 380364 skipped citations... 20991 'social' terms
Document 152000 ... 41 journals... 139687 cited works... 32623 authors... 5000 terms used... 383104 skipped citations... 21113 'social' terms
Document 153000 ... 41 journals... 139823 cited works... 32766 authors... 5000 terms used... 385502 skipped citations... 21242 'social' terms
Document 154000 ... 41 journals... 139928 cited works... 32903 authors... 5000 terms used... 387538 skipped citations... 21360 'social' terms
Document 155000 ... 41 journals... 140037 cited works... 33054 authors... 5000 terms used... 390456 skipped citations... 21484 'social' terms
Document 156000 ... 41 journals... 140111 cited works... 33181 authors... 5000 terms used... 392655 skipped citations... 21604 'social' terms
Document 157000 ... 41 journals... 140206 cited works... 33359 authors... 5000 terms used... 395402 skipped citations... 21765 'social' terms
Document 158000 ... 41 journals... 140295 cited works... 33509 authors... 5000 terms used... 398069 skipped citations... 21892 'social' terms
Document 159000 ... 41 journals... 140414 cited works... 33686 authors... 5000 terms used... 400623 skipped citations... 22025 'social' terms
Document 160000 ... 41 journals... 140497 cited works... 33832 authors... 5000 terms used... 402992 skipped citations... 22164 'social' terms
Document 161000 ... 41 journals... 140581 cited works... 33958 authors... 5000 terms used... 405359 skipped citations... 22294 'social' terms
Document 162000 ... 41 journals... 140669 cited works... 34092 authors... 5000 terms used... 407669 skipped citations... 22419 'social' terms
Document 163000 ... 41 journals... 140740 cited works... 34235 authors... 5000 terms used... 410276 skipped citations... 22535 'social' terms
Document 164000 ... 41 journals... 140820 cited works... 34398 authors... 5000 terms used... 412781 skipped citations... 22659 'social' terms
Document 165000 ... 41 journals... 140889 cited works... 34551 authors... 5000 terms used... 414935 skipped citations... 22798 'social' terms
Document 166000 ... 41 journals... 140973 cited works... 34696 authors... 5000 terms used... 418345 skipped citations... 22938 'social' terms
Document 167000 ... 41 journals... 141048 cited works... 34851 authors... 5000 terms used... 421095 skipped citations... 23064 'social' terms
Document 168000 ... 41 journals... 141122 cited works... 35002 authors... 5000 terms used... 423559 skipped citations... 23201 'social' terms
Document 169000 ... 41 journals... 141184 cited works... 35145 authors... 5000 terms used... 426185 skipped citations... 23333 'social' terms
Document 170000 ... 41 journals... 141229 cited works... 35275 authors... 5000 terms used... 428642 skipped citations... 23452 'social' terms
Document 171000 ... 41 journals... 141290 cited works... 35417 authors... 5000 terms used... 431166 skipped citations... 23581 'social' terms
Document 172000 ... 41 journals... 141329 cited works... 35571 authors... 5000 terms used... 433598 skipped citations... 23697 'social' terms
Document 173000 ... 41 journals... 141388 cited works... 35723 authors... 5000 terms used... 436661 skipped citations... 23839 'social' terms
Document 174000 ... 41 journals... 141422 cited works... 35868 authors... 5000 terms used... 438694 skipped citations... 23950 'social' terms
Document 175000 ... 41 journals... 141449 cited works... 36001 authors... 5000 terms used... 441039 skipped citations... 24087 'social' terms
Document 176000 ... 41 journals... 141488 cited works... 36120 authors... 5000 terms used... 443983 skipped citations... 24214 'social' terms
Document 177000 ... 41 journals... 141507 cited works... 36289 authors... 5000 terms used... 446523 skipped citations... 24345 'social' terms
Document 178000 ... 41 journals... 141528 cited works... 36420 authors... 5000 terms used... 448845 skipped citations... 24470 'social' terms
Document 179000 ... 41 journals... 141558 cited works... 36543 authors... 5000 terms used... 451261 skipped citations... 24595 'social' terms
Document 180000 ... 41 journals... 141582 cited works... 36691 authors... 5000 terms used... 454158 skipped citations... 24732 'social' terms
Document 181000 ... 41 journals... 141589 cited works... 36807 authors... 5000 terms used... 456376 skipped citations... 24833 'social' terms
Document 182000 ... 41 journals... 141596 cited works... 36958 authors... 5000 terms used... 459122 skipped citations... 24971 'social' terms
Document 183000 ... 41 journals... 141603 cited works... 37091 authors... 5000 terms used... 461318 skipped citations... 25090 'social' terms
Document 184000 ... 41 journals... 141607 cited works... 37241 authors... 5000 terms used... 464083 skipped citations... 25236 'social' terms
In [16]:
list(cnt_doc['t'])[:5]
Out[16]:
[('social',), ('played',), ('ever',), ('candidates',), ('money',)]
In [17]:
len([x for x in cnt_doc['t'] if not '-' in x[0]])
Out[17]:
2500
In [18]:
min(list(cnt_doc['t'].values()))
Out[18]:
4
In [19]:
for k,v in cnt_doc.items():
    print(k, len(v))
fj 41
c 141608
fa 37329
t 5000
fy 104
fj.fy 1849
c.fj 463963
c.fy 588668
fy.t 238387
c.t 9661193
fj.t 164358
c.fa 1328474
fa.fj.fy 68269
c.c 1019555
c.c.fy 1122173
t.t 10174838

Save the database

In [20]:
save_cnt("%s.doc"%database_name, cnt_doc)
Saving sociology-jstor-basicall.doc ___ fj
Saving sociology-jstor-basicall.doc ___ c
Saving sociology-jstor-basicall.doc ___ fa
Saving sociology-jstor-basicall.doc ___ t
Saving sociology-jstor-basicall.doc ___ fy
Saving sociology-jstor-basicall.doc ___ fj.fy
Saving sociology-jstor-basicall.doc ___ c.fj
Saving sociology-jstor-basicall.doc ___ c.fy
Saving sociology-jstor-basicall.doc ___ fy.t
Saving sociology-jstor-basicall.doc ___ c.t
Saving sociology-jstor-basicall.doc ___ fj.t
Saving sociology-jstor-basicall.doc ___ c.fa
Saving sociology-jstor-basicall.doc ___ fa.fj.fy
Saving sociology-jstor-basicall.doc ___ c.c
Saving sociology-jstor-basicall.doc ___ c.c.fy
Saving sociology-jstor-basicall.doc ___ t.t
In [21]:
save_cnt("%s.ind"%database_name, cnt_ind)
Saving sociology-jstor-basicall.ind ___ fy
Saving sociology-jstor-basicall.ind ___ c
Saving sociology-jstor-basicall.ind ___ fj
Saving sociology-jstor-basicall.ind ___ fj.fy
Saving sociology-jstor-basicall.ind ___ c.fj
Saving sociology-jstor-basicall.ind ___ c.fy
Saving sociology-jstor-basicall.ind ___ t
Saving sociology-jstor-basicall.ind ___ fy.t
Saving sociology-jstor-basicall.ind ___ c.t
Saving sociology-jstor-basicall.ind ___ fj.t
Saving sociology-jstor-basicall.ind ___ c.fa
Saving sociology-jstor-basicall.ind ___ fa.fj.fy
Saving sociology-jstor-basicall.ind ___ fa
Saving sociology-jstor-basicall.ind ___ c.c
Saving sociology-jstor-basicall.ind ___ c.c.fy
Saving sociology-jstor-basicall.ind ___ t.t
In [ ]: