As distribuições do acaso

Authors

  • F. G. Brieger Universidade de S. Paulo; Escola Superior de Agricultura Luiz de Queiroz; Seção de Genética

DOI:

https://doi.org/10.1590/S0071-12761945000100011

Abstract

1) The present paper deals with the mathematical basis and the relations of the different chance distributions. It is shown that the concepts of classical statistics may only be applied correctly when dealing with illimited populations where the number of variables is so large that it may be considered as infinite. After the attempts of LEXIS and HELMERT, a partial solution was found by KARL PEARSON and by STUDENT, until finally R. A. FISHER gave the general solution, solving the problem of statistical analysis in a general form and determining the chance distribution in small samples. 2) As a basis for the formulas, I am using always the relative deviate, which may be determined in two ways: the simple relative deviale: D= v-v/ δ ou D = v-v2/δ the compound relative deviate: D = + Σ (v - 2)²: o o = o1/ o o 3) The deviates are always defined by two degrees of freedom, nl for the dividend and n2 for the divisor. According to the values combined in any given case, we may distinguish four basic chance distributions which we shall call according to the respective authors: the distributions of GAUSS, STUDENT, PEARSON and FISHER. The mathematical definition and the corresponding degrees of freedom are given both in formulae 3-1 to 3-4 on pg. and in the lower part of Fig. 2. The upper part of Fig. 2. represents grafically these four distributions. The equations and the forma of the corresponding curves are discussed in detail. 4) The main differences between the simple and the compound relative deviate are discussed: a) Simple deviates have always a definite signe and are either positive or negative, according to the signe of the numerator. Correspondingly the distributions of GAUSS and STUDENT are symetrical with regards to the abscissa zero and extend on both sides of it untill plus and minus infinite. Compound deviates on the other side, have no definite sign, since the numerator is a square root. The distributions of PEARSON and FISHER, accordingly, are discontinuous for the value zero and,we obtain two identical and independent curves which go from zero to plus infinite, resp. from zero to minus infinite. b) Secondly when studying simple deviates we admitt that both positive and negative large deviates may occur in consequence of an increase in variability. Consequently we ha- ve to use, in the corresponding tests, bilateral limits of probability (Fig. 3). When analysing compound deviates, we are comparing one standard error with another, which may either be an ideal value or at least a better estimate. Admitting that only an increase of variability may occurr, we apply in tests, based on PEARSON'S or FISHER's distribuitions, only the upper (superior) unilateral limit of probability (Fig. 4). The tables thus far published, for these distributions contain the unilaeeral limits only. A more complete table, including bilateral limits, has been computed by myself and is already in press. 5) Discussing the relations of the four distribuitions, it is shown that mathematically their formulas can be easily transiormed from one to the other by changing the respective values of degrees of freedom. The application of a few principles of mathematics is sufficient, besides remembering that the distributions of PEARSON and FISHER correspond only to half a distribution of STUDENT and GAUSS. Thus it is shown: a) that for nl bigger than 30, the distribution of STUDENT is so near to that of GAUSS (or normal), to permitt its substitution. b) that for nl bigger than 30, the distribution of PEARSON becomes almost symetrical and may be substituted by a modified distribution of GAUSS (or normal) with mean equal to one and error standard. 1 : 2n1 c) That the distribution of FISHER with nl = n2 becomes more or less symetrical when both reach the limit of 50 or bester still 100, and than may be substituted by a modified distribution of GAUSS with mean one and error standard. 1 : n d) That the distributions of FISHER, when nl differs from n2, may be substituted either by the correspondent distribution of PEARSAN, if nl is small and n2 bigger than 100, or by a modified distribution of GAUSS with mean unity and error standard equal to when nl goes beyond 50 and n2 beyond 100 ou vice versa. 1 : 2n1n2 n1 + n2 6) The formulas, generally given in the litterature to characterize the different distribuitions are far from being uniform and use differents measures for the abcissa. Thus in the tests for FISHER's distribution, the natural logarithm for the deviate were used initially (FISER's z-test). Later on SNEDECOR (1937) recommended the square of the deviate (F-test) and BRIEGER (1937) the deviate itself (n- test). In the X2 test, based on PEARSON'S distribution, one generally uses the square of the compound deviate, multiplied by the degree of freedom n1. The t-test, based on STUDENT'S distribution, finally makes use of the simple deviate itself. The inevitabal algebric consequences of this variation of unitis of emasure is, that the severity and thus the statistical efficiency of the tests is not comparable. Decimals in the t-test and n-test correspond to hundreds in the F-test and to almost anything, depending upon the values of nl, in the X2-test. 7) In the last chapter a few rather complicated problems are discussed, which can be solved with approximation in practical tests, but wich are still unsolved from the theoretical point of view. We shall mention here only two of the questions raised: a)Analyzing the difference between a variable and its mean (or of a partial mean and a general mean), only the standard error of the first term is used generally, considering the other as a constant: D = v- v 6n Howewer with more justification both terms may be considered as variable und thus one should apply the formula : D. v-v δ2 - δ 2 0 The first mentioned simple value of D chould follow a distribution of STUDENT and its analysis thus does not present any difficulties. But in the second term we combine the term, v, with standard error o which should follow STUDENT'S distribution and the mean, V , with standard error o v which generally will follow the distribution of GAUSS. How shall we combine the requeriments of those two distribuitions simultaneously? BEHREND's test seems to give a solution, which however is not very easy to apply and which is not suficient when the second term follows also a distribution of STUDENT, but with different degree of freedom. b) The second problem arises in connection with the ana-ysis of variance in its most simple form, i. e. the test "within-between". If we want to compare by a t-test the partial mean of one sample Va with the general mean v of the whole experiment, we must decide wether we should use standard error of this sample oa, based on np degrees of freedom or the error "within" oD which is a balanced mean value ot all the m individual sample errors. At the same time we have an alternative choice with regards to the degree of freedom: D = va - v/ Np n1 = 1/ n2 = n p ou D = va - v / δo/ n p = n1 = 1/ n2 = m. np Thus it is evident that the use of the value o D not only alters the value of the relative deviate D, but also the limits of probability to be applied which depend upon the degree of greedom. Howewer we must justify the substitution of the partial error o a by the error "within" oD and this should be done by determining wether the value o a: oD is due to chance only, i. e. that there is really no difference between the two errors from a statistical point of view. The necessary test howewer: D = δa /δ d n1 = np / n2 = m. n p generally does not allow a very decisive answer since the degree of freedom np is in most cases small. Whenever there is some reason to doubt wether the substitution is really justified, it seems to me reasonable to use the probably better estimate oD, instead of the individual sample error oa, while at the same time make allowances for doubts by not substituting the degrees of freedom: D = va - n / Bd Np n = 1 / n2 = nf A more complete formula naturally would be the following: D = va - v/ Bd N = 1 / N2... c) These too examples should be sufficient to show that there are still Important theoretical problems to be solved, in spite of the really very considerable progress achieved with regards to theory and methods of analysis of simple and compound relative deviates from uniform small or large, but always limited samples.

Downloads

Download data is not yet available.

Published

1945-01-01

Issue

Section

nao definida

How to Cite

Brieger, F. G. (1945). As distribuições do acaso . Anais Da Escola Superior De Agricultura Luiz De Queiroz, 2, 321-392. https://doi.org/10.1590/S0071-12761945000100011